0% found this document useful (0 votes)
375 views26 pages

SAS Programming Basics

The document discusses an introductory SAS programming basics seminar. It covers navigating the SAS interface, creating and modifying datasets using data steps and proc steps, and manipulating data using operators and functions. Common SAS options and syntax errors are also examined.

Uploaded by

Junaid Faruqui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
375 views26 pages

SAS Programming Basics

The document discusses an introductory SAS programming basics seminar. It covers navigating the SAS interface, creating and modifying datasets using data steps and proc steps, and manipulating data using operators and functions. Common SAS options and syntax errors are also examined.

Uploaded by

Junaid Faruqui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • SAS Programming Basics
  • Diagnosing and Correcting Syntax Errors
  • Manipulating Datasets
  • Data Step vs. Proc Step
  • Advanced Data Techniques
  • SAS Functions
  • Modifying SAS Output
  • Wrapping things up

HelptheStatConsultingGroupby

stat

>

sas

>

seminars

>

sas_programming_basics

givingagift

>[Link]

SASProgrammingBasics
SASisapowerfulandflexiblestatisticalpackagethatrunsonmanyplatforms,[Link]
[Link]
[Link]
studentsintheclasswillhavehandsonexperienceusingSASfordatamanipulationincludinguseofarithmeticoperators,conditionalprocessing,usingSAS
builtinfunctions,merging,appending,[Link]:

ComfortablynavigatetheSASwindowenvironment
Subsetandcreatenewdatasets
Createnewvariables
WriteanddebugbasicSASprograms
UseSASfunctionforbasicdatamanagementtasks
Mergeandappenddata
ModifySASoutputforpresentation

PleasenotethatsinceweareusingdatafilesprovidedbySAS,[Link],thisseminarpageincludes
outputfromtheSASproceduresusedintheseminar.
ForclarityallSASkeywordswillbeinCAPITALlettersinordertodistinguishthemfromtheinformationthatyouastheuserwillprovide.
Note:ThisseminarwasdevelopedinSAS9.4

1.0SASRefresher
1.1Libname
Wewillstartbysettingourlibname,whichopensadirectorytothelocationwhereourSASdatafilesarestored.

*assign libname
LIBNAME idre 'C:\';
SASalsoallowsyoutoclearaparticularlibnameorusethe_all_keywordtoclearallassignedlibnames.

*clear libname;
LIBNAME idre CLEAR;
LIBNAME _ALL_ CLEAR;
* reassign library;
LIBNAME idre 'C:\';
1.1SASWindowingenvironment
Let'[Link],Results,ProgramEditor,Log,andOutput/Results
[Link],whenyoustartSAS,thewindowsthatinitiallyappeararetheLog,[Link]
undertheViewmenuinthetoolbar.
TheSASExplorerwindowallowsyoutomanagefilesassociatedwithyourcurrentSASsessionincludingviewing,deleting,moving,[Link]
Editorwindow,whichisliterallyjustatexteditor,permitsyoutoenter,edit,[Link]
informationabouttheircurrentsessionincludingmessagesaboutsubmittedSASprogramssuchassuccessfulexecution,[Link]
[Link]
procedures.InSAS9.4,thedefaultoutputformatisHTML.
1.2CreatingnewSASdatasets
Aswewillbeusingseveraldifferentdatasetsintheseminartoday,let'salsocoverhowtocreatenewpermanentandtemporarydatasetsfromthedatafiles
youhavebeenprovided.

*permanent dataset;
DATA [Link];
SET [Link];
RUN;
*temporary dataset;
DATA new;
SET [Link];
RUN;

1.3SASOptions
[Link]
varydependingwhatcomputingenvironmentyouareusing([Link],Unix).TheOPTIONSprocedureliststhecurrentsettingsofSASsystemoptions
intheSASlog.

PROC OPTIONS;
RUN;
SASincludestwotypesofoptions:[Link]
differentdependingonwhichoperatingsystemyouareusing.
Belowaresomeexamplesofcommonoptionsandwhattheyareresponsiblefordoing.
[Link]
below,[Link],youwillseethatintheLog(shownbelow),SASissuesawarningitassumed
[Link],inthesecondexamplewheretheautomaticcorrectionoptionisturned
off,SASissuesanerrorandstopsexecutingtheprocedure.

*autocorrect option;
OPTIONS AUTOCORRECT; /*default*/
PROC FREQ DATE=[Link];
TABLE code;
RUN;
OPTIONS NOAUTOCORRECT;
PROC FREQ DATE=[Link];
TABLE code;
RUN;

[Link],thedefaultisforSASistoerror
[Link],thedefaultoptionisinvokedandasyoucanseebelowSASissuesawarningthatthe
[Link],inthesecondexamplewherewetellSAStonotissueanerror(NOFMTERR),SASignorestheincorrectlyused
formatandwilltheexecutethecommandwithouttheformat.

*format error;
OPTIONS FMTERR;/*default*/
PROC PRINT DATA=[Link];
FORMAT code $code.;
RUN;
OPTION NOFMTERR;
PROC PRINT DATA=[Link];
FORMAT code $code.;
RUN;


2.0DiagnosingandCorrectingSyntaxErrors
[Link]
ofsyntaxerrors.
2.1ColorCodedSyntax.
[Link]
diagnosesyntaxerrors,[Link]
[Link],CLASS,MODELare
[Link],thekeywordwilloftenremainblacklikethevariablenamesbecauseSASdoesnot
[Link],thewaytoindicateaformatistoputaperiodatthenendand,
onceyoudothis,[Link],youwillseethatwearemissinganendquote,thusallof
[Link].

2.2LogFile
[Link]:

Inthesyntaxshown,[Link]"average"and"min"optionstoourstatementto
[Link],optionsshouldbecoloredinblueand
inthisexample"average"remainsblackindicatingtheSASisnotrecognizingitasakeyword.
[Link]"average"wasnot
[Link],inthisinstance,[Link],youwillseethat"mean"
[Link]"average"with"mean"theprocedurewillexecuteasexpected.

[Link]
SASprogramsarecomprisedoftwodistinctsteps:[Link],whileproceduresareprewrittenprogramsthat
[Link],Datastepsareusedtoread,modifyandcreatedatafilesandalwaysbeginwitha"DATA"[Link]
[Link]
[Link]"PROC"[Link]
includingPROCPRINT,PROCMEANS,[Link].
Inthefollowingsectionswewilldemonstratehowtousethesetwotypesofsteps.

4.0ManipulatingDatasets
4.1Operators
AnoperatorinSASisasymbolrepresentingacomparison,logicaloperationormathematicalfunction.
4.1.1ComparisonOperator
[Link]=,<,>but
alsohavemnemonicequivalentslikeEQ,LT,orGT,[Link].
[Link]"sales"datafile,wehaveinformationonsalesassociatesfrom
Australia(AU)andtheUnitedStates(US).IfweonlywantedtooutputrecordsforAustraliansalesassociateswecouldusethe=[Link]
variablecountrycontainscharacterinformationnotnumeric,weneedtoputsinglequotesaround'AU'.

PROC PRINT DATA=[Link];


WHERE Country='AU';
RUN;
TheINoperatorcanbeusedifyouaretryingtospecifyalistorrangeofvalues,asdemonstratedbelow.

PROC PRINT DATA=[Link];


WHERE Country IN ('AU', 'US');
RUN;
[Link]
arelessthan(<)$30,[Link],weoutputsalaryvaluesgreaterthanorequalto(ge)$30,000.

PROC PRINT DATA=[Link];


WHERE Salary<30000;
RUN;
PROC PRINT DATA=[Link];
WHERE Salary ge 30000;
RUN;
OnelimitationofusingaWHEREstatementisthatmorethan1cannotbeusedsimultaneously,[Link]
followingsyntax,SASwillissueanoteintheLogstating"WHEREclausehasbeenreplaced."Itwillthenexecutethefollowingsyntaxomittingthefirst
[Link],inthenextsectionwewilldemonstratehowtocombinecomparisonoperatorswithlogicaloperatorstoachievethedesired
output.

PROC PRINT DATA=[Link];


WHERE Country='AU';
WHERE Salary<30000;
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageoncomparisonoperators.
4.1.2LogicalOperators
ThelogicalorBooleanoperatorsincludeAND,OR,&[Link],these
[Link]'smnemonicalternative.

Symbol

Mnemonic

^~

NOT

&

AND

OR

IntheprevioussectionwelearnedthatwecannotusetwoWHEREstatements,butwecanusetheANDoperatortocombinetheinformationcontainedin
thosetwostatementstoachievethedesiredresult.
BelowweuseANDtooutputobservationsrepresentingAustraliansalesassociatesthatmakelessthan$30,[Link]
usedandtheygivethesameresult.

PROC PRINT DATA=[Link];


WHERE Country='AU' AND Salary<30000;
RUN;
PROC PRINT data=[Link];
WHERE Country='AU' & Salary<30000;
RUN;

AswithcomparisonoperatorsyoucanalsocombineAND,OR,&NOTwiththeINoperator.Intheexamplebelowthevariablejob_titleincludesseven
[Link],[Link]
wehavemorethenonevaluewearetryingtoexclude.

PROC FREQ DATA=[Link];


TABLES Job_Title;
WHERE Job_Title NOT IN ('Sales Manager','Sales Rep. IV');
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonlogicaloperators.
4.1.3WhereOperators
[Link],SASdoesincludeasetofspecialoperatorsthatcanbeused
[Link].

Operator

Description

CharorNum

BetweenAnd

Allowsforaninclusiverange

Both

Contains

Includesacharacterstringorsubstring

CharacterOnly

IsNullorIsMissing

Identifiesmissingvalues

Both

Like

Matchesapattern

CharacterOnly

=*

Soundslike

CharacterOnly

SameAndorAlso

AugmentsanexistingWHEREclausewithout
havetoretypetheoriginalone

Both

Forexample,herearethreewaysofspecifyingthatwewantSAStooutputallsalesassociaterecordswithsalariesthatrangefrom$28,000to$30,[Link]
inanygoodprogramminglanguage,therearealwaysmultiplewaysofdoingthesamething.

* We can use only comparison operators;


PROC PRINT DATA=[Link];
WHERE 28000<=Salary<=30000;
RUN;
*We can use a mix of comparison and logical operators;
PROC PRINT DATA=[Link];
WHERE Salary>=28000 & Salary<=30000;
RUN;
*We can use only the special WHERE operators;
PROC PRINT DATA=[Link];
WHERE Salary BETWEEN 28000 AND 30000;
RUN;
Earlier,wediscussedthat,ingeneral,[Link]
arethespecialoperators"sameand"and"also".[Link]
theexamplebelowthefirstconditionsubsetsthedatatoAustraliansalesassociatesthatmakelessthen$26,000,andthenweaddtheadditionalclausethat
theymustalsobefemale.

*Using Same and;


PROC PRINT DATA=[Link];
WHERE Country='AU' and Salary<26000;
WHERE SAME AND Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
*Using Also;
PROC PRINT DATA=[Link];
WHERE Country='AU' & Salary<26000;
WHERE ALSO Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
Nowwhilesomeofthesespecialoperatorsarefairlyselfexplanatorylike"IsNull"somemaybelessso,suchas"=*"and"Like".Theseoperatorscanbe
helpfulforidentifyingissuessuchasmisspelledinformation,incorrectlyenteredinformation,[Link],
belowisadatasetcalled"shoes_eclipse"thatincludesseveraldifferentproductnames:

Let'ssupposeweareinterestedinidentifyingproductnamesthatincludetheword"Woman's".Howcouldwedothat?The"Like"operatorcouldhelpusdo
[Link],apercent(%)signandanunderscore(_).Thepercent
[Link],[Link]
onlyinterestedinproductsthatstartwith"Woman's",thenwedon'tcarehowmanyspacescomeafter"Woman's":

PROC PRINT DATA=idre.shoes_eclipse;


VAR product_name;
WHERE product_name LIKE "Woman's %";
RUN;

IalsocouldaskSAStooutputtomeanynamethatincludes"Men's"[Link]%signsb/canyproductnamewith
"Men's"mayhavecharacterspacesbeforeandafter.

PROC PRINT DATA=idre.shoes_eclipse;


VAR product_name;
WHERE product_name LIKE "% Men's %";
RUN;

FormoreinformationcheckoutSASHelpandDocumentationonspecialWHEREoperators.
4.1.4Arithmeticoperators
Arithmeticoperators,asyoucanprobablytellfromthename,[Link]
symbolsusedinSAS.

Symbol

Description

**
*
/
+

Exponentiation
Multiplication
Division
Addition
Subtraction

[Link],ifyouarecalculatingvaluesusingavariable(s)withmissingdata,theresultingvaluewillalsobe
[Link],expressionsareevaluatedwithrespecttothetraditionalorderofoperationswithexponentiationtakingthehighestprioritylevel,then
multiplication/divisionandlastaddition/[Link],asisthecasewiththeotheroperatorswe
havediscussed,arithmeticoperatorscanbeusingonconjunctionwithbothlogicalandcomparisonoperators.
Let'[Link]"sales_subset"fromthe"sales"[Link]
containonlyobservationsfromAustralianemployeeswhosejobtitlecontainstheword"Rep".SoweareusingalogicalandspecialWHEREoperator.
Additionally,wearecreatinganewvariablecalled"Bonus"whichiscalculatedbymultiplying"Salary"by.10.

DATA sales_subset;
SET [Link];
WHERE Country='AU' & Job_Title contains 'Rep';
Bonus=Salary*.10;
RUN;
Belowweoutputthefirst20recordsofournewdataset.

Inthissecondexamplelet'suseparenthesestochangetheestimationofacompound(morethenoneoperator)[Link]
createtwonewvariablesprofit1andprofit2.

DATA profit;
SET idre.order_fact;
profit1 = total_retail_price - costPrice_per_unit * quantity;
profit2 = (total_retail_price - costPrice_per_unit) * quantity;
RUN;
Let'sseehowtheuseofparentheseshaschangedourvalues.

FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonarithmeticoperators.
4.2ConditionalProcessing
4.2.1WHEREandIFstatements
[Link]
[Link],[Link],whileboth
WHEREandIFcanbeusedwithaDatastep,[Link],ifweaddanIFstatementtothePROCMEANS
[Link].

HoweverifyouuseWHEREthestatementisblue.

IfyouattempttoexecutethePROCMEANSusingtheincorrectIFstatementSASwillproduceanerrorbutSASwillexecutethecommandusingthe
WHEREstatement.

DatastepswillacceptbothWHEREandIFstatement,[Link]
[Link]$30,000andassigningthem,usingTHEN
OUTPUT,toanewdatasetcalled"highsales".

DATA highsales ;
SET [Link];
IF salary GT 30000 THEN OUTPUT highsales;
RUN;
[Link]
ofequivalentwaysofsubsettingthedata?

DATA emps;
SET [Link];
WHERE Country='AU';
Bonus=Salary*.10;
IF Bonus>=3000;
RUN;
Moreover,[Link]
[Link]
[Link]
[Link]"Bonus",SAS
wouldhavegivenusanerrorsayingthe"Bonus"[Link],[Link],
SASwillexecutetheWHEREstatementandcreate"Bonus"andthenassesswhethertheIFconditionistrue.
4.2.2IfThenstatement
[Link]
fulfillsacertaincondition.
Wewillonceagaincreateavariablecalled"Bonus",butassignthevaluesbasedonacertainsetofconditionsthataredefinedbyanemployee'sjobtitle.

DATA comp1;
SET [Link];
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
IF Job_Title='Sales Manager' THEN Bonus=1500;
IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;

Youwillseeintheoutputabove,[Link]"Bonus"forallofthe
jobtitles.
ArelatedstatementtoIFTHENistheELSEstatementthatcanbeusedwhencreatingconditionalstatementsaroundmutuallyexclusivegroups.

DATA comp2;
SET [Link];
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
[Link]
true,[Link],[Link],aswas
thecasewiththefirstIFTHENexample,wewillendupwithalotofmissingvaluesusingthissyntax.
Whatifwehadascenariowherewewantedtogivealltheremainingcategories,thatdidnotfulfilltheprescribedconditions,[Link]
[Link],weaddanadditionalELSEstatementassigningallofthejobtitlesabonusvalue
of500.

DATA comp3;
SET [Link];
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
ELSE Bonus=500;
RUN;

Now,wehavecompletedataforallobservations.
[Link]
[Link]
[Link],wedeleteallobservationsassociatedwiththreespecificjobtitles.

DATA drop;
SET [Link];
IF Job_Title IN('Sales Manager', 'Senior Sales Manager', 'Chief Sales Officer') THEN DELETE;
RUN;
4.2.3UsingDo
[Link]
[Link],let'simaginethatforeachbonusvalue,Ialsowanttocreatea
variablecalledfreqthatdenoteshowmanytimesayearthesalesassociatecanreceivethebonus([Link],twiceayear).Sowemighttrythe
followingcodeusingalogicaloperator.

DATA freq1;
SET [Link];
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000 & Freq = "once a year";
ELSE Bonus=500 & Freq = "twice a year";
RUN;
Whilethissyntaxappearsreasonable,SASwillexecutethestatementandtheissueanoteinthelogthat"VariableFreqisuninitialized".WhenSASis
unabletolocateavariableinaDATAstep,SASprintsthismessage.Ifyoulookinthefreq1SASdatasetyouwillseethatSAScreatedthevariablebutsets
allofit'[Link]"Freq"willrequireaseparatestatementinsteadofjustasimple"&".Youcouldtry
this:

DATA freq2;
SET [Link];
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE Bonus=500;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Freq = "once a year";
ELSE Freq = "twice a year";
RUN;
[Link]
multiplestatements.

DATA bonus;
SET [Link];
IF Country='US' THEN DO;
Bonus=500;
Freq='Once a Year';
END;

ELSE DO;
Bonus=300;
Freq='Twice a Year';
END;
RUN;
WhilethesyntaxlookssimilartoatraditionalIFTHEN,[Link],[Link]
[Link],[Link],insteadofjustELSEwenowhaveELSEDO
[Link],SASwillissueawarninginthelogandfailtoexecutetheDatastep.
4.3SASFunctions
Functionsacceptsargumentsandthenproduceaparticularvalue(numericorcharacter)[Link]
[Link]
[Link],commondatefunctions,andsomeadditionalfunctionsusefulforspecificdata
managementtasks.
4.3.1ArithmeticFunctions
Inthefirstexample,wewillusethe"Oldbudget"datafiletocalculatethetotalandaverageamountbudgetedforbusinessoperationsoverafiveyearperiod.

DATA budget;
SET [Link];
sum1 = yr2003 + yr2004 + yr2005 + yr2006 + yr2007;
sum2 = SUM(yr2003, yr2004, yr2005, yr2006, yr2007);
sum3 = SUM( of yr2003-yr2007);
mean1 = (yr2003 + yr2004 + yr2005 + yr2006 + yr2007)/5;
mean2 = MEAN(yr2003, yr2004, yr2005, yr2006, yr2007);
mean3 = MEAN( of yr2003-yr2007);
RUN;
[Link]"sum1"usinganarithmeticoperatortoaddthe5budget
[Link],wecanusetheSUM()function,[Link]
[Link]"+",acasewithmissingvaluesonanyofthe
[Link]()function,anymissingvalueswillbetreatedasthoughtheywerezero,
[Link]
[Link],[Link]
[Link]
alsousesimilarsyntaxtodemonstratehowtoestimatetheaverageormeanbudgetvariables.

Allthevaluesproducedfor"sum1sum3"and"mean1mean3"[Link]
mathematicalfunctionsincludingabsolutevalue,maximum,minimumandsquarerootthatcanbeusedinasimilarmanner.
4.3.2DateFunctions
[Link],SAShassomebuiltinfunctionsthatcanassist
userswithmanagingthisdatatype.SASstoresdateinformationasnumericvaluesrepresentingthenumberdaysbeforeorafterJan1,[Link]
[Link]"Sales"datasetwhichincludesinformationondateofbirthandhiringdataforeachemployeeto
demonstratesomedatefunctions.

DATA comp;
SET [Link];
Hire_Month=MONTH(Hire_Date);
Birth_Day = WEEKDAY(Birth_date);
Day_Dif = DATDIF(Birth_date,Hire_Date, 'actual');
Month_dif= INTCK('years',Birth_date,Hire_Date);
Bonus_1 = INTNX('month', Hire_Date, 6);
RUN;
TheMONTHfunctionpullsthemonthfrom"Hire_date"andput'sitinavariablecalled"Hire_month".TheWEEKDAYfunctionfiguresoutwhatdayofthe
week(17)thedatewouldhavefallenonandoutputsthis.
[Link]
[Link]'actual'numberofdays,butwecouldchooseothermethodsofcalculationsuchassumingthateachmonthhas
30daysandthatayearalwayshas360days.
INTCKcountsthenumberofintervalsbetweentwodates,inourexampleweaskedSAStooutputthenumberofyearsbetweenanemployeesdataofbirth
andwhentheywerehiredwhichwewouldbeequivalenttoanemployeesageatthetimeofhire.

INTNKisusedtoestimatecalculatethevariablebonus_1.[Link]
argumentsforthisfunctionaretheunitoftime,thevariablerepresentingthestartdate/[Link],employeesare
eligible6monthsaftertheirhiredate.
Belowistheoutputofthefirst10observationsofthe"comp"dataset,[Link],SASstoresdateinformationas
[Link](discussedfurtherinthenextsection),itwilldisplayasjustanumber.

PROC PRINT DATA=comp (OBS=10);


VAR Employee_ID Hire_date Hire_Month Birth_date Birth_Day Day_dif Month_dif Bonus_1;
*FORMAT Hire_date Birth_date Bonus_1 mmddyy10.;
RUN;

MoreexampleofSASdatefunctioncanbefoundontheSASHelpandDocumentationwebsite.
4.3.3OtherFunctions
SASincludesseveralothertypesoffunctionsdesignedforspecifictypesofneedsmanyofthesefunctionsarehelpfulfordatamanagementofcharacteror
[Link],LENGTHtellstheuserthelengthofacharacterstringwhileCOMPRESSwillcompressstringvaluesandremoveunwanted
[Link],insimilarwaytoextractingdateinformationliketheMONTHfunction,SAShasseveral
functionsincludingSCANandSUBSTRthatallowsyoutoextractwordsfromaphrase.
Let'[Link]"Shoes_eclipse"[Link]
interestistoobtainthelengthofproduct_name,compressproduct_nametoremovetheblanks,andcreateavariabletheextractsthebrandname
"Eclipse"fromproduct_group.

DATA shoes;
SET idre.shoes_eclipse;
length_name = LENGTH(product_name);
comp_product = COMPRESS(product_name);
brand = SUBSTR(product_group, 1, 7);

brand2 = SCAN(product_group, 1, " ");


RUN;

[Link],forthevariablelength_name,ifyoucountedthenumberoflettersandspacesinproduct_name
[Link],thecompressedversionofproduct_namenowincludesnospaces.Third,bothSCAN
andSUBSTRfunctionsproducedthesameoutput.TheSUBSTRfunctiontakes3arguments,thenameofvariablewiththeinformationyouwanttoextract,
[Link]
characterstringoflength7startingatthefirstcharacterpositionof"productgroup"whichwouldbethe"E"[Link],thismeanswhatever
[Link]
function,whichworkverysimilartoSUBSTRexcept,insteadofspecifyingthelengthofthestring,[Link]
indicatesthatthecharacterstringofintereststartsatthefirstpositionandcontinuesuntilablank/[Link]
ofdelimitersincluding<(+&!$*)^/,%.
Inthepreviousexamples,wewereextractingvaluesfromastring,[Link].
BelowwewantSAStocombinethecharacterstringinformationinfirst_nameandlast_nameintoonefullnamevariable.Additionally,thefunctionalso
[Link],thedelimiterisjustablankwhileinthesecond
examplethedelimiterisacomma.

DATA salesquiz;
SET [Link];
sep = " ";
fullname = CATX(sep, first_name, last_name);
sep1 = ",";
fullname1 = CATX(sep1, last_name, first_name);
RUN;

Thenewvariablesaredisplayedabove.
AlistofallSASfunctions,bycategory,canbefoundhereontheSASwebsite.
Note:TheorderinwhichthevariablesarespecifiedintheCATXfunctiongovernstheorderinwhichtheywillbecombined.
4.4Sorting,MergingandAppending
4.4.1Sorting
[Link],certaintypesofdata
managementneedslikemergingdatasetsorgroupingobservationsbyaparticularcharacteristicrequiresorting.
[Link].

PROC SORT DATA=[Link] OUT=sales; *OUT= is optional;


BY Salary;
RUN;

Sortingcanalsobedoneusingmorethenonevariable.

PROC SORT DATA=[Link] OUT=sales;


BY Salary Country;
RUN;

Asyoucansee,thedataissortedinascendingorderby"Salary"firstandthenwhentherearetiedsalariesfromdifferentcountries,AUcomesbeforeUS
[Link]/oraddingintheDESCENDINGoption,whichreversesthesort
orderforthevariablethatimmediatelyfollowsit.

PROC SORT DATA=[Link] OUT=sales;


BY DESCENDING Salary DESCENDING Country;
RUN;

4.4.2Merging
[Link](OnetoOne)
ormultipleobservations(OnetoMany)[Link],thedatasetstobemergedmustbesortedbythe
samevariable(s).Intheexamplebelow,wewillmergeadatasetthathasemployeepayrollinformationwithaseconddatasetwithemployeeaddresses.
Sinceanemployee'sIDnumber(employee_id)isauniqueidentifierofeachobservation,wewillusethisvariabletomatchobservations.
First,weneedsorteachdatasetbyemployee_id.

PROC SORT DATA=idre.employee_payroll OUT=payroll;


BY Employee_ID;
RUN;

PROC SORT DATA=idre.employee_addresses OUT=addresses;


BY Employee_ID;
RUN;

[Link],Youwillnoticethatdatasets"addresses"and"payroll"donotshareanyofthesamevariablesexceptEmployee_ID.In
general,[Link]
[Link],[Link],Employee_IDisuniqueineachdataset,so
thiswillbeaOnetoOnemerge.
MergingisdoneinaDatastepsimilartowhatwehavebeenexecuting,[Link],
[Link]
forsorting.

DATA payadd;
MERGE payroll addresses;
BY Employee_ID;
RUN;
[Link],Employee_Nameisfromthe"addresses"dataandBirth_dateandSalaryare
fromthe"payroll"data.

Nowlet'stakealookatanexampleofaOnetoManymerge.
[Link].
BecausemorethenoneitemcanbeassociatedwithaparticularOrder_ID,[Link],wewillneedtoconductaonetomany
mergewhereeachrowinour"orders"datacouldbemergedwithmultiplerowsinthe"order_item"[Link],wewillbeginbysortingbothsetsofdataby
Order_ID.

PROC SORT DATA=[Link] OUT= orders;


BY Order_id;
RUN;

PROC SORT DATA=idre.order_item OUT= order_item;


BY Order_id;
RUN;

[Link]
presentinthefinalmergeddataset.

DATA allorders;
MERGE orders order_item;
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
RUN;

[Link]
variablesthathavemissinginformationwerebothfromthe"orders"[Link]
lookatthe"orders"datawewouldseethatthereisnoinformationfortheorderidentifier"1243854878"butthereisinformationin"order_item",thuswhen
youmergethedatasetstogetherallthevariablesfrom"orders"[Link]
[Link],[Link],youcanchoosetocontroltheobservationsoutputtothe
[Link](s)contributedtoformingthe
observationinthefinalmergedataset.Itisatemporaryvariableusedinthemergingprocessthatisgivena0valueifdidnotprovideinformationora1ifit
[Link]'stakealookathowwecould
applythisoptioninourpreviousmerge.

DATA allorders2;
MERGE orders (in=a)
order_item (in=b);
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
IF a;
RUN;
UsingtheINoptionwithanIFstatementselectsobservationstobematchedbyorder_IDthatarepresentin"orders".Ifyouhaveavaluefororder_IDthatis
in"order_item"butnot"orders"thenitwillnotbeusedtoconstructobservationsforthe"allorders2"[Link]:UsingIF=aisequivalenttosayingIFa=1.
Thus,youwillnotendupwithanymissingvalues.
Now,[Link]
recordstheresultisasomewhatunpredictableandoftenundesirableassortingofobservations.
4.4.3Appending
[Link]
[Link].
Wewillappendthreedatasetsthatincludeinformationonordersfrom3consecutivemonths(JulySeptember)[Link]
recordsfromeachofthedatasetstobeappended.

[Link]
numberofdatasetsdoesnotmatter.

DATA mnth7_8_9_2011 ;
SET idre.mnth7_2011 idre.mnth8_2011 idre.mnth9_2011;
RUN;
Aportionofthenewlyappendeddatasetisbelow.

Nowyoucanseethatall3datasetshasbeenappendedor"stacked"[Link]
[Link]?
Takealookbackatour"shoe"[Link],[Link]
samevariablesexcepttwo,product_idandsupplier_name.

Whatwillhappenwhenweattempttoappendthedata?

DATA shoes;
SET idre.shoes_eclipse idre.shoes_tracker;
RUN;

[Link],inthenew"shoes"datawecreated,alltherecordsfromtheEclipsedatasetwillbemissingonthe
variablesthatwereonlyintheTrackerdataset.

5.0ModifyingSASOutput
5.1TitlesandFootnotes
Asyouhaveprobablyalreadynoticed,SASprovidesalotofoutputfrommanyofit'[Link]
[Link]
[Link].
Wheneveryouarepresentingtablesofinformation,[Link]
[Link],itisalsopossibletoaddmultipletitlestooutputinSASaswellasfootnotesbyjustaddinganumericsuffixtothestatementindicatingthe
desiredordering.SASallowsforupto10differenttitlesand/orfootnotes.

TITLE1 'Orion Star Sales Staff';


TITLE2 'Salary Report';
FOOTNOTE1 'Confidential';
PROC PRINT DATA=[Link] (OBS=5);
VAR Employee_ID
Last_Name Salary;
RUN;


5.2LabelOptions
Additionally,[Link]
[Link],butifyouhavetolabel10variables,available
[Link]
controlthedisplayofthetitlesothatinsteadofthelabelbeingoneline,youcansplititintotwolines.

PROC PRINT DATA=[Link] (OBS=5) SPLIT='*';


VAR Employee_ID Last_Name Salary;
LABEL Employee_ID = 'Sales ID'
Last_Name = 'Last*Name'
Salary = 'Annual*Salary';
RUN;

[Link]
[Link]:

TITLE;
FOOTNOTE;
5.3Formats
Beyondjustlabelingvariables,[Link].
FormattingvalueschangestheappearanceofthosevaluesinoutputbuttheunderlyingvaluesdoesNOTchange.
[Link]
[Link],wewillfocusonhowtocreateandapplyuserdefinedformats.
InSAS,[Link],takealookatthesyntaxbelow:

PROC FORMAT;
VALUE $ctryfmt 'AU'='Australia'
'US'='United States'
other ='Miscoded';
VALUE tiers0-49999='Tier 1'
50000-99999='Tier 2'
100000-250000='Tier 3';
RUN;
[Link]"$"infrontof
[Link]
[Link]"Miscoded'.
Fornumericformats,[Link].

InbothDatastepsandProcsteps,SASdistinguishesformatsfromvariablesbyendingtheminaperiodwhichthenturnsthetextgreen.

PROC PRINT DATA=[Link] (OBS=5);


VAR Employee_ID Salary Country Birth_Date Hire_Date;
FORMAT Salary tiers. Birth_Date Hire_Date monyy7. Country $ctryfmt.;
RUN;

[Link]
certainproceduresinSAS,[Link],
thenyoucanusethesameformatstatementinaDataStep.
5.4OutputDeliverySystem(ODS)Basics
BesidescustomizingtheSASdefaultoutput,youmaywanttooutputresultstodifferentfiletypes.BydefaultSAS9.4outputresultsasHTMLandthisis
whatyouseeinthe"ResultsViewer"[Link],youwillneedtousetheOutputDeliverySystem(ODS)statement.
Thiswillallowforoutputinseveraldifferentformatsincludinglisting/text,rtf,[Link].
[Link](s)is
executed,[Link]:

ODS PDF FILE="&path\[Link]";


ODS RTF FILE="&path\[Link]";
PROC FREQ DATA=<data>;
TABLES <variable>;
RUN;
ODS PDF CLOSE;
ODS RTF CLOSE;
Inthiscase,[Link]
[Link].
[Link]
Resultstab.

ODS LISTING;
PROC FREQ DATA=[Link];
TABLES gender;
RUN;
ODS LISTING CLOSE;
Asauser,youcanalsocustomizethestyleorlookoftheoutputwhenselectingeitherahtml,[Link]
[Link]:

ODS HTML FILE="C:\[Link]" STYLE=sasweb;


PROC FREQ DATA=[Link];
TABLES gender;
RUN;
ODS HTML CLOSE;

ODS PDF FILE="C:\[Link]" STYLE=printer; /*Default*/


ODS PDF FILE="C:\[Link]" STYLE=journal;
PROC FREQ DATA=[Link];
TABLES gender;

RUN;
ODS PDF CLOSE;

[Link],bemindfulthatthiswillalsoclosethehtmldefault,soyouwillneedto
[Link],SASwillissuethewarning"Nooutput
destinationactive".

ODS _ALL_ CLOSE;


ODS HTML;

6.0SpecialIssues
6.1DealingwithDuplicates
[Link].
[Link]"nonsales"datafile,weshouldhave235uniqueemployee
[Link]=[Link]
numberindescendingorder.

PROC FREQ DATA=[Link] ORDER=FREQ;


TABLES Employee_ID;
RUN;

AboveyoucanseethattheemployeeID#120108hastworecordsassociatedwithit,[Link]
NLEVELS,whichdisplaysthenumberofdistinctvaluesforeachvariable.

PROC FREQ DATA=[Link] NLEVELS;


TABLES Employee_ID /NOPRINT;
RUN;

Thereare235uniqueemployeesinthe"nonsales"databutonly234uniquelevels,meaningthatoneemployeeID#isduplicated.
OncethepresenceofduplicateIDnumbershasbeenconfirmed,youwillmostlikelywanttoexaminethemtodetermineiftheyareindeedduplicaterecords
[Link],whatdoyoudowhenseveral
ID'sorrecordsareduplicated?Let'sseparatetherecordswithuniqueID'sfromtheduplicatesusinganIFstatement.

PROC SORT DATA=[Link] OUT=ids2;


BY employee_id;

RUN;
DATA dupes nodupes;
SET ids2;
BY employee_id;
IF NOT (FIRST.employee_id and LAST.employee_id) THEN OUTPUT dupes;
ELSE OUTPUT nodupes;
RUN;
[Link]
[Link],[Link]'swherethefirstandlast
recordsarenotthesame,toadatasetcalled"dupes",andalltheotheruniquerecordsareputindatasetcalled"nodupes".
"Dupes"orduplicatedemployeeIDnumbers:

"NoDupes"oruniqueemployeeIDnumbers.

6.2IdentifyingOutliers
[Link].
Bydefault,[Link]'sexamineoutliersforproductpricesinthe"price_new"
dataset.

PROC UNIVARIATE DATA=idre.price_new;


VAR unit_cost_price;
RUN;

YoucanoverridethisdefaultbyspecifyingtheoptionNEXTROBS=[Link]
[Link],SASalsoprovidesanobservationorrow
[Link],[Link]
[Link]'stryaddingtheproductidentifierProduct_IDtoeachofourextremevalues.

PROC UNIVARIATE DATA=idre.price_new NEXTROBS=3;


VAR unit_cost_price;
ID Product_ID;
RUN;

Noweachextremevalueisassociatedwithit'sIDnumber.

7.0Wrappingthingsup
Aswestatedinthebeginning,SASisaveryflexibleprogramswithgreatfeaturesfordatamanagement.
Thisseminaronlyscratchesthesurfaceondescribingalloftheprogrammingoptionsavailabletousers.
Formoreinformationonthetopicsdiscussedherepleaseexploreourwebsite.
Additionally,SAShasahostofcoursesdesignedtoimproveyourprogrammingskillsaimedatusersofalllevels.

Howtocitethispage

Reportanerroronthispageorleaveacomment

Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.

IDRE RESEARCH TECHNOLOGY


GROUP

High Performance
Computing
Statistical Computing

GIS and Visualization

ABOUT
2016 UC Regents

CONTACT

NEWS

Terms of Use & Privacy Policy

HighPerformanceComputing

GIS

StatisticalComputing

Hoffman2Cluster

Mapshare

Classes

Hoffman2AccountApplication

Visualization

Conferences

Hoffman2UsageStatistics

3DModeling

ReadingMaterials

UCGridPortal

TechnologySandbox

IDREListserv

UCLAGridPortal

TechSandboxAccess

IDREResources

SharedCluster&Storage

DataCenters

SocialSciencesDataArchive

AboutIDRE

EVENTS

OUR EXPERTS

(https://idre.ucla.edu/)giving a gift
Help the Stat Consulting Group by
SAS Programming Basics
SAS is a powerful and flexibl
1.3 SAS Options
SAS includes a large suite of system options that will affect your SAS session. Specific options are invoked 
 
2.0 Diagnosing and Correcting Syntax Errors
A main issues with learning a new programming language is the ability to identi
3.0 Data Step vs. Proc Step
SAS programs are comprised of two distinct steps: data steps and proc steps. Data steps are writt
Symbol
Mnemonic
^    ~   ¬
NOT
 &
AND
 |
OR
In the previous section we learned that we cannot use two WHERE statements, but w
* We can use only comparison operators;
PROC PRINT DATA=idre.sales;
WHERE 28000<=Salary<=30000;
RUN;
*We can use a mix of com
I also could ask SAS to output to me any name that includes "Men's" any where in the title. This would require multiple % sig
In this second example let's use parentheses to change the estimation of a compound (more then one operator) expression. We w
For more example checkout SAS 9.4 Help and Documentation page on arithmetic operators.
4.2 Conditional Processing
4.2.1 WHERE
Data steps will accept both WHERE and IF statement, however only an IF can be used for assignment statements. Below is an exa

You might also like