HelptheStatConsultingGroupby
stat
>
sas
>
seminars
>
sas_programming_basics
givingagift
>[Link]
SASProgrammingBasics
SASisapowerfulandflexiblestatisticalpackagethatrunsonmanyplatforms,[Link]
[Link]
[Link]
studentsintheclasswillhavehandsonexperienceusingSASfordatamanipulationincludinguseofarithmeticoperators,conditionalprocessing,usingSAS
builtinfunctions,merging,appending,[Link]:
ComfortablynavigatetheSASwindowenvironment
Subsetandcreatenewdatasets
Createnewvariables
WriteanddebugbasicSASprograms
UseSASfunctionforbasicdatamanagementtasks
Mergeandappenddata
ModifySASoutputforpresentation
PleasenotethatsinceweareusingdatafilesprovidedbySAS,[Link],thisseminarpageincludes
outputfromtheSASproceduresusedintheseminar.
ForclarityallSASkeywordswillbeinCAPITALlettersinordertodistinguishthemfromtheinformationthatyouastheuserwillprovide.
Note:ThisseminarwasdevelopedinSAS9.4
1.0SASRefresher
1.1Libname
Wewillstartbysettingourlibname,whichopensadirectorytothelocationwhereourSASdatafilesarestored.
*assign libname
LIBNAME idre 'C:\';
SASalsoallowsyoutoclearaparticularlibnameorusethe_all_keywordtoclearallassignedlibnames.
*clear libname;
LIBNAME idre CLEAR;
LIBNAME _ALL_ CLEAR;
* reassign library;
LIBNAME idre 'C:\';
1.1SASWindowingenvironment
Let'[Link],Results,ProgramEditor,Log,andOutput/Results
[Link],whenyoustartSAS,thewindowsthatinitiallyappeararetheLog,[Link]
undertheViewmenuinthetoolbar.
TheSASExplorerwindowallowsyoutomanagefilesassociatedwithyourcurrentSASsessionincludingviewing,deleting,moving,[Link]
Editorwindow,whichisliterallyjustatexteditor,permitsyoutoenter,edit,[Link]
informationabouttheircurrentsessionincludingmessagesaboutsubmittedSASprogramssuchassuccessfulexecution,[Link]
[Link]
procedures.InSAS9.4,thedefaultoutputformatisHTML.
1.2CreatingnewSASdatasets
Aswewillbeusingseveraldifferentdatasetsintheseminartoday,let'salsocoverhowtocreatenewpermanentandtemporarydatasetsfromthedatafiles
youhavebeenprovided.
*permanent dataset;
DATA [Link];
SET [Link];
RUN;
*temporary dataset;
DATA new;
SET [Link];
RUN;
1.3SASOptions
[Link]
varydependingwhatcomputingenvironmentyouareusing([Link],Unix).TheOPTIONSprocedureliststhecurrentsettingsofSASsystemoptions
intheSASlog.
PROC OPTIONS;
RUN;
SASincludestwotypesofoptions:[Link]
differentdependingonwhichoperatingsystemyouareusing.
Belowaresomeexamplesofcommonoptionsandwhattheyareresponsiblefordoing.
[Link]
below,[Link],youwillseethatintheLog(shownbelow),SASissuesawarningitassumed
[Link],inthesecondexamplewheretheautomaticcorrectionoptionisturned
off,SASissuesanerrorandstopsexecutingtheprocedure.
*autocorrect option;
OPTIONS AUTOCORRECT; /*default*/
PROC FREQ DATE=[Link];
TABLE code;
RUN;
OPTIONS NOAUTOCORRECT;
PROC FREQ DATE=[Link];
TABLE code;
RUN;
[Link],thedefaultisforSASistoerror
[Link],thedefaultoptionisinvokedandasyoucanseebelowSASissuesawarningthatthe
[Link],inthesecondexamplewherewetellSAStonotissueanerror(NOFMTERR),SASignorestheincorrectlyused
formatandwilltheexecutethecommandwithouttheformat.
*format error;
OPTIONS FMTERR;/*default*/
PROC PRINT DATA=[Link];
FORMAT code $code.;
RUN;
OPTION NOFMTERR;
PROC PRINT DATA=[Link];
FORMAT code $code.;
RUN;
2.0DiagnosingandCorrectingSyntaxErrors
[Link]
ofsyntaxerrors.
2.1ColorCodedSyntax.
[Link]
diagnosesyntaxerrors,[Link]
[Link],CLASS,MODELare
[Link],thekeywordwilloftenremainblacklikethevariablenamesbecauseSASdoesnot
[Link],thewaytoindicateaformatistoputaperiodatthenendand,
onceyoudothis,[Link],youwillseethatwearemissinganendquote,thusallof
[Link].
2.2LogFile
[Link]:
Inthesyntaxshown,[Link]"average"and"min"optionstoourstatementto
[Link],optionsshouldbecoloredinblueand
inthisexample"average"remainsblackindicatingtheSASisnotrecognizingitasakeyword.
[Link]"average"wasnot
[Link],inthisinstance,[Link],youwillseethat"mean"
[Link]"average"with"mean"theprocedurewillexecuteasexpected.
[Link]
SASprogramsarecomprisedoftwodistinctsteps:[Link],whileproceduresareprewrittenprogramsthat
[Link],Datastepsareusedtoread,modifyandcreatedatafilesandalwaysbeginwitha"DATA"[Link]
[Link]
[Link]"PROC"[Link]
includingPROCPRINT,PROCMEANS,[Link].
Inthefollowingsectionswewilldemonstratehowtousethesetwotypesofsteps.
4.0ManipulatingDatasets
4.1Operators
AnoperatorinSASisasymbolrepresentingacomparison,logicaloperationormathematicalfunction.
4.1.1ComparisonOperator
[Link]=,<,>but
alsohavemnemonicequivalentslikeEQ,LT,orGT,[Link].
[Link]"sales"datafile,wehaveinformationonsalesassociatesfrom
Australia(AU)andtheUnitedStates(US).IfweonlywantedtooutputrecordsforAustraliansalesassociateswecouldusethe=[Link]
variablecountrycontainscharacterinformationnotnumeric,weneedtoputsinglequotesaround'AU'.
PROC PRINT DATA=[Link];
WHERE Country='AU';
RUN;
TheINoperatorcanbeusedifyouaretryingtospecifyalistorrangeofvalues,asdemonstratedbelow.
PROC PRINT DATA=[Link];
WHERE Country IN ('AU', 'US');
RUN;
[Link]
arelessthan(<)$30,[Link],weoutputsalaryvaluesgreaterthanorequalto(ge)$30,000.
PROC PRINT DATA=[Link];
WHERE Salary<30000;
RUN;
PROC PRINT DATA=[Link];
WHERE Salary ge 30000;
RUN;
OnelimitationofusingaWHEREstatementisthatmorethan1cannotbeusedsimultaneously,[Link]
followingsyntax,SASwillissueanoteintheLogstating"WHEREclausehasbeenreplaced."Itwillthenexecutethefollowingsyntaxomittingthefirst
[Link],inthenextsectionwewilldemonstratehowtocombinecomparisonoperatorswithlogicaloperatorstoachievethedesired
output.
PROC PRINT DATA=[Link];
WHERE Country='AU';
WHERE Salary<30000;
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageoncomparisonoperators.
4.1.2LogicalOperators
ThelogicalorBooleanoperatorsincludeAND,OR,&[Link],these
[Link]'smnemonicalternative.
Symbol
Mnemonic
^~
NOT
&
AND
OR
IntheprevioussectionwelearnedthatwecannotusetwoWHEREstatements,butwecanusetheANDoperatortocombinetheinformationcontainedin
thosetwostatementstoachievethedesiredresult.
BelowweuseANDtooutputobservationsrepresentingAustraliansalesassociatesthatmakelessthan$30,[Link]
usedandtheygivethesameresult.
PROC PRINT DATA=[Link];
WHERE Country='AU' AND Salary<30000;
RUN;
PROC PRINT data=[Link];
WHERE Country='AU' & Salary<30000;
RUN;
AswithcomparisonoperatorsyoucanalsocombineAND,OR,&NOTwiththeINoperator.Intheexamplebelowthevariablejob_titleincludesseven
[Link],[Link]
wehavemorethenonevaluewearetryingtoexclude.
PROC FREQ DATA=[Link];
TABLES Job_Title;
WHERE Job_Title NOT IN ('Sales Manager','Sales Rep. IV');
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonlogicaloperators.
4.1.3WhereOperators
[Link],SASdoesincludeasetofspecialoperatorsthatcanbeused
[Link].
Operator
Description
CharorNum
BetweenAnd
Allowsforaninclusiverange
Both
Contains
Includesacharacterstringorsubstring
CharacterOnly
IsNullorIsMissing
Identifiesmissingvalues
Both
Like
Matchesapattern
CharacterOnly
=*
Soundslike
CharacterOnly
SameAndorAlso
AugmentsanexistingWHEREclausewithout
havetoretypetheoriginalone
Both
Forexample,herearethreewaysofspecifyingthatwewantSAStooutputallsalesassociaterecordswithsalariesthatrangefrom$28,000to$30,[Link]
inanygoodprogramminglanguage,therearealwaysmultiplewaysofdoingthesamething.
* We can use only comparison operators;
PROC PRINT DATA=[Link];
WHERE 28000<=Salary<=30000;
RUN;
*We can use a mix of comparison and logical operators;
PROC PRINT DATA=[Link];
WHERE Salary>=28000 & Salary<=30000;
RUN;
*We can use only the special WHERE operators;
PROC PRINT DATA=[Link];
WHERE Salary BETWEEN 28000 AND 30000;
RUN;
Earlier,wediscussedthat,ingeneral,[Link]
arethespecialoperators"sameand"and"also".[Link]
theexamplebelowthefirstconditionsubsetsthedatatoAustraliansalesassociatesthatmakelessthen$26,000,andthenweaddtheadditionalclausethat
theymustalsobefemale.
*Using Same and;
PROC PRINT DATA=[Link];
WHERE Country='AU' and Salary<26000;
WHERE SAME AND Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
*Using Also;
PROC PRINT DATA=[Link];
WHERE Country='AU' & Salary<26000;
WHERE ALSO Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
Nowwhilesomeofthesespecialoperatorsarefairlyselfexplanatorylike"IsNull"somemaybelessso,suchas"=*"and"Like".Theseoperatorscanbe
helpfulforidentifyingissuessuchasmisspelledinformation,incorrectlyenteredinformation,[Link],
belowisadatasetcalled"shoes_eclipse"thatincludesseveraldifferentproductnames:
Let'ssupposeweareinterestedinidentifyingproductnamesthatincludetheword"Woman's".Howcouldwedothat?The"Like"operatorcouldhelpusdo
[Link],apercent(%)signandanunderscore(_).Thepercent
[Link],[Link]
onlyinterestedinproductsthatstartwith"Woman's",thenwedon'tcarehowmanyspacescomeafter"Woman's":
PROC PRINT DATA=idre.shoes_eclipse;
VAR product_name;
WHERE product_name LIKE "Woman's %";
RUN;
IalsocouldaskSAStooutputtomeanynamethatincludes"Men's"[Link]%signsb/canyproductnamewith
"Men's"mayhavecharacterspacesbeforeandafter.
PROC PRINT DATA=idre.shoes_eclipse;
VAR product_name;
WHERE product_name LIKE "% Men's %";
RUN;
FormoreinformationcheckoutSASHelpandDocumentationonspecialWHEREoperators.
4.1.4Arithmeticoperators
Arithmeticoperators,asyoucanprobablytellfromthename,[Link]
symbolsusedinSAS.
Symbol
Description
**
*
/
+
Exponentiation
Multiplication
Division
Addition
Subtraction
[Link],ifyouarecalculatingvaluesusingavariable(s)withmissingdata,theresultingvaluewillalsobe
[Link],expressionsareevaluatedwithrespecttothetraditionalorderofoperationswithexponentiationtakingthehighestprioritylevel,then
multiplication/divisionandlastaddition/[Link],asisthecasewiththeotheroperatorswe
havediscussed,arithmeticoperatorscanbeusingonconjunctionwithbothlogicalandcomparisonoperators.
Let'[Link]"sales_subset"fromthe"sales"[Link]
containonlyobservationsfromAustralianemployeeswhosejobtitlecontainstheword"Rep".SoweareusingalogicalandspecialWHEREoperator.
Additionally,wearecreatinganewvariablecalled"Bonus"whichiscalculatedbymultiplying"Salary"by.10.
DATA sales_subset;
SET [Link];
WHERE Country='AU' & Job_Title contains 'Rep';
Bonus=Salary*.10;
RUN;
Belowweoutputthefirst20recordsofournewdataset.
Inthissecondexamplelet'suseparenthesestochangetheestimationofacompound(morethenoneoperator)[Link]
createtwonewvariablesprofit1andprofit2.
DATA profit;
SET idre.order_fact;
profit1 = total_retail_price - costPrice_per_unit * quantity;
profit2 = (total_retail_price - costPrice_per_unit) * quantity;
RUN;
Let'sseehowtheuseofparentheseshaschangedourvalues.
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonarithmeticoperators.
4.2ConditionalProcessing
4.2.1WHEREandIFstatements
[Link]
[Link],[Link],whileboth
WHEREandIFcanbeusedwithaDatastep,[Link],ifweaddanIFstatementtothePROCMEANS
[Link].
HoweverifyouuseWHEREthestatementisblue.
IfyouattempttoexecutethePROCMEANSusingtheincorrectIFstatementSASwillproduceanerrorbutSASwillexecutethecommandusingthe
WHEREstatement.
DatastepswillacceptbothWHEREandIFstatement,[Link]
[Link]$30,000andassigningthem,usingTHEN
OUTPUT,toanewdatasetcalled"highsales".
DATA highsales ;
SET [Link];
IF salary GT 30000 THEN OUTPUT highsales;
RUN;
[Link]
ofequivalentwaysofsubsettingthedata?
DATA emps;
SET [Link];
WHERE Country='AU';
Bonus=Salary*.10;
IF Bonus>=3000;
RUN;
Moreover,[Link]
[Link]
[Link]
[Link]"Bonus",SAS
wouldhavegivenusanerrorsayingthe"Bonus"[Link],[Link],
SASwillexecutetheWHEREstatementandcreate"Bonus"andthenassesswhethertheIFconditionistrue.
4.2.2IfThenstatement
[Link]
fulfillsacertaincondition.
Wewillonceagaincreateavariablecalled"Bonus",butassignthevaluesbasedonacertainsetofconditionsthataredefinedbyanemployee'sjobtitle.
DATA comp1;
SET [Link];
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
IF Job_Title='Sales Manager' THEN Bonus=1500;
IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
Youwillseeintheoutputabove,[Link]"Bonus"forallofthe
jobtitles.
ArelatedstatementtoIFTHENistheELSEstatementthatcanbeusedwhencreatingconditionalstatementsaroundmutuallyexclusivegroups.
DATA comp2;
SET [Link];
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
[Link]
true,[Link],[Link],aswas
thecasewiththefirstIFTHENexample,wewillendupwithalotofmissingvaluesusingthissyntax.
Whatifwehadascenariowherewewantedtogivealltheremainingcategories,thatdidnotfulfilltheprescribedconditions,[Link]
[Link],weaddanadditionalELSEstatementassigningallofthejobtitlesabonusvalue
of500.
DATA comp3;
SET [Link];
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
ELSE Bonus=500;
RUN;
Now,wehavecompletedataforallobservations.
[Link]
[Link]
[Link],wedeleteallobservationsassociatedwiththreespecificjobtitles.
DATA drop;
SET [Link];
IF Job_Title IN('Sales Manager', 'Senior Sales Manager', 'Chief Sales Officer') THEN DELETE;
RUN;
4.2.3UsingDo
[Link]
[Link],let'simaginethatforeachbonusvalue,Ialsowanttocreatea
variablecalledfreqthatdenoteshowmanytimesayearthesalesassociatecanreceivethebonus([Link],twiceayear).Sowemighttrythe
followingcodeusingalogicaloperator.
DATA freq1;
SET [Link];
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000 & Freq = "once a year";
ELSE Bonus=500 & Freq = "twice a year";
RUN;
Whilethissyntaxappearsreasonable,SASwillexecutethestatementandtheissueanoteinthelogthat"VariableFreqisuninitialized".WhenSASis
unabletolocateavariableinaDATAstep,SASprintsthismessage.Ifyoulookinthefreq1SASdatasetyouwillseethatSAScreatedthevariablebutsets
allofit'[Link]"Freq"willrequireaseparatestatementinsteadofjustasimple"&".Youcouldtry
this:
DATA freq2;
SET [Link];
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE Bonus=500;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Freq = "once a year";
ELSE Freq = "twice a year";
RUN;
[Link]
multiplestatements.
DATA bonus;
SET [Link];
IF Country='US' THEN DO;
Bonus=500;
Freq='Once a Year';
END;
ELSE DO;
Bonus=300;
Freq='Twice a Year';
END;
RUN;
WhilethesyntaxlookssimilartoatraditionalIFTHEN,[Link],[Link]
[Link],[Link],insteadofjustELSEwenowhaveELSEDO
[Link],SASwillissueawarninginthelogandfailtoexecutetheDatastep.
4.3SASFunctions
Functionsacceptsargumentsandthenproduceaparticularvalue(numericorcharacter)[Link]
[Link]
[Link],commondatefunctions,andsomeadditionalfunctionsusefulforspecificdata
managementtasks.
4.3.1ArithmeticFunctions
Inthefirstexample,wewillusethe"Oldbudget"datafiletocalculatethetotalandaverageamountbudgetedforbusinessoperationsoverafiveyearperiod.
DATA budget;
SET [Link];
sum1 = yr2003 + yr2004 + yr2005 + yr2006 + yr2007;
sum2 = SUM(yr2003, yr2004, yr2005, yr2006, yr2007);
sum3 = SUM( of yr2003-yr2007);
mean1 = (yr2003 + yr2004 + yr2005 + yr2006 + yr2007)/5;
mean2 = MEAN(yr2003, yr2004, yr2005, yr2006, yr2007);
mean3 = MEAN( of yr2003-yr2007);
RUN;
[Link]"sum1"usinganarithmeticoperatortoaddthe5budget
[Link],wecanusetheSUM()function,[Link]
[Link]"+",acasewithmissingvaluesonanyofthe
[Link]()function,anymissingvalueswillbetreatedasthoughtheywerezero,
[Link]
[Link],[Link]
[Link]
alsousesimilarsyntaxtodemonstratehowtoestimatetheaverageormeanbudgetvariables.
Allthevaluesproducedfor"sum1sum3"and"mean1mean3"[Link]
mathematicalfunctionsincludingabsolutevalue,maximum,minimumandsquarerootthatcanbeusedinasimilarmanner.
4.3.2DateFunctions
[Link],SAShassomebuiltinfunctionsthatcanassist
userswithmanagingthisdatatype.SASstoresdateinformationasnumericvaluesrepresentingthenumberdaysbeforeorafterJan1,[Link]
[Link]"Sales"datasetwhichincludesinformationondateofbirthandhiringdataforeachemployeeto
demonstratesomedatefunctions.
DATA comp;
SET [Link];
Hire_Month=MONTH(Hire_Date);
Birth_Day = WEEKDAY(Birth_date);
Day_Dif = DATDIF(Birth_date,Hire_Date, 'actual');
Month_dif= INTCK('years',Birth_date,Hire_Date);
Bonus_1 = INTNX('month', Hire_Date, 6);
RUN;
TheMONTHfunctionpullsthemonthfrom"Hire_date"andput'sitinavariablecalled"Hire_month".TheWEEKDAYfunctionfiguresoutwhatdayofthe
week(17)thedatewouldhavefallenonandoutputsthis.
[Link]
[Link]'actual'numberofdays,butwecouldchooseothermethodsofcalculationsuchassumingthateachmonthhas
30daysandthatayearalwayshas360days.
INTCKcountsthenumberofintervalsbetweentwodates,inourexampleweaskedSAStooutputthenumberofyearsbetweenanemployeesdataofbirth
andwhentheywerehiredwhichwewouldbeequivalenttoanemployeesageatthetimeofhire.
INTNKisusedtoestimatecalculatethevariablebonus_1.[Link]
argumentsforthisfunctionaretheunitoftime,thevariablerepresentingthestartdate/[Link],employeesare
eligible6monthsaftertheirhiredate.
Belowistheoutputofthefirst10observationsofthe"comp"dataset,[Link],SASstoresdateinformationas
[Link](discussedfurtherinthenextsection),itwilldisplayasjustanumber.
PROC PRINT DATA=comp (OBS=10);
VAR Employee_ID Hire_date Hire_Month Birth_date Birth_Day Day_dif Month_dif Bonus_1;
*FORMAT Hire_date Birth_date Bonus_1 mmddyy10.;
RUN;
MoreexampleofSASdatefunctioncanbefoundontheSASHelpandDocumentationwebsite.
4.3.3OtherFunctions
SASincludesseveralothertypesoffunctionsdesignedforspecifictypesofneedsmanyofthesefunctionsarehelpfulfordatamanagementofcharacteror
[Link],LENGTHtellstheuserthelengthofacharacterstringwhileCOMPRESSwillcompressstringvaluesandremoveunwanted
[Link],insimilarwaytoextractingdateinformationliketheMONTHfunction,SAShasseveral
functionsincludingSCANandSUBSTRthatallowsyoutoextractwordsfromaphrase.
Let'[Link]"Shoes_eclipse"[Link]
interestistoobtainthelengthofproduct_name,compressproduct_nametoremovetheblanks,andcreateavariabletheextractsthebrandname
"Eclipse"fromproduct_group.
DATA shoes;
SET idre.shoes_eclipse;
length_name = LENGTH(product_name);
comp_product = COMPRESS(product_name);
brand = SUBSTR(product_group, 1, 7);
brand2 = SCAN(product_group, 1, " ");
RUN;
[Link],forthevariablelength_name,ifyoucountedthenumberoflettersandspacesinproduct_name
[Link],thecompressedversionofproduct_namenowincludesnospaces.Third,bothSCAN
andSUBSTRfunctionsproducedthesameoutput.TheSUBSTRfunctiontakes3arguments,thenameofvariablewiththeinformationyouwanttoextract,
[Link]
characterstringoflength7startingatthefirstcharacterpositionof"productgroup"whichwouldbethe"E"[Link],thismeanswhatever
[Link]
function,whichworkverysimilartoSUBSTRexcept,insteadofspecifyingthelengthofthestring,[Link]
indicatesthatthecharacterstringofintereststartsatthefirstpositionandcontinuesuntilablank/[Link]
ofdelimitersincluding<(+&!$*)^/,%.
Inthepreviousexamples,wewereextractingvaluesfromastring,[Link].
BelowwewantSAStocombinethecharacterstringinformationinfirst_nameandlast_nameintoonefullnamevariable.Additionally,thefunctionalso
[Link],thedelimiterisjustablankwhileinthesecond
examplethedelimiterisacomma.
DATA salesquiz;
SET [Link];
sep = " ";
fullname = CATX(sep, first_name, last_name);
sep1 = ",";
fullname1 = CATX(sep1, last_name, first_name);
RUN;
Thenewvariablesaredisplayedabove.
AlistofallSASfunctions,bycategory,canbefoundhereontheSASwebsite.
Note:TheorderinwhichthevariablesarespecifiedintheCATXfunctiongovernstheorderinwhichtheywillbecombined.
4.4Sorting,MergingandAppending
4.4.1Sorting
[Link],certaintypesofdata
managementneedslikemergingdatasetsorgroupingobservationsbyaparticularcharacteristicrequiresorting.
[Link].
PROC SORT DATA=[Link] OUT=sales; *OUT= is optional;
BY Salary;
RUN;
Sortingcanalsobedoneusingmorethenonevariable.
PROC SORT DATA=[Link] OUT=sales;
BY Salary Country;
RUN;
Asyoucansee,thedataissortedinascendingorderby"Salary"firstandthenwhentherearetiedsalariesfromdifferentcountries,AUcomesbeforeUS
[Link]/oraddingintheDESCENDINGoption,whichreversesthesort
orderforthevariablethatimmediatelyfollowsit.
PROC SORT DATA=[Link] OUT=sales;
BY DESCENDING Salary DESCENDING Country;
RUN;
4.4.2Merging
[Link](OnetoOne)
ormultipleobservations(OnetoMany)[Link],thedatasetstobemergedmustbesortedbythe
samevariable(s).Intheexamplebelow,wewillmergeadatasetthathasemployeepayrollinformationwithaseconddatasetwithemployeeaddresses.
Sinceanemployee'sIDnumber(employee_id)isauniqueidentifierofeachobservation,wewillusethisvariabletomatchobservations.
First,weneedsorteachdatasetbyemployee_id.
PROC SORT DATA=idre.employee_payroll OUT=payroll;
BY Employee_ID;
RUN;
PROC SORT DATA=idre.employee_addresses OUT=addresses;
BY Employee_ID;
RUN;
[Link],Youwillnoticethatdatasets"addresses"and"payroll"donotshareanyofthesamevariablesexceptEmployee_ID.In
general,[Link]
[Link],[Link],Employee_IDisuniqueineachdataset,so
thiswillbeaOnetoOnemerge.
MergingisdoneinaDatastepsimilartowhatwehavebeenexecuting,[Link],
[Link]
forsorting.
DATA payadd;
MERGE payroll addresses;
BY Employee_ID;
RUN;
[Link],Employee_Nameisfromthe"addresses"dataandBirth_dateandSalaryare
fromthe"payroll"data.
Nowlet'stakealookatanexampleofaOnetoManymerge.
[Link].
BecausemorethenoneitemcanbeassociatedwithaparticularOrder_ID,[Link],wewillneedtoconductaonetomany
mergewhereeachrowinour"orders"datacouldbemergedwithmultiplerowsinthe"order_item"[Link],wewillbeginbysortingbothsetsofdataby
Order_ID.
PROC SORT DATA=[Link] OUT= orders;
BY Order_id;
RUN;
PROC SORT DATA=idre.order_item OUT= order_item;
BY Order_id;
RUN;
[Link]
presentinthefinalmergeddataset.
DATA allorders;
MERGE orders order_item;
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
RUN;
[Link]
variablesthathavemissinginformationwerebothfromthe"orders"[Link]
lookatthe"orders"datawewouldseethatthereisnoinformationfortheorderidentifier"1243854878"butthereisinformationin"order_item",thuswhen
youmergethedatasetstogetherallthevariablesfrom"orders"[Link]
[Link],[Link],youcanchoosetocontroltheobservationsoutputtothe
[Link](s)contributedtoformingthe
observationinthefinalmergedataset.Itisatemporaryvariableusedinthemergingprocessthatisgivena0valueifdidnotprovideinformationora1ifit
[Link]'stakealookathowwecould
applythisoptioninourpreviousmerge.
DATA allorders2;
MERGE orders (in=a)
order_item (in=b);
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
IF a;
RUN;
UsingtheINoptionwithanIFstatementselectsobservationstobematchedbyorder_IDthatarepresentin"orders".Ifyouhaveavaluefororder_IDthatis
in"order_item"butnot"orders"thenitwillnotbeusedtoconstructobservationsforthe"allorders2"[Link]:UsingIF=aisequivalenttosayingIFa=1.
Thus,youwillnotendupwithanymissingvalues.
Now,[Link]
recordstheresultisasomewhatunpredictableandoftenundesirableassortingofobservations.
4.4.3Appending
[Link]
[Link].
Wewillappendthreedatasetsthatincludeinformationonordersfrom3consecutivemonths(JulySeptember)[Link]
recordsfromeachofthedatasetstobeappended.
[Link]
numberofdatasetsdoesnotmatter.
DATA mnth7_8_9_2011 ;
SET idre.mnth7_2011 idre.mnth8_2011 idre.mnth9_2011;
RUN;
Aportionofthenewlyappendeddatasetisbelow.
Nowyoucanseethatall3datasetshasbeenappendedor"stacked"[Link]
[Link]?
Takealookbackatour"shoe"[Link],[Link]
samevariablesexcepttwo,product_idandsupplier_name.
Whatwillhappenwhenweattempttoappendthedata?
DATA shoes;
SET idre.shoes_eclipse idre.shoes_tracker;
RUN;
[Link],inthenew"shoes"datawecreated,alltherecordsfromtheEclipsedatasetwillbemissingonthe
variablesthatwereonlyintheTrackerdataset.
5.0ModifyingSASOutput
5.1TitlesandFootnotes
Asyouhaveprobablyalreadynoticed,SASprovidesalotofoutputfrommanyofit'[Link]
[Link]
[Link].
Wheneveryouarepresentingtablesofinformation,[Link]
[Link],itisalsopossibletoaddmultipletitlestooutputinSASaswellasfootnotesbyjustaddinganumericsuffixtothestatementindicatingthe
desiredordering.SASallowsforupto10differenttitlesand/orfootnotes.
TITLE1 'Orion Star Sales Staff';
TITLE2 'Salary Report';
FOOTNOTE1 'Confidential';
PROC PRINT DATA=[Link] (OBS=5);
VAR Employee_ID
Last_Name Salary;
RUN;
5.2LabelOptions
Additionally,[Link]
[Link],butifyouhavetolabel10variables,available
[Link]
controlthedisplayofthetitlesothatinsteadofthelabelbeingoneline,youcansplititintotwolines.
PROC PRINT DATA=[Link] (OBS=5) SPLIT='*';
VAR Employee_ID Last_Name Salary;
LABEL Employee_ID = 'Sales ID'
Last_Name = 'Last*Name'
Salary = 'Annual*Salary';
RUN;
[Link]
[Link]:
TITLE;
FOOTNOTE;
5.3Formats
Beyondjustlabelingvariables,[Link].
FormattingvalueschangestheappearanceofthosevaluesinoutputbuttheunderlyingvaluesdoesNOTchange.
[Link]
[Link],wewillfocusonhowtocreateandapplyuserdefinedformats.
InSAS,[Link],takealookatthesyntaxbelow:
PROC FORMAT;
VALUE $ctryfmt 'AU'='Australia'
'US'='United States'
other ='Miscoded';
VALUE tiers0-49999='Tier 1'
50000-99999='Tier 2'
100000-250000='Tier 3';
RUN;
[Link]"$"infrontof
[Link]
[Link]"Miscoded'.
Fornumericformats,[Link].
InbothDatastepsandProcsteps,SASdistinguishesformatsfromvariablesbyendingtheminaperiodwhichthenturnsthetextgreen.
PROC PRINT DATA=[Link] (OBS=5);
VAR Employee_ID Salary Country Birth_Date Hire_Date;
FORMAT Salary tiers. Birth_Date Hire_Date monyy7. Country $ctryfmt.;
RUN;
[Link]
certainproceduresinSAS,[Link],
thenyoucanusethesameformatstatementinaDataStep.
5.4OutputDeliverySystem(ODS)Basics
BesidescustomizingtheSASdefaultoutput,youmaywanttooutputresultstodifferentfiletypes.BydefaultSAS9.4outputresultsasHTMLandthisis
whatyouseeinthe"ResultsViewer"[Link],youwillneedtousetheOutputDeliverySystem(ODS)statement.
Thiswillallowforoutputinseveraldifferentformatsincludinglisting/text,rtf,[Link].
[Link](s)is
executed,[Link]:
ODS PDF FILE="&path\[Link]";
ODS RTF FILE="&path\[Link]";
PROC FREQ DATA=<data>;
TABLES <variable>;
RUN;
ODS PDF CLOSE;
ODS RTF CLOSE;
Inthiscase,[Link]
[Link].
[Link]
Resultstab.
ODS LISTING;
PROC FREQ DATA=[Link];
TABLES gender;
RUN;
ODS LISTING CLOSE;
Asauser,youcanalsocustomizethestyleorlookoftheoutputwhenselectingeitherahtml,[Link]
[Link]:
ODS HTML FILE="C:\[Link]" STYLE=sasweb;
PROC FREQ DATA=[Link];
TABLES gender;
RUN;
ODS HTML CLOSE;
ODS PDF FILE="C:\[Link]" STYLE=printer; /*Default*/
ODS PDF FILE="C:\[Link]" STYLE=journal;
PROC FREQ DATA=[Link];
TABLES gender;
RUN;
ODS PDF CLOSE;
[Link],bemindfulthatthiswillalsoclosethehtmldefault,soyouwillneedto
[Link],SASwillissuethewarning"Nooutput
destinationactive".
ODS _ALL_ CLOSE;
ODS HTML;
6.0SpecialIssues
6.1DealingwithDuplicates
[Link].
[Link]"nonsales"datafile,weshouldhave235uniqueemployee
[Link]=[Link]
numberindescendingorder.
PROC FREQ DATA=[Link] ORDER=FREQ;
TABLES Employee_ID;
RUN;
AboveyoucanseethattheemployeeID#120108hastworecordsassociatedwithit,[Link]
NLEVELS,whichdisplaysthenumberofdistinctvaluesforeachvariable.
PROC FREQ DATA=[Link] NLEVELS;
TABLES Employee_ID /NOPRINT;
RUN;
Thereare235uniqueemployeesinthe"nonsales"databutonly234uniquelevels,meaningthatoneemployeeID#isduplicated.
OncethepresenceofduplicateIDnumbershasbeenconfirmed,youwillmostlikelywanttoexaminethemtodetermineiftheyareindeedduplicaterecords
[Link],whatdoyoudowhenseveral
ID'sorrecordsareduplicated?Let'sseparatetherecordswithuniqueID'sfromtheduplicatesusinganIFstatement.
PROC SORT DATA=[Link] OUT=ids2;
BY employee_id;
RUN;
DATA dupes nodupes;
SET ids2;
BY employee_id;
IF NOT (FIRST.employee_id and LAST.employee_id) THEN OUTPUT dupes;
ELSE OUTPUT nodupes;
RUN;
[Link]
[Link],[Link]'swherethefirstandlast
recordsarenotthesame,toadatasetcalled"dupes",andalltheotheruniquerecordsareputindatasetcalled"nodupes".
"Dupes"orduplicatedemployeeIDnumbers:
"NoDupes"oruniqueemployeeIDnumbers.
6.2IdentifyingOutliers
[Link].
Bydefault,[Link]'sexamineoutliersforproductpricesinthe"price_new"
dataset.
PROC UNIVARIATE DATA=idre.price_new;
VAR unit_cost_price;
RUN;
YoucanoverridethisdefaultbyspecifyingtheoptionNEXTROBS=[Link]
[Link],SASalsoprovidesanobservationorrow
[Link],[Link]
[Link]'stryaddingtheproductidentifierProduct_IDtoeachofourextremevalues.
PROC UNIVARIATE DATA=idre.price_new NEXTROBS=3;
VAR unit_cost_price;
ID Product_ID;
RUN;
Noweachextremevalueisassociatedwithit'sIDnumber.
7.0Wrappingthingsup
Aswestatedinthebeginning,SASisaveryflexibleprogramswithgreatfeaturesfordatamanagement.
Thisseminaronlyscratchesthesurfaceondescribingalloftheprogrammingoptionsavailabletousers.
Formoreinformationonthetopicsdiscussedherepleaseexploreourwebsite.
Additionally,SAShasahostofcoursesdesignedtoimproveyourprogrammingskillsaimedatusersofalllevels.
Howtocitethispage
Reportanerroronthispageorleaveacomment
Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.
IDRE RESEARCH TECHNOLOGY
GROUP
High Performance
Computing
Statistical Computing
GIS and Visualization
ABOUT
2016 UC Regents
CONTACT
NEWS
Terms of Use & Privacy Policy
HighPerformanceComputing
GIS
StatisticalComputing
Hoffman2Cluster
Mapshare
Classes
Hoffman2AccountApplication
Visualization
Conferences
Hoffman2UsageStatistics
3DModeling
ReadingMaterials
UCGridPortal
TechnologySandbox
IDREListserv
UCLAGridPortal
TechSandboxAccess
IDREResources
SharedCluster&Storage
DataCenters
SocialSciencesDataArchive
AboutIDRE
EVENTS
OUR EXPERTS