S for SAS-Post your comments at http://s4sas.blogspot.

com

1

S for SAS-Post your comments at http://s4sas.blogspot.com

2

Preface
SAS is one language everyone understands in Consumer Finance Analytics! That explains why an Analyst should learn it for an effective communication. This guide is an effort to equip a beginner consumer finance analyst with basic SAS skills, in a week’s time. This guide attempts to: 1. Simplify SAS for the beginners, trimming the possibilities down to a bare minimum. 2. Give an introduction to the possibilities SAS offers for data extraction, analysis and reporting. 3. Customize the content to Consumer Finance environment and the way SAS being used there. 4. Provide more examples and step-by-step instruction to the readers to practice SAS programs. Target audience is expected to have basic computer skills like working with a spreadsheet or word processor. Also the user should know how to edit a program in a Windows based SAS System. No statistics or programming background is expected. Practice problems at the end of the chapters are collected from SAS Institute website and are tested in Windows environment. Readers are encouraged to tryout each of them by reproducing them in SAS System. Your suggestions to improve this working guide are most welcome! Please go to the discussion forum http://s4sas.blogspot.com to post a comment.

JEOMOAN KURIAN Jeo.kurian@gmail.com

S for SAS-Post your comments at http://s4sas.blogspot.com

3

Contents
Preface ............................................................................................................. 2 Consumer Finance Analytics ......................................................................... 5 Scope of Analytics in Consumer Finance ................................................. 6 Introduction to SAS System ........................................................................... 8 SAS Language in Nutshell ............................................................................. 9 First Steps with Data and SAS..................................................................... 10 Data Value, Variable, Observations, and Dataset ................................. 10 Rules for SAS names................................................................................ 11 Rules for SAS statements ........................................................................ 12 A Simple Program Explained ....................................................................... 13 Getting Data In .............................................................................................. 17 Reading Data Instreams and External Files using INPUT .................... 17 Reading data using PROC IMPORT ....................................................... 21 Data Extraction using PROC SQL........................................................... 22 Reading EBCDIC files using SAS............................................................ 23 Working with Datasets .................................................................................. 25 SAS variables ............................................................................................ 25 Creating variables with INPUT Statement .............................................. 27 Specifying a New Variable in a LENGTH Statement ............................. 28 Creating variables through PROC SQL and PROC IMPORT............... 29 Array Variables .......................................................................................... 30 Variable LABELs ....................................................................................... 31 Variable FORMATs ................................................................................... 32 SAS Functions in DATA steps ................................................................. 34 MERGE –Combining datasets ................................................................. 36 Conditional Processing with WHERE, IF-ELSE, DO-END.................... 37 Procedures for Data Insights ....................................................................... 42 PROC DATASETS .................................................................................... 42 PROC PRINT............................................................................................. 43 PROC SORT.............................................................................................. 44 PROC TRANSPOSE ................................................................................ 46 PROC DOWNLOAD.................................................................................. 48 PROC FREQ.............................................................................................. 48 PROC MEANS........................................................................................... 51 PROC GPLOT ........................................................................................... 52 SAS Powered Reporting............................................................................... 56 PROC EXPORT ........................................................................................ 56 ODS HTML................................................................................................. 56 Dynamic Data Exchange (DDE) .............................................................. 58 SAS Macro Processing................................................................................. 61 SAS Macro Variables................................................................................ 61 Macro Programs ........................................................................................ 63 PROC SQL and SAS .................................................................................... 66 Data extraction with PROC SQL.............................................................. 66 SAS Data steps with PROC SQL ............................................................ 67 PROC SQL and SELECT statement ....................................................... 68 Data retrieval Methods using SELECT ................................................... 69 Unix for SAS Analysts................................................................................... 75

........................blogspot.................................. 85 Convert missing values to zero and values of zero to missing.......................................................... 90 Convert a SAS date to a character variable ........................................................................................ 98 PROC SUMMARY......................................................................... 80 Reading data using simple list input............. 93 Using _TEMPORARY_ arrays for Missing value Treatment............................................................csv extension .. 95 BASIC PROC SQL Exercises ........................................................................S for SAS-Post your comments at http://s4sas..................... 75 A List of Useful UNIX Commands.. 84 Merging and creation of subsets based on Origin ............................................................. 93 Using arrays and DO loop in Data Step................................................................................................................ 100 PROC GPLOT ....................... 91 DO LOOP block.............................................................................................................................. 86 Create and apply user-defined formats....................................................................................................... 83 Concatenating data sets Using SET ........ 99 Comparison of MEANS and SUMMARY Output.................................................... 97 PROC MEANS.......................... 79 Appendix-I Practice Programs ............................... 95 MERGING using PROC SQL...................... 76 Sum-up With OPTIONS... 94 A Simple SAS Macro .......Options available............... 101 Quick Index ......... 86 Convert values from character to numeric......................... 81 Reading multiple records to create one observation ..................... 80 Reading data using named input ........................ 88 Use of MDY function ........... and days between two dates.......................................................................................................... TAB or delimited file .......... 92 INDEX Function for String Search....................... 82 Reading a comma delimited file with a ..................................... 83 Creating an external file with column-aligned data .............................................................................................. 83 Creating a delimited file using a PUT statement ............................................................................................................................................................................................................................ 87 Working with Dates in the SAS System .......................... 90 Calculate number of years....................... 102 ................................ 75 SASWORK in UNIX .................................................................... 80 Reading data using formatted input.... 91 Using the SCAN function........................................................................... 90 Determine the week number of the year....................................................... 85 Convert selected numeric values from zero to missing........................................................................... 81 Reading a TAB delimited file............................................. 82 Use of PROC IMPORT to read a CSV............................................................... 96 PROC FREQ. 87 Convert values from numeric to character............................. 80 Reading data using column input ........................ 84 A Simple MERGE................................................................ 81 Reading comma delimited data with modified list input..................... months.....................................................com 4 Unix Server Spaces or Remote Storage.............................................................................

has their major share of income coming through PLCC cards (Private Label Credit Cards).S for SAS-Post your comments at http://s4sas. Auto loans and mortgages examples. Sam’s Club and labeled as their own store cards (hence the name Private Label).dominated by CITI and GE Money. The major consumer finance lenders in the US are CITIFinancial.blogspot. However. Sears. Unsecured Loans: The loans that are not backed by any security. . interest income from the customer for the credit that is revolved and other fee income like late fees. the customers often get special discounts and reward points when they shop in these stores with the retailer specific cards. However in today’s competitive world.that is lending to people with less than perfect credit history. Consumer finance covers a variety of products under secured and unsecured loans: Secured Loans: The loans that are guaranteed with some asset like a car or a house so that repossession of the asset is possible in case the consumer defaults the payment. HSBC and Wells Fargo. homes and such high value items. GE Money. it’s not hard to find that some finance companies pay a percentage of the business to the retailer or retailer is absorbing some loss from the finance companies(This mostly depend on the retailer’s standing in the Market). the term "consumer finance" often refers to sub prime lending. The finance companies get a discount income from the retailer (12%). These cards are typically issued for retail chains like Wal-Mart. Examples are Personal Loans and Credit cards. Consumer finance companies also offer credit through dealers and brokers (refereed as dealer financing or retail sales financing) for financing cars. The sub prime lending . All banks lend to the prime segments but the sub prime lending is usually done by specialized institutions. The retailers benefit from the increased sales as the cardholders shop in where they can get credit and special rewards. Consumer finance companies are the risk takers and any write off or defaults on the payments are usually absorbed by them. Availability of credit to the sub-prime segment is at a rate higher than the people with relatively better credit history. GAP. Home Depot. There is no repossession exists for this kind of loans but the lender can take a legal action in case of default.Apart from its sub-prime nature. boats. The financial institutions like GE Money manage the credit program for these retailers by funding the retailer next day of purchase and collecting it back from the customers.com 5 Consumer Finance Analytics The term “consumer finance” refers to any form of financing to consumers. over limit charges and insurance charges. in the United States. However the interest rate on these cards is often higher than the bankcard rates because of the average high risk profile of the customers.

Scoring and Modeling: Another major area of SAS application is scoring and modeling. SAS is used in meeting all or some of the above requirements. In such a scenario. SAS is used widely for all these purposes due to its flexibility to work with multiple platforms and databases and produce reports in various formats like excel html. managers depend on the analytics department to control the risk-returns statistically. SAS based scorecard and segmentation techniques are used to mine the address lists and demographic information available in the credit bureaus to arrive at this prospect lists. This requires data warehousing.S for SAS-Post your comments at http://s4sas. Often the pre approved (pre screened) credit cards are mailed to prospects. scorecard development. it important to monitor them for potential risk. tracking dashboards and comprehensive reporting systems. Let us look at some of these specific uses in detail. The company should have a robust MIS and reporting system for the managers to know how each segment of the customers is doing and take actions if things are not going in the right direction. acquisition and strategy tracking dashboards are developed to monitor the acquisition process and also to compare various strategies in a champion-challenger framework. Such scorecards predict the probability of a person responding to an offer mail or probability of a default or willingness of a customer to repay or rank orders any such desirability. to come out with a decision. Further. Logistics regression based scorecards are developed at various lifecycles of the products. average risk profile of the people. the scorecards developed in using SAS score the applications. credit line management. access or SAS itself. More than 60% of resources in any analytics departments spend time to develop and produce these important reports. Operations use SAS to analyze the call volume in their contact centers and how to optimize the system to . sales promotion. is relatively high. Segmentation tools in SAS are favorites of marketing managers as they can zero down to the prospects they want to do the campaign for activation or sales promotion. account activation. In cases of walk-in applications. A typical loss rates of PLCC cards are 7% of receivables but for a bankcard its just 5.com 6 As we have seen earlier.blogspot. If implemented. often online. Scope of Analytics in Consumer Finance Consumer finance is more like a volume based business where million of loan applications and accounts are handled with thin profit margins.This is the first place where analytics is used to target and acquire the right people. cross-selling and collections are some of the activities analytically driven in the consumer finance. a consumer finance company deal with.5%. Once the accounts are acquired. data segmentation and pattern analysis. Acquisition. Account Management –Authorization. This is one reason the interest charged on the customer is high for sub-prime segment.

marketing. all risk. some of the SAS tools/modules/utilities typically in consumer finance 7 environment are data extraction. In short. cluster and principal component analysis.com improve the collections or cross sell efforts. manipulation and formatting features of Base SAS . . SAS has built in modules and function to handle data from various sources and convert them into a format required for the analytics purposes. logistics regression .blogspot. transformation and Loading) process here requires working with internal data warehouse and other department specific files/databases. To summarize .S for SAS-Post your comments at http://s4sas. SQL procedures and reporting using ODS/DDE with excel interface. SAS data marts are quite popular among analytics users to subset data from these huge warehouses and make it time series based or department specific. Though most of the enterprise data warehouses now use Oracle for storing data . Data Management: SAS is also used for data warehousing. collections and operations departments use SAS to analyze data and reporting. ETL (extraction.

PROCs (PROCedures) . most of the interaction with SAS system in analytics is done through writing SAS programs. or PROCs. and mining Report writing and graphics Statistical Analysis and Operations Research Forecasting and Decision Support Data Warehousing (Extract. and the data are assigned to the Variables. For example: PROC PRINT: Prints the contents of a data set and create reports. Transform. Variables corresponding to the various elements of the data set are defined. originally Statistical Analysis System.the SAS language is organized into a series of procedures. platforms like Unix and Mainframe do not provide a menu driven interface for SAS. PROC REG: Performs regression analysis using the method of least squares. or they may be read in from a file. retrieval. Load) Though SAS system provides a menu driven interface. each of which is dedicated to a particular form of data manipulation or statistical analysis to be performed on data sets created in the DATA step.com 8 Introduction to SAS System The SAS System. management. . is an integrated system of software products provided by the SAS Institute that enables a user to perform: • • • • • Data entry.S for SAS-Post your comments at http://s4sas. PROC GPLOT: Constructs plots of the data as specified by the user. PROC TTEST: Computes the 2-sample t-test for comparing the means of 2 treatments. Data may be input manually in the body of the program. A SAS program consists of one or more DATA steps to get the data into a format that SAS can understand and one or more calls to PROCs to perform various analyses on the data. Also. standard deviations and other summary Statistics for some or all of the variables in a data set.the part of the program in which a structure for the data to be analyzed is created. PROC MEANS: Computes means. Additionally data may be stored in a data warehouse (for example Consumer Data Warehouse in Stoner server).blogspot. A SAS program is composed of two fundamental components: DATA step(s). SAS programs provide high level of flexibility to the user. PROC FREQ: Produces frequency and cross tabulation tables on the variables specified.

DATA. A SAS program consists of a series of DATA. SORT. data transformation and PROCedure statements. ELSE. INPUT. IF / THEN. PROC GPLOT.blogspot. An entry level SAS user should at least know how to use the following SAS statements to have a control over the language. . clean. CARDS. SAS programs are written using SAS language to manipulate. PROC SQL.com 9 SAS Language in Nutshell To use SAS it is necessary that you are acquainted with the scripting known as SAS language. LABEL. FORMAT. describe and to do data analysis. PROC MEANS. SET. TITLE. WHERE. you get more familiar with these statements/procedures and various features and options that are available with each one of them. In the chapters that follow. MERGE. PROC PRINT.S for SAS-Post your comments at http://s4sas. PROC FREQ.

Account # Open Date Status Code Credit Limit 1234670 11-Sep-04 Z 2000 1234671 12-Sep-04 3000 1234672 13-Sep-04 Z 2500 1234673 14-Sep-04 T 3200 1234674 15-Sep-04 8000 1234675 16-Sep-04 D 2000 1234676 17-Sep-04 4000 1234677 18-Sep-04 S 6000 1234678 19-Sep-04 T 8000 1234679 20-Sep-04 T 2000 Data Value. Account Open Date. and data set. Observations. It has all the account numbers of the sample we have for the credit card holders .‘D’. The DATA VALUES in the field ‘status code’ are ‘Z’. Each column of data values is a VARIABLE. the first column in our data set is reserved for the VARIABLE we'll call Account #. For example. Variable. Most of the times. VARIABLE A set of data values that describes a given attribute makes up a VARIABLE. Account Status Code and current Credit Limit. Let us start with a generic example to understand how SAS reads and understands data. the DATA VALUES are 2000. ‘S’ and ‘T’. Information includes account number. and Dataset DATA VALUE Data value is the basic unit of information.S for SAS-Post your comments at http://s4sas. But many a times we need to create our own datasets for testing certain procedures or functions so this learning comes handy. In the field containing information about the account holder’s credit limit. the data is extracted from a data warehouse. variable. Now let us look at some data features. Below is the table that provides some information about 10 accounts holders of a credit card company. 3000 etc.com 10 First Steps with Data and SAS Objectives of this chapter is to: • • Understand the terms: data value. observation.blogspot. Understand the rules for writing SAS statements and for naming variables and data sets.

.) for missing data. A SAS DATA SET is the special way that SAS organizes and stores the data. or a record and so on. The row below represents all the data values associated with OBSERVATION #1. numeric digits (0. . 1. In SAS. colons and percent signs. others have a maximum length of 8. procedures. Open date is a field that deserves special mention. OBSERVATION All the data values associated with a case. Subsequent characters can be letters. dollar signs. Many SAS names can be 32 characters long. 3. . Values of numeric variables can only be numbers or a period (. 1. There are certain procedures where the outcome of their execution results in creation of one or many datasets. an individual. The first character must be a letter (A. . Account # Open Date Status Code Credit Limit 1234670 11-Sep-04 Z 2000 DATA SET A DATA SET is a collection of data values usually arranged in a rectangular table (or matrix). Each row of the data table (or Matrix) represents one OBSERVATION.. You can use upper or lowercase letters. C.S for SAS-Post your comments at http://s4sas.blogspot. 2. make up an OBSERVATION. options. as well as numeric digits.numeric and character. 9). .. SAS data sets. Character variables can be made up of letters 11 and special characters such as plus signs. if we convert our sample into a SAS dataset we will have a data set with 4 columns (fields) and 10 rows with 3 numeric fields and 1 character field. The DATA step creates the SAS data set and the PROC steps are instructions indicating how the SAS data set is to be manipulated or analyzed. formats. For example.com SAS variables are of 2 types . B. . and statement labels. a year. a single entity. dates are stored as numeric but displayed in various format using SAS formats. In the sample data above. SAS processes names as uppercase regardless of how you type them. a subject. account number and credit limit are numeric variables and status code is a character variable. Z) or underscore (_). Rules for SAS names Among the kinds of SAS names that appear in SAS statements are variables names. or underscores. .

6. 4. 12 Special characters.S for SAS-Post your comments at http://s4sas.com 4. the blanks are not necessary. . are not allowed. If the items are special characters such as '='. SAS reserves a few names for automatic variables and variable lists. In file reference. Rules for SAS statements 1. you can use the dollar sign ($). pound sign (#). Blanks cannot appear in SAS names. except for the underscore. For example. 3. Some SAS statements may consist of more than one line of commands. SAS statements must end with a semicolon (. 5. and at sign (@). A SAS statement may continue over more than one line. 5. 2. '+'. '$'.). One or more blanks should be placed between items in SAS statements. _N_ and _ERROR_ .blogspot. SAS statements may begin in any column of the line.

blogspot. run.S for SAS-Post your comments at http://s4sas. 7. run. data rloc. Learn to use the CARDS statement 4. Learn to use the DATA statement 2. INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 25-29. CARDS. and INPUT statements and also demonstrates the use of two procedures viz.sample_accounts. Data sample_accounts. Proc Print data = loc. Title " Account Sample". Libname loc ‘c:\mydata’. 1234670 11-Sep-04 Z 2000 1234671 12-Sep-04 3000 1234672 13-Sep-04 Z 2500 1234673 14-Sep-04 T 3200 1234674 15-Sep-04 8000 1234675 16-Sep-04 D 2000 1234676 17-Sep-04 4000 1234677 18-Sep-04 S 6000 1234678 19-Sep-04 T 8000 1234679 20-Sep-04 T 2000 .) 5. Learn RUN statement. Learn how to include TITLES on your output 6.sample_accouts. This program demonstrates the use of DATA. Two of them are numeric and rests of the columns are characters. This program creates a SAS dataset names Sample accounts with 4 columns. Learn how to use the semicolon (. run. CARDS. PROC PRINT and PROC FREQ. Learn how to run a SAS program Let us start with a simple SAS program.sample_accounts . Each and every component in the program is explained in detail below. Learn to use two Procedures – PRINT and FREQ 8. run. tables statuscode. Proc Freq data =loc. /* stores data permanently*/ set sample_accounts.com 13 A Simple Program Explained Objective of this chapter: 1. How to create a permanent dataset using LIBNAME 9. . Learn to use the INPUT statement 3.

Result: Input data are defined for SAS The INPUT statement specifies the names and order of the variables in your data. 14 Libname is used to declare a data library. Open date is read as character variable to simplify the example. DATA Statement Use: Names the SAS data set Syntax: DATA SOMENAME. INPUT Statement Use: Defines names and order of variables to SAS Syntax: INPUT variable variable_type column(s). These procedures are explained in details later in this book. Data sample_accounts. In this program. Although there are three types of INPUT statements. which can be mixed. data is stored in ‘C:\mydata’ using this library reference. LIBNAME Statement Use: define a SAS library name Syntax: Libname xx <folder reference>.S for SAS-Post your comments at http://s4sas. input and proc statements are case insensitive.blogspot.com Note: Data. Proc Print and Proc Freq are used to show how these two procedures used to print data. Sas data can be stored permanently by saving it to a library. . The DATA statement names the data set you are creating. INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 25-29. Result: A temporary SAS data set named sample_accounts is created The DATA statement signals the beginning of a DATA step. the beginning SAS user should only be concerned with learning how to use the Column Input style. Libname loc ‘c:\mydata’. The general form of the SAS DATA statement is: DATA SOMENAME. The name should be 1-32 characters and must begin with a letter or underscore.

(DATA.blogspot.txt'. INPUT Account 1-7 CARDS. Result: Data can be processed for the SAS data set The CARDS statement signals that the data will follow next and immediately precedes your input data lines. the INPUT statement should indicate how many lines it takes to record all of the information for each case or observation. Result: SAS is signaled that the statement is complete .com The INPUT statement should indicate in which columns each variable might be found. CARDS Statement Use: Signals that input data will follow Syntax: CARDS. OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 25-29. Proc Print data = sample_accouts. 1234670 11-Sep-04 Z 2000 1234671 12-Sep-04 3000 1234672 13-Sep-04 Z 2500 1234673 14-Sep-04 T 3200 1234674 15-Sep-04 8000 1234675 16-Sep-04 D 2000 1234676 17-Sep-04 4000 1234677 18-Sep-04 S 6000 1234678 19-Sep-04 T 8000 1234679 20-Sep-04 T 2000 . In addition. (Example: INFILE 'c:\accounts. The general form of the CARDS statement is: INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 25-29.S for SAS-Post your comments at http://s4sas. CARDS. run. instead of the CARDS. SEMICOLON Use: Signals the end of any SAS statement Syntax:A DATA Step or PROCedure statement. 15 Note: If the data is contained in an external file. The variables OperDate and StatusCode are character variable as indicated by the dollar sign ($) after the variable name The other variables are numeric. run.) DATA sample_accounts. Title " Account Sample". The general form of the SAS INPUT statement is: INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 25-29.). you will use an INFILE statement to specify where that file resides.

Once the program is executed check the log for any errors or warning. PROC PRINT and PROC FREQ: These are two common procedures used to print the content of the dataset created and to produce a frequency table on a status code column. Program Editor is used to compose the programs and to execute it. Logs will normally give a clear idea whether the syntax was used correctly or the program was successfully run. which appears at the top of the output page.com The semicolon (.) is used as a delimiter to indicate the end of SAS statements. In Unix and Mainframe the programs are composed and executed differently. Please refer a relevant manual for the same. Log window displays the log of execution and Output window displays the output of the program. . Result: The statements and procedures specified in the SAS program blocks are executed. If we are using remote sign-on from windows. Many times we use remote ‘signon’ to log on to a Unix server and in that case the program is running is a remote server. How to Run a SAS Program 16 In a windows environment there are three different windows that help in successful program execution. TITLE Statement Use: Puts TITLES on your output Syntax: TITLE 'some title'. Result: A TITLE is added at the top of each page of the output printed. Good news is that SAS procedures and statements that are used in these platforms are same. RUN Statement Use: Instruct SAS to execute the SAS program Syntax: RUN. log and output are automatically downloaded to the local desktop. The TITLE statement assigns a title.S for SAS-Post your comments at http://s4sas. Title " Account Sample".blogspot.

Usually. Spaces are usually used to "delimit" (or separate) free formatted data. For example: DATA sample_accounts. because there are no delimiters (such as spaces. Data files from credit bureaus and other external data vendors are sent to us as large data files with various formats so its important to understand how to read them. 1234670 11-Sep-04 ZX 2000 1234671 12-Sep-04 N 3000 1234672 13-Sep-04 Z 2500 12346730 14-Sep-04 TN 3200 1234674 15-Sep-04 X 8000 1234675 16-Sep-04 D 2000 12346767 17-Sep-04 N 4000 1234677 18-Sep-04 S 6000 1234678 19-Sep-04 TS 8000 1234679 20-Sep-04 T 2000 . comma separated files. INPUT Account OpenDate $ StatusCode $ CreditLimit . To learn how to use INPUT. SAS understands only SAS datasets and formats so its important to convert all kinds of data into a form which SAS understands. RUN.that is. or tabs) to separate fixed formatted data. The data so converted are called SAS datasets.blogspot. PROC IMPORT and PROC SQL for reading data. commas. But its not unusual that we will have to read data from a variety of sources like spreadsheets.com 17 Getting Data In Objective of this Chapter • • To learn data sources and methods to read data into SAS. This approach is good for relatively small datasets. This chapter explains various ways SAS reads data to make SAS DATASETS. . Reading fixed formatted data instream(COLUMN INPUT) Fixed formatted data can also be read in-stream. INFILE. MS access tables or manually entering through program editor. Reading Data Instreams and External Files using INPUT Reading free formatted data instream(LIST INPUT) One of the most common ways to read data into SAS is by reading the data instream in a data step . In a Consumer Finance environment. by typing the data directly into the syntax of your SAS program. data warehouses are commonly used to store data. text files.S for SAS-Post your comments at http://s4sas. CARDS.

txt". INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 25-29. RUN. if we rearrange the card accounts data from above. This also requires the data to be in the same columns for each case. Reading fixed formatted data from an external file Suppose you are working in a windows environment and you have a text file called ‘sample_accounts. INFILE "C:\Mydata\sample_accounts.txt’ look like: 1234670 1234671 1234672 1234673 1234674 1234675 1234676 1234677 1234678 1234679 11-Sep-04 12-Sep-04 13-Sep-04 14-Sep-04 15-Sep-04 16-Sep-04 17-Sep-04 18-Sep-04 19-Sep-04 20-Sep-04 Z Z T D S T T 2000 3000 2500 3200 8000 2000 4000 6000 8000 2000. 1234670 11-Sep-04 Z 2000 1234671 12-Sep-04 3000 1234672 13-Sep-04 Z 2500 1234673 14-Sep-04 T 3200 1234674 15-Sep-04 8000 1234675 16-Sep-04 D 2000 1234676 17-Sep-04 4000 1234677 18-Sep-04 S 6000 1234678 19-Sep-04 T 8000 1234679 20-Sep-04 T 2000.txt". Here is what the content of the ‘C:\Mydata\sample_accounts.blogspot. This file can be read into SAS by using an INFILE statement in DATA step Data sample_accounts. we can read it as fixed formatted data: 18 Data sample_accounts. INFILE statement can also be used to provide reference to the datafile. This is how it will look like: filename sacc "D:\Mydata\sample_accounts. INFILE sacc .S for SAS-Post your comments at http://s4sas. For example. Data sample_accounts. INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 2529.txt’ in ‘C:\Mydata’ directory. RUN. INPUT Account 1-7 OpenDate $ 9-17 StatusCode $ 21-22 CreditLimit 2529. CARDS. RUN. That is.com column definitions are required for every variable in the dataset. Alternately. you need to provide the beginning and ending column numbers for each variable. .

.2000.20-Sep-04.16-Sep-04.19-Sep-04. RUN.Z.2000 1234676.T.T.8000 1234675. suppose you have a tab delimited file named sample_accounts.txt’ into SAS by the following method: .13-Sep-04. Reading Tab delimited data from an external file Free formatted data that is TAB delimited can also be read from an external file.2000 1234671.. Here's what the data in the file look like: 1234670.6000 1234678.txt that is stored in the ‘C:\Mydata’ directory of your computer.12-Sep-04.2500 1234673.D.T.S for SAS-Post your comments at http://s4sas.csv into SAS by using the following method: Data sample_accounts.3000 1234672.8000 1234679.14-Sep-04.15-Sep-04.com 19 Reading comma delimited data from an external file Free formatted data that is comma delimited can also be read from an external file.blogspot. INFILE "C:\Mydata\sample_accounts. INPUT Account OpenDate $ StatusCode $ CreditLimit .csv stands for comma separated values) that is stored in the C:\Mydata directory of your computer.17-Sep-04.S.'.Z. suppose you have a comma delimited file named sample_accounts..4000 1234677.csv (. For example. Here's what the data in the file look like: 1234670 1234671 1234672 1234673 1234674 1234675 1234676 1234677 1234678 1234679 11-Sep-04 12-Sep-04 13-Sep-04 14-Sep-04 15-Sep-04 16-Sep-04 17-Sep-04 18-Sep-04 19-Sep-04 20-Sep-04 Z Z T D S T T 2000 3000 2500 3200 8000 2000 4000 6000 8000 2000 We could read the data from ‘sample_accounts. We could read the data from sample_accounts.csv" DLM ='.18-Sep-04.11-Sep-04. For example.3200 1234674.

05 3320899201/19/2001$702. Many a time the data we get are formatted and its necessary to read them as it is.61 run. INPUT acctnum $8. Also the positions to read are specified. Note:. date mmddyy10. . the DLM= option and/or DSD option on the INFILE statement will need to be specified.blogspot.77 0345900601/19/2001$1.209. CARDS. For various informats for data and numbers please checkout the SAS help or Manual. TRUNCOVER enables you to read variable-length records when some records are shorter than coded for on the INPUT statement.59 1028754301/17/2001$672. If record lengths exceed 256 bytes then add the LRECL= option to the INFILE statement to specify a larger record length (.. Note that informats are specified in the INPUT statement to instruct SAS that the data following should be understood in that form (mmddyy is a date format and comma9. Have a look at the following example: DATA acctinfo. This is especially true when we read data containing date and decimal places from text files.S for SAS-Post your comments at http://s4sas. RUN. An informat gives the data type and the field width of an input value. amount comma9. Informats also read data that are stored in nonstandard form. 0074309801/15/2001$1. Note: If your data is delimited by another character other than a blank or space. an informat follows a variable name and defines how SAS reads the values of this variable. INFILE "C:\Mydata\sample_accounts.003. TRUNCOVER may need to be specified on the INFILE statement to prevent SAS from reading more than one record at a time when reading variable length. is a number format with commas to read easily). Also. or numbers that contain special characters such as commas. such as packed decimal. Some data files may also require additional INFILE statement options.txt" DLM='09'x INPUT Account OpenDate $ StatusCode $ CreditLimit .With formatted input. .com 20 Data sample_accounts. Reading data using formatted input Discussion on reading data into SAS would be incomplete without mentioning how to read the formatted input.

XLS . DBMS files and other delimited files.MDB . GETNAMES=YES.WK3 . Identifier EXCEL2000 ACCESS DBF WK1 WK3 WK4 EXCEL EXCEL4 EXCEL5 EXCEL97 DLM CSV TAB Input Data Source MS Excel Version 2000 Microsoft Access database dBASE file Lotus 1 spreadsheet Lotus 3 spreadsheet Lotus 4 spreadsheet Excel Version 4 or 5 spreadsheet Excel Version 4 spreadsheet Excel Version 5 spreadsheet Excel 97 spreadsheet delimited file (default delimiter is a blank) delimited file (comma-separated values) delimited file (tab-delimited values) Extension . There are several DBMS specifications available.WK4 .blogspot. PROC IMPORT OUT= WORK. By default the first row in excels would be read as column names. Suppose you have an excel sheet named Sample_Accounts.DBF .XLS .xls in ‘C:\mydata’ directory. The below table summarizes that. the following method could be used to read the data into a SAS dataset called Sample_accounts.XLS .xls" DBMS=EXCEL2000 REPLACE.Sample_accounts DATAFILE= "C:\Mydata\Sample_Accounts.WK1 . PROC IMPORT would be useful when we need to read the files repetitively for reporting and other analytics purposes.XLS .S for SAS-Post your comments at http://s4sas.* . RUN.CSV .com 21 Reading data using PROC IMPORT SAS uses a procedure called PROC IMPORT to read data from spreadsheets.XLS .TXT .

any data should be converted into a SAS dataset before SAS can carry out any operations on them. (a proprietary implementation of SQL by Oracle). SAS automatically converts the field formats in to SAS formats. a date field in Oracle will be converted into SAS date and an Oracle Varchar field will be converted into character. The below sample show how to use PROC SQL to extract data from one of the ORACLE data warehouses. Note that here SELECT that used is an ORACLE SQL PLUS command. billing_cycle_day AS billing_cycle_day FROM ACCOUNT_DIM WHERE CLIENT_ID='BROOK BROS' AND nvl(EXTERNAL_STATUS_REASON_CODE. Further. Further. As we have seen earlier.'0') <>'98').com 22 Data Extraction using PROC SQL In Consumer finance world. When querying a data warehouse. CREATE TABLE acct_status AS SELECT * FROM Connection To ORACLE (SELECT current_account_nbr AS account_number. each step is explained for a better understanding.external_status AS estatus. For example we embed Oracle SQL Plus code within PROC SQL so that data exaction is most optimized by Oracle. For example. PROC SQL is most widely used to extract data from various data warehouses. CREATE TABLE statement creates a SAS Dataset from the output of SELECT statement. Versatility of PROC SQL facilitates to use the RDBMS specific utilities and functions that make the data extraction process more efficient. PROC SQL.blogspot. . external_status_reason_code AS ext_rcode. Quit.S for SAS-Post your comments at http://s4sas. Disconnect From ORACLE. So one can use all the features of ORACLE while querying. Connect To ORACLE(User=501115644 Password=mypasswd Buffsize=10000 Path=CDCIT1 Preserve_Comments ). In the above example PROC SQL use the connect string “Connect To ORACLE(User=501115644 Password=mypasswd Buffsize=10000 Path=CDCIT1 Preserve_Comments )” to identify the oracle database (CDCITI) and use the user name and passwords specified to read data from it.

S for SAS-Post your comments at http://s4sas. DATA chm_rdc. Reading EBCDIC files using SAS In Consumer Finance environment we often come across EBCDIC data files whenever we deal with credit bureaus data and credit card processors like FDR. Library references: A common way to read a SAS dataset is to declare the folder that holds data sets as a SAS library. The above program declares a library called ‘rserver’ and assigns the folder ‘\projects\jkurian\mydata’. For example the below code tells SAS that the files in library rserver is version 6. INPUT @23 CHD_CLIENT_NUMBER $ebcdic4. set rserver. For instance. Further a dataset ‘cred_line’ is assigned(SET) to a data set ‘cred_line_new’ It is possible that there could be an older version of SAS dataset you need to read from an external source. Credit bureaus like Experian or Equifax and processors like FDR uses mainframe systems and hence their data files are EBCDIC. @31 CHD_PRIN_BANK $ebcdic4. but a 'P' in ASCII. . These two character sets represent the same data differently. Libname rserver '\projects\jkurian\mydata'. In such situation Libname should explicitly declare the dataset version. Windows. @35 CHD_AGENT_BANK $ebcdic4. @27 CHD_SYSTEM_NO $ebcdic4. The below example reads an EBCDIC data file named f95_sep from ‘C:\Mydata’.cred_line. Libname rserver v6 '\projects\jkurian\mydata'.com SAS datasets are used to store large amount of data in consumer finance environments. The below statement creates a SAS Library reference. the value '50'x is a '&' in EBCDIC.blogspot. 23 Some implementations of SAS Datasets like SAS SPDS are used for even data warehousing in Consumer Finance. UNIX. and Macintosh. Often we create SAS datasets and store them in shared drives and folders for future use. INFILE 'C:\Mydata\f95_sep' lrecl=32760 recfm =s370vb truncover . EBCDIC is the character encoding system of mainframes and ASCII is the encoding system on other machines such as VAX. run. data cred_line_new.

Note that @ sign is a column pointer. @39 reads data from column 39. Read the file into SAS and create a SAS Dataset.blogspot.18-Sep-04. Please refer the documentation on this at SAS support site for other options.13-Sep-04.6000 1234678.. Format. run.D.4000 1234677.T. Do a cross tab using PROC FREQ for account and open_date variables (First and second data fields) 1234670. 1234670 1234671 1234672 1234673 1234674 1234675 1234676 1234677 1234678 1234679 11-Sep-04 12-Sep-04 13-Sep-04 14-Sep-04 15-Sep-04 16-Sep-04 17-Sep-04 18-Sep-04 19-Sep-04 20-Sep-04 Z 2000 3000 Z 2500 T 3200 8000 D 2000 4000 S 6000 T 8000 T 2000 2.com 24 @39 CHD_ACCOUNT_NUMBER $ebcdic16. Key in the following data into SAS Program Editor and create a SAS dataset.8000 1234679. @130 CHD_EXTERNAL_STATUS $ebcdic1. Convert the date into MONYY7.txt Exercises: 1.Z.3000 1234672.12-Sep-04.S.2000 1234671. Create a file ‘sample_data.2000 1234676.Z. Read the date as mmddyy10. .com/techsup/technote/ts642.2000.11-Sep-04.3200 1234674. http://support. format and generate a PROC FREQ on the date variable.15-Sep-04.. ..17-Sep-04.8000 1234675.2500 1234673.19-Sep-04.sas.16-Sep-04.20-Sep-04.14-Sep-04.S for SAS-Post your comments at http://s4sas.T.T.txt’ with the data below and save it in your computer.

To simplify. drop. including dates and times. length. Characters are variables of type character that contain alphabetic characters. We did come across concepts like informats. SAS variables are containers that you create within a program to store and use character and numeric values. PROC contents. keep. Let us discuss some of these concepts in detail so that we understand some frequently used functions and methods used for data processing – to clean. i) ii) iii) iv) v) vi) vii) viii) How to create a variable and type? How to decide length of a variable? How to keep or drop a variable(s) from a dataset? What are array variables and how to declare them? How do we know the type of variables in an already created dataset? How to label and apply format to a variable? What’s a macro variable and how to declare them? What’s scope of a variable in a SAS program? How to create a variable . each and every field/column in a SAS dataset is a SAS variable.S for SAS-Post your comments at http://s4sas. and other special characters. Learn SAS commonly used functions for data processing. SAS variables Declaration. assignment.blogspot. Yes SAS stores date and time as Numbers. format and combine/subset data to make it ready for analytics purposes. Learn how to MERGE datasets. RENAME and FORMAT data. label. macro variables and scope of variables. IF/ELSE and DO-END • The previous chapters demonstrated how a simple SAS program looks like and how to read data from various sources into SAS. array. These are the questions we will discuss in this chapter around SAS variables. Learn conditional processing using WHERE. numeric digits 0 through 9.com 25 Working with Datasets Objectives of this Chapter: • • • • Learn about SAS Variables. There are two types of variables–Character and Numeric. formats and some procedures in SAS. Numeric variables are variables of type numeric that are stored as floating-point numbers. Learn how to LABEL.

1. Tot_cost is the variable that takes the value of a product of two other variables. run. as they are used 90% of the times. The above program creates a dataset ‘var_test’ with five variables. id ='JK'. proc print data=var_test . pro_price = 4. 1. run.S for SAS-Post your comments at http://s4sas. Have a look at the following program: Data var_test. Using an assignment statement This is the most common form of variable creation. ‘Id’ is assigned with a value of ‘JK’. final_price = tot_cost. -----Alphabetic List of Variables and Attributes----# Variable Type Len Pos --------------------------------------------2 NProducts Num 8 0 5 final_price Num 8 24 1 id Char 2 32 3 pro_price Num 8 8 4 tot_cost Num 8 16 .3 27. length and label of the variables in a dataset. As we have seen earlier.blogspot. final price variable is assigned with another variable in the dataset ie tot_cost. Using an assignment statement 2.55 27. And finally.run.3 PROC contents is the procedure used to know the data types. NProducts= 6. Nproducts and pro_price are assigned with numbers where latter is a decimal. Through a LENGTH statement and 4. As a result of a PROC SQL/PROC IMPORT. tot_cost = NProducts*pro_price. proc contents data =var_test. each variable will form a column/field in the dataset. There are many other ways too but we limit to these four types.com 26 There are four ways we commonly use to create a variable. Its not necessary that the variables should be declared well in advance. Using and INPUT statement 3.555. PROC Print prints the data sets and all variables and this is how the output looks like: pro_ final_ Obs id NProducts price tot_cost price -----------------------------------------------------1 JK 6 4.

05 3320899201/19/2001$702. proc contents data =acctinfo. types and lengths are assigned by SAS. SAS gives the variable a default type and length as shown in the examples in the following table. amount comma9. x='ABC'. INPUT acctnum $8.com These outputs together tell us how the variable creations are done and what values. Creating variables with INPUT Statement We have already seen how to use INPUT to read data into variables. This will lead to SAS truncating the variable in to the length of the variable created. date mmddyy10. In a DATA step.blogspot. SAS determines the length of a variable 27 from its first occurrence in the DATA step.. Below reproduced is one example to show how its done. x='ABCDE'.61 .run. Resulting Type of X Numeric variable Character variable Character variable Resulting Length of X 8 Explanation Default numeric length (8 bytes unless otherwise specified) Length of source variable 4 3 Length of first literal encountered Practical problems: Many a time the length of the variable is not sufficient to hold the value encountered during the data processing. This problem can be solved with declaring the length of the variable before assignment. 0074309801/15/2001$1.003. When the type and length of a variable are not explicitly set.209. a=’ABCD’ x=a.77 0345900601/19/2001$1.59 1028754301/17/2001$672. The new variable gets the same type and length as the expression on the right side of the assignment statement. We have also seen how to use SAS informats to tell SAS what kind of data its reading. DATA acctinfo. . CARDS.S for SAS-Post your comments at http://s4sas. run. Expression Numeric variable Character variable Character literal Example a=34 x=a. you can create a new variable and assign it a value by using it for the first time on the left side of an assignment statement. Now let us discuss some general rules of variable creation by assignment.

final_price = tot_cost.com Output : Alphabetic List of Variables and Attributes # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 acctnum Char 8 16 3 amount Num 8 8 2 date Num 8 0 28 Here INPUT statement specifies the data type next to the variable and also how many positions (length). the length of the variable needs to be explicitly defined. length id $ 10.S for SAS-Post your comments at http://s4sas. tot_cost = NProducts*pro_price. Specifying a New Variable in a LENGTH Statement In practical situations. as automatically assigned by SAS. SAS reads only up to the length of first variable.blogspot. For example. So it’s a good programming practice to declare the variable with a LENGTH statement so that we are sure it can hold all kinds of values the data has. NProducts= 6. Output is : Alphabetic List of Variables and Attributes # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 2 NProducts Num 4 24 5 final_price Num 8 16 1 id Char 10 28 3 pro_price Num 8 0 4 tot_cost Num 8 8 Output shows that now ID variable is a 10-character field so it can hold more characters. Without this explicit declaration.55. when we create new variables. . run. Suppose the second value is longer than the first. length NProducts 4.run. You can use the LENGTH statement to create a variable and set the length of the variable. pro_price = 4. Let us modify our earlier example: Data var_test. SAS assigns the length of the variable as that of the first. proc contents data =var_test. when we read two character values successively into a variable. id ='JK'. ID field can hold only two characters.

In the first case.767 bytes. you can change the length of the variable by using a subsequent LENGTH statement. Set base_data((keep = var1 var2 var3 etc). Run If you are reading a big dataset into SAS and require only a few variables from it. Here what happens during the process is SAS identifies the best format for the database fields you are extracting and apply the same to the datasets. In the second case. Creating variables through PROC SQL and PROC IMPORT We always extract data from data warehouses or import data using SAS import utilities and we find the dataset is created with all columns with various formats. Set base_data. Please note that we have to use such efficient methods to restrict the data read into the system to optimize the system resources such as SASWORK and shared drives. use the following statements in the program.blogspot. SAS reads from disk only the three variables you intend to keep. This will ensure that output dataset is created with required variables only. During the data processing we create several variables but need to save only select ones in the final dataset. KEEP and DROP statements are used often to control the number of variables (fields) read into and output into the datasets. Run.com For character variables. you must allow for the longest possible value in the first statement that uses the variable. PROC SQL provides more flexibility in formatting the variables. Run Alternately you can specify the first statement as follows: Data target_data . For numeric variables. . Data target_data . If you want to restrict the number of columns in output data set. The maximum length of any character variable in the SAS System is 32. keep = var1 var2 var3 . even though you only intend to use three variables. because you cannot change the length with a subsequent LENGTH 29 statement within the same DATA step. SAS reads the entire data set base_data.S for SAS-Post your comments at http://s4sas. Set base_data. Data target_data (keep = var1 var2 var3 etc). use the following method. KEEP and DROP statements. This is discussed separately in the appendix-II.

As we have seen earlier one of the basic use of array is to group similar variables . array weight{50} wt1-wt50. wt3…. Like this 50 times the loop is executed for each row of data and the update is made for fifty fields (columns) Line:6 Ending the loop . Assume it’s a very large data set and we want to replace all 999 with a ‘. Let us look at simple program to understand Data array_test. . Array Variables Unlike other programming languages. if weight{i}=999 then weight{i}= . Using array variable we can simplify this. End of the execution. SAS array variable is a set of similar variables grouped together with a name in an ARRAY statement. Now think about substituting fields with missing. If we are to do this data step. The closed bracket {50} specifies the number of elements in the array followed by the values of array elements*wt1. We use arrays very often when we work with datasets that are arranged as a Time Series. For example weight{40} = ‘wt40’ Line 4: Do loop 1 to 50 to hold the record to scan through 50 different fields specified.S for SAS-Post your comments at http://s4sas. run. weight {2} will have a value of wt2. all the fields are evaluated and updated with ‘. wt2. end. do i=1 to 50.’ for 999. Line 5: Remember weight{1} will have a value wt1 and that field will be evaluated in the following IF condition to check whether it has a value of 999. Set weight_data. Similarly when i=2. Time Series Example Now let us look at an example to know the way array is used in our environment (This program does not run as it requires some datasets) . we will have to write 50 statements ( if wt1 =999 then wt1=. Line 3: Here ‘array’ is the key word that tells SAS the following name (weight) is an array.’ for missing.. like that for each column) .wt50).com 30 The same way DROP statement can also be specified based on the data requirement of the user. In the above example ‘weight_data has weight measured at fifty different growth stages and wherever data is missing that data point is updated with 999.blogspot. Values with some other value.

Statement_month =st_month (i). declaring an array would solve the same. Variable LABELs A SAS Label describes a variable. Further. class Statement_month. Let us modify our earlier example: Data var_test. run. balance_due = bal_due(i). For example st_month(5) = ‘stmonth55’ In other words.S for SAS-Post your comments at http://s4sas. Proc summary data = Balance_due. array bal_due {6} baldue15 baldue25 baldue35 baldue45 baldue55 . length id $ 10. .blogspot. do i=1 to 5. The closed bracket (5) specifies the number of elements in the array followed by the values of array elements. id ='JK'. run. array st_month {6} stmonth15 stmonth25 stmonth35 stmonth45 stmonth55. NProducts= 6. var balance_due . end. length NProducts 4. set TS_49. (For example. When we produce reports we could print labels instead of variable names. If we need to do the same action repetitively on same group of variables. output. array st_month {5} stmonth15 stmonth25 stmonth35 stmonth45 stmonth55 Here ‘array’ is the key word that tells SAS the following name (st_month) is an array. (s)he could use PROC summary to do the summarization as data is now in a format that PROC Summary understands. No let us look at the ARRAY statement.com 31 Data Balance_due. using SAS ARRAY (s)he could achieve that easily. The above program reads a dataset (TS_49) where the variables are arranged in a time series format. When labels are assigned in the data step they are available for all procedures that use that data set. ARRAY help us to group a set of variables so that programming could be made short and flexible. baldue15 stands for balance due for the month of Jan 2005 and so on) The user wants to see the balance due for all accounts by statement month for 5 specific months .

run. Proc print data = var_test label. Variable FORMATs A format is an instruction that SAS uses to write data values. Note that a format does not change the original value of a variable.blogspot. run.S for SAS-Post your comments at http://s4sas. tot_cost = NProducts*pro_price. the labels can be created with PROC FORMAT procedure. LABEL id ="Vendor Name" NProducts ="Number of Products" pro_price = "Product Price" tot_cost ="Total Cost" final_price ="Final Price" . When you use PROC PRINT or some other Procedures you can print the labels instead of variable names. we will look at how we can use existing SAS internal formats using the FORMAT statement in PROCs and Data steps.com 32 pro_price = 4. which gives more information about the variable. . final_price = tot_cost. Since formats are primarily used to format output. You use formats to control the written appearance of data values. proc contents data =var_test. run. Let us look at the output of PROC Contents: -----Alphabetic List of Variables and Attributes----# Variable Type Len Pos Label ------------------------------------------------------------------2 NProducts Num 4 24 Number of Products 5 final_price Num 8 16 Final Price 1 id Char 10 28 Vendor Name 3 pro_price Num 8 0 Product Price 4 tot_cost Num 8 8 Total Cost The PROC output now show ‘Label’. For example the code below will print labels for the variables in var_test. Additionally. which is not discussed in this guide. The following example shows how to use a format statement in a data step.55.

dollar_amt dollar10. dollar_amt with dollar10. MONNAMEw. DDMMYYw. num_test = 1250. Character.S for SAS-Post your comments at http://s4sas. name $reverj7. There are three categories of formats. formats.Do consult the SAS manual for a complete list of formats. format num_test words40. . dollar_amt =13400. run.ss Writes date values as the day of the month Writes date values in the form ddmmyy or ddmmyyyy Writes date values in the form ddmmyy or ddmmyyyy with a specified separator Writes date values as the month and the year and separates them with a character Writes date values as the name of the month Writes date values as the month of the year Date and Time DATEw. name with reverj7. Description Writes standard character data Writes data values that are enclosed in double quotation marks Writes character data in reverse order and preserves blanks Converts character data to uppercase Writes standard character data Writes date values in the form ddmmmyy or ddmmmyyyy Writes datetime values in the form ddmmmyy:hh:mm:ss. The below list is not exhaustive.run. proc print data = format_test. Please have a look at the following table to understand what each format does to the display of the variable..d DAYw. today =today().com 33 data format_test. DATETIMEw. Name = 'J KURIAN'. date and time and numeric. Syntax for the format statement is: Format <variable name> <format name> .5.2 The above example formats four variables – num_test with words40. MMYYxw. today monyy7.2 and today with monyy7. $QUOTEw. $UPCASEw. Category Character Format $CHARw. . $w.. Lists of frequently used formats are provided below. MONTHw.. $REVERJw.blogspot. DDMMYYxw.

COMMAw. .d FLOATw. and decimal points Generates a native single-precision. Numeric BESTw.com 34 MONYYw. w.blogspot. QTRRw.S for SAS-Post your comments at http://s4sas.d NUMXw. YEARw. yyyy Writes date values as the year Writes date values as the year and month and separates them with a character SAS chooses the best notation Writes numeric values with commas and decimal points Writes numeric values with periods and commas Writes numeric values with dollar signs. Length_var = length(Max_var). WORDDATEw. WORDSw. Max_var= max(100. Writes date values as the month and the year in the form mmmyy or mmmyyyy Writes date values as the quarter of the year Writes date values as the quarter of the year in Roman numerals Writes date values as the day of the week and the date in the form day-of-week. WEEKDAYw. floating-point value by multiplying a number by 10 raised to the dth power Writes numeric values with a comma in place of the decimal point Writes Social Security numbers Writes standard numeric data one digit per byte Writes numeric values as words with fractions that are shown numerically Writes numeric values as words SAS Functions in DATA steps A SAS function performs a computation or manipulation on variables (arguments) and returns a value. WEEKDATEw. the day. SAS functions are mainly used in DATA step programming statements. month-name dd.d COMMAXw.101). Most functions use arguments supplied by the user. and the year in the form month-name dd. run. QTRw.d WORDFw. yy (or yyyy) Writes date values as the day of the week Writes date values as the name of the month.d SSNw. commas. This_month = month(today()). data function_test.d DOLLARw. YYMMxw.

blogspot. Function Name LENGTH LOWCASE SCAN SUBSTR (right of =) UPCASE DATEPART DAY INTCK INTNX Description Returns the length of an argument Converts all letters in an argument to lowercase Selects a given word from a character expression Extracts a substring from an argument Converts all letters in an argument to uppercase Extracts the date from a SAS datetime value Returns the day of the month from a SAS date value Returns the integer number of time intervals in a given time span Advances a date. LENGTH and MONTH functions are used. .run.argument-n>) In the above example.S for SAS-Post your comments at http://s4sas. . The syntax of a function is : Function-name (argument-1. and returns a date..com 35 proc print. or datetime value Returns the month from a SAS date value Returns the quarter of the year from a SAS date value Returns the current date as a SAS date value Returns the year from a SAS date value Returns the largest value Returns the arithmetic mean (average) Returns the smallest value Returns the sum of the nonmissing arguments Assigns DATA step information to a macro variable Returns the natural (base e) logarithm Returns the remainder value Returns the square root of a value Returns a random variate from a uniform distribution Returns the value produced when a SAS expression that uses a specified informat expression is read Returns values from a queue Returns a value using a specified format Converts ZIP codes to state postal codes Returns the smallest integer that is greater than or equal to the argument Returns the largest integer that is less than or equal to the argument Rounds to the nearest round-off unit MONTH QTR TODAY YEAR MAX MEAN MIN SUM CALL SYMPUT LOG MOD SQRT RANUNI INPUT LAG PUT ZIPSTATE CEIL FLOOR ROUND . . time. we have shown how MAX. or datetime value by a given interval. time. The table below has some frequently used functions in our programs.

DATALINES. RUN. For example to combine the application data with performance data we use MERGE utility in SAS. For example. 1002 2000 1003 4000 1004 3000 . Output looks like as follows: Obs account credit_ limit . Now we need to combine these into one dataset so that it will show the latest credit lines for each account. Let us look at the following examples. we try to match only those accounts while doing the performance data merging.com 36 TRUNC VLABEL VNAME VTYPE Truncates a numeric value to a specified length Returns the label that is associated with the specified variable Returns the name of the specified variable Returns the type (character or numeric) of the specified variable MERGE –Combining datasets One of the features frequently used while SAS programming is combining one or many datasets. INPUT account credit_limit. DATALINES. account number or Account Key is usually used for match merging. MERGE intial_cl new_cl. INPUT account credit_limit. if we are interested in the performance of the accounts that are opened only in Jan 2004. DATA intial_cl.blogspot. BY account. Intial_cl holds data for all initial credit lines assigned and new_cl has all increased credit lines. RUN. 1002 3000 1004 5000 .S for SAS-Post your comments at http://s4sas. DATA new_cl. DATA credit_limit. In consumer Finance. PROC PRINT. Most of the time a match merging is done on SAS datasets. We have two record sets that have information about credit lines of the accounts.

The above data step introduces a conditional processing using IF .blogspot. combines both the datasets with data updated into the first dataset specified (Outer join) Also it is possible to merge more than two data sets at a time. ‘IF a and b’ combines the dataset if the BY statement finds a match in the second dataset. Let us introduce some more complexity to reflect the way we use MERGE . Also a handle to the dataset is declared after the dataset name (in=a. Please go through the Appendix-I for more details. Output looks like as follows Account Sample Obs 1 2 account 1002 1004 credit_ limit 3000 5000 There are various combinations that suits to various requirements If a . BY account. IF-ELSE. What if we put ‘new_cl’ data set first in the MERGE statement? Note that the data overwritten on the first dataset. Only those records with a match are combined. RUN. = combines the data with data updated for all accounts in the second data set If a or b. MERGE intial_cl(in=a) new_cl(in=b) .com 1 2 3 1002 1003 1004 3000 4000 5000 37 So a simple merge statement combined the datasets and the BY statement made sure that its updated with the latest information.S for SAS-Post your comments at http://s4sas. Conditional Processing with WHERE. DO-END . in=b) . Using these handles we can combine the datasets conditionally. PROC SQL is also used to merge datasets. This is different from the first example where all records are output into the resultant dataset. IF a and b. DATA credit_limit. = combines the data with data updated for all accounts in the first dataset If b.

IF ELSE conditions are commonly used to flag accounts or create new variables based on certain conditions. 1005 800 Z 1006 450 D 1007 560 Z 1008 450 A 1009 900 C 1110 300 C . 1002 300 A 1003 20 A 1004 1200 . But WHERE is more efficient when we do the sub-setting. Let us look at an example now. if status_code eq 'Z' then status= "Bad-Charged Off".S for SAS-Post your comments at http://s4sas.run. Note that WHERE statement is used to evaluate two variables with and ‘and’ condition. IF can also be used instead of WHERE . The example below shows the usage of IF/ELSE Data perf. else run. Data perf. Status= "Good Account". Conditions behind creating this subset were the activity definitions we mentioned before.). DATA account_perf. Here dataset ‘perf’ is a subset of account_perf.blogspot. cards. else if status_code eq 'C' then status = "Cancelled".com 38 When working with data we come across situations where data need to be filtered or we may have to apply several conditions before we have the final data for analysis or modeling. set account_perf. For example. INPUT account current_os status_code $. When we want to report out the number of active accounts and total $ outstanding. (You can use IF current_os >100 and status_code ne 'Z'. where current_os >100 and status_code ne 'Z'. IF/ELSE can be used. length status $20. we need to subset the entire data so that only accounts that are active are included. Suppose we need to flag the accounts into Good and Bad based on certain conditions. The same way we can use OR also. run.. Suppose the active definition says “ An active account is an account where current $ outstanding is greater than 100 and Status_code is not Z”. set account_perf. .

blogspot.S for SAS-Post your comments at http://s4sas. Let us have a look at the example below. end. Data perf.com 39 Dataset used is same as the previous example. wo_amount = current_os . Cancelled and Good Accounts. Obs 1 2 3 4 5 6 7 8 9 account 1002 1003 1004 1005 1006 1007 1008 1009 1110 current_ os 300 20 1200 800 450 560 450 900 300 status_ code A A Z D Z A C C status Good Account Good Account Good Account Bad-Charged Off Good Account Bad-Charged Off Good Account Cancelled Cancelled When there is multiple processing done conditionally. if status_code eq 'Z' then do . Status= "Good Account". status= "Bad-Charged Off". The output looks like the following: Obs 1 2 3 4 5 6 7 8 9 account 1002 1003 1004 1005 1006 1007 1008 1009 1110 current_ os 300 20 1200 800 450 560 450 900 300 status_ code A A Z D Z A C C status Good Account Good Account Good Account Bad-Charged Off Good Account Bad-Charged Off Good Account Good Account Good Account wo_amount 0 0 0 800 0 560 0 0 0 . set account_perf. A new variable (status) is created based on the condition. run. proc print . Bad Charged off. then DO-END loop is used. this can be achieved using a DO-END loop. For example we want to categorize the good and bad accounts and also want to compute the write-off amount based on a same condition ie Status_code =Z. length status $20..run. else do. end. wo_amount = 0 . Here we are categorizing the data into three segments.

Write down your observations on how each merging worked. BY account. PROC PRINT. DATALINES. RUN. IF-ELSE. DATA credit_limit.S for SAS-Post your comments at http://s4sas. IF b . INPUT account credit_limit. PROC PRINT. RUN. INPUT account credit_limit. IF a or b . DATA new_cl. DATA credit_limit. RUN. RUN. DATA intial_cl.com Now let us summarize: 40 WHERE. Exercises: Create two data sets as showed below and try out the following combination of merging. BY account. BY account. RUN. RUN. DO-END are used for conditional processing in SAS data steps. PROC PRINT. MERGE intial_cl(in=a) new_cl(in=b) . DATALINES. MERGE intial_cl(in=a) new_cl(in=b) .blogspot. 1002 3000 1004 5000 1005 2500 . 1002 2000 1003 4000 1004 3000 . . WHERE and IF conditions are also used in many SAS PROCs to subset data and filter out unwanted data. IF a . MERGE intial_cl(in=a) new_cl(in=b) . DATA credit_limit.

S for SAS-Post your comments at http://s4sas. RUN.blogspot. . IF a=b .com 41 DATA credit_limit. PROC PRINT. RUN. BY account. MERGE intial_cl(in=a) new_cl(in=b) .

whether the data are compressed. DATA mylib. DATA mylib. The multi-user environments are constrained by the system resources like SAS Workspace or the shared folders.intial_cl. To remove unnecessary files and manage the datasets. you can • • • • • • • Delete SAS files List the SAS files that are contained in a SAS library Copy SAS files from one SAS library to another Rename SAS files List the attributes of a SAS data set. PROC MEANS and PROC GPLOT in this chapter. PROC FREQ. We will discuss PROC DATASETS. PROC DATASETS PROC DATASETS is a utility procedure that helps to manage the SAS datasets in various libraries. Append SAS data sets Create and delete indexes on SAS data sets The example below demonstrates three frequently used features of Proc Contents – Delete. DATALINES. Proc Datasets is often used in the main programs. DATALINES. PROC SORT.com 42 Procedures for Data Insights Objectives of this chapter: To learn frequently used procedures in Consumer Finance environment.new_cl. These procedures are used to understand the data we work with as well as to bring out business insights. PROC TRANSPOSE. whether the data are indexed etc. PROC PRINT. copy and Rename. INPUT account credit_limit. libname mylib 'D:\mydata'. INPUT account credit_limit.S for SAS-Post your comments at http://s4sas.blogspot. information such as the date the data were last modified. PROC DOWNLOAD. 1002 3000 1004 5000 1005 2500 . 1002 2000 1003 4000 1004 3000 . . With PROC DATASETS.

If the dataset is large. change new_cl=brand_new_cl. Line 1: tells SAS the dataset name. we would want to restrict the number of column and rows in our print report. var account_code current_balance credit_limit status_code fico_score. status_code and fico_score. run. credit_limit. it’s a good practice to have a look at a sample data extract. Also restricts the number of observations printed through OBS statement. PROC PRINT Proc PRINT is used to understand how the data looks and also for a variety of reporting purposes. This is how we do it. current_balance. The below example show how to do that.blogspot.S for SAS-Post your comments at http://s4sas. delete intial_cl. Assume that ‘statement’ is a dataset with 100 variables and 1 Million records. account_code. Line 3: Specifies the variables to be printed . Proc print in conjunction with ODS features can produce impressive reports. copy in=mylib out =work. Proc print data=statement (obs=10). filter data/variables and to do additional computation on the data. Line 2: specifies a title for the report. When we work with data. PROC Print has various options and features to control the display. title 'Sample records-Statement'. run. We want to see any 10 observations and 5 variables viz. Line 1: specifying the library to list the detail (filenames and attributes) Line 2: change the name ‘new_cl’ into ‘brand_new_cl’ Line 3: delete intial_cl from mylib Line 4: copy from ‘mylib’ library to ‘work’ library the file specified in select statement (brand_new_cl) select brand_new_cl.com 43 proc datasets library=mylib details.

pageby ext_status_code. BY statement: Produces a separate section of the report for each BY group. SUM: Computes the total for the specified variables DATA account_perf. SUM statement and PAGE statement. Note that BY statement in Proc Print require the data to be sorted by the variables in BY statement. run. Some of the features we use with 44 are: BY statement. Title "Example for SUM. BY and PAGEBY Statements". Computes the totals by page. Here are some examples on how to use them. cards. PAGEBY: Controls page ejects that occur before a page is full. When we do the data cleaning. INPUT account current_os ext_status_code $ int_status_code $. PROC SORT procedure should be used to do the sorting prior to the print.blogspot. Proc print data = account_perf.com What we have seen is a basic version of PROC PRINT. PROC SORT sorts the variable(s) in a dataset in an ascending or descending order. PROC SORT PROC SORT is a very useful utility we often use when work with data and procedures. sum current_os. Additionally it performs some other operations like deleting the duplicate rows and ordering the variables for certain procedures. by ext_status_code.S for SAS-Post your comments at http://s4sas. run. proc sort data = account_perf. 1002 300 A C 1003 20 A C 1004 1200 A D 1005 800 Z A 1006 450 Z A 1007 560 Z A 1008 450 A D 1009 900 Z D 1110 300 Z D . the accounts are appearing multiple times (duplicate and non-duplicate rows) and we want to keep the record . if specified. run. In the example shown below. by ext_status_code . PROC SORT is used to remove all the duplicate records or duplicate observation.

Our objective is to keep the record last updated so when SAS deletes the duplicates it keeps the first record in the order sorted. DATA statement. Line 1: specifies the data to read in and ‘out’ statement specifies the dataset to be created after sorting. Let us create a SAS dataset to demonstrate some of the capabilities of PROC SORT. run. by account descending Dt_updt.S for SAS-Post your comments at http://s4sas. Line 2: BY statement specifies the variables to be used for sorting. run. cards.blogspot.. Note that account number 1003 and 1002 have duplicate repords. we should first clean this dataset to make sure no doble couting is done on measurement variables. In order to do any analysis or reporting. INPUT account current_os ext_status_code $ format dt_updt mmddyy10. Here account variable is sorted in an ascending order (default) and Dt_updt is sorted in a descending order. and in this case ‘descending’ sort keyword brings the latest record first. Such issues are common with statement table as the customer can request to change the billing cycle for their credit card statements or problems with data loading at ETL stage. . 1002 300 A 03/15/2005 1003 20 A 03/15/2005 1003 20 A 03/15/2005 1004 1200 A 03/15/2005 1005 800 Z 03/15/2005 1006 450 Z 03/15/2005 1007 560 Z 03/15/2005 1002 300 Z 03/25/2005 1002 300 Z 03/25/2005 1009 900 Z 03/15/2005 1110 300 Z 03/15/2005 1004 1200 Z 03/26/2005 . Accounts 1002 and 1004 are repeated but they are not duplicated. ’NOHUP’ is a keyword tells SAS to remove all records that are identical and keep only the first one. .com last updated only. 45 Dt_updt mmddyy10. The DATA step above creates a dataset with 12 records. Using PROC SORT we can filter those accounts and create a clean SAS dataset. This is how we use a PROC SORT statement: proc sort data = statement out=statement1 nodup.

PROC PRINT. format Perfmonth MONYY7. We don’t have to sort data for PROC SUMMARY. tot_accounts newacct current_balance. PROC TRANSPOSE The TRANSPOSE procedure creates an output data set by restructuring the values in a SAS data set. It is possible to specify multiple variables in the BY statement and it is possible to control the order (ascending or descending) of the individual variables while performing the sorting. . PROC SUMMARY. So with these two sort steps we have a clean dataset with only latest information. actives . run. To remove them we use the ‘nodupkey’ keyword as follows: proc sort data = statement1 out=statement2 nodupkey. label Perfmonth ="Performance Month" tot_accounts ="Total Number of Accounts" actives= " #Active Accounts" newacct ="# New Accounts" current_balance ="$ Current Balances". Line 1: ‘nodupkey’ keyword is specified to instruct SAS to remove all the records appearing repetitively for the variables in BY statement.blogspot. It converts the row elements to columns. TIP: PROC SORT is very time/resource consuming process in Consumer Finance environment as we typically work with millions of records. If an account is repeated twice the fist record in the sort order is kept and rest are deleted from the output dataset. Some Procedures use BY statements to categorize the output – For example. Let us create a dataset to demonstrate an example: DATA statement_summary. by account .S for SAS-Post your comments at http://s4sas. unless it has a BY statement. INPUT Perfmonth date9. Make sure you sort the data by BY variables before you perform a procedure on it. transposing selected variables into observations.com 46 Notice that we still have multiple records for couple of accounts. PROC UNIVARIATE. Line 2: BY statement specifies the variables where SAS should look for duplicates.

63 31788023.08 30994371.71 36525256. Line 1: Specifies the dataset and output dataset – Proc Transpose does not print the results Line 2: Spcifies the field to be transposed through an ‘ID’ statement.01 32329068.30 This how we use a PROC TRANSPOSE proc transpose data =statement_summary out = ts_statement. id Perfmonth.83 33772968.41 34349188. This is how the output looks like .88 38625107.23 31551717. run.run.83 33772968. 8860303 8947743 9035228 9123510 9199904 9289735 9390095 9493607 9583579 9643529 9706194 9723757 86521 90272 90436 92157 79118 92682 10294 10673 93288 63447 66283 20907 1366811 1376739 1397670 1424469 1417949 1369723 1371610 1383296 1427501 1432429 1340457 1294083 32932422.01 32329068. 01-Apr-03 01-May-03 01-Jun-03 01-Jul-03 01-Aug-03 01-Sep-03 01-Oct-03 01-Nov-03 01-Dec-03 01-Jan-04 01-Feb-04 01-Mar-04 . you want the Performance months as columns .71 36525256.12 39782001.74 37758060.3 proc print data = statement_summary label.12 39782001.42 35398736.74 37758060.23 31551717. var tot_accounts newacct actives current_balance .63 31788023.41 34349188.run. Line 3: Specifies the variables to be transposed This is how it looks if we print a part of the data transposed (ts_statement) Obs 1 _NAME_ tot_accounts _LABEL_ APR2003 MAY2003 8947743 JUN2003 9035228 Tot Number of Accounts 860303 .com 47 cards. Now we want to produce a report that displays al metric in a time series format ie. Total Number of Accounts 8860303 8947743 9035228 9123510 9199904 9289735 9390095 9493607 9583579 9643529 9706194 9723757 Obs 1 2 3 4 5 6 7 8 9 10 11 12 Performance Month APR2003 MAY2003 JUN2003 JUL2003 AUG2003 SEP2003 OCT2003 NOV2003 DEC2003 JAN2004 FEB2004 MAR2004 # New Accounts 86521 90272 90436 92157 79118 92682 10294 10673 93288 63447 66283 20907 #Active Accounts 1366811 1376739 1397670 1424469 1417949 1369723 1371610 1383296 1427501 1432429 1340457 1294083 $ Current Balances 32932422.88 38625107.blogspot.08 30994371.42 35398736.S for SAS-Post your comments at http://s4sas. Then it becomes a typical case where we use PROC Transpose.

PROC Download enables to download the data into your local windows folders. the Sas datasets are created in the remote server (Stoner Unix server for example).S for SAS-Post your comments at http://s4sas. SAS remote sign on to a Unix server enables the user to compose and submit the program in their windows clients and see the SAS log and Output in their workstation. This field has all variable names and their respective labels so that we can identify them. Variable we specified in ID statement is now converted into columns. PROC DOWNLOAD PROC DOWNLOAD is used to download data from a remote server. Options obs=100 restricts the number of observations downloaded to 100. PROC FREQ PROC FREQ is used to produce frequency counts and cross tabulation tables for specified variables/field. . 1376739 90436 397670 48 current_balance $ Current Balances 3293242 30994371 31551717 Note that two variables are created by SAS ‘_Name_ ‘ and ‘_Label_ ‘ . Let us look at a program : libname loc 'c:\datasets'. endrsubmit. OBS=Max should be used if the intention is to download the complete dataset. PROC Transpose is often used to convert the variables into a time series format as shown above.cmart_auth out = loc.cmart_auth. run. The above program assumes that a signon to a remote server is already estabilished. Proc downlod data = user01. rsubmit. When you are submitting a program to remote server.com 2 3 4 newacct actives # New Accounts #Active Accounts 86521 136681 90272. there are several instances we would do a PROC transpose to reshape the data in to a structure that helps in automation. It can also produce the statistics like Chi Square to analyze relationships among variables. It is possible to use a ‘BY’ statement in a PROC Transpose to transpose them in a grouped manner. libname user01 '/projects/cmart' options obs =100.blogspot. There are two librariesdeclared – ‘loc’ is a local SAS library and ‘user01’ is a remote Sas library . when you are working with SAS remote submit session. When we work with SAS programs for automation.

A standard FREQ output shows a column. TITLE 'Frequency of External Status Code by Client'. WHERE client ne 'JCP'. PROC FREQ offers various features to control the output and the way the tables are produced. In this case. run. For example a cross tabulation table of FICO score segments to External Status code will tell us how the accounts are distributed across various score segments and statuses.com As a good practice. There are several options specified like missing. nocol and nopercent to control the way statistics are displayed. BY client. proc freq DATA = account_perf. Line 3 specifies SAS to generate a cross-tab of External status code and Internal Status code trough a TABLES statement. INPUT client $ account current_os ext_status_code $ int_status_code $. proc sort DATA = account_perf. Line 1 and 2 in PROC FREQ statement tells SAS which dataset to use and the title of the output. TABLES ext_status_code* int_status_code /missing norow nocol nopercent. Let us start with an example to understand them better. LABEL client ='Client Name'. Proc sort is used to sort the data by client as we use a ‘BY’ statement in PROC FREQ that follows. at a data processing stage we use PROC Freq on categorical fields to 49 understand the data and their frequency distributions. cards. frequencies are computed for each client group. norow. row and cell percentages. Line 4: BY Statement specifies the grouping criteria.S for SAS-Post your comments at http://s4sas.blogspot. DATA account_perf. The data step creates a SAS dataset named ‘account_perf’. D Cmart 1110 300 Z D . . run. Cmart 1002 300 A C Cmart 1003 20 A C JCP 1004 1200 A D JCP 1005 800 Z A GIA 1006 450 Z A GIA 1007 560 Z A JCP 1008 450 A D GIA 1009 900 . run. BY client.

Missing . Nopercent . Norow . List . ‘Out=perf_freq’ specifies the output dataset name to store the frequency output. PROC FREQ is a very useful and the most used of the SAS procedures. WHERE client ne 'JCP'. ii) Wants to order the values to be changed to ascending order of frequency counts (highest occurring first) and iii) The output to be listed instead of cross-tab. To sum up.please note that ‘order =freq’ is specified to tell SAS to order the output by descending frequency count Line: 3.com 50 Line 5: WHERE statement allows the conditional processing. Its also possible to use FORMAT statements to manipulate the display. The following program block does the job: proc freq DATA = account_perf order=freq. . Here is a list of commonly used options with TABLES statement Nocol .blogspot. At line : 1.performs several chi-square tests. run. Here all clients other than ‘JCP’ are taken for computation.suppresses printing of column percentages of a cross tab.S for SAS-Post your comments at http://s4sas.prints two-way to n-way tables in a list format rather than as cross tabulation tables Chisq . Title 'Frequency of External Status Code by Client'. In Consumer Finance environment. TABLES int_status_code*ext_status_code / list out = perf_freq . list and output data creation using OUT are often used options.suppresses printing of cell percentages of a cross tab.interprets missing values as non-missing and includes them in % and statistics calculations. Line 5: Labels the output. PROC FREQ is used mainly in the data preparation stage and for reporting.suppresses printing of row percentages of a cross tab.‘list’ keyword is used to tell SAS to list the output and not cross-tabulate. label client ='Client Name'. Frequency ordering. Now let us look at the following requirements: i) The user wants the output not to be printed but to be put into a SAS data set for further processing.

To Sum up. mean. Line:3-Tells SAS to group the values in client field and produce the statistics separately. Options like BY and CLASS statements enable the users to look at them by segmenting into various categories. sum. Minimum and Maximum. PROC Means is used for .blogspot. Here current_os is a field in the dataset given and that contain data for current Dollar outstanding for each account in the dataset. respectively. minimum and maximum this is how we do it.S for SAS-Post your comments at http://s4sas. output out = perf_mean n = count sum = total mean = avg run. BY client . PROC MEANS is used to understand the numeric variables. var current_os. label client ='Client Name' current_os= 'Current Outstanding' . Title 'Summary of Current Outstanding' . proc means DATA = account_perf var range sum mean Title 'Summary Statistics of Current Outstanding’. Additionally. Mean. Let us start with a simple example: proc means DATA = account_perf . by client. The above program requests three statistics are to be put into the output dataset ‘perf_mean’. total and avg fields. We can request additional statistics through the options. Suppose we want to create a SAS dataset with the statistics computed for further processing.com 51 PROC MEANS The PROC MEANS produces descriptive statistics for numeric variables. Just like we use FREQ on categorical and character fields. the MEANS procedure reports N (the number of non-missing observations). proc means DATA = account_perf var sum mean . median min max . in the output dataset. sum and mean for each client (BY Client) are created as count. Std Dev (Standard Deviation). var current_os. Total count. run. . By default. Means is used on numeric data. their distributions and other statistical characteristics.Specifies the variable for which the statistics are to be computed. range. this is how we instruct SAS. run. Now we want to request more statistic like variance. Title 'Summary of Current Outstanding' . Line:4. var current_os. median.

In conjunction with the SYMBOL statement the GPLOT procedure can produce join plots. . high-low plots. cards. needle plots. 52 PROC GPLOT PROC GPLOT is used to produce the graphical output in SAS. PROC GPLOT is considered as an improvement over the PROC PLOT procedure as it provides good quality presentation compared to simple PROC PLOT outputs. INPUT client $ account fico_seg $ current_os tot_payment ext_status_code $ . PROC GPLOT produces a variety of two-dimensional graphs including • • • • • Simple scatter plots Overlay plots in which multiple sets of data points display on one set of axes Plots against a second vertical axis Bubble plots Logarithmic plots (controlled by the AXIS statement). In this way it can be used as a substitute for PROC SUMMARY. The dataset and the program below gives a good understanding about the GPLOT options and how the graphs are created. DATA account_perf.blogspot. Cmart 1002 401-500 300 100 A C Cmart 1003 501-600 200 150 A C Cmart 1004 601-700 1200 180 A D Cmart 1005 701-800 800 190 Z A Cmart 1006 801-900 450 200 Z A GIA 1007 401-500 560 210 Z A GIA 1008 501-600 450 180 A D GIA 1009 601-700 900 145 A D GIA 1110 701-800 300 148 Z D . and plots with simple or spline-interpolated lines. The SYMBOL statement can also display regression lines on scatter plots. showing trends and patterns Interpolating between data points Extrapolating beyond existing data with the display of regression lines and confidence limits.com summarizing large datasets with a variety of statistics for reporting purposes.S for SAS-Post your comments at http://s4sas. The GPLOT procedure is useful for • • • Displaying long series of data. run.

('FICO SEGMENTS') WHITE 85 (NUMBER = 5) NONE. The purpose of the GOPTIONS statement is to apply levels of specific options to graphs created in the session or to override specific default values. PART-II SYMBOL1 SYMBOL2 SYMBOL3 PART-III AXIS1 LABEL COLOR LENGTH MAJOR MINOR AXIS2 LABEL COLOR LENGTH MAJOR MINOR PART-IV = = = = = = = = = = ('Current O/S') WHITE 60 (NUMBER = 5) NONE . PART-I GOPTIONS RESET GUNIT CBACK CTEXT /* DEVICE FONTRES HTITLE HTEXT HSIZE VSIZE INTERPOL NOBORDER.blogspot. RUN. by client. The GOPTIOND used above and explained below.S for SAS-Post your comments at http://s4sas. PLOT current_os*fico_seg tot_payment*fico_seg/overlay haxis=axis1 vaxis=axis2.com 53 proc sort data = account_perf. . reset = option resets graphics options to their default values. in order for the requested options to be applied to it. = = = = = global = PCT = BLACK = WHITE = GIF*/ PRESENTATION = 3 = 2 8 IN 6 IN JOIN PROC GPLOT DATA = Account_perf. HEIGHT = 2.5 WIDTH = 2.5 WIDTH = 2. however. VALUE = DOT COLOR = LIME VALUE = SQUARE COLOR = VIPK VALUE = TRIANGLE COLOR = STRO HEIGHT = 2.run. the first step is to become familiar with the GOPTIONS. by client. It can be located anywhere within your SAS program. HEIGHT = 2. it must be placed before the graphics procedure.5 WIDTH = 2. GOPTIONS: When working with GPLOT.

These definitions control: • • The appearance of plot symbols and plot lines. and area fills Interpolation methods Part III defines the Axes 1 and 2 where we can name the axis.com gunit = sets the units of character height measurement to percentage of display height. spiffy the color. the dataset should be sorted by the BY variable. 54 In Part II.. boxes. which are used by the GPLOT procedure. how to "connect" the points) border option includes a border around the graphics output area. including bars. Here we are plotting two variables – O/S and Payments in the same Y variable (ie FICO_score) . In this example.S for SAS-Post your comments at http://s4sas.e. Interpol: interpolation method (i. cfont= option sets the font color to white device = option selects the device driver to use to create a graphic file(GIF) htitle= option sets the text height for the first title to 3 (in percentage of display height). a graph will be produced by each client in the dataset. length of each axis and customize the number of major and minor divisions on it. Line:2 Tells SAS to produce a graph for each category in the variable specified in BY statement. definitions done in the Part-III are applied using Haxis and Vaxis options. cback = option sets the color background to black.blogspot. htext = option sets the text height for all text to 2 (in percentage of display height). Further. confidence limit lines. Line:3 specifies the X and Y axes in the graph. Many a time we would require to name the SAS graph files and store it in a specific directory so that . Having seen various options in GPLOT now let us look at the GPLOT statement in Part-IV. SYMBOL statements create SYMBOL definitions.The OVERLAY option on the PLOT statement makes sure that both plot lines appear on the same graph. Note that to use BY in GPLOT. POC GPLOT provides the flexibility to plot single or multiple variables on a same graph.

S for SAS-Post your comments at http://s4sas. This kind of complicated charting can be automated using various options of GPLOT. In scorecard tracking or Champion-challenger strategy tracking projects often there are many segmentations (like score segments or strategy identifiers or vintage segments) and charting for various performance variables are carried out to compare the segments. This can be achieved by DEVICE option in GOPTIONS.blogspot.com 55 we have control over the files when we do the automation. .

S for SAS-Post your comments at http://s4sas. So with appropriate changes in file extensions we can create XLS files through the ODS . PROC EXPORT PROC EXPORT procedure exports a SAS dataset into a MS Excel. ODS HTML or DDE methods to produce an excel or web based report. ODS HTML Output Delivery System (ODS) statement in SAS allows the analysts to create HTML. Windows SAS provides an interface to do the data set export and when you want to do it automatically. we may have to create the data set in SAS. In Consumer Finance. and Excel reports that are sharable. RUN. Line: 1 Specifies the SAS library and dataset name to be exported Line: 2 Specifies the path and Excel filename where the data to be explored Line: 3 tells SAS what data output format it is and to replace the file if existing already. Also the labels are not exported by default.xls" DBMS=EXCEL2000 REPLACE. This chapter briefs some techniques and possibilities to explore for various reporting tasks.blogspot. Analytics community prefers the reports to be in a spreadsheet (MS Excel) format as they can carry out further analysis. you can specify the same as a program block as shown below. Before we create reports using proc export. It can also export the data into other formats like Access or Lotus but we limit our discussion to Excel only. PROC EXPORT DATA= work. we use PROC EXPORT. HTML formats are sharable across platforms and its possible to open those files in excel. which could be directly exported to excel. For the regular production reporting automation is carried out to avoid manual work as much as possible.com 56 SAS Powered Reporting A lot of consumer finance analytics work involves some form of reporting. Let us look at some example to understand each one of them. Its possible to create templates for these reports and applies such formats on the final reports. This method is normally used when we have to send only data tables as reports on a regular basis. Note that not all formats applied on the variables are preserved during the export process.Account_perf OUTFILE= "D:\ex_Account_perf.

by client. Line: 1 of ODS statement specifies the path and file name of the output report. ODS HTML FILE='D:\ods_report_test. Cmart 1002 401-500 300 100 A C Cmart 1003 501-600 200 150 A C Cmart 1004 601-700 1200 180 A D Cmart 1005 701-800 800 190 Z A Cmart 1006 801-900 450 200 Z A GIA 1007 401-500 560 210 Z A GIA 1008 501-600 450 180 A D GIA 1009 601-700 900 145 A D GIA 1110 701-800 300 148 Z D . Title "# of Accounts by Client". Line: 2 – A print statement that prints data for each client separately. This is an advantage as many times customer prefers an Excel report but they cannot be produced in Unix environments. the basic call contains two parts: the ODS call statement to tell SAS that an HTM/XLS output is requested and the ODS close statement to tell SAS to close the report. label account="Account #" fico_seg = "Fico Segments". .a PROC FREQ and a PROC PRINT – output of these two procedures would be written into an HTML file.blogspot. ODS HTML CLOSE. format current_os dollar10. run. if they are specified between and open and close statements.2 tot_payment words40. run. tables client/list. run. Format and labels are also applied to demonstrate later that how ODS preserves them in the final output. INPUT client $ account fico_seg $ current_os tot_payment ext_status_code $ . Let us look at an example DATA account_perf. cards. proc print data = account_perf. When using the ODS HTML statement.html'. Proc Freq data = account_perf.S for SAS-Post your comments at http://s4sas. For example. we are doing two operations. title "Sample accounts with FICO Score". Any output operations in between call and close statements are directly written into an HTML output file. .com 57 HTML statement. The data step above creates a SAS dataset ‘account_perf’.

The standard template is used by default by SAS for the display.S for SAS-Post your comments at http://s4sas. Dynamic Data Exchange (DDE) Dynamic Data Exchange (DDE) is a MS Windows protocol for dynamically transferring data between Windows-based applications using a client/server model. SAS preserves the label. Now let us assume we designed one excel template and stored it as ‘D:\MyReports\DDE_test. The output template provided by default can be changed by creating templates using PROC Template procedure and the same can be applied at ODS run time. Using this protocol SAS System can request data or send data and commands to other windows applications like Excel or PowerPoint. which is an advantage over PROC EXPORT. How do we produce an Excel report? One can just modify Line: 1 and change the file extension to . Now let us look at an example to understand how DDE works in SAS. so that we could play with it?” We hear the all the time from the managers. Line: 10 -Tells SAS to close the ODS file opened for various output.blogspot. “Could you put the data in an Excel spreadsheet.com Line: 6. Web based reports or SAS based reports (PROC REPORT or PROC TABULATE) do not meet this requirement of the customer but mastering DDE can help the analyst to create the customized reports in Excel This is what you can achieve through DDE when working with Excel i) ii) iii) iv) You can have a nice template designed in Excel and direct the data into targeted cells in the template You can put data sheets in excel and make refresh the excel reports or graph You can save the report into a new name and save it in a specified folder You can run a macro you recorded in excel which does additional formatting to the output report. as they really like navigating through the spreadsheet and doing additional calculations.xls (instead of .html) to create an excel file! This file can be opened in Excel with all formats and labels preserved. In the earlier example we had a limitation that we could not use our own excel template for reporting .FREQ procedure is used to display the number of accounts for each client. format. titles that are applied on the data. Now we want SAS to open this template and fill in the data for various columns.html file as 58 output tables. When writing into the html file. . All the outputs of the procedure used above are written to D:\ods_report_test.xls’ .

SLEEP command to stop SAS system from processing for 3 minutes. Now we have the template open in Excel and ready to receive data. FILENAME DDE_SAS DDE "EXCEL|ACCOUNT_PERF!R2C1:R20C6" NOTAB. PUT '[OPEN("D:\MyReports\DDE_test. “09”x is a delimiter specification that tells SAS to put a TAB after each variable written into the file. Next step is to drive the data into the spread sheet in a DATA step. Please note that any data outside this defined area will be ignored. If you don’t have a shortcut created. We will do it using the same FILENAME statement. ‘Account_Perf’ in the template is defined as the target worksheet and the data area marked contains rows starting from 2(R2) and ending at 20 (R20) and columns 1(C1) to columns 6(C6). RUN.X command is used to invoke the excel short-cut in D: drive.) . FILE DDTEM.S for SAS-Post your comments at http://s4sas. PUT statement is used to write specified fields in to a file that is specified earlier with FILENAME statement. RC = SLEEP(3). please create one! Line:2 – A FILENAME statement that establish a link with Excel system opened with a keyword ‘DDE’ DDTM is the handle defined to refer to this link later in SAS data step. x '"C:\Program Files\Microsoft Office\Office\excel. This is to make sure that the template in Excel is opened before SAS proceeds with the data updating in the steps followed. We need to implement another link to target worksheet and define the area in the template before we tell SAS where to put the data. Line:4 – FILE statement to tell SAS which link is established with excel to be used Line:5 – PUT command – passes on the instruction to EXCEL system – Here the instruction is to open the template we created.exe" '. Line:3 – A temporary dataset is created to invoke the template. Line:6. Here. (Please note that dataset used in previous section is used here .blogspot. DATA _NULL_.com 59 First let us invoke excel and open the template we designed: options noxwait noxsync. FILENAME DDTEM DDE 'EXCEL|SYSTEM'.xls")]'. Here is how we do it. Line:1.

xls!painter")]'. SET account_perf.AS("D:\MyReports\Client_report_formatted. FILE DDE_SAS. PUT '[QUIT()]'. which does some formatting on the reports and graphs after the data was updated.blogspot. The above code use the link established with excel and drive the data from account_perf to ‘DDE_SAS’. SAS based reporting is largely limited to ODS HTML and DDE in analytics environments. PUT '[SAVE. PUT statement specifies the variable names and the order.S for SAS-Post your comments at http://s4sas. Suppose our file had a macro called ‘painter’ defined and we are now running the excel macro through SAS and saving the file with a different name. However SAS procedures like PROC REPORT and PROC TABULATE provide lot of features to customize and format the reports.xls")'. With DDE this can also be achieved from SAS. RUN.xls. PUT '[RUN("Client_report. DATA _NULL_. "09"x is the delimiter (ie TAB) . Flexibility is that we can define the specific areas in Excel sheet and put the values from SAS dataset there. .com 60 DATA _NULL_. FILE DDE_SAS. PUT client $ "09"x account "09"x fico_seg $ "09"x current_os "09"x tot_payment "09"x. RUN. Coupled with Macros. We have seen the basic form of how DDE is used in automating the Excel reports with SAS. DDE could be used as a powerful method to create EXCEL based reports. To do this task analyst have to open the file and run the macro. To conclude. Suppose there is an Excel macro on Client_report.

Cmart 1002 401-500 300 100 A C Cmart 1003 501-600 200 150 A C Cmart 1004 601-700 1200 180 A D Cmart 1005 701-800 800 190 Z A Cmart 1006 801-900 450 200 Z A GIA 1007 401-500 560 210 Z A GIA 1008 501-600 450 180 A D GIA 1009 601-700 900 145 A D GIA 1110 701-800 300 148 Z D . and when we refer a macro variable it is preceded by an ampersand sign &. When we submit our program. we can consider macro language to be composed of macro variables and macro programs. label account="Account #" fico_seg = "Fico Segments". These are symput and symget. For example its possible to declare a list of variables used in print and freq procedures repetitively as a macro variable and use it instead of a long list of variables. format current_os dollar10. Also its possible to compute the data range of the data for a report and assign that to a macro variable so that all procedures and data step can access this information. All the key words in statements that are related to macro variables or macro programs are preceded by percent sign %. substituting them with the text string they were defined to be and then process the program as a standard SAS program There are two functions that are particularly useful when we want to get information in and out of a data step. Now let us look at some examples: DATA account_perf. SAS will process the macro variables first. A macro variable can be created by using the %let statement.com 61 SAS Macro Processing The SAS macro language is help to reduce the amount of repetitive SAS code and it facilitates passing information from one procedure to another procedure. You use symput to get information from a data step into a macro variable and symget is used when we want to get information from a macro variable into a data step. Furthermore. Generally. INPUT client $ account fico_seg $ current_os tot_payment ext_status_code $ .S for SAS-Post your comments at http://s4sas.blogspot.2 . In this chapter will demonstrate how to create macro variables and how to write basic macro programs. we can use it to write SAS programs that are "dynamic" and flexible. cards. The scope of the variables is mostly limited to the data step or the procedures where they are created or used but through macro facility we can make them available throughout the program. SAS Macro Variables Many times we create macro variables to make them available through out various data steps and procedures.

blogspot. finally. this will give some idea about the scope of variables and their availability.S for SAS-Post your comments at http://s4sas.).edate). Though this example is not a good use of macro variables. run.'B'). Two columns are assigned to the variable. The program above is written for a weekly application reporting purpose where every table in the report should have a header with start date and end date. . First line after the DATA step uses %LET statement to declare a macro variable ‘varlist’. &sweek.com 62 .run. Further applications considered for these reporting should fall between these dates. run. bdate = put (bd.. var &varlist. Proc print data =account_perf. proc means data = account_perf. date9. edate = put(bd+6. %let sweek =-1. var &varlist. run. bd= INTNX('WEEK'.The PROC Means procedure also uses the same variable to generate the statistics. The program below demonstrates the use of SYMPUT function to create macro variables from a SAS data step. date9.). Line:7.(Note that there could be more variables or string that could be assigned) Line:4 – Var statement uses the ‘varlist’ variable with a ‘&’ prefix. Title " Weekly report ". %let varlist = current_os tot_payment. Title " Weekly report from &bdatec to &edatec". call symput ("bdatec". as many times we have to assign values to macro variables from a SAS data step. This is particularly useful. data _null_. Proc print data =account_perf. the report file also should have this date stamp. call symput ("edatec".bdate). var &varlist. run.today().

INPUT client $ account fico_seg $ ext_status_code $ . Lowes 1008 101-200 450 180 A Lowes 1009 201-300 900 145 A Lowes 1110 301-400 300 148 Z Lowes 1002 401-500 300 100 A Lowes 1003 501-600 200 150 A Lowes 1004 601-700 1200 180 A Lowes 1005 701-800 800 190 Z Lowes 1006 801-900 450 200 Z GIA 1008 101-200 450 180 A GIA 1009 201-300 900 145 A GIA 1110 301-400 300 148 Z GIA 1002 401-500 300 100 A GIA 1003 501-600 200 150 A GIA 1004 601-700 1200 180 A GIA 1005 701-800 800 190 Z GIA 1006 801-900 450 200 Z current_os tot_payment . DATA account_perf.blogspot. Sas programs are usually written to do a task repetitively. Line 1: declares macro variable ‘week’ with %let statement. cards. Now these dates are available during data extraction and reporting steps of the program. Line 3: INTX function is used to compute the beginning day of previous week.com To achieve this previous week start date and end dates are computed in a data step and assigned to macro variables. 63 There are some uses of macro variables but they are extensively used in macro processing to pass on values into a SAS Macro. Now let us look at a program to understand the macro processing . Macro Programs A macro program is similar to a subroutine or a function in other programming languages.S for SAS-Post your comments at http://s4sas. We will see that in the following sessions. Line 10: The macro variables are used in the title statement of Print procedure to print start date and end date. A sample data set is created with three clients and various score segments. Line 4: end date is computed using the beginning date. A macro program always starts with the %macro statement including the user defined program name and it ends with a %mend statement. Lines 6&7 : Call symput statement is used to assign the value of edate and bdate to macro variables edatec and bdatec.

proc print data = &dataset. These variables can take different values. run. %mend.'Z'.'A'. After the data step line 1: defines the macro with %macro statement followed by macro name ‘%score_seg’. where client=&client and ext_status_code =&st_code. The macro variable passed on are used to print a meaningful title . This is followed by. Title "FICO Segment for External Status=&st_code and Client =&client". %score_seg('GIA'.'Z'. The report needs to be produced individually for the clients specified.account_perf) %score_seg('Cmart'.'Z'. in parenthesis. Line 2: Proc print statement is specified and data statement takes a macro variable as the input data. every time a macro call is made .dataset). We can write a proc print step for each and every combination but this macro simplifies that. This task is repetitive.'A'.account_perf) Objective is to produce a set of reports to show FICO score segments and Current Outstanding based on the External Status code. run. the arguments or macro variables that are passed on to the macro.account_perf) %score_seg('Lowes'.account_perf) %score_seg('Lowes'.account_perf) %score_seg('GIA'. The value assigned to this macro variable will be read as dataset name Line 3: Title statement in PROC print . Now let us assume that the data set has more than 10 client and several status codes and a large number of records.'A'.com 64 Cmart Cmart Cmart Cmart Cmart Cmart Cmart Cmart . Only client status code and datasets are variables so they are the best candidates for parameters (macro variables) that can be passed on from a Macro call.S for SAS-Post your comments at http://s4sas.st_code.blogspot.account_perf) %score_seg('Cmart'. var client ext_status_code fico_seg current_os. 1008 1009 1110 1002 1003 1004 1005 1006 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 450 900 300 300 200 1200 800 450 180 145 148 100 150 180 190 200 A A Z A A A Z Z %macro score_seg (client.

Exercise for you: Collect 3 Macro programs from your team members written for various purposes.com 65 Line 5: Where statement takes the values from macro variables passed on . Macros are extremely useful in SAS programming environment and used in various contexts to simplify the programs or to bring efficiency to data processing. Values are passed on to the macro with a comma.S for SAS-Post your comments at http://s4sas. List down the functionality (what the macro does) and the context they are used. . Line 9 onwards: the same macro is called for various combinations. This will ensure that the data is filtered as per the specifications of the user. Explain what way it simplifies the program.blogspot. Line: 7 Tells SAS to end the macro with a %mend statement Line 8: Calls the macro ‘%score_seg’. Proc print is customized for the analyst requirement What we have seen above are basic macro constructs and some simple examples.

and you can use it with any SAS data set (table). . Connect To ORACLE(User=501115644 Password=ypasswd Buffsize=10000 Path=CDCIT1 Preserve_Comments ). Often.S for SAS-Post your comments at http://s4sas. PROC SQL is part of Base SAS software. data set options. widely used language that retrieves and updates data in relational tables and databases. PROC SQL can be an alternative to other SAS procedures or the DATA step. PROC SQL is used in analytics to : • • • • Retrieve data from database tables or views (Oracle or SQL Server) Combine SAS datasets from tables or views (MERGE) Create datasets and indexes Compute statistics and Generate reports Data extraction with PROC SQL We often use PROC SQL to extract data from various warehouses.blogspot. billing_cycle_day AS billing_cycle_day FROM ACCOUNT_DIM WHERE CLIENT_ID='BROOK BROS' AND nvl(EXTERNAL_STATUS_REASON_CODE. informats. Disconnect From ORACLE. external_status_reason_code AS ext_rcode. Quit. The SQL procedure is SAS’ implementation of Structured Query Language. CREATE TABLE acct_status AS SELECT * FROM Connection To ORACLE (SELECT current_account_nbr AS account_number. and formats with PROC SQL just as you can with other SAS procedures. PROC SQL. Below is an example reproduced from a previous chapter. functions.com 66 PROC SQL and SAS Structured Query Language (SQL) is a standardized.external_status AS estatus. You can use SAS language elements such as global statements.'0') <>'98').

SAS Data steps with PROC SQL PROC SQL can perform some of the operations that are provided by the DATA step and the PRINT. INPUT client $ account fico_seg $ cards.blogspot. DATA account_perf. When querying a data warehouse. sum(current_os) as tot_os . CREATE TABLE statement creates a SAS Dataset from the output of SELECT statement. Further. sum(tot_payment)as tot_payment from account_perf group by client order by tot_os descending .com In the above example PROC SQL use a connect string “Connect To ORACLE(User=501115644 Password=ypasswd Buffsize=10000 Path=CDCIT1 67 Preserve_Comments )” to identify the oracle database (CDCITI) and use the user name and passwords specified to read data from it. and SUMMARY procedures. Cmart 1002 401-500 300 100 Cmart 1003 501-600 200 150 Cmart 1004 601-700 1200 180 Cmart 1005 701-800 800 190 Cmart 1006 801-900 450 200 GIA 1007 401-500 560 210 GIA 1008 501-600 450 180 GIA 1009 601-700 900 145 GIA 1110 701-800 300 148 . select client. quit. current_os tot_payment . For example. SORT. Let us create a dataset and then we will see how PROC SQL works like a DATA step. Line 2: Assigns a title to the output of SELECT statement that follows Line 3: Select statement with group function SUM used to summarize the data. a date field in Oracle will be converted into SAS date and an Oracle Varchar field will be converted into character. PROC SORT and PROC SUMMARY. PROC SQL. title "Summary of O/S and Payments by Client". PROC SQL automatically converts the field formats in to SAS formats. GROUP BY clause is used to compute the SUM for each distinct group in the database. The above program block shows how a PROC SQL is substituting a PROC PRINT. ORDER BY is . run.S for SAS-Post your comments at http://s4sas.

SELECT client. In our example below. sum(current_os) as tot_os . PROC SQL and SELECT statement We have seen above how SELECT statements are used in PROC SQL.’ and it terminates the procedure. Line 3: Note the CREATE TABLE <table name> AS statement. title "Summary of O/S and Payments by Client". ORDER BY. Output is always printed to the screen (like PROC PRINT) and it’s also possible to create a SAS dataset from the output. We will see some sample select statements and how it is used conditionally to work with data.blogspot. In our example below total outstanding in sorted in a descending order. To create a dataset from the output the above program can be modified as follows: PROC SQL. sum(tot_payment)as tot_payment FROM account_perf WHERE client in ('GIA'.S for SAS-Post your comments at http://s4sas. Most of data retrieval and data combining are done using select statement. 68 A PROC SQL statement ends with ‘QUIT\.com used to sort the output and DESCENDING keyword is to control the sort order. Other clauses added to SLECT statements to restrict the data retrieval or conditional processing . sum(tot_payment)as tot_payment FROM account_perf The SELECT statement must contain a SELECT clause and a FROM clause. It is also possible to SORT by multiple variables. both of which are required in a PROC SQL query. GROUP BY and HAVING. SELECT client. The WHERE clause restrict the data that you retrieve by specifying a condition that each row of the table must satisfy. Those clauses are WHERE. sum(current_os) as tot_os . In the example above the simple SELECT statement is shown below. clients are restricted to GIA and Cmart only. sum(current_os) as tot_os . sum(tot_payment)as tot_payment from account_perf group by client order by tot_os descending . create table summary as select client. . 'CMART') ORDER BY sorts the output in ascending or descending order as specified. quit.

S for SAS-Post your comments at http://s4sas.blogspot.com

69

SELECT client, sum(current_os) as tot_os , sum(tot_payment)as tot_payment FROM account_perf

WHERE client in ('GIA', 'CMART') ORDER BY tot_os descending ;
GROUP BY computes the statistics for each category of values in the specified variable. A summary or group function like average or SUM in SELECT statement is followed by GROUP BY clause to instruct SAS that the statistics should be computed for each group of data. Let us look at our modified example to see how total outstanding and total payments are computed client wise. PROC SQL; SELECT client, sum(current_os) as tot_os , sum(tot_payment)as tot_payment FROM account_perf GROUP BY client; quit; The HAVING clause works with the GROUP BY clause to restrict the groups in a query’s results based on a given condition. PROC SQL applies the HAVING condition after grouping the data and applying aggregate functions. For example, the following query restricts the groups to include only the client GIA. PROC SQL; SELECT client, sum(current_os) as tot_os , sum(tot_payment)as tot_payment FROM account_perf GROUP BY client HAVING Client='GIA'; quit;

Data retrieval Methods using SELECT
Using the dataset in above example, we will demonstrate how to retrieve data from a single table and how to create SAS datasets from resultant output.

I. SELECT – All columns in a Table
PROC SQL; SELECT * FROM account_perf; quit; II. SELECT- Specific columns in a Table PROC SQL; SELECT client,current_os quit; FROM account_perf;

S for SAS-Post your comments at http://s4sas.blogspot.com III. SELECT-How to create dataset from SELECT statements

70

As we have seen from the previous examples, just add a CREATE TABLE <Table Name> AS before the SELECT statements. So the above example so modified would look like as follows PROC SQL; CREATE TABLE Sample_Dataset AS SELECT client,current_os FROM account_perf; quit;

IV. SELECT- Eliminating duplicate rows PROC SQL; SELECT DISTINCT client quit;

FROM account_perf;

V. SELECT-Computing values PROC SQL; SELECT client,(current_os/1000)as OS_in_1000 quit;

FROM account_perf;

Note that row level computing can be done using the formula or multiple columns. VI. SELECT-Assigning Column Alias and formatting it. We have seen earlier in our examples that a new column name is formed with ‘AS’ statement in SELECT statement. Its also possible to specify the format of that variable in PROC SQL. Let us have a look at how OS_in_1000 variable is formed. PROC SQL; SELECT client,(current_os/1000)as OS_in_1000 format =4.2 account_perf; quit; VII. SELECT – Conditional Assignment using CASE Using CASE statement for conditional processing is a powerful feature of SAS Data step and PROC SQL. Here is an example where in SELECT statement CASE statement is used to create a new field Risk_Category based on certain conditions. PROC SQL; SELECT client,current_os, CASE WHEN current_os <= 300 THEN 'Low Risk' WHEN current_os <= 800 THEN 'Med Risk' ELSE 'High Risk' END AS Risk_category from account_perf; quit; Note that unlike DATA step CASE constructs, in PROC SQL each line does not end with a semi column. Also ‘AS’ key word is logically follows after the END of the loop.

FROM

S for SAS-Post your comments at http://s4sas.blogspot.com

71

VIII. SELECT-Specifying COLUMN attributes You can specify the following column attributes, which determine how SAS data is displayed: FORMAT= INFORMAT= LABEL= LENGTH= If you do not specify these attributes, then PROC SQL uses attributes that are already saved in the table or, if no attributes are saved, then it uses the default attributes. Let us have look at an example: PROC SQL; SELECT client,current_os format =4.2 label ='Current Outstanding' FROM account_perf; quit; IX. SELECT – Using Sub queries SUB Queries and Queries inside a Query. Instances where the WHERE clause evaluates the output of another SELECT statement, the second SLECT statement is known as a SUB Query. Here is an example: PROC SQL; SELECT client,current_os FROM account_perf WHERE client IN (SELECT distinct client where tot_payment >180); quit;

FROM account_perf

Though not a real life scenario, the above example demonstrates how a Sub-Query is used. The sub-query returns a list of clients that had at least one payment more than $180 and their current outstanding is listed for all accounts. In real life, esp when we work with multiple tables, sub queries are very useful to frame the right WHERE clauses for data retrieval. Now let us look at some conditional operators used in a WHERE clause of a SELECT Statement. Exercise for you is to frame a query using these operators. Consult some online help for syntax help. Operator ANY ALL BETWEEN-AND CONTAINS EXISTS Definition Specifies that at least one of a set of values obtained from a sub query must satisfy a given condition Specifies that all of the values obtained from a Sub query must satisfy a given condition Tests for values within an inclusive range Tests for values that contain a specified string Tests for the existence of a set of values obtained From a sub query

PROC SQL. We commonly use SAS DATA step and MERGE method to achieve this task. below given is a list of group functions you can use in a PROC SQL statement. unless you use a GROUP BY clause. avg(tot_payment)as avg_payment FROM account_perf GROUP BY client. SELECT client. PROC SQL applies the function to the entire table. When you use an aggregate function. When you execute these program blocks. avg(tot_payment)as avg_payment FROM account_perf. FREQ.blogspot. avg(current_os) as avg_os . quit.GROUP FUNCTIONS There are a lot of group functions we can use in SELECT Statement of a PROC SQL. Function AVG. quit. the first PROC SQL computes the averages for each client group and the second PROC SQL computes them for the entire table. avg(current_os) as avg_os . MEAN COUNT. SELECT. Having seen how a group function is used. N CV MAX MIN NMISS RANGE STD STDERR SUM VAR Definition Mean or average of values Number of nonmissing values Coefficient of variation (percent) Largest value Smallest value Number of missing values Range of values Standard deviation Standard error of the mean Sum of values Variance XI. SELECT – Joining the tables When we work with multiple SAS tables we often will have to join them for data processing. PROC SQL can . Note that all of these are substitutes for a PROC SUMMMARY or PROC MEANS statistics.S for SAS-Post your comments at http://s4sas.com 72 IN IS NULL or IS MISSING LIKE Tests for values that match one of a list of values Tests for missing values Tests for values that match a specified pattern X. Here is an example: PROC SQL. SELECT client.

S for SAS-Post your comments at http://s4sas.blogspot.com be used as a simpler substitute for the MERGE data step method. Let us create another dataset so that we can demonstrate the example. DATA account_perf2; INPUT account fico_seg $ cards; 1002 401-500 400 100 1003 501-600 300 150 1004 601-700 400 180 1005 701-800 900 190 2009 601-700 600 145 1007 401-500 660 210 1008 501-600 750 180 2006 801-900 550 200 2009 601-700 600 145 2110 701-800 400 148

73

prev_os prev_payment

;

;run;
Now to join these tables for all common accounts (equi-join), we use DATA step and MERGE statement as follows. Note that in the data step, dataset should be sorted before we MERGE them. proc sort data=account_perf out=account_perf; by account; run; proc sort data=account_perf2 out =account_perf2; by account; run;

Data merged1; merge account_perf(in=a) account_perf2(in=b); by account; if a=b; run; Now the same results can be achieved using PEOC SQL as follows. Proc SQL; create table merged2 as Select a.*, b.* account_perf2 b where a.account=b.account; quit; The above example demonstrates that how PROC SQL can simplify the coding. Not only that we could avoid the data sort, now we can make use of the powerful WHERE clause to exactly tell SAS various conditions of merging. With the use of Sub-queries and conditional processing (like IN, NOT IN , LIKE , CONTAIN) in WHERE clause, we can achieve any combination of data merging.

from

account_perf a,

S for SAS-Post your comments at http://s4sas.blogspot.com PROC SQL in SAS also provides direct merging of multiple tables with RIGHT JOIN, LEFT JOIN and FULL JOIN keywords for various Outer joins. Readers are requested to explore them as well.

74

S for SAS-Post your comments at http://s4sas.blogspot.com

75

Unix for SAS Analysts
Unix version of SAS is commonly used in organizations with large number of users. Additionally, PC SAS and its Remote SUBMIT features are used to connect to Unix SAS. While working with Unix SAS an analyst or SAS Programmer should be comfortable with a few Unix navigation and utility commands that enables him/her to work independently and efficiently. Here are some topics I thought would be useful. Where and how typically Unix is used : • • • Mostly the servers are Unix based with an ip address (e.g 3.172.21.66). SAS will be installed in this server and also some facility for file storage. Mostly Databases/warehouses are in UNIX servers – Oracle and SAS based data warehouses. SAS in Unix Server using Signon, Rsubmit, and batch execution etc. Files/datasets are uploaded and downloaded to and from Unix Servers(server folders)

Unix Server Spaces or Remote Storage
Many a times the data is permanently stored in one of the Unix servers. One should know how to use a library function to store and retrieve a dataset in unix. For example, if the allocated space is in ‘/projects/dual_cards/jkurian ‘, to save or retrieve a SAS dataset into this location, you need to declare a library name in SAS program as follows: Libname rloc '/projects/dual_cards/jkurian'; proc datasets lib=rloc;run; Proc datasets will list all the datasets in this location. Note that in Unix, the ‘/’ sing is used to separate the directories. Reverse is the case when you work with windows.

SASWORK in UNIX
‘Saswork’ is a location in Unix (server space) where SAS does the data processing by default. It’s a shared space where each SAS Unix user is allocated with space to process their request. Typically the size of SASwork runs into 100 plus GBs but due to the number of users and volume of data used, this space gets consumed very fast. Exhausting this space can lead to terminating all the SAS program submitted by multiple users hence it’s a responsibility of an analyst to monitor this space.

700. sas pgm etc) . 775 – all access to the group. Always used when we want to list the files created by a user or space utilization.prints the contents to the screen chmod : Change file attributes (Read / Write / Executable for owner.sas – lists lines where there is a ‘jkurian’ occurrence anywhere inside the .protect files gzip : to compress a file Usage: gzip <filename> gunzip: Unzips the files that are compressed Usage: gunzip <filename> grep – a very powerful text searching command. empty or not cat : to read a text file (csv.S for SAS-Post your comments at http://s4sas. recursively. These commands are commonly used in consumer finance environment and meet 90% of the requirements while working with Unix SAS.blogspot.SAS files . group & others) Usage: chmod 777 <filename/folder name> Tips: 777. A List of Useful UNIX Commands pwd: to see what’s the present working directory Usage: /home/jkurian> pwd ls : Lists the files and folders ls –l : detailed listing of files and directories Wildcard usage: ls –l *. Usage 1: ls -l | grep jkurian – returns all files created by jkurian Usage 2: grep ‘jkurian’ *.com 76 Let us learn 15 Unix commands listed below.all access. sas7* Usage: /home/jkurian> ls –l mkdir : make a directory Usage: mkdir <filename> cd : change into a directory Usage: cd <folder Name> cp : to copy a files/directory Usage: cp <filename> <folder name/filename> rmdir: remove a directory rm : remove a file rm -R : Remove files and directories.

Telnet/Logon to the Server (your business server) 2. do ‘cd’ into that folder and use ‘rm <filename> ‘ command to remove the file.Kills a specified process that’s submitted by the user.S for SAS-Post your comments at http://s4sas.Lists the processes currently running. Issue command ‘df –k .’ The space utilization would be printed to the screen. FTP and How to work with it . Process ID or PIDs are required to kill a particular process.com 77 ps -ef . If you want to remove only files inside a folder. Useful when we want to kill some program we submitted. Space is showed in kilobytes. kill –9 . df -k . Shows the space utilized and space available for the root folder. 1. to get of folders created by you in SAS work. Usage: ps -ef | grep <user id> lists all the process id for that particular user. Steps to know how many folders are created in SASWORK 1 2 Telnet/Logon to the Server (your server) Issue command ‘cd /saswork’ 3 Issue command ls –l | grep ‘jkurian’ Substitute your username instead of ‘jkurian’. Issue command ‘cd /saswork’ .PID Obtained by using ‘ps’ command.blogspot. Steps to remove a folder in SASWORK that’s no more required 1 2 Go to /saswork Issue command ls –l | grep ‘jkurian’ to see the folders created by you 3 Issue command rm –R <foldername> to remove the entire folder.if /saswork is the work folder 3. 3. Its important to clean up the dead or orphan processes in your SASwork to optimize the saswork space. Usage : kill –9 <PID> . Helpful to estimate the space availability. Steps to see the SASWORK space availability 1. Don’t submit a program unless there is enough space free (At least 10% free) 2. 4.

Steps to run a SAS program in a Telnet Session (Unix Server) First create a SAS program and store it in one of the Unix folders OR FTP a program from the desktop. At the shell prompt issue the following command nohup sas <filename> For example: /home/jkurian> nohup sas scoretest.sas & nohup: to protect from hang ups – you can close the telnet window and the program would be still running! Sas: key word to invoke SAS program to execute the program &: Running the program in background – you can continue work in the same terminal You can locate the log and output files within the same directory as you submitted the program.S for SAS-Post your comments at http://s4sas.Issue ‘ ftp <ip address>‘ command in the Unix shell prompt. FTP from desktop .Go to START/RUN and type ftp <remote server name/ip> (Just like telnet) 78 FTP between two Unix servers (your server) .sas –matches the pattern mget: mget *. 5. Commonly Used FTP commands: put : put <filename> get : get <file name> mput: mput *. datasets etc lcd : change local directory path pwd : see present working directory ls: list files and folders in the remote server. text outputs) between servers and also to upload and download between desktop and servers.com File transfer protocol.blogspot. & . datasets.sas prompt: turns the prompt off bin: sets the transfer mode to binary.we use ftp utility/command to transfer files (sas pgms.use it for excel.

This is used when we wanted to read only a few records from large dataset for testing purposes. the date and time is always printed Usage: options nodate . The default is firstobs=1. Whenever we work with large data files make sure that this option is explicitly specified as it compressed the dataset up to 65% of its original size. this option forces SAS to begin reading at a specified line of data. COMPRESS=: Compresses the SAS dataset being created. Usage: options firstobs=3. SAS system options control many aspects of your SAS session. By default. There are a few other options that help to control the way SAS processes data. Usage: options obs=10 . and the attributes of SAS files and data libraries.blogspot. Usage: options compress =yes|No .com 79 Sum-up With OPTIONS Objective of this page is to discuss about various options in SAS that can be set locally to control the way SAS process data or display output. ERRORS= : controls the maximum number of observations for which complete error messages are printed. . COMPRESS is a great space saver . FIRSTOBS=: causes SAS to begin reading at a specified observation in a data set. Please consult the SAS help for a complete list. The default maximum number of complete error messages is errors=20. Usage: options errors =100. These options come with default values supplied by SAS or set locally.S for SAS-Post your comments at http://s4sas. Some of these options are often used. options obs=max . To return to using all observations in a data set use obs=max. OBS= : specifies the last observation from a data set or the last record from a raw data file that SAS is to read. If SAS is processing a file of raw data. NODATE= : suppresses the printing of date and time to the log and output window. including the efficiency of program execution.

77 0345900601/19/2001$1.9 3. Reading data using column input Note: Column input requires the data to be standard numeric or character data data employee.9 4.S for SAS-Post your comments at http://s4sas. Hourly Salary Salary Salary Reading data using formatted input This program reads in data using informats for a file with no delimiters.blogspot.7 3. the DLM= option and/or DSD option on the INFILE statement will need to be specified. amount comma9. A1237 3. infile datalines.2 3. This program reads in data separated by a blank space using simple list input style in the INPUT statement.com 80 Appendix-I Practice Programs Reading data using simple list input.05 3320899201/19/2001$702.6 3.0 .5 B3051 4. datalines. 0074309801/15/2001$1.61 .0 3. . proc print. proc print. format date mmddyy10.59 1028754301/17/2001$672. input acctnum $8. datalines. If your data is delimited by another character other than a blank or space. run. input lastname $ 1-10 fname $ datalines. date mmddyy10..8 3.209. proc print.. 12-21 ssn 23-31 status $ 33-38. data grades. run. data acctinfo.9 A9361 2.0 3. run. input student $ test1-test4.8 3. Green Samual 888888888 Brennon Carol 123456789 Wang Robert 999999999 Randolph Virginia 987654321 .003.

9.$300 . acct start=15-01-2000 first=Mary last=Lowe hr start=01-09-1984 first=Greg last=Richards oper start=01-11-1990 first=Cindy last=Lou oper start=15-05-1995 first=Julie last=Simpson acct start=01-03-1999 first=Sam last=Hampton . On EBCDIC systems (VM. datalines.txt'. MAC. input dept $ last= $ first= $ start= $10.txt' DSD dlm='09'x truncover.$500 "Chang. data grades. run. data info. MVS.3. infile 'c:\temp\tabdlm. run. UNIX.3.Daniel". infile datalines dsd.0. proc print. put "Suzy B. test1-test4 fee :dollar8..S for SAS-Post your comments at http://s4sas.Bertrum".. Thompson" '09'x "04/28/1995" '09'x "Raleigh". file 'c:\temp\tabdlm. input student :$20..5. DOB :mmddyy8.3. data deptinfo. datalines.com 81 Reading data using named input Read datalines or records that contain a variable name followed by an equal sign and the value. .3. data _null_.6.9. city :$20.. input name :$30.8. VMS) the hex representation of a TAB character is '09'x.0.0.3. Reading comma delimited data with modified list input Read in comma-delimited data by specifying the DSD option on the INFILE statement and using modified list-input style. put "Samuel B.4.blogspot.8.Fen". run.3.. Reading a TAB delimited file Read an external file into a SAS data set when the variables in the external file are separated with a TAB character.3. proc print.$400 "Elano. "Alexander. Note: On ASCII systems (PC.. VSE) the hex representation of a TAB is '05'x.4. run. Thomspon" '09'x "5/1/1993" '09'x "Wake Forest".

#2 @1 address1 $22. TAB or delimited file This program reads an exclamation point (!) delimited file variable names on the first row.! ".. . infile datalines n=4 truncover. run. getnames=yes.txt' out=work. Use of PROC IMPORT to read a CSV.!333!apple".test dbms=dlm replace.S for SAS-Post your comments at http://s4sas. run. data _null_. run. datalines. put"x1!x2!x3!x4".com 82 proc print./* Create test file to read using PROC IMPORT below. */ data test. proc print.. #4 @1 phone_no $12.txt'. format dob mmddyy8. proc print. input #1 @1 name $15. Sonya Larson 10054 Plum Tree Rd Buffalo NY 10068 716-555-1348 Kip Holfser 902 West Blvd Lansing. put "111!. */ /* Each person's data spans multiple records. /* note this is the first semi-colon */ delimiter='!'. run. proc import datafile='c:\temp\pipefile. */ file 'c:\temp\pipefile. #3 @1 address2 $30. /* Create sample data with variable length records. put "11!22!. MI 48910 517-555-0227 Chan Rong 3052 East Bank Way Savannah GA 30058 912-555-0025 Randy Nguyen 100 49th Street Harrisburg PA 19075 717-555-7773 .blogspot. Reading multiple records to create one observation This program creates one SAS record by reading multiple records from the source dataset.

put Region Returns Sales. set sashelp. statement.csv'.var2. run. proc import datafile='c:\temp\csvfile. Reading a comma delimited file with a .berry. file 'c:\temp\csvfile. run. you do not have to use the DELIMITER= required. data _null_. Creating an external file with column-aligned data With column style output.1 +5 height. Relative .coconut. put "apple.dewberry". */ data _null_. put"var1. Open the file 'd:\test. The sample below uses column style PUT for NAME and AGE. put name 1-8 age 13-15 @20 sex +5 weight 5. put "apricot. data in columns.var4".csv' out=work. run.blogspot. /* Create comma delimited test file to read using PROC IMPORT below.com 83 run. set sashelp. file xx dlm='~'. Also assuming the variable names are on the first row.S for SAS-Post your comments at http://s4sas. proc print. run. file log. an absolute control pointer ("@") specifies column 20 as the starting position for SEX.csv extension Since the DBMS= value is CSV. You can also use control pointers on a PUT statement to align control pointers ("+n") help align WEIGHT and HEIGHT. specify the starting and ending column numbers for the output data after the variable name.shoes (keep= Region Returns Sales obs=20). the GETNAMES= statement is also not Creating a delimited file using a PUT statement filename xx 'd:\test.txt' to see the output.txt'. Below.fruit dbms=csv replace.class (obs=8).var3.banana. run.crabapple. data _null_.date".

S for SAS-Post your comments at http://s4sas. input name $ age. proc print data=both. merge one two. set one two. A Simple MERGE /* Create sample data */ data one. run. . a amber b brown c cream c cocoa c carmel . Daniel 33 1 Terry 40 2 Michael 60 3 Tyrone 26 4 . datalines. input id $ color $.blogspot. run. data both. datalines.com 84 Concatenating data sets Using SET data one. by id. input name $ age group. run. datalines. Chris 36 Jane 21 Jerry 30 Joe 49 . input id $ fruit $. data both. data two. datalines. a apple a apple b banana c coconut . data two. proc print data=both.

proc print data=one_only. else output two_only. Convert missing values to zero and values of zero to missing This program converts missing values to zero and values of zero to missing for numeric variables.S for SAS-Post your comments at http://s4sas. title 'One only'. run. run. datalines. proc print data=both. input id $ name $ dept $ project $. proc print data=two_only.blogspot. .com 85 Merging and creation of subsets based on Origin This program demonstrates how to do Merging data sets by a common variable and create output data sets based upon observation origin. run. Method is to Use the ARRAY statement with the automatic _NUMERIC_ variable to process all the numeric variables from the input data set. title 'Two only'. else if in1 then output one_only. Use the DIM function to set the upper bound of an iterative DO to the number of elements in the array. run. if in1 and in2 then output both. data both one_only two_only. by id. data two. title 'Both'. merge one(in=in1) two(in=in2). 111 Fred 35 222 Diana 40 777 Steve 0 888 Monique 37 999 Vien 42 . data one. input id $ name $ projhrs. 000 Miguel A12 Document 111 Fred B45 Survey 222 Diana B45 Document 888 Monique A12 Document 999 Vien D03 Survey . datalines.

0 8 9 9 . set deptnum. run. then testmiss(i)=0. datalines. run. input dept qrt1 qrt2 qrt3 qrt4. do i = 1 to dim(testmiss). array testmiss(*) _numeric_.S for SAS-Post your comments at http://s4sas. value codesc 1-50='low' 50-high='high'. data deptnum. run. run. input var1 var2 var3.com 86 /* Example 1 . if testmiss(i)=. datalines. do i = 1 to dim(testzero). 101 3 0 4 9 410 8 7 5 8 600 0 0 6 7 700 6 5 6 9 901 3 8 7 0 . 7 1 4 . proc print. */ /* Create sample data */ data numbers. 5 6 2 8 3 0 .blogspot. data nozero(drop=i). end. end. proc print. set numbers.Convert all numeric missing values to zero. data nomiss(drop=i). Convert selected numeric values from zero to missing Use the ARRAY statement to define the specific numeric variables to change from a value of zero to a missing value. .. array testzero(*) qrt1-qrt4. run. Create and apply user-defined formats proc format. Use the DIM function to set the upper bound of an iterative DO to the number of elements in the array. if testzero(i)=0 then testzero(i)=.

To left align the resulting character value. specify -L after the format specification. Specify a numeric format that describes how to write the numeric value to the character variable. 123456 110204 1000 120504 . sasdate=input(date.blogspot. datalines. proc print data=values. set values. . Convert values from character to numeric This program converts a character value to a numeric value by using the INPUT variable..codesc. datalines.S for SAS-Post your comments at http://s4sas..). 64412 D0001568 49 64412 Z0056012 51 78001 C0000969 3 78001 F0032140 11 88204 B0000073 79 89569 R0022217 1 99301 H0009355 99 99301 C0000889 58 . format sasdate mmddyy10.com 87 data values. Specify a numeric informat that best describes how to read the data value into the numeric Convert values from numeric to character Converts a numeric value to a character value by using the PUT function.). data values.56 031704 3920 123104 . date :$6. data char. input string :$8.mmddyy6. numeric=input(string. codefmt=put(code. data num. function. input prodnum custnum $ code. run. run.. input num date: mmddyy6. datalines. 1234. proc print. run.).8.

proc print data=sortdate. run. run. proc print data=tourdate. format depart worddate18.com 88 data now_char. run. proc print data=dates. var depart country nights.blogspot.. run. .S for SAS-Post your comments at http://s4sas. -L). set dates. run. proc print. nights. by depart. num=put(oldnum.date9. run. title 'Report with Departure Date Spelled Out'. title 'Departure Dates Listed in Chronological Order'. run. data dates.6. run. date=put(olddate. Japan 13may89 8 Greece 17oct89 12 New Zealand 03feb90 16 Brazil 28feb90 8 Venezuela 10nov89 9 Italy 25apr89 8 USSR 03jun89 14 Switzerland 14jan90 9 Australia 24oct89 12 Ireland 27may89 7 . set num (rename=(num=oldnum date=olddate)). proc print data=dates. data tourdate. cards. format depart mmddyy8.. Working with Dates in the SAS System SAS System has various date formats a the programs below demonstrates various such formats in SAS data steps. proc contents data=tourdate.). title 'Departure Dates with SAS Date Values'. run. title 'Departure Dates in Calendar Form'. input country $ 1-11 @13 depart date7. proc sort data=tourdate out=sortdate.. format depart date7.

run. proc print data=temp. return=depart+nights. rightnow=today(). title 'Age in Years of Tradewinds Travel'. run. ageyrs=agedays/365.1 start rightnow date7. format now date7.. proc print data=home. set tourdate. title 'Date and Day of Week Payment Is Due'. format duedate weekdate29.blogspot. run. now=today(). run. run.S for SAS-Post your comments at http://s4sas. if now+90<=depart<=now+120. title 'Corrected Value for Switzerland'. if weekday(duedate)=1 then duedate=duedate-1. title 'Dates of Departure and Return'. format ageyrs 4. set tourdate. set tourdate... title 'Tours Departing between 90 and 120 Days from Today'. agedays=rightnow-start.25. run. data corrdate. start='08feb82'd. data ads. data pay. proc print data=pay. proc print data=corrdate. rightnow=today(). set tourdate. var country duedate. title 'Age of Tradewinds Travel'. run. format return date7. proc print data=ads.. start='08feb82'd.com 89 data home. age=rightnow-start. duedate=depart-30. run. /* Calculating a duration in years */ data temp2. format start rightnow date7. run. run. /* Calculating a duration in days */ data temp. . run. proc print data=temp2.. run. if country='Switzerland' then depart='21jan90'd.

com 90 Use of MDY function /* Use month. run.. set one. dd=bdays+dd. chardate=put(sasdate.day. proc print. mm=mm-1. tod=today(). months. datalines. end. input @1 dob mmddyy10.). Convert a SAS date to a character variable data one. 1 1 99 02 02 2000 . format sasdate mmddyy10. */ if dd < 0 then do. /* If the difference in days is a negative value. data two.. sasdate=mdy(month. 010199 . proc print.0)-1). /* Get the current date from operating system */ /* Determine number of days in the month prior to current month */ bdays=day(intnx('month'.mmddyy6.blogspot. run. mm=month(tod)-month(dob). . Calculate number of years. and years between */ /* start and end dates */ dd=day(tod)-day(dob). run. and days between two dates data a. run.S for SAS-Post your comments at http://s4sas. months. yy=year(tod)-year(dob). datalines.. data two. day and year variables to create a SAS date*/ data one.tod. input month day year. /* Find difference in days. input sasdate :mmddyy6. set one.year). add the number */ /* of days in the previous month and reduce the number of months */ /* by 1.

S for SAS-Post your comments at http://s4sas. proc print. /* Use INTNX to roll DATE back to the first of the year. end.. intrate.date)+1.com 91 /* If the difference in months is a negative number add 12 */ /* to the month count and reduce year count by 1. 010104 010404 041804 081804 123104 . Determine the week number of the year data test.intnx('year'. */ if mm < 0 then do. 01/01/1970 02/28/1992 01/01/2000 03/01/2000 05/10/1990 05/11/1990 05/12/1990 . datalines.blogspot. data getweek. run. run... yy=yy-1.. */ week=intck('week'. DO LOOP block This program demonstrates how to conditionally adjust variable values with a DO block in a SAS data step /* Create sample data */ data acctinfo. format duedate date9. * Create sample data */. input date :mmddyy6. mm=mm+12. run. . datalines. */ /* Pass the result as the 'start' parameter to INTCK.date. input duedate date9.0). format dob tod mmddyy10. proc print. datalines. set test. format date date9.

LAST_NAME = SCAN(NAME. TITLE "Names and Phone Numbers in Alphabetical Order (by Last Name)". if duedate lt today() then do. COLUMNS NAME PHONE LAST_NAME. and NEWLOAN */ /* based on the value of DUEDATE */ data acctinfo. ***Extract the last name from NAME.99.blogspot.10 10nov2010 1. intrate=intrate*1.. and Last name order.-1. proc print. status='On Time'. DEFINE LAST_NAME / ORDER NOPRINT WIDTH=20. Using the SCAN function Suppose you want to produce an alphabetical list by last name. @21 PHONE $13. The SCAN function makes quick work of this. DATA FIRST_LAST. /* Conditionally set values for STATUS. /* Scans from the right */ DATALINES. and LAST name.com 92 10oct2000 1. Snoker (908)782-4382 Raymond Albert (732)235-4444 Steven J. if duedate ge today() then do. newloan='Solicit'. end. newloan='Deny'. but your NAME variable contains FIRST. Foster (201)567-9876 Jose Romerez (516)593-2377 . . status='Late'. end. INTRATE. possibly a middle initial. INPUT @1 NAME $20. Jeff W. Note that the LAST_NAME variable in PROC REPORT has the attribute of ORDER and NOPRINT. set acctinfo. Middle. PROC REPORT DATA=FIRST_LAST NOWD. run.' ').S for SAS-Post your comments at http://s4sas.02. DEFINE PHONE / DISPLAY 'Phone Number' WIDTH=13 FORMAT=$13. intrate=intrate*.. run. DEFINE NAME / DISPLAY 'Name' LEFT WIDTH=20. so that the list is in alphabetical order of last name but all that shows up is the original NAME variable in First. RUN.12 .

proc print data=one. else no_numbers=string. run. input string $25. proc print data=two. datalines. /* Search for the letter 'c' */ datalines.blogspot. the cat came back catastrophic curious cat caterwauls .. input custno trip1 trip2 trip3 trip4 trip5 trip6 trip7 trip8 trip9 trip10. INDEXW functions searches a character expression for a string.'cat'). INDEXC. position=index(string. Using arrays and DO loop in Data Step This program demonstrates how to compute averages of variable values with arrays and DO loop. data tripinfo. my aunt amy in the army my oh my . datalines. input string $25. /* Sample 3: INDEXW */ data three. proc print data=three. infile datalines truncover. run.S for SAS-Post your comments at http://s4sas. * Sample 1: INDEX */ data one. 123 200 225 432 300 100 550 80 325 600 270 . if indexw(string. /* Search for the word 'cat' */ letter=INDEX(string. or word.. Box 101 Pine Street . input string $25.com 93 INDEX Function for String Search INDEX.. datalines.'c'). run.'my') > 0 then contains_the_word_my='yes'. if indexc(string.'0123456789')> 0 then has_numbers=string. * Sample 2: INDEXC */ data two. specific character.

. if i le 5 then do. keep custno avg5 avg10. /* The _TEMPORARY_ array values are used to populate the missing values*/ data new(drop=i). Use a DO */ /* loop to process variables in the array. if trip(i)=.. input var1 var2 var3. datalines. 10 20 30 100 . */ data average.S for SAS-Post your comments at http://s4sas. proc print. array now(3) var1 var2 var3. proc print. if now(i)=.blogspot. then now(i)=newval(i). then avg5=. determine */ /* if a condition is met and then perform a subsequent action. 300 . set test. end. array trip (10) trip1-trip10. 2000 3000 2205 1400 1385 1240 1000 900 890 1000 1025 1200 1120 1000 800 750 300 3000 3000 3000 3000 3000 699 599 /* Put variables TRIP1-TRIP10 into an array and with a DO block.1 . else avg10=mean(of trip1-trip10). then avg10=. Using _TEMPORARY_ arrays for Missing value Treatment /* Create sample data */ data test. if trip(i)=.2 . run. set tripinfo.3) . run. end. do i=1 to 3. run. do i=1 to 10. end. else avg5=mean(of trip1-trip5).com 94 124 125 126 127 . array newval(3)_TEMPORARY_ (. . run. 40 400 .

duration. data final. label title='Paper Title'..S for SAS-Post your comments at http://s4sas. run. x&i %end. x=1. run. cards. %mend test. test to plan 10:30 20 Peter Info SysArtificial Intelligence 9:30 45 Paul Info SysQuery Languages 10:30 40 Lewis Info SysQuery Optimisers 15:30 25 Jonas Users Starting a Local User Group 14:30 35 Jim Users Keeping power users happy 15:15 20 Janet Users Keeping everyone informed 15:45 30 Marti GraphicsMulti-dimensional graphics 16:30 35 Marge GraphicsMake your own point! 15:10 35 Mike GraphicsMaking do without color 15:50 15 */ . run. data x2. options mprint. x=3. input author$1-8 section$9-16 title$17-43 @45 time time5.com 95 A Simple SAS Macro This Macro shows a way to concatenate all datasets together without having to type in each one. data x1. %test proc print. %macro test.blogspot.. Tom Testing Automated Product Testing 9:00 35 Jerry Testing Involving Users 9:50 30 Nick Testing Plan to test. 1) as an empty copy of some other table 2) as the results of any valid SQL select expression 3) from the traditional SQL DML statements /* creates a base table for further use data paper. set %do i = 1 %to 3. BASIC PROC SQL Exercises This code shows three ways in which SQL can create SAS datasets. data x3. format time time5. run. x=2. run.

and empty copy of PAPER */ proc sql. P3. quit.S for SAS-Post your comments at http://s4sas. cards. proc sql. data parts. run. input cno $ pno $ qty. create table p3 as select * from paper where time > '12:00't.com 96 16:15 25 Jane . title2 'Table P3'. */ proc sql. 001 Part One 002 Part Two 003 Part Three . that contains all of the papers presented after 12:00.blogspot. proc contents data=p2. run. run. proc contents data=counts. this creates a table. proc print data=p3. create table counts( section char(20). run. * In one step. MERGING using PROC SQL This example demonstrate another example of merging using PROC SQL where the "common" variable has a different name in each table and the "common" variable has a different format and the ‘common’ variable has some prefix in some table and not in others. C001 P001 10 C001 P002 20 C002 P003 30 C002 P002 20 C003 P003 50 . title2 'Description of table COUNTS'. papers num). data orders. create table p2 like paper. cards. input no $ desc $ 4-20. */. use em! /* This creates table P2. unlike any existing table.quit. title2 'Description of table P2'. GraphicsPrimary colors. /* This creates a table.quit.

input a b @@. . p. 2 . cust c = c. run. proc freq. tables a*b / missprint. title 'NO TABLES STATEMENT'. 2) and substr(pno. title '2-WAY CONTINGENCY TABLE'. o. tables a*b. tables a / missprint. proc freq. title '2-WAY CONTINGENCY TABLE WITH MISSING OPTION'. run. 2) = quit. proc freq. . data cust. select o.Options available The examples below show various ways one can use the PROC FREQ procedure. title '2-WAY CONTINGENCY TABLE WITH MISSPRINT OPTION'.no. proc sql.cno. 1 2 2 1 .com 97 . run.no p. input no $ name $ 4-20. title '1-WAY FREQUENCY TABLE WITH MISSPRINT OPTION'.S for SAS-Post your comments at http://s4sas. cards. PROC FREQ. cards.pno. tables a*b / missing. data new. 1 1 .qty p. proc freq. c.blogspot. 001 Cust One 002 Cust Two 003 Cust Three . Please read the title of each step to understand what it does. 2 1 proc freq. options ls=132. from orders o.desc. o. tables a*b / list. run. parts where substr(cno. proc freq. run.name.

S for SAS-Post your comments at http://s4sas. data gains.3 98. class team. tables a*b / list. age . /*Example:2 */ input name $ height weight.0 Bennett 63. output out=results. PROC MEANS The examples below demonstrate the ways PROC MEANS used for basic statistics and variable checking. tables a*b / list sparse. cards. title '2-WAY FREQUENCY TABLE.com 98 title '2-WAY FREQUENCY TABLE'.blogspot. proc freq order=data. run.2 Carol 62. ORDER=DATA'.0 Barbara 65. run. title '2-WAY FREQUENCY TABLE WITH SPARSE OPTION'. run. run. class name. proc means nmiss n. Carol blue 5 Carlos blue 6 . 5 Bennett red .7 102.8 102.5 84.5 Carlos 63.2 96. /*Example:1 */ input name $ team $ cards. data gains. Alfred 69. .9 .5 Alicia 56. tables a*b / list missing. title '2-WAY FREQUENCY TABLE WITH MISSING OPTION'. run. proc freq. proc freq. run. proc print data=results.0 122. Alfred blue 6 Alicia red 5 Barbara . run. run. run. proc means noprint.

Rum them in SAS ans see how they are different from each other.0 AJH 1 Wakana F 63. var height weight.5 85. title 'Requesting Assorted Statistics'.5 84. proc means data=gains.1 92. A PA 87 A PA 90 A PM 82 A PM 71 A UN 72 A UN 77 B PA 79 B PA 80 B PM 73 B PM 72 B UN 70 B UN 66 C PA 77 C PA 81 C PM 72 C PM 68 C UN 62 C UN 61 . proc means data=gains maxdec=3 nmiss range uss css t prt sumwgt skewness kurtosis.0 AJH 2 Robert M 64. IF DILUTION='A' THEN DL=1.3 .0 AJH 1 Philip M 70.8 128. Alfred M 69.2 BJH 2 . BJH 2 Thomas M 57.5 86.3 118.0 115. 99 PROC SUMMARY A Number of PROC Summary examples are listed below.5 AJH 2 Alicia F 56. /* Use class variable COMPOUND to group data.0 BJH 1 Robert M 68.5 AJH 1 Alfred M 71. cards. class sex.3 99.0 AJH 1 Thomas M 59. ELSE IF DILUTION='B' THEN DL=2.0 130. /*Example 4:*/ title 'Statistics For All Numeric Variables'.0 122.3 AJH 2 Wakana F 61.8 102. DATA VIRUS. run.9 BJH 2 Philip M 69. run. run. ELSE IF DILUTION='C' THEN DL=4. output out=test max=maxht maxwght maxid(height(name) weight(name))=tallest heaviest.blogspot. var height weight. CARDS.com data gains. proc print data=test.9 AJH 2 William M 66. run. /*Example : 3*/ input name $ sex $ height weight school $ time. proc means data=gains.S for SAS-Post your comments at http://s4sas.0 118.0 BJH 1 William M 68. INPUT DILUTION $ COMPOUND $ TIME @@.0 BJH 1 Alicia F 60.5 112. */ .

9 Andrea F 28.6 27. PROC PRINT. slightly */ /* different from class. RUN. RUN.0 Sam M 27.0 36.3 27. class sex. CLASS COMPOUND. BY COMPOUND.0 29.6 34.6 Mike M 29.8 32. input name $ sex cards. RUN.6 32.2 Ellen F 27. PROC SORT. Comparison of MEANS and SUMMARY Output data relay. output out=newmeans min=.5 Jim M 26.7 Karen F 34.2 23. OUTPUT OUT=OUTA MEAN=M STD=S N=COUNT. RUN.9 proc means data=relay noprint. RUN. var back breast fly free.1 29.com 100 PROC SUMMARY PRINT.1 26. VAR TIME. PROC PRINT.2 33.2 31. PROC SUMMARY DATA=VIRUS.3 26.5 27.run.1 21. /* Use by variable to group data.9 27.2 . PROC SUMMARY PRINT N MEAN STD STDERR SUM VAR MIN MAX CV CSS USS RANGE NMISS.S for SAS-Post your comments at http://s4sas. VAR TIME. BY COMPOUND.2 30. CLASS COMPOUND.0 27. RUN.1 Carol F 32.0 22.9 32.8 Clayton M 27. 28. PROC SUMMARY DATA=VIRUS. . VAR TIME DL. RUN. $ back breast fly free.1 26.8 23.4 25. OUTPUT OUT=OUTA MEAN=M STD=S N=COUNT. Sue F 35.3 24.9 25.run. BY COMPOUND.blogspot.1 36. CLASS COMPOUND.4 24.3 33. */ PROC SUMMARY PRINT.6 Jan F 31. VAR TIME DL.0 24. RUN.

datalines. run.5 56. proc summary data=relay print min. /* Define title */ title 'Study of Height vs Weight'. title 'Using PROC SUMMARY with the PRINT option'.0 65. /* Generate scatter plot */ proc gplot data= stats.0 . proc print data=newsumm. var back breast fly free.3 98.blogspot.8 102.0 62.com/ctx/samples/index. PROC GPLOT A simple program to demonstrate the basic construct of GPLOT procedure. 69. /* Set the graphics environment */ goptions reset=all gunit=pct border cback=white colors=(black blue green red) ftext=swiss ftitle=swissb htitle=6 htext=4. run.8 128.0 57. class sex. plot height*weight. title 'Using PROC PRINT with PROC SUMMARY'. output out=newsumm min=.sas.5 112. /* Create the data set STATS */ data stats.0 67. input height weight.5 85.0 64.0 66.0 112.3 77.0 133. These Examples are mainly sourced from SAS Institute website.0 150. Please visit: http://support. run.S for SAS-Post your comments at http://s4sas.jsp .com 101 proc print data=newmeans. run.5 84. title 'Using PROC PRINT with PROC MEANS'.0 72.5 56.

49 I IF-ELSE Conditions · 37 IN statement . PROC SQL · 72 INPUT Statement · 14 C CARDS Statement · 15 CASE Condition · 70 COMPRESS Option · 79 Conditional Processing · 37 J Joins.com 102 Quick Index A Array Variables · 30 H HTML Output · 57 B BY Statement · 37. Table · 72 D Data Extraction · See Reading Data DATA Statement · 14 DATA Step · 8 Data Value · 10 Dataset · See SAS Dataset DATASETS Procedures · See Procedures DDE · See Reporting Delimited Data · See Reading Data DOWNLOAD Procedure · 48 K KEEP Statement · 29 L LABEL Statement · 31 LIBNAME Statement · 14 M Macro Language · 61 Macro Programs · 63 Macro Variables · 61 MEANS Procedure · 51 MERGE Statement · 36 E EBCDIC Data · See Reading Data EXPORT Procedure · See Reporting External Files · 18 F FIRSTOBS Option · 79 FORMAT · 32 Formatted INPUT · See Reading Data FREQ Procedure · 48 FTP Commands · 77 N NODATE Option · 79 O OBS Option · 79 Observation · 11 ODS · See Reporting OPTIONS · 79 ORACLE.blogspot. Connection · 22 G GPLOT Procedure · 52 GROUP Functions · 72 .S for SAS-Post your comments at http://s4sas. 45.

com SELECT statement · 68 SEMICOLON · 15 SORT Procedure · 44 SQL Procedure · 66 Sub Queries · 71 103 P PRINT Procedure · 43 PROC IMPORT · See Reading Data PROC SQL · 66 Procedures · 8.blogspot. 42 T R Reading Data · 17 Reporting · 56 RUN Statement · 16 TABLES statement · 49 TITLE Statement · 16 TRANSPOSE Procedure · 46 U S SAS Dataset · 11 SAS Functions · 34 SAS Language · 9.S for SAS-Post your comments at http://s4sas. See SAS Names · 11 SAS Statements · 12 Sas Variable UNIX Commands · 76 V Variable · 10 Creating · 25 SAS Variables · See Sas Language SASWORK · 75 W WHERE Condition · See .

error-free or that the web site or the server that hosts the web site are free from viruses or other forms of harmful computer code.S for SAS-Post your comments at http://s4sas. All such information is provided "as is”. The readers are free to distribute them in any form for educational purposes. service marks.blogspot.com 104 Copyright All materials on this web site are copyright free and/or collected from sources available to the public. However. Hypertext links to other web locations are for the convenience of users and do not constitute any endorsement or authorization by the author. Disclaimer The author makes no warranties or representations of any kind concerning the accuracy or suitability of the information contained on this web site for any purpose. names. there is no guarantee that the services provided by this web site will be uninterrupted . logos and other copyrighted information displayed in this website belong to their respective owners. The trademarks. a written permission is required for any commercial use of the material. .

Sign up to vote on this title
UsefulNot useful