This action might not be possible to undo. Are you sure you want to continue?
The purpose of this handout is to introduce key components of the SAS System through the development of a simple SAS program and SAS basics for Windows, including:
• • • • • •
Overview Of A SAS Program The DATA Step Points To Remember When Writing A SAS Program Writing a DATA Step The PROC Step Sample SAS Programs and SAS basics for Windows
Part 1: Overview of A SAS Program
The SAS System helps you to organize and analyze a collection of data items using SAS programming statements. A SAS program is a collection of SAS statements in a logical sequence. There are generally two major components of a SAS program: a. The DATA Step b. The PROC Step It is important to note that a SAS program may have several DATA steps or PROC steps. It may also have no DATA steps and all PROC steps, or vice versa.
Part 2: The DATA Step
The DATA step has two major functions: a. Create a SAS data set from ASCII data b. Modify other previously created SAS data sets In this handout we shall concentrate on part (a). For discussion purposes assume we have the following data set:
o City LA Chicago Texas Dallas Dept A100 B100 C100 D100 Revenue 5000 3000 6000 4000
• • • • •
The data set has three variables (number of columns). The data set has four observations. There is only one record per observation. The first two variables are character variables. The third variable is a numeric variable.
Part 3: Points to Remember When Writing A SAS Program
1. All SAS statements begin with a keyword and end with a semicolon. 2. Except for within the data section, SAS is not sensitive to spacing between words. 3. Comments are entered in a SAS program using either one of the following formats: a. /* text */ (use for large comment blocks) b. * text ; (use for single line comments)
Part 4: Writing a Data Step
The DATA Statement Purpose: Start a DATA step. Form: DATA data_ set_name; Keyword: DATA Naming rules: Limited to 32 characters. First character must be alphabetic or an underscore '_'. The entire data set name must contain only letters, numbers, or underscore. The INPUT Statement Purpose: Describe the format of the ASCII data. Form: INPUT Variable names Variable formats; Keyword: INPUT Naming rules: Limited to 32 characters. First character must be alphabetic or an underscore '_'. The entire data set name must contain only letters, numbers, or underscore.
As a rule SAS assumes all variables to be numeric. If the variable has character values the variable can be defined as character by placing a $ sign after the variable's name on the INPUT statement in LIST input. The default missing data value is a dot '.' in LIST input.
The INFILE Statement Purpose: Direct SAS to read an external data file. Form: INFILE fileref; or
The CARDS Statement Purpose: Indicate the beginning of instream data. Form: CARDS; Keyword: CARDS Notes: Do not use this statement if using an external data file.
Part 5: The PROC Step
Statistical analysis in SAS consists of using one or more procedures (PROCs). After the creation of a SAS data set, you can invoke any of the SAS procedures to analyze the data set. The PROC step indicates which SAS data set is to be processed, lists the variables to be included in the analysis, and other options specific to the procedure. Discussion of the commonly used PROCs will follow in a later handout.
Part 6: Sample SAS Programs and SAS basics for Windows
I. A sample SAS program using instream data. 1. Create a sample SAS program; enter the following in the Editor window:
DATA Budget; INPUT Name $ Dept CARDS; LA A100 Chicago B100 Texas C100 Dallas D100 ; PROC PRINT; RUN;
$ Revenue; 5000 3000 6000 4000
2. Submit the SAS job by highlighting the code and execute the menu command: Run -> Submit (or click on the toolbar icon ‘Submit’). When the Output window becomes the active window, your job is complete. It may or may not be successful. 3. Check the Log window to see if your job is successful. If there is any error in your program, make sure that active window is the Editor window. Click on the bottom cursor to switch the windows. 4. Before resubmit your program clear the Log and Output windows: Edit -> Clear. This step is not required it may prove useful since SAS appends information from subsequent runs. II. A sample SAS program using data from an external file. The main advantage of using an External data file is ease of reading the program code and debugging. The data might come from other sources and can be used without being in the SAS code. Assume the SAS data file ‘revenue.dat’ is saved in “C: \My documents\ My SAS documents”. It contains the following information:
LA Chicago Texas Dallas A100 B100 C100 D100 5000 3000 6000 4000
1. Enter the following SAS program in the Editor window:
DATA Budget; INFILE 'c:\My documents\My SAS documents\revenue.dat'; INPUT Name $ Dept $ Revenue; PROC PRINT; RUN;
2. Submit the job. 3. Check the Output and Log windows.
III. Creating and reading a permanent SAS data set. All of the above SAS programs create temporary SAS data sets. That is, once SAS finishes executing the program all temporary data sets are annihilated. This prevents you from reusing your data set in other SAS programs unless you recreate the SAS data set. However, you do have the option of saving your SAS data set permanently on your disk. The only modification is in the DATA statement. That is, change all DATA statements to a two-level name. In the above examples change all data statements to read as follows:
LIBNAME dept 'd:\'; DATA dept.budget; ...(insert your other statements here)... PROC PRINT DATA=Dept.Payroll; RUN;
The above statements will create a new permanent library named ‘dept’ on drive ‘d’ and a new SAS data file named ‘budget’ in the library ‘dept’. The information in this file can then be used in other SAS programs without having to recreate the entire data set. * You also can create a new library under SAS window by the following steps: 1. 1. Create a new folder named ‘dept’ in drive ‘d’. 2. 2. Click on ‘New Library’ icon on the tool bar and type the new library referece ‘dept’ in the Name field. Use the Browse button to find the folder ‘dept’. In addition, check the box for ‘Enable at startup’. Then click OK. IV. Importing Data from a Spreadsheet. SAS allows you to import data from a spreadsheet as a raw data file. The following example reads an Excel file spreadsheet into a SAS data set. a. SAS Programs: Assume the Excel data file ‘revenue.xls’ is saved in “C: \My documents\My SAS documents”. The following PROC process shows how to import the ‘sheet1’ of the revenue.xls data, and save it in SAS work library with a name ‘budget’. Enter the following SAS program in the Editor window:
PROC import out=work.budget’; Datafile= “c:\My documents\My SAS documents\revenue.xls” Dbms=Excel2000 Replace; Sheet=”Sheet1”;
b. SAS Windows: 1. Execute the menu command: File -> Import Data… 2. Under the ‘SAS Import Wizard Window’, select Microsoft Excel 97, 2000 or 2002 as a data source from the checkbox, and chick on ‘Next’. 3. Under the ‘Connect to MS’ window, click on ‘browse’ to find the excel file you wish to import. Then select your excel table in the next window. 4. Choose a SAS destination: Confirm that WORK is selected in the Library, and name your SAS data file. 5. At the next screen (prompting for a file name in which to store PROC IMPORT statements), click on ‘Finish’ button. The imported data will be accessed through the Explorer window by double clicking on Libraries then Work then the name of the Member data set you assigned.
Intro To SAS Index
INTRODUCTION TO SAS
The purpose of this handout is to present the following concepts: * Data modifiers * Forms of INPUT statements * Reading Multiple Records Per Observation
Part 1: Commonly Used Modifiers
By default SAS assumes all data values are numeric. Therefore if some of the variables in the data set has non-numeric values the user will need to use one or all of the following three modifiers in some combination so as to match the form of the underlying data stream. (See section on Formatted Input for examples).
$ indicates that a variable has character values with default size of eight (8) characters with no embedded blanks. & indicates that a character value may have one or more single embedded blanks. The first occurrence of at least two consecutive blanks indicates an end for the variable value. : indicates that the data value is to be read from the next non blank column until the pointer reaches the next blank column or the end of the data line. That is, allows the user to read more than eight (8) characters with no embedded blanks.
Part 2: Forms Of INPUT Statement
List Input Use the List input mode to read data recorded with at least one blank space separating each data field. Missing values are represented as a dot (period). Form: INPUT variable list < modifiers >; Example:
DATA Census; INPUT State $ Pop; CARDS; NC 5.082 SC 2.590 VA .
Column Input Use Column input mode to read the following type of data.
o o o o
Standard character and numeric data Data values which are entered in fixed column positions Character values longer than eight characters Character values that contain embedded blanks
Form: INPUT variable < modifier > startcol - endcol; Example:
DATA Census; INPUT State $ 1-2 Pop 3-7; CARDS; NC5.082
SC .590 VA .
Formatted Input Use formatted input mode to read the following:
• • •
Data in fixed column positions (column input is also a viable choice) Nonstandard numeric and character data Data whose location is determined by other data values
Form: INPUT pointercontrol variable < modifiers > informat; Pointer Controls: @n go to column n +n move the pointer n positions #n advance to the first column of the nth record @ hold the current input line and re-read certain variables. @@ useful when each input line contains values for several observations Informats: w. numeric width (will also advance the pointer) w.d numeric width with an implied decimal $w. character width Example:
DATA Census; INPUT State &14. Product :8 Pop @@; CARDS; North Carolina Pins 5.082 South Carolina Needles 0.590 Virginia Cushions .
Note: For further details on informats refer to, SAS Language Reference Version 6.0. Named Input Use Named input to read data lines containing variable names followed by an equal sign and a value for the variable. Form: INPUT variable= informat.; -orINPUT variable=modifier; -orINPUT variable=startcol-endcol; Example: DATA Census; INPUT State =:$14. Pop= ; CARDS; STATE=North Carolina Pop=5.082
STATE=South Carolina Pop=0.590 STATE=Viginia Pop=. General Notes on INPUT
All forms of input, except the named input, can be used in any combination. Once you start reading data using the named input all other variables must be read using the named input.
Part 3: Reading Multiple Records Per Observation
There are three techniques for reading multiple records of data to create a SAS data set. 1. Prepare one INPUT statement for every record. That is, if there are three records per observation then you would have three INPUT statements. 2. Use a slash / in a single INPUT statement to indicate that the next record is to be placed into the input buffer. 3. Use #n to advance to the first column of the nth record. Example: Assume that the data consists of two records per observation. The variables on the second record are Y30-Y50. The following examples are equivalent.
1. INPUT State $ 1-2 Area 4-8 (Y10-Y20) (5.); INPUT (Y30-Y50) (5.);
2. INPUT State $ 1-2 Area 4-8 (Y10-Y20) (5.) / (Y30Y50) (5.); 3. INPUT #1 State $ 1-2 Area 4-8 (Y10-Y20) (5.) #2 (Y30-Y50) (5.);
Intro To SAS Index
INTRODUCTION TO SAS
This handout presents several optional SAS program statements that may be used to further define your data set. Special attention is given to statements that allow the user to
create new variables, modify the values of existing variables, and control the manipulation of individual observations in the data set.
Part 1: Assignment Statements
Assignment statements are used to create new variables and modify values of existing variables. These statements must be used in the DATA step. In the event, that you need to create/modify variables during a procedural step you must first interject a new DATA step in the SAS program. Form: variable = expression; An expression is a sequence of variable names, constants, and possibly function names linked together by operators. When an assignment statement is executed, the expression is evaluated and the result of the expression is assigned to the variable. Example 1: To create a new variable DATA Census90; INPUT State $ Pop Area; PPrcnt = Pop * 100 / Area; CARDS; NC 5.062 48985 Example 2: To modify an existing variable DATA Census90; INPUT State $ Pop Area; Pop = Pop * 100 / Area; CARDS; NC 5.062 48985 Result: Example 1, creates a new variable called, 'PPrcnt' representing population density. The data set will have four variables: State, Pop, Area, and PPrcnt. Example 2, creates the same concept (density) but replaces the original value of the population variable. The data set in the second example will only contain three variables: State, Pop, and Area. And Pop will now have the modified values. The original data for the POP variable is now inaccessible in the data set.
Part 2: Arithmetic Operators
The prioritized list of the arithmetic operators recognized by SAS is shown below. The priority of the evaluation can be changed by the user by incorporating an expression in parentheses. Operators Expressions ** Exponentiation A = B**2 * Multiplication A = B*C / Division A = B/C + Addition A = B+C - Subtraction A = B-C Example: An expression performing a multiplication before an exponentiation A = B ** (2*C)
Part 3: SAS Functions
Functions can also appear in an expression. A SAS function is an instruction to perform a given series of calculations on an expression. A large library of functions in several categories including mathematics, statistics, character, and data and time is available. Form: Keyword(argument1, argument2, ... ); Where keyword names the function and the argument(s) or expression are enclosed in parentheses. Multiple arguments are separated by commas. Example: ANS = 2 * LOG(X+Y);
Part 4: IF/THEN/ELSE Statements
Conditional execution of data step program statements is implemented using the IF/THEN/ELSE statements. Form: IF expression THEN statement; ELSE statement; Observe that IF/THEN and ELSE are two separate SAS statements. Each time the IF statement is executed the expression following the IF is evaluated. When the expression is true for the observation, the statement following the THEN is executed. The ELSE
statement, which is optional, can be used to control a specific action if the IF condition is false. In the event an ELSE statement is not specified no action is taken if the expression is false. Comparison Operators The following is a list of comparison operators that can be used to express a relationship between two quantities in an expression following the IF. = or EQ equal to ^= or NE not equal to <> or NE not equal to > or GT greater than < or LT less than >= or GE greater than equal to <= or LE less than equal to ^< or NL not less than ^> or NG not greater than Example: DATA Census90; INPUT State $ Pop Area; IF State = 'NC' THEN Pop = Pop * 1000; Result: The POP values are modified only for the state of NC.
Logical Operators Expression containing comparison operators usually include logical operators. The following is a list of possible logical operators. OR Execute statement if either comparison operator is true. & or AND Execute the statement only if both comparison operators are true. ^^ or NOT If the comparison operator is false then the result of the logical operator is true or vice versa. Example: DATA Census90; INPUT State $ Pop Area; IF State = 'NC' OR state = 'SC' THEN Pop = Pop * 1000; ELSE Pop = Pop * 10000; Result: Pop values are multiplied by 1000 for the states of SC and NC. All other states have their POP values multiplied by 10000.
IN Operator The IN operator facilitates making comparisons to a list of items. For instance the Logical Operator example is rewritten as follows: Example: DATA Census90; INPUT State $ Pop Area; IF State IN ('NC', 'SC') THEN Pop = Pop * 1000; ELSE Pop = Pop * 10000;
Part 4: The Subsetting IF And IF/THEN DELETE Statements
These statements control which observations are written to a SAS data set. A subsetting IF statement controls which observations are included in the data set. The IF/THEN DELETE statement controls which observations are deleted from the data set. Form: The general form for a subsetting IF is: IF expression; The general form for an IF/THEN DELETE statement is: IF expression THEN DELETE; Example 1: For subsetting IF DATA se; INPUT State $ Pop Area Region $; IF Region = 'SE'; Example 2: For IF/THEN DELETE DATA NotSe; INPUT State $ Pop Area Region $; IF Region = 'SE' THEN DELETE; Result: Example 1, will select only those states in the SE region. Example 2, will select only those states that are NOT is the SE region. That is, all observations with a region code of SE will be deleted.
Part 5: WHERE Processing
The WHERE statement is used to select a subset of observations from an existing SAS data set that satisfies one or more conditions. WHERE processing is very similar to IF statement processing. Form: WHERE expression; The word expression is as defined in part 4, & 5. However, there are several special WHERE statement operators that may also be used in the expression. These are stated and explained below. Special WHERE Operators The five new operators are: BETWEEN - AND CONTAINS or ? LIKE NULL or IS MISSING =* The following data set is defined for the purpose of exemplifying the form and uses of the WHERE operators. ID NAME MILES CITY WD2327 Masters 60000 Dallus WD8734 Morris 27000 Boston WD6743 Ashton 32000 Dallas . Cannon 18000 Seattle WD0354 Cash 75000 Chicago BETWEEN-AND Allow the user to select observations based on a range of variable values. Example: WHERE Miles BETWEEN 30000 AND 60000; Result: Only observation 1, and 3 will be selected. CONTAINS or ? Only those observations are selected that contain a specified character string.
Example: WHERE Name CONTAINS 'ASH'; Result: The new data set will have only observations 3, and 5. LIKE Select only those observations that satisfy a pattern matching criteria. Example: WHERE Name LIKE 'M%'; Result: Only observation 1, and 2 are selected. Example: WHERE Name LIKE 'M_R%'; Result: Only observation 2 is selected. IS NULL or IS MISSING Select observations for which the value of the variable is missing or null. Example: WHERE ffid IS MISSING; Result: Only observation 4 is selected =* (Sounds-Like Operator) Select only those observations that contain a spelling variation of the word or words specified in the WHERE expression. Example: WHERE city =* 'DALLAS'; Result: Observations 1 and 3 are selected.
Intro To SAS Index
INTRODUCTION TO SAS
This handout discusses the more frequently used data management techniques for working with and reshaping SAS data sets. Specifically the following points are discussed: * Conditional execution of groups of statements * Variable selection and deletion * Sorting SAS data sets * Concatenation, interleaving, and merging of data sets.
Part 1: The DO And END Statements
The two purposes of the DO and END statements are: a) to execute several statements when certain conditions are specified, and b) to execute certain statements in repetition using an index variable. Form: The general form for condition (a) is: IF expression THEN DO; << other SAS statements >> END; The general form for condition (b) is: DO index variable = start TO stop BY increment; << other SAS statements >> END: Note: The BY increment phrase is optional. Example 1: IF State = 'NC' THEN DO; Adjpop = Pop / Area;
Name = 'North Carolina'; END; Example 2: DO i = 1 TO 50; y = sqrt(x) + (i * 10); Output; END; Result: Example 1, will provide ADJPOP and NAME with values only in the event that the value of STATE is NC at the current observation. Example 2, will create 50 new observations for variable Y.
Part 2: The DROP And KEEP Statements
The purpose of these statements is to indicate which variables are to be included in the SAS data set that is being created. The DROP statement identifies variables to be excluded from the data set. Alternatively, the KEEP statement identifies variables to be included in the data set. Form: DROP variable list; -orKEEP variable list; Notes: These statements are one of many SAS statements which can be used in both the DATA and the PROC step. However, the DROP and KEEP statements are mutually exclusive. That is, both statements cannot be used together in the same DATA or PROC step. In the DATA step they control which variables are stored in the data set being created. In the PROC step they are valid only for the duration of the step and are equivalent to using the VAR statement. Example 1: The DROP Statement DATA Census; INPUT State $ Pop Area; PopSqm = Pop * 1000 / Area; DROP Pop Area;
Example 2: The KEEP Statement DATA Hischool College; INPUT Name $ Age Yred; IF Yred LE 12 THEN OUTPUT Hischool; ELSE OUTPUT College; KEEP Name Age; Result: In Example 1, four variables are created but the variables POP and AREA are dropped. That is, the data set will retain only variables STATE and POPSQM. In Example 2, two data sets are created simultaneously. Each with three variables. However, the variables NAME and AGE will be retained and YRED will be dropped.
Part 3: Sample Data Set For The SET And MERGE Statements
The following data sets are defined for the discussion on the SET and MERGE statements. Data CensusA; Input State Pop; Cards; NC 5.08 FL 6.78 TN 3.92 Data CensusB; Input State Area; Cards; FL 58560 TN 42244 Remarks: Both data sets have two variables. Of which only one variable (STATE) is common to both data sets. CensusA has POPulation data for three unique states. However, CensusB has data on the AREA for only two of the three states. Goal: The programmer's goal is to join these two data sets into one so as to be able to perform statistical analysis on the combined data. As shown in the examples below the programmer should use match-merging. However, before we discuss match-merging it
would be helpful to visualize the different forms of SET and MERGE and why they should not be used for the specified goal.
Part 4: The SET Statement
The SET statement is used to instruct SAS to read data from one or more SAS data sets. Recall: SAS data sets are not ASCII data sets. With the use of the SET statement the programmer can perform any one of the following data set manipulations: * Reusing previously created data * Concatenation * Interleaving
The form of the SET statement follows. Form: SET sasdataset1 sasdataset2 ...;
Examples Of SET Example 1: Reusing Data In a previous Handout we discussed how to create SAS data sets. The question now is, "How does one re-use these SAS data sets at a later stage ?" The answer is simple. Use the SET statement.
Form: DATA Reuse; SET CensusA; * temporary data set is CENSUSA; -orLIBNAME sasd 'D:\'; DATA Reuse; SET sasd.censusa; * permanent data set is censusa.sas7bdat on drive d Example 2: Concatenation
Two or more data sets are said to be concatenated when they are stacked together to form one large data set. Form: DATA Both; SET CensusA CensusB; Result: State Pop Area FL 6.78 . NC 5.08 . TN 3.92 . FL . 58560 TN . 42244 Remark: All information contained in CensusA is stacked before information in CensusB. Use this method when reading data for the same set of variables but different observations. Example 2: Interleaving If a BY statement follows the SET statement the resulting data set will be interleaved by the value of the BY variable. Note: The data sets to be interleaved must already be sorted by the variable(s) listed in the BY statement. Form: DATA Both; SET CensusA CensusB; BY State; Result: State Pop Area FL 6.78 . FL . 58560 NC 5.08 . TN 3.92 . TN . 42244 Remarks: The data set is sorted in ascending order by the values of STATE. Use this method when reading data for the same set of variables but different observations and when the resultant data set needs to be in some sort of sorted order.
Part 5: The MERGE Statement
The matching of observations across two or more data sets is referred to as merging. There are two types of merge operations. These are: * One-to-One Merging * Match Merging Examples Of MERGE Example 1: One-to-One Merging In a one-to-one merge, if the two data sets to be merged have a common variable but it has a different value in both the data sets, the last value read will be the one to appear in the new data set. The number of variables in the new data set will equal to the sum of unique variables in the data sets being merged. The number of observations will equal to the number of observations of the largest data set (in terms of observations). Form: DATA Both; MERGE CensusA CensusB; Result: State Pop Area FL 5.08 58560 TN 6.78 42244 TN 3.92 . Remarks: The new data set has 3 variables since STATE is common in both data sets. The new data set has 3 observations since CensusA has three observations. The new data set has matched up NC's population with FL's area. The new data set has matched up FL's population with TN's area. The new data set has no observation for the state of NC but two observations for TN In short this usage of the MERGE statement is faulty. This method should be used only when the original data sets have different variables and preferably the same number of observations. Example 2: Match Merging Controls which observations are matched. Match-merge requires that at least one variable is common to each data set. And that the data sets in use are sorted by the common variable.
Form: DATA Both; MERGE CensusA CensusB; BY State; Result: State Pop Area FL 6.78 58560 NC 5.08 . TN 3.92 42244 Remarks: The new data set has 3 variables and 3 observations. Since the BY statement was used the observations are properly aligned for the states in question.
Part 6: Creating ASCII Files From A SAS Data Set
Often it is necessary to export data from a SAS data set to an ASCII file which may be transported to other software systems. This is achieved using the FILE and PUT Statements. The FILE Statement Purpose: Directs SAS to create an external data file. Form: FILE 'path_and_file_name'; or FILENAME fileref 'path_and_file_name'; *before the DATA statement; FILE fileref; Keyword: FILE The PUT Statement Purpose: Describes to SAS the format of the ASCII data. Form: PUT variable names variable formats; Keyword: PUT NamingRules: Limited to 32 characters. First character must be alphabetic or and underscore '_'. The entire variable name must contain only letters, numbers, or underscore.
As a general rule all variables in SAS are assumed to be numeric. If a variable has character values you must place a $ sign after the variable's name in the PUT statement in LIST output. The default missing data value is a dot (period). Two additional pointer directions available in the PUT statement are: _PAGE_ Prints the current page and begins a new page OVERPRINT causes the current line to be printed over the previous line. This option is useful for underlining text by overprinting underscores.
Example: Observe that the FILE and PUT statements use all of the same syntax as INFILE and INPUT. For example, suppose you wish to create an Ascii data set using the previously defined BUDGET data set of handout 1. You would then write the following SAS program:
DATA _NULL_; SET Budget; /* bring in budget */ FILE 'a:\revenue.out'; /* create file: revenue.out on drive A */ PUT Name $ Dept $ Revenue; /* write values of the variables */
Intro To SAS Index