You are on page 1of 3

Proc Compare Stat 480 James D. Abbey January 19, 2006 1.

. Enter your SAS data sets for comparison. Ideally, the data sets will match entirely. In this exercise, data1 will be correct while data2 will have flaws. a. SAS Code for Data Entry:

i. Data data1; specifies that you are entering data into SAS and names the data data1 ii. infile C:\Temp\data1.txt; points to an external file on your harddrive. You could also have used a datalines; command to enter the data directly into SAS. iii. Input var1 1 var2 2; specifies what variables to input. The number following the variable name is a column flag, which tells SAS to look for data in column 1, 2, etc. 1. Other input options are a space, tab, comma, etc. a. Space and tab will be treated the same in SAS and require no special syntax. Use of a comma will be covered later. iv. Run; tells SAS to execute the commands above. b. A view of our sample data: i. Data1: Data2:

ii. As you can see, the errors occur in the following observations (lines): 1. Observation 2 swaps the second and third columns 2. Observation 3 swaps the first and second columns 3. Observation 5 swaps the third and fourth columns 4. Observation 7 is missing the first piece of data 2. Using PROC COMPARE a. SAS syntax: i. We use the proc compare to compare data sets in many ways. In our example, we will go element-wise. ii. base = data 1 compare = data2 informs SAS that we are comparing data1 to data2

b. SAS output: i. First, we receive a variable error output summary:

ii. Second, we receive a more detailed break down by variable and observation:

1. Example of analysis: Observation 2 shows an error in the entry of variable 2. Upon analysis of the data sets, I found that variables two and three were swapped in the data2 file. 2. Note: Often, you will not be able to determine which data set is flawed unless you revisit the raw data.

3. Summary SAS Code: a. Used in the above example:

b. Provided in class: data duckworth; infile 'My Documents\hw2S06.txt'; input id 1-3 home 4-8 drink 9 gov 10 binge 11 hours 12-13 sex 14 age 15-16 height 17-19 live 20; datalines; proc print; run;