Professional Documents
Culture Documents
BASIC
Version 11/12/2004
I was born in Hiroshima, Japan, and went to college at Doshisha University in Kyoto and worked for a few
years as a high schoolteacher in Osaka. In 2000 I got my doctorate in Sociology at the University of
Chicago. I learned SAS while working for a research project lead by Charles Bidwell and Anthony Bryk,
both Sociology professors at the U of C. Currently I am a research analyst at a non-profit research
organization located in Washington DC. From my window, I can see the Washington Monument.
Table of Contents
I. Basic Operations....................................................................................................................................................................4
1. Ask questions to SAS by emailing support@sas.com.......................................................................................................4
2. How do I start and what mini-windows do I look at?.......................................................................................................5
3. How do I look at data sets?................................................................................................................................................6
4. Assigning library name and create folders........................................................................................................................7
5. How do we create SAS data?.............................................................................................................................................9
A) Create a SAS data using a SAS syntax......................................................................................................................9
B) Create SAS data Via. Ms-Excel Sheets...................................................................................................................10
C) Create a SAS data set via an external text file.........................................................................................................11
6. Examples of data steps.....................................................................................................................................................12
7. Manipulating variables in data steps................................................................................................................................13
8. Lots of manipulation techniques to be used in a data step..............................................................................................16
9. Using Character Functions to create new variables........................................................................................................17
10. Application: How do we restrict analytical samples using NMISS function.............................................................17
II. Procedures............................................................................................................................................................................18
11. PROC CONTENTS: Description of Contents............................................................................................................18
12. PROC PRINT: See Data..............................................................................................................................................19
13. PROC SORT: Sorting Observations based on a value of variable..............................................................................19
14. PROC MEANS: Get Descriptive Statistics (Mean, STD, Min, Max)........................................................................20
Kaz SAS 3
15. PROC FREQ: Get Frequencies...................................................................................................................................21
16. PROC UNIVARIATE: Get elaborate statistics and a univariate plot..........................................................................21
17. PROC PLOT: Plotting Two Variables.........................................................................................................................22
18. PROC TIMEPLOT: Time Plot.....................................................................................................................................22
19. PROC CORR: Correlation...........................................................................................................................................22
20. PROC REG: OLS Regression.....................................................................................................................................23
21. PROC LOGISTIC: Logistic Regression......................................................................................................................23
22. MAKE AN ASCHI FILE.............................................................................................................................................23
III. More Procedures..............................................................................................................................................................24
23. PROC STANDARD: Standardize Values....................................................................................................................24
24. PROC RANK: Rank observations...............................................................................................................................25
25. PROC SQL: Creating group-level mean variables......................................................................................................25
26. PROC IMPORT...........................................................................................................................................................26
IV. Merging Data Sets............................................................................................................................................................27
V. MACROs.............................................................................................................................................................................28
27. Most common way of using a macro...........................................................................................................................28
28. Simple macro using LET statement.............................................................................................................................28
29. Macro can be specified from data (not directly by you).............................................................................................28
Kaz SAS 4
I. Basic Operations
1. Ask questions to SAS by emailing support@sas.com
When you have a question about SAS, you can email SAS institutes’ technical support team. The address is
support@sas.com. At the beginning of your email content, you copy the information you get at the head of
your log file. The log file is a file that you get when you run SAS. It looks like this:
NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0)
Licensed to UNIVERSITY OF XXXXX, Site XXXXX.
NOTE: This session is executing on the WIN_ME platform.
I often use GOOGLE to get answers to my questions. I think SAS’s help menu is not very easy to understand
because they don’t always show you the best examples.
Kaz SAS 5
Click Explorer
to look at the data sets. See next page on this.
Kaz SAS 6
Notes:
I look at the data sets to check if there is anything
Look closely if there is any wrong with it. You must close the data sets before
irregularity in data. you run anything else if the syntax you wrote
affects the data set.
To get the view above where you can examine the data, follow the following steps.
2. Click
4. Click
Libraries
The data set.
3. Click
Work or other
folders.
1. Click
Explorer
Kaz SAS 7
Running above creates two folders “here” and “there” in the libraries in the explorer’s view as you see in the
picture below (See previous page to see how to get to this view).
Imagine there is a data set called MYDATA and it is in C: \TEMP. You can
create it in this way:
libname here "C:\TEMP";
data here.MYDATA;
X=1;
run;
This silly data has one observation, which is X whose value is 1. Because
you decided to call that folder by a nickname HERE, you will be referring to
the data set as “here.MYDATA.” For example, to print the contents of that
data, you will do this:
What are other folders? Sashelp hosts lots of data sets that SAS institutes
ship with the SAS software for demonstration’s sake. I have never opened
Sasuser or Maps. “Work” hosts temporary data sets that you create as you
program in SAS. Temporary data sets disappear if you close your SAS
program. Permanent data sets, on the other hand, are the data sets you create
to keep even after you turn quit SAS. Next page elaborate eon these things.
Kaz SAS 8
Here are some silly example syntax to show you what the folders do and
what temporary and permanent data sets are.
;
run;
proc print;
run; After creating a data set, you want to see the data to see if there is
anything wrong. Because this is a small data set, you can do PROC
PRINT to print it on your output window. The other useful way is
to click on the actual SAS data to see the content. I explained it
earlier.
Kaz SAS 10
Be sure to close the excel sheet when you run the syntax to import it. Otherwise, you get this error message:
ERROR: File _IMEX_.'Sheet1$'n.DATA does not exist.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
Kaz SAS 11
If you know where the data points are exactly in the data, you can indicate the locations in the following way.
data kaz;
infile "C:\TEMP\kaz.txt" ;
input ID 1 SEX $ 4-9 height 13-15 ;
run;
$ indicates that SEX is a character variable. SAS always needs to know if it
is a character variable or a numeric variable.
proc print;
run;
If character variable is just one word (e.g., Male), then we don’t really need to tell SAS about exact locations. SAS will
consider each block of words or numbers as one value. But you need to say “missover,” so in case SAS won’t encounter a
value (as in the third observation in this data set) at an expected place, it will consider it as a missing value. If a character
variable contains more than one word, then use the method above instead of below.
set abc;
/*here manipulation of data */
run; I am creating a new temporary data set ABC (to be found in the WORK folder)
based on an already existing temporary data set called ABC (found in the
WORK folder). The latter ABC will be overridden by a new data ABC. This is
data abc; perfectly okay.
set abc;
/*here manipulation of data */
run; I am creating a new temporary data set XYZ based on an already existing
permanent data set called ABC (found in the HERE folder, which is
C:\TEMP).
data xyz;
set here.abc;
/*here manipulation of data */
run;
I am creating a new permanent data set ABC in the HERE folder (which is
C:\TEMP) based on an already existing temporary data set called XYZ.
data here.abc;
set xyz;
/*here manipulation of data */
run;
I am creating a new permanent data set ABC in the THERE folder (which is
C:\TEMP) based on an already existing permanent data set called ABC in the
data there.abc; HERE folder (which is C:\).
set here.abc;
/*here manipulation of data */
run;
Reminder:
Temporary data sets: Found in WORK folder. They disappear when a session ends..
Work folder: Click on Explorer Click on LIBRARIES Click on WORK
The HERE folder and THERE folder: HERE and THERE are the arbitrary names that I assigned by giving
LIBNAME statement. They refer to paths that I specified.
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
3 Age Num 8 0
4 Height Num 8 8
1 Name Char 8 24
2 Sex Char 1 32
5 Weight Num 8 16
Here is a sample of how you can work on this data set to create Body Mass Index, as well as other useful
variables. You always need to create a new data to create new
data ABC; variables.
set sashelp.Class;
*Creating a character variable indicating a person's BMI status (Body
Mass Index);
weight_metric=weight*0.45359237;
height_metric=(height* 2.54)/100 ;
BMI=weight_metric/(height_metric**2);
/*Definition of obesity Normal weight = 18.5-24.9
Overweight = 25-29.9 Obesity = BMI of 30 or greater */
Without length statement, SAS would set the length of character to the first value it
encounters, which would be “Underweight” in this case.
Functions, such as mean(of …) or sum (of …), take statistics of non-missing values. They do return values even when some
of the variables in the brackets are missing. For example, if X1 is missing:
X=mean (of X1 X2 X3); will return the average of X2 and X3.
In contrast,
X=(X1+X2+X3)/2 will return a missing value, namely, “.”
Kaz SAS 15
run;
proc print;
run;
II. Procedures
11. PROC CONTENTS: Description of Contents
Data ABC;set sashelp.Prdsale;
run;
/*1111111111111111111111111*/
/*simple way*/
proc contents data=ABC;
run;
/*I like "position option" because it gives me a table that is sorted by
the position of variables in the data, in addition to alphabetically
sorted table*/
proc contents data=ABC position;
run;
input
acro $ NATION $ 6-14 NAME $ 15-33 MAT7 MAT8 GNP14 PROP NATEXAM NATSYLB NATTEXT block
$;
cards;
;run;proc print;run;
Advanced topics:
proc sort data=kaz out=kaz2 nodupkey;
by block;
run;
proc print data=kaz2;run;
This takes only the first observation of each block. Imagine that you have data where there are individual level variable
(e.g., 100 students) and group level variable (e.g., 10 schools). Imagine you want to get school level information from this
data. Above procedure would take just the first observation of each school and gets you ten lines of data for 10 schools.
Ignore individual-level variables, however.
Kaz SAS 20
14. PROC MEANS: Get Descriptive Statistics (Mean, STD, Min, Max)
PROC MEANS data=kaz;
VAR mat7 mat8;
run;
Advanced topic: Group means.
/*Report group means*/
proc sort data=kaz out=kaz2;by block;run;
proc means data=kaz2;
by block;
var mat7 mat8;
run;
You can also use “class” statement instead of “by” statement. Class statement is easier because you don’t
need to sort the data by the by-variable before it. I forgot what the downside of it was.
proc means data=kaz2; /*now, kaz2 does not have to be sorted by block*/
class block;
var mat7 mat8;
run;
ods output summary=john; /*Output Delivery System Used. See SAS manual 2*/
run;
ods listing on; /*printing of results resumed*/
proc print data=john;
run;
I recommend reading a chapter on PROC MEANS in SAS CD-online. It is a very versatile procedure.
var mat8;
run;
Run;
Advanced Topic:
http://www.estat.us/sas/OLS%20tables%20for%20learning.txt
Research Tip:
Why do we use rank?
a. We can split the sample based on the rank. e.g., high SES student sample versus low SES student sample.
b. We can create dummy variables quickly by specifying group=2. e.g., high SES student will receive 1; else
0. This grouping occurs at the median point of a variable, which may or may not be always the best strategy.
Alternative way is to assign 1 and 0 based on some meaningful threshold. For example, I have temperature
data, I may use a medium point to split the data if it makes sense, but maybe I use 0 degree (Freezing point)
as a meaningful point to split the data instead.
data B;
set kaz;
keep nation mat8;
run;
data NEW;
merge A B;
by nation;
run;
/*Confirm*/
proc print data=NEW;
run;
Kaz SAS 28
V. MACROs
%john(group=sex,var1=height weight);
%john(group=sex,var1=age height);
data X;
x="John's phone Number";
y="312-234-3999";
run;
data x2;
set x;
call symput ("example", x);
run;