You are on page 1of 28

Kaz SAS 1

Kaz’s SAS manual

BASIC

Version 11/12/2004

by Kazuaki Uekawa, Ph.D.


www.estat.us
kuekawa@alumni.uchicago.edu
Copyright © 2002 By Kazuaki Uekawa All rights reserved.
Kaz SAS 2
Profile:

Kazuaki (Kaz) Uekawa, Ph.D.

I was born in Hiroshima, Japan, and went to college at Doshisha University in Kyoto and worked for a few
years as a high schoolteacher in Osaka. In 2000 I got my doctorate in Sociology at the University of
Chicago. I learned SAS while working for a research project lead by Charles Bidwell and Anthony Bryk,
both Sociology professors at the U of C. Currently I am a research analyst at a non-profit research
organization located in Washington DC. From my window, I can see the Washington Monument.

Table of Contents
I. Basic Operations....................................................................................................................................................................4
1. Ask questions to SAS by emailing support@sas.com.......................................................................................................4
2. How do I start and what mini-windows do I look at?.......................................................................................................5
3. How do I look at data sets?................................................................................................................................................6
4. Assigning library name and create folders........................................................................................................................7
5. How do we create SAS data?.............................................................................................................................................9
A) Create a SAS data using a SAS syntax......................................................................................................................9
B) Create SAS data Via. Ms-Excel Sheets...................................................................................................................10
C) Create a SAS data set via an external text file.........................................................................................................11
6. Examples of data steps.....................................................................................................................................................12
7. Manipulating variables in data steps................................................................................................................................13
8. Lots of manipulation techniques to be used in a data step..............................................................................................16
9. Using Character Functions to create new variables........................................................................................................17
10. Application: How do we restrict analytical samples using NMISS function.............................................................17
II. Procedures............................................................................................................................................................................18
11. PROC CONTENTS: Description of Contents............................................................................................................18
12. PROC PRINT: See Data..............................................................................................................................................19
13. PROC SORT: Sorting Observations based on a value of variable..............................................................................19
14. PROC MEANS: Get Descriptive Statistics (Mean, STD, Min, Max)........................................................................20
Kaz SAS 3
15. PROC FREQ: Get Frequencies...................................................................................................................................21
16. PROC UNIVARIATE: Get elaborate statistics and a univariate plot..........................................................................21
17. PROC PLOT: Plotting Two Variables.........................................................................................................................22
18. PROC TIMEPLOT: Time Plot.....................................................................................................................................22
19. PROC CORR: Correlation...........................................................................................................................................22
20. PROC REG: OLS Regression.....................................................................................................................................23
21. PROC LOGISTIC: Logistic Regression......................................................................................................................23
22. MAKE AN ASCHI FILE.............................................................................................................................................23
III. More Procedures..............................................................................................................................................................24
23. PROC STANDARD: Standardize Values....................................................................................................................24
24. PROC RANK: Rank observations...............................................................................................................................25
25. PROC SQL: Creating group-level mean variables......................................................................................................25
26. PROC IMPORT...........................................................................................................................................................26
IV. Merging Data Sets............................................................................................................................................................27
V. MACROs.............................................................................................................................................................................28
27. Most common way of using a macro...........................................................................................................................28
28. Simple macro using LET statement.............................................................................................................................28
29. Macro can be specified from data (not directly by you).............................................................................................28
Kaz SAS 4

I. Basic Operations
1. Ask questions to SAS by emailing support@sas.com
When you have a question about SAS, you can email SAS institutes’ technical support team. The address is
support@sas.com. At the beginning of your email content, you copy the information you get at the head of
your log file. The log file is a file that you get when you run SAS. It looks like this:

NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0)
Licensed to UNIVERSITY OF XXXXX, Site XXXXX.
NOTE: This session is executing on the WIN_ME platform.

I developed my SAS skills mostly by communicating with SAS tech team.

I often use GOOGLE to get answers to my questions. I think SAS’s help menu is not very easy to understand
because they don’t always show you the best examples.
Kaz SAS 5

2. How do I start and what mini-windows do I look at?


In Windows, you can activate SAS by going to START ALL PROGRAMThe SAS System. Confirm that you get three
windows.
1. Editor file. This is where you write your syntax.
Click this man to run your program.
2. Log file. This file shows your errors.
3. Output file. You get results in this window. Click on this ! mark to cancel
when the program is running.

Click Explorer
to look at the data sets. See next page on this.
Kaz SAS 6

3. How do I look at data sets?


This syntax (you type in into the editor file) gets you an example data to look at.
data abcd;
set sashelp.Prdsale;
run;
You can look at the data set in this way if you follow the four steps below.

Notes:
I look at the data sets to check if there is anything
Look closely if there is any wrong with it. You must close the data sets before
irregularity in data. you run anything else if the syntax you wrote
affects the data set.

To get the view above where you can examine the data, follow the following steps.

2. Click
4. Click
Libraries
The data set.

3. Click
Work or other
folders.

1. Click
Explorer
Kaz SAS 7

4. Assigning library name and create folders


You need a libname statement at the head of your SAS programs. With these, you assign nick names (library
name) to indicate folders that host your SAS data sets. For example:
libname here "C:\TEMP";
libname there "C:\";

Running above creates two folders “here” and “there” in the libraries in the explorer’s view as you see in the
picture below (See previous page to see how to get to this view).

Imagine there is a data set called MYDATA and it is in C: \TEMP. You can
create it in this way:
libname here "C:\TEMP";
data here.MYDATA;
X=1;
run;

This silly data has one observation, which is X whose value is 1. Because
you decided to call that folder by a nickname HERE, you will be referring to
the data set as “here.MYDATA.” For example, to print the contents of that
data, you will do this:

proc print data=here.MYDATA;


run;

To see what variables are in the data, do this:


proc contents data=here.MYDATA;
run;

What are other folders? Sashelp hosts lots of data sets that SAS institutes
ship with the SAS software for demonstration’s sake. I have never opened
Sasuser or Maps. “Work” hosts temporary data sets that you create as you
program in SAS. Temporary data sets disappear if you close your SAS
program. Permanent data sets, on the other hand, are the data sets you create
to keep even after you turn quit SAS. Next page elaborate eon these things.
Kaz SAS 8
Here are some silly example syntax to show you what the folders do and
what temporary and permanent data sets are.

/*libname statements just need to occur at the


beginning of the syntax file*/
libname here "C: \TEMP";
libname there "C:\";
/*this creates a data called Wally in WORK folder*/
data Wally;
x=1;
y=2;
z=3;
run;
Click on these /*this creates a data called ABC in HERE folder*/
folders to find data here.Wally;
different “Wally”
data sets. x=4;
y=5;
z=6;
run;
/*this creates a data called ABC in THERE folder*/
data there.Wally;
x=7;
y=8;
z=9;
run;

/*Use proc print to see the content of the data sets*/


proc print data=work.ABC;
run; The following would do the same:
proc print;
run;
proc print data=here.ABC; (when data is not specified, SAS just uses whatever
run; data it sees right before the syntax.)

proc print data=ABC;


proc print data=there.ABC; run;
(“work.” can be omitted in this way. I always omit it.)
run;
Kaz SAS 9

5. How do we create SAS data?


A) Create a SAS data using a SAS syntax

Of course you can create data in your syntax.


libname here "C:\";
data kaz;
input ID 1 SEX $ 4-9 height 13-15 ;
cards;
1 Male 170
2 Female 165
When a value is missing, it is safe to enter a dot instead of
3 Male
leaving it empty in this way. But empty is also okay
4 Male 168 because INPUT LINE explicitly is telling SAS where to
5 Female 170 find values for each variable (e.g., height 13-15).

;
run;

proc print;
run; After creating a data set, you want to see the data to see if there is
anything wrong. Because this is a small data set, you can do PROC
PRINT to print it on your output window. The other useful way is
to click on the actual SAS data to see the content. I explained it
earlier.
Kaz SAS 10

B) Create SAS data Via. Ms-Excel Sheets


This uses the first row for variable names. Then use this
syntax to import the excel sheet (C:\mary.xls) as a
SAS data set (JOHN):
PROC IMPORT OUT= JOHN
DATAFILE= "C:\mary.xls"
DBMS=EXCEL2000 REPLACE;
RUN;

/*This one ignores variable names.


It also specifies the sheet from
which to take data*/
PROC IMPORT OUT= JOHN
DATAFILE= "C:\mary.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=NO;
SHEET=”Sheet1”;
RUN;

Be sure to close the excel sheet when you run the syntax to import it. Otherwise, you get this error message:
ERROR: File _IMEX_.'Sheet1$'n.DATA does not exist.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
Kaz SAS 11

C) Create a SAS data set via an external text file


Imagine you have a text file (say, kaz.txt) that looks like this in your C temp folder.

It’s okay for a value to be missing. Dot “.” is


often used to indicate a missing value, though. It
is safer that way.

If you know where the data points are exactly in the data, you can indicate the locations in the following way.
data kaz;
infile "C:\TEMP\kaz.txt" ;
input ID 1 SEX $ 4-9 height 13-15 ;
run;
$ indicates that SEX is a character variable. SAS always needs to know if it
is a character variable or a numeric variable.
proc print;
run;

If character variable is just one word (e.g., Male), then we don’t really need to tell SAS about exact locations. SAS will
consider each block of words or numbers as one value. But you need to say “missover,” so in case SAS won’t encounter a
value (as in the third observation in this data set) at an expected place, it will consider it as a missing value. If a character
variable contains more than one word, then use the method above instead of below.

libname here "C:\TEMP";


data kaz;
infile "C:\TEMP\kaz.txt" missover;
input ID SEX $ height ;
run; missover: when data are missing,
SAS will treat them as missing
values
proc print;
run;
Kaz SAS 12
Data Steps and Creating New Variables
6. Examples of data steps
Any SAS program consists of two elements. One is DATA STEPS and the other is PROCs (such as proc print or proc
means). I discuss data steps in this chapter. I show you some variations of data steps, so you understand them by examples.
libname here "C:\TEMP";
libname there "C:\";
I am creating a new temporary data set XYZ (to be found in the WORK
folder) based on an already exisiting temporary data set called ABC (found in
data xyz; the WORK folder).

set abc;
/*here manipulation of data */
run; I am creating a new temporary data set ABC (to be found in the WORK folder)
based on an already existing temporary data set called ABC (found in the
WORK folder). The latter ABC will be overridden by a new data ABC. This is
data abc; perfectly okay.
set abc;
/*here manipulation of data */
run; I am creating a new temporary data set XYZ based on an already existing
permanent data set called ABC (found in the HERE folder, which is
C:\TEMP).
data xyz;
set here.abc;
/*here manipulation of data */
run;
I am creating a new permanent data set ABC in the HERE folder (which is
C:\TEMP) based on an already existing temporary data set called XYZ.
data here.abc;
set xyz;
/*here manipulation of data */
run;
I am creating a new permanent data set ABC in the THERE folder (which is
C:\TEMP) based on an already existing permanent data set called ABC in the
data there.abc; HERE folder (which is C:\).

set here.abc;
/*here manipulation of data */
run;
Reminder:
Temporary data sets: Found in WORK folder. They disappear when a session ends..
Work folder: Click on Explorer Click on LIBRARIES Click on WORK
The HERE folder and THERE folder: HERE and THERE are the arbitrary names that I assigned by giving
LIBNAME statement. They refer to paths that I specified.

7. Manipulating variables in data steps


We use a SAS sample data set sashelp.Class (a data set called Class stored in
Kaz SAS 13
SASHELP folder) to practice creating new variables. Do this to find out
what this data set has:
proc contents data=sashelp.Class;
run;
You get information below, telling you that the data set has AGE, Height,
Name, SEX, and weight.

# Variable Type Len Pos

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

3 Age Num 8 0

4 Height Num 8 8

1 Name Char 8 24

2 Sex Char 1 32

5 Weight Num 8 16

Here is a sample of how you can work on this data set to create Body Mass Index, as well as other useful
variables. You always need to create a new data to create new
data ABC; variables.
set sashelp.Class;
*Creating a character variable indicating a person's BMI status (Body
Mass Index);
weight_metric=weight*0.45359237;
height_metric=(height* 2.54)/100 ;
BMI=weight_metric/(height_metric**2);
/*Definition of obesity Normal weight = 18.5-24.9
Overweight = 25-29.9 Obesity = BMI of 30 or greater */
Without length statement, SAS would set the length of character to the first value it
encounters, which would be “Underweight” in this case.

length status $ 15;


If BMI < 18.5 then status="Underweight";
If BMI => 18.5 and BMI < 25 then status="Normal";
If BMI => 25 and BMI < 30 then status="Overweight";
If BMI >= 30 then status="Obese";
run;
Kaz SAS 14

8. Lots of manipulation techniques to be used in a data step


data abc;
set sashelp.Class;
var1=height+weight;
var2=sum(of height weight);
var3=weight-height;
var4=height*weight;
var5=height/weight;
var6=1/(height+weight);
var7=mean(of height weight);
var7B=mean(height, weight);/*this way is okay too*/
var8=max(of height weight);
var9=min(of height weight);
var10=log(height);
var11=abs(var3); /*Absolute values: this takes out negative signs*/
var12=nmiss(of height weight);/*N of missing cases*/
var13=n(of height weight); /*N of observations*/
run;
proc print;
run;

How is Z=mean(of X1 X2 X3) different from Z=(X1+X2+X3)/3;?


How is Z=sum(of X1 X2 X3) different from Z=X1+X2+X3;?

Functions, such as mean(of …) or sum (of …), take statistics of non-missing values. They do return values even when some
of the variables in the brackets are missing. For example, if X1 is missing:
X=mean (of X1 X2 X3); will return the average of X2 and X3.
In contrast,
X=(X1+X2+X3)/2 will return a missing value, namely, “.”
Kaz SAS 15

9. Using Character Functions to create new variables


data abc;
set sashelp.Class;
var1=name||sex;
var2=compress(name||sex);/*COMPRESS gets rid of space in between*/
var3=substr(name,1,3);/*take the first 3 letters starting from the first
letter*/
var4=upcase(name);/*upper case*/

run;

proc print;
run;

10. Application: How do we restrict analytical samples using NMISS function


Imagine we are running several procedures on your data. We want to always be using the same number of observations, but
sometimes depending on the pattern of missing values, it is hard to use the same data for each procedure. Here is a way to
force your sample to be the same, by making sure that you are using a set of variables that does not have any missing values.
(I forget what pairwise deletion and listwise deltion meant, but this is to do one of them, which is a stricter way of selecting
cases.)

Use NMISS function to create a new variable john.


data ABC;
set sashelp.class;
if name=”Janet” then height=.; /*just imagine Janet was missing a value for her height*/
X=nmiss(of height weight);/*this returns the number of missing cases*/
run;

proc means data=ABC;


where X=0; /*Run only when X=0, namely, number of missing cases is 0*/
var weight height;
run;

proc reg data=ABC;


where X=0; /*Run only when X=0, namely, number of missing cases is 0*/
model height=weight;
run;
Kaz SAS 16

II. Procedures
11. PROC CONTENTS: Description of Contents
Data ABC;set sashelp.Prdsale;
run;
/*1111111111111111111111111*/
/*simple way*/
proc contents data=ABC;
run;
/*I like "position option" because it gives me a table that is sorted by
the position of variables in the data, in addition to alphabetically
sorted table*/
proc contents data=ABC position;
run;

/*Easiest way to produce RTF or EXCEL documents off PROC CONTENTS*/


/*but I don't like this way because it comes with too many details*/
ods rtf file ="C:\TEMP\datadictionary1.rtf";
proc contents data=ABC position;
run;
ods rtf close;

ods html file ="C:\TEMP\datadictionary1.xls";


proc contents data=ABC position;
run;
ods html close;

/*Using ODS we get only the data we want.*/


proc contents data=ABC position;
ods output position=whatever_name_you_want ;
run;

ods rtf file ="C:\TEMP\datadictionary2.rtf";


proc print data=whatever_name_you_want noobs;
title "data dictionary in RTF";
var variable label ;
run;
ods rtf close;
Kaz SAS 17

12. PROC PRINT: See Data


proc print data=sashelp.class;
run;

proc print data=sashelp.class;


VAR name weight height sex;
run;

proc print data=sashelp.class round noobs;


where sex="M";
VAR name weight height sex;
run;

proc sort data=sashelp.class out=kaz;


by height;
run;

proc print data=kaz (obs=5);


title "Observation sorted by height. Also show only five shortest
people";
var name height sex;
run;
Kaz SAS 18

I have cleaned up this document up to here. I am still working on the


rest.
The rest of this manual is based on this data set:
http://www.estat.us/sas/kazclass.txt
Download the digital version of this document and cut and paste the following data. The data comes from
TIMSS (Third International Mathematics and Science Survey). MAT7 is 7th graders’ and MAT8 is 8th
graders’ nation-mean mathematics score. NATEXAM is 1 when a nation has a national examination system,
NATTEXT is 1 if a nation decides on textbooks at the national-level, and NATSYLB is 1 when a nation
decides on syllabus at the national-level. Block is a geographical area. PROP is a proportion of kids in
middle school.
data kaz;

input

acro $ NATION $ 6-14 NAME $ 15-33 MAT7 MAT8 GNP14 PROP NATEXAM NATSYLB NATTEXT block

$;

cards;

aus Australi Australia 498 529.63 -0.15526 84 0 1 0 ocea

aut Austria Austria 509 539.43 -0.29163 100 0 0 1 weuro

bfl Belgi_FL Belgium (Fl) 558 565.18 -0.25157 100 1 1 0 weuro

bfr Belgi_FR Belgium (Fr) 507 526.26 -0.25157 100 0 1 0 weuro

can Canada Canada 494 527.24 0.07184 88 0 0 0 namer

col Colombia Colombia 369 384.76 -0.23699 62 0 1 0 samer

cyp Cyprus Cyprus 446 473.59 -0.41906 95 0 1 1 seuro

csk Czech Czech Republic 523 563.75 -0.34840 86 0 1 0 eeuro

dnk Denmark Denmark 465 502.29 -0.34057 100 1 0 0 weuro

fra France France 492 537.83 0.55791 100 0 1 0 weuro

deu Germany Germany 484 509.16 0.91992 100 0 0 0 weuro

grc Greece Greece 440 483.90 -0.32620 99 0 1 1 seuro

hkg HongKong Hong Kong 564 588.02 -0.31638 98 1 1 1 seasia

hun Hungary Hungary 502 537.26 -0.37602 81 0 0 0 eeuro

isl Iceland Iceland 459 486.78 -0.42606 100 0 0 0 neuro

irn Iran Iran, Islamic Rep. 401 428.33 -0.17095 66 0 1 1 meast

irl Ireland Ireland 500 527.40 -0.38919 100 1 1 0 weuro

isr Israel Israel . 521.59 -0.35464 87 0 1 0 meast

jpn Japan Japan 571 604.77 1.85543 96 0 1 0 seasia

kor Korea Korea 577 607.38 -0.01168 93 0 1 1 seasia

kwt Kuwait Kuwait . 392.18 -0.40359 60 0 1 1 meast

lva Latvia Latvia (LSS) 462 493.36 -0.42319 87 0 0 0 eeuro


Kaz SAS 19
ltu Lithuani Lithuania 428 477.23 -0.41785 78 1 1 1 eeuro

nld Netherla Netherlands 516 540.99 -0.18184 93 1 0 0 weuro

nzl NewZeala New Zealand 472 507.80 -0.38319 100 1 1 0 ocea

nor Norway Norway 461 503.29 -0.35450 100 0 1 1 neuro

prt Portugal Portugal 423 454.45 -0.32588 81 0 1 0 weuro

rom Romania Romania 454 481.55 -0.35396 82 1 1 1 eeuro

rus RussianF Russian Federation 501 535.47 0.12827 88 1 0 0 eeuro

sco Scotland Scotland 463 498.46 0.48017 100 0 0 0 weuro

sgp Singapor Singapore 601 643.30 -0.37279 84 1 1 1 seasia

slv SlovakRe Slovak Republic 508 547.11 -0.40217 89 0 1 0 eeuro

svn Slovenia Slovenia 498 540.80 -0.41310 85 0 1 1 eeuro

esp Spain Spain 448 487.35 0.03461 100 0 1 1 weuro

swe Sweden Sweden 477 518.64 -0.30049 99 0 1 0 neuro

che Switzerl Switzerland 506 545.44 -0.27916 91 0 0 0 weuro

tha Thailand Thailand 495 522.37 -0.14533 37 0 1 1 seasia

usa USA United States 476 499.76 5.37506 97 0 0 0 namer

;run;proc print;run;

13. PROC SORT: Sorting Observations based on a value of variable


You would be using this procedure a lot, but be careful with large data set. This procedure consumes lots of computation
time.
PROC SORT data=kaz out=kaz2;
/*If you don’t want to create a new data set, just write “out=kaz”*/
by mat8;
run;

Advanced topics:
proc sort data=kaz out=kaz2 nodupkey;
by block;
run;
proc print data=kaz2;run;
This takes only the first observation of each block. Imagine that you have data where there are individual level variable
(e.g., 100 students) and group level variable (e.g., 10 schools). Imagine you want to get school level information from this
data. Above procedure would take just the first observation of each school and gets you ten lines of data for 10 schools.
Ignore individual-level variables, however.
Kaz SAS 20

You can use more than one variable in by line.


proc sort data=kaz out=kaz2;
by natexam block;
run;
/*How would the new data look like?*/
proc print data=kaz2;run;

14. PROC MEANS: Get Descriptive Statistics (Mean, STD, Min, Max)
PROC MEANS data=kaz;
VAR mat7 mat8;
run;
Advanced topic: Group means.
/*Report group means*/
proc sort data=kaz out=kaz2;by block;run;
proc means data=kaz2;
by block;
var mat7 mat8;
run;

You can also use “class” statement instead of “by” statement. Class statement is easier because you don’t
need to sort the data by the by-variable before it. I forgot what the downside of it was.

proc means data=kaz2; /*now, kaz2 does not have to be sorted by block*/
class block;
var mat7 mat8;
run;

/*Save group means*/


ods listing close; /*printing of results suppressed*/
proc means data=kaz2; /*make sure kaz2 is already sorted by group ID*/
by block;
var mat7 mat8;
Kaz SAS 21

ods output summary=john; /*Output Delivery System Used. See SAS manual 2*/
run;
ods listing on; /*printing of results resumed*/
proc print data=john;
run;

/*Get standard errors by adding STDERR*/


/*But it would only get standard error, so you must add other statistics you would like with it. Specify mean,
N, STD, MAX, and MIN*/
PROC MEANS data=kaz mean n std max min stderr;
VAR mat7 mat8;run;
run;

I recommend reading a chapter on PROC MEANS in SAS CD-online. It is a very versatile procedure.

15. PROC FREQ: Get Frequencies


PROC FREQ data=kaz;
Tables natexam ;
Run;
Advanced topics:
Get cross tabulation:
PROC FREQ data=kaz;
tables natexam*block;
run;

16. PROC UNIVARIATE: Get elaborate statistics and a univariate plot


PROC UNIVARIATE PLOT DATA=KAZ;
var mat7 mat8 gnp14;
run;
Advanced topic:Get a whisker plot by sub groups, so you can compare group values. But the output is text-
based and pretty ugly.
proc sort data=kaz out=kaz2;
by block;
run;
PROC UNIVARIATE data=kaz2 plot;
by block;
Kaz SAS 22

var mat8;
run;

17. PROC PLOT: Plotting Two Variables


This is text-based graph. Use proc gplot for a nicer graphic.
PROC PLOT data=KAZ;
Plot mat7*mat8;
run;

18. PROC TIMEPLOT: Time Plot


proc timeplot data=KAZ;
plot mat8= '*';
id NAME;
run;
Advanced topics:
/*Sort first by the variable of your interest and see it*/
/*you will be seeing a ranking of nations*/
proc sort data=kaz out=kaz2;
by mat8;
run;
proc timeplot data=KAZ2;
plot mat8= '*';
id NAME;
run;
Add bells and whistles. Below, I am asking, “Does GNP has anything to do with test score?
/*First sort by GNP*/
proc sort data=kaz out=kaz2;
by gnp14;
run;
proc timeplot data=KAZ2;
title “TIMSS countries sorted by GNP”;
plot mat7 mat8/overlay hiloc npp ;
id NAME block gnp14 prop;
run;
19. PROC CORR: Correlation
PROC CORR DATA=KAZ;
VAR mat7 mat8 gnp14;
Kaz SAS 23

Run;

20. PROC REG: OLS Regression


PROC REG DATA=KAZ;
MODEL mat8=natexam gnp14;
Run;

Advanced Topic:
http://www.estat.us/sas/OLS%20tables%20for%20learning.txt

21. PROC LOGISTIC: Logistic Regression


/*I don’t know if natexam can be considered a dependent variable, but for the sake of demonstration*/
PROC logistic data=kaz descend;
Model natexam=gnp14;
run;
/*option descend makes sure that RROC LOGISTIC is modeling the probability that the outcome=1.
Without this option, it would model the probability that the outcome=0*/

22. MAKE AN ASCHI FILE


To use a stand-alone software program, you may have to create a simple aschi file. But I rarely use this lately
because many software read SAS data directly.

data timss;set kaz;


file "aschi_example.txt";
put (nation) (10.0) (mat7 mat8) (8.0);
run;
Kaz SAS 24

III. More Procedures


23. PROC STANDARD: Standardize Values
Make Z-score with a mean of 0 and standard deviation of 1
proc standard data=kaz out=kaz2 mean=0 std=1;
var mat7 mat8;
run;

/*then see what you did*/


proc print data=kaz2;
run;

Advanced technique: Standardize within groups.


/*First sort by group ID*/
proc sort data=kaz out=kaz2;
by block;
run;
/*Use by statement*/
proc standard data=kaz2 out=kaz3 mean=0 std=1;
by block;
var mat7 mat8;
run;
Kaz SAS 25

24. PROC RANK: Rank observations


proc rank data=kaz out=kaz2 group=3;
/*Creates 3 groups. The new values will be 0, 1, and 2. */
var mat7 mat8;
RANKS Rmat7 Rmat8;
/*give names to the new variables*/
Run;

/*see what happened*/


proc print data=kaz2;
var mat7 Rmat7 mat8 Rmat8;
RUN;

Research Tip:
Why do we use rank?
a. We can split the sample based on the rank. e.g., high SES student sample versus low SES student sample.
b. We can create dummy variables quickly by specifying group=2. e.g., high SES student will receive 1; else
0. This grouping occurs at the median point of a variable, which may or may not be always the best strategy.
Alternative way is to assign 1 and 0 based on some meaningful threshold. For example, I have temperature
data, I may use a medium point to split the data if it makes sense, but maybe I use 0 degree (Freezing point)
as a meaningful point to split the data instead.

25. PROC SQL: Creating group-level mean variables


One could use proc means to derive group-level means. I don’t recommend this since it involves extra steps
of merging the mean data back to the main data set. Extra steps always create rooms for errors. PROC SQL
does it at once.
proc sql;
create table kaz2 as
select *,
mean(mat7) as mean_mat7,
mean(mat8) as mean_mat8,
mean(gnp14) as mean_gnp
from kaz
group by block;
run; /*proc sql does not really require run statement, but for the sake of consistency*/
Kaz SAS 26

proc print data=kaz2;


run;

26. PROC IMPORT


Since you learned proc export, why not learn proc import. You can read excel data into SAS by this. For an experiment,
create an excel sheet in C drive and import it into SAS using the following code.

PROC IMPORT OUT= mine


DATAFILE= "C:\example.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;

proc print data=mine;


run;
Kaz SAS 27

IV. Merging Data Sets


libname here “C:\”;

/*Create two data sets A and B.*/


data A;
set kaz; /*I am assuming that you already have this data set “kaz” */
keep nation mat7;
run;

data B;
set kaz;
keep nation mat8;
run;

/*MERGE DATA SETS*/


/*First sort them by a common ID*/
/*Here they are already sorted, so the following two lines are not really necessary*/
proc sort data=A;by nation;run;
proc sort data=B;by nation;run;

data NEW;
merge A B;
by nation;
run;
/*Confirm*/
proc print data=NEW;
run;
Kaz SAS 28

V. MACROs

27. Most common way of using a macro


%macro john (group=,var1=,var2=);
proc means data=sashelp.class;
class &group;
var &var1;
run;
%mend john;

%john(group=sex,var1=height weight);
%john(group=sex,var1=age height);

28. Simple macro using LET statement


%let john=height weight;
proc reg data=sashelp.class;
title "&john ";
model weight=height;
run;

29. Macro can be specified from data (not directly by you)

data X;
x="John's phone Number";
y="312-234-3999";
run;

data x2;
set x;
call symput ("example", x);
run;

proc print data=x2;


title "&example";
var y;
run;

You might also like