You are on page 1of 13

An Introduction to SAS

Part I
Department of Computing Services
Jie Chen Ph.D.
Jie.chen@umb.edu
March, 2003

3/13/2003 1

SAS (Statistical Analysis System)

• SAS , short for Statistical Analysis System,


is a software system designed for data
management and analysis.
• With base SAS software you can store data
values and retrieve them, modify data.
• Obtain statistical analysis and create
reports

3/13/2003 2

Goal of This Workshop


Learning enough to use the SAS System to
input and output data, to create a simple
SAS program file, to run SAS programs and
to obtain data analysis using Statistical
procedures and graphs.

3/13/2003 3
References
• SAS/STAT User’s Guide Volume 1,2 and 3
(version 8.0)
• SAS Language (version 6)
• SAS Language and Procedures (version 6)
• SAS Procedures Guide for Personal
Computers (version 6.03)

3/13/2003 4

Table of Contents
• Introduction
• To read and create data sets
• Submitting SAS programs
• Basic statistical procedures and graphs
– PROC PRINT
– PROC SORT
– PROC MEANS
– PROC FREQ
3/13/2003 5

1. Introduction
• SAS data management
• SAS procedures
• SAS program
• Open the SAS system

3/13/2003 6
SAS data Management
• The SAS system works with numerical and
character data.
• The data must be in a SAS data set or an
external file which can be read from SAS
program.
• Some external files can be imported to SAS
system
3/13/2003 7

SAS Procedures
• SAS procedures use data values from SAS
data sets to produce preprogrammed reports
requiring minimal effort from you
• An example:
PROC PRINT data = example;
title ‘This is a subset of data’;
run;

3/13/2003 8

A SAS Program
The statements in a SAS program are
divided into two kinds of steps:
• DATA steps:
– to create one or more new SAS data sets.
• PROC steps:
– To call a procedure from SAS library and to
execute that procedure.

3/13/2003 9
Open The SAS System
• Click Start/Programs
• Select The SAS System and
– The SAS System for Windows v 8
• The PROGRAM EDITOR-(Untitled )
window is active for syntax.

3/13/2003 10

2. To Read and Create Data Sets


• Type of text data file (ASCII format)
– with delimiters
– without delimiters
• Read data from the external text files
• Output data to the external text files

3/13/2003 11

Entering data at the program editor


window- CARDS statement
data example;
input age educ $ race sex ;
cards;
84 8 1 1
65 bd 1 1
82 hg 1 0
;
run;
3/13/2003 12
Using PROC PRINT to view data
data example;
input age educ $ race sex ;
cards;
84 8 1 1
65 bd 1 1
82 hg 1 0
;
run;
proc print data = example;
run;

3/13/2003 13

Submitting the SAS program


• To execute the statements
– Type in F3 or Click the run button .
– Or Select Local/Submit
• To recall SAS statement you have
submitted
– Click Window/Program Editor
– Click Locals /Recall text

3/13/2003 14

After Running A SAS Program


When you execute a SAS program, the output
generated by SAS is divided into two major
parts:
• SAS log : contains information about the
processing of the SAS program, including
warning and error messages
• Output: contains reports generated by
SAS procedures and DATA steps.
3/13/2003 15
Saving a SAS program file
containing data
1. Click on File…Save As
2. A dialog box will appear.
3. Verify that the desired folder and extension
name (.sas) are chosen in the dialog box.
4. Type a file name in the File Name text
box, for example ‘myfile’.
5. Click the Save button to save the data.

3/13/2003 16

SAS Naming Conventions


SAS file names can be up to eight characters
long. The first character must be letter (A-Z).
The following characters can be letters and
numbers (0-9). Blanks cannot appear in SAS
file names and special characters such as $,
@, %, &, and # are not allowed.

3/13/2003 17

Clearing Data Editor Window


• Click Edit
• Click Clear All
• The program editor window is clear

3/13/2003 18
Reading Data from a Text File
with Delimiters
data sample;
infile ‘a:\sample1.txt’;
input age educ $ race sex ptotinc famsize
fincome region;
run;
proc print data = sample;
run;
3/13/2003 19

Reading Data from a Text File


without Delimiters
data sample2;
infile ‘a:\sample2.txt’;
input name $ 1-20 age 21-22 educ 23-24
race 25 sex 26 ptoi92 27-31 famsize 32
fincome 33-37 region 48;
run;
proc print data = sample2;
run;
3/13/2003 20

To Create a Text Data File


with fewer variables
data _null_;
set sample2;
file ‘a:\sub1.dat’;
put name $ age region;
run;

3/13/2003 21
To Create a Text Data File
with fewer observations
data _null_;
set sample;
if age > 30;
file ‘a:\sub2.dat’;
put name $ age region;
run;

3/13/2003 22

4. Basic Statistical Procedures


and Graphs
• PROC MEANS
– for all sample
– with BY statement
– with CLASS statement
• PROC SORT
• PROC FREQ
• PROC REG
• PROC PLOT
• PROC UNIVARIATE
3/13/2003 23

PROC MEANS
data sample;
infile ‘a:\sample1.txt’;
input age educ $ race sex ptotinc famsize fincome
region;
run;
proc means data = sample;
var age ptotinc famsize ;
run;
3/13/2003 24
PROC SORT
proc sort data = sample out = list;
by sex region;
run;
proc print data = list;
run;

3/13/2003 25

PROC MEANS
with BY Statement
proc means data = list;
var ptotinc famsize famsize;
by sex;
run;

3/13/2003 26

PROC MEANS
with class Statement
proc means data = sample;
class sex;
var ptotinc famsize famsize;
run;

3/13/2003 27
PROC FREQ (one way)
proc freq data = sample;
table region race;
run;

3/13/2003 28

PROC FREQ (two ways)


proc freq data = sample;
table region*race / nopercent chisq;
run;

3/13/2003 29

3/13/2003 30
y = β 0

An Example of Linear
Regression
Y=a+bX+e
where x: is the regressor variable
(famsize).
a, b: are the unknown parameters.
y: is the response variable
(fincome).
e : is the unknown error.
3/13/2003 31

PROC REG
proc reg data = sample;
model fincome = famsize ;
output out = out1 p = pred1 r = resid1;
run;

3/13/2003 32

The output of the regression


model
• R-square = .2507
• F Value = 9.37
• P Value = .0048 < .05
• Estimated intercept = 11617
• Estimated slop = 8165, when the family size
increases one unite the family income will
increase 8146.

3/13/2003 33
A Fitted Equation
y = 11617 + 8165 x

fincome = 11617 + 8165 famsize

3/13/2003 34

Checking the Assumptions of


normality and randomness
proc rank data = out1 normal = blom;
var resid1;
ranks nscore;
proc plot ;
plot resid1*pred1;
plot resid1*nscore;
run;

3/13/2003 35

Q-Q Plot

3/13/2003 36
PROC UNIVARIATE
proc univariate data = out1 normal plot;
var resid1;
run;

3/13/2003 37

The distribution of residuals

3/13/2003 38

You might also like