An Introduction to SAS

Part I
Department of Computing Services Jie Chen Ph.D. Jie.chen@umb.edu March, 2003
3/13/2003 1

SAS (Statistical Analysis System)
• SAS , short for Statistical Analysis System, is a software system designed for data management and analysis. • With base SAS software you can store data values and retrieve them, modify data. • Obtain statistical analysis and create reports
3/13/2003 2

Goal of This Workshop
Learning enough to use the SAS System to input and output data, to create a simple SAS program file, to run SAS programs and to obtain data analysis using Statistical procedures and graphs.

3/13/2003

3

References
• SAS/STAT User’s Guide Volume 1,2 and 3 (version 8.0) • SAS Language (version 6) • SAS Language and Procedures (version 6) • SAS Procedures Guide for Personal Computers (version 6.03)

3/13/2003

4

Table of Contents
• • • • Introduction To read and create data sets Submitting SAS programs Basic statistical procedures and graphs
– – – –
3/13/2003

PROC PRINT PROC SORT PROC MEANS PROC FREQ
5

1. Introduction
• • • • SAS data management SAS procedures SAS program Open the SAS system

3/13/2003

6

SAS data Management
• The SAS system works with numerical and character data. • The data must be in a SAS data set or an external file which can be read from SAS program. • Some external files can be imported to SAS system
3/13/2003 7

SAS Procedures
• SAS procedures use data values from SAS data sets to produce preprogrammed reports requiring minimal effort from you • An example: PROC PRINT data = example; title ‘This is a subset of data’; run;
3/13/2003 8

A SAS Program
The statements in a SAS program are divided into two kinds of steps: • DATA steps:
– to create one or more new SAS data sets.

• PROC steps:
– To call a procedure from SAS library and to execute that procedure.
3/13/2003 9

Open The SAS System
• Click Start/Programs • Select The SAS System and
– The SAS System for Windows v 8

• The PROGRAM EDITOR-(Untitled ) window is active for syntax.

3/13/2003

10

2. To Read and Create Data Sets
• Type of text data file (ASCII format)
– with delimiters – without delimiters

• Read data from the external text files • Output data to the external text files

3/13/2003

11

Entering data at the program editor window- CARDS statement
data example; input age educ $ race sex ; cards; 84 8 1 1 65 bd 1 1 82 hg 1 0 ; run;
3/13/2003 12

Using PROC PRINT to view data
data example; input age educ $ race sex ; cards; 84 8 1 1 65 bd 1 1 82 hg 1 0 ; run; proc print data = example; run;
3/13/2003 13

Submitting the SAS program
• To execute the statements
– Type in F3 or Click the run button – Or Select Local/Submit .

• To recall SAS statement you have submitted
– Click Window/Program Editor – Click Locals /Recall text
3/13/2003 14

After Running A SAS Program
When you execute a SAS program, the output generated by SAS is divided into two major parts: • SAS log : contains information about the processing of the SAS program, including warning and error messages • Output: contains reports generated by SAS procedures and DATA steps.
3/13/2003 15

Saving a SAS program file containing data
1. Click on File…Save As 2. A dialog box will appear. 3. Verify that the desired folder and extension name (.sas) are chosen in the dialog box. 4. Type a file name in the File Name text box, for example ‘myfile’. 5. Click the Save button to save the data.
3/13/2003 16

SAS Naming Conventions
SAS file names can be up to eight characters long. The first character must be letter (A-Z). The following characters can be letters and numbers (0-9). Blanks cannot appear in SAS file names and special characters such as $, @, %, &, and # are not allowed.
3/13/2003 17

Clearing Data Editor Window
• Click Edit • Click Clear All • The program editor window is clear

3/13/2003

18

Reading Data from a Text File with Delimiters
data sample; infile ‘a:\sample1.txt’; input age educ $ race sex ptotinc famsize fincome region; run; proc print data = sample; run;
3/13/2003 19

Reading Data from a Text File without Delimiters
data sample2; infile ‘a:\sample2.txt’; input name $ 1-20 age 21-22 educ 23-24 race 25 sex 26 ptoi92 27-31 famsize 32 fincome 33-37 region 48; run; proc print data = sample2; run;
3/13/2003 20

To Create a Text Data File
with fewer variables
data _null_; set sample2; file ‘a:\sub1.dat’; put name $ age region; run;

3/13/2003

21

To Create a Text Data File
with fewer observations
data _null_; set sample; if age > 30; file ‘a:\sub2.dat’; put name $ age region; run;
3/13/2003 22

4. Basic Statistical Procedures and Graphs •
• • • • • PROC MEANS
– for all sample – with BY statement – with CLASS statement

PROC SORT PROC FREQ PROC REG PROC PLOT PROC UNIVARIATE
23

3/13/2003

PROC MEANS
data sample; infile ‘a:\sample1.txt’; input age educ $ race sex ptotinc famsize fincome region; run; proc means data = sample; var age ptotinc famsize ; run;
3/13/2003 24

PROC SORT
proc sort data = sample out = list; by sex region; run; proc print data = list; run;

3/13/2003

25

PROC MEANS with BY Statement
proc means data = list; var ptotinc famsize famsize; by sex; run;

3/13/2003

26

PROC MEANS with class Statement
proc means data = sample; class sex; var ptotinc famsize famsize; run;

3/13/2003

27

PROC FREQ (one way)
proc freq data = sample; table region race; run;

3/13/2003

28

PROC FREQ (two ways)
proc freq data = sample; table region*race / nopercent chisq; run;

3/13/2003

29

3/13/2003

30

y

=

β

0

An Example of Linear Regression
Y=a+bX+e

where

is the regressor variable (famsize). a, b: are the unknown parameters. y: is the response variable (fincome). e : is the unknown error.
31

x:

3/13/2003

PROC REG
proc reg data = sample; model fincome = famsize ; output out = out1 p = pred1 r = resid1; run;

3/13/2003

32

The output of the regression model
• • • • • R-square = .2507 F Value = 9.37 P Value = .0048 < .05 Estimated intercept = 11617 Estimated slop = 8165, when the family size increases one unite the family income will increase 8146.
33

3/13/2003

A Fitted Equation
y = 11617 + 8165 x fincome = 11617 + 8165 famsize

3/13/2003

34

Checking the Assumptions of normality and randomness
proc rank data = out1 normal = blom; var resid1; ranks nscore; proc plot ; plot resid1*pred1; plot resid1*nscore; run;
3/13/2003 35

Q-Q Plot

3/13/2003

36

PROC UNIVARIATE
proc univariate data = out1 normal plot; var resid1; run;

3/13/2003

37

The distribution of residuals

3/13/2003

38

Sign up to vote on this title
UsefulNot useful