4.4K views

Uploaded by sarath.annapareddy

Proc tabulate, Gplot, Glimmix, Proc Reg, Proc Anova, Proc Mixed, Proc catmod, Proc Genmod

- Clinical-SAS
- SAS Slides 1 : Introduction to SAS
- LEARN SAS within 7 weeks: Part4 (More on Manipulating Data)
- INTRODUCTION TO SAS PROCEDURES: 1
- SAS Proficiency Test by Judy Loren
- SAS Macro to Create a Delimited Text File From a SAS Dataset
- LEARN SAS within 7 weeks: Part1
- SAS Slides 2 : Basics of SAS Programming Language
- SAS Programming Skills
- SAS Slides 12 : Macros
- SAS Macro Examples
- SAS Slides 11 : Utility Procedures
- SAS Slides 15 : Improving Effeciency With options in SAS
- SAS Slides 14 : SAS Access / SAS Connect
- SAS Training
- SAS Overview
- SAS Slides 8 : BASE SAS Statistics Procedures
- An Introduction to the SAS System
- Sas Clinical Imp Questions
- sas macros

You are on page 1of 44

Anil Kumar

PROC tabulate

• Summarize the data in the form of a well

organized table

• Syntax:

ClASS class variables;

VAR variables;

TABLE page, row, column description/options;

RUN;

PROC tabulate – example (1)

proc tabulate data=sashelp.Class;

class sex;

var height weight;

table sex, height weight;

run;

Result:

PROC tabulate – example (2)

proc tabulate data=sashelp.Class;

class sex;

var height weight age;

table sex all, (age height weight)*(std mean sum);

run;

Result:

Gplot – A simple example

• SAS/ Graph modular is feathered by the

flexible PROC gplot

• A simple example:

symbol i=none v=star;

plot height*weight;

run;

quit;

Resulting graph

Gplot – further example

• The following example shows more

flexibility of the procedure

goptions reset=all;

proc gplot data=sashelp.Class;

symbol1 color = green i = join v= diamond line = 1 w=2 h=2;

symbol2 color = red i= join v= star line = 2 w=2 h=2;

plot Height*Weight=Sex/ hminor=0 legend=legend1;

legend1 down=1 position=(top center inside)

cshadow = blue frame value = (f=duplex)

ACROSS =1

label=(font=duplex h=1.5);

title f= zapf color=blue h =5pct 'Testing the graph';

run;

General Outline of Model Choices

ANOVA Interval Categorical, Fixed Effects Normality

only

REG Interval Interval, Fixed Effects only Normality

Effects Only

GLM Interval Categorical, Interval, Fixed Normality

Effects Only

GENMOD Categorical, Interval Categorical, Interval, Fixed Exponential

Effects Only Family

MIXED Interval Categorical, Interval, Normality

Random Effects

GLIMMIX Categorical, Interval, Categorical, Interval, Exponential

Random Effects Random Effects Family

Cerrito

PROC REG

• Inputs and output are interval

• Ordinal data may be included

• Assumptions on ε

– Normally distributed

– ε has mean zero and constant variance

– Is independent

• Residual analysis should be a routine part

of the analysis

Residuals

• The studentized residual, the RSTUDENT

statistic, is similar to the the standardized

residual except that the mean square error

is calculated omitting the observation.

• Observations with studentized residual

absolute values of greater than 2 are

potential outliers.

Regression Example

Output

Scatterplot With Regression Line

Residuals

PROC ANOVA

• Each treatment should have exactly the

same number of observations; every

categorical outcome has the same

number of observations.

• Caution: If you use PROC ANOVA for

analysis of unbalanced data, you must

assume responsibility for the validity of the

results.

• Use PROC GLM instead.

Categorical Procedures

Model Output Variable Types of Inputs Assumptions

Effects Only

contingency table. Input can be raw data, cell counts, or direct

input of a covariance matrix

Effects Only Family

Random Effects Random Effects Family

PROC CATMOD

• PROC CATMOD provides a wide variety

of categorical data analyses.

• Now that PROC LOGISITIC handles

classification variables, there is less of a

need to use PROC CATMOD for

regression.

• PROC CATMOD should not be used when

a continuous input variable has many

distinct values.

Output

Logistic Regression

• Binary outcomes

• Allows for any combination of nominal, ordinal or

continuous explanatory variables

• Computes predicted values, the receiver

operating characteristics (ROC) curve and an

approximation to the area beneath the curve ( c ),

and a number of regression diagnostics

• If the occurrence is rare, use the Poisson

distribution in PROC GENMOD.

Generalized Linear Models

In generalized linear models the response is assumed to

possess a probability distribution of exponential form.

That is, the probability density of the response Y for

continuous response variables, or the probability function for

discrete responses, can be expressed as

(omitting some requirements for these functions).

Expressions for the mean and variance are

distributions constitute a broad class of probability density functions.

Don’t confuse this broad family with the exponential pdf.

Distributions and Associated Default

Link Functions Available in PROC

GENMOD

Interval (Quantitative) Procedures

ANOVA Interval Categorical, Fixed Effects Normality

only

REG Interval Interval, Fixed Effects only Normality

Effects Only

GENMOD Categorical, Interval Categorical, Interval, Fixed Exponential

Effects Only Family

MIXED Interval Categorical, Interval, Normality

Random Effects

GLIMMIX Categorical, Interval, Categorical, Interval, Exponential

Random Effects Random Effects Family

Assessing Goodness of Fit -

Akaike’s Information Criterion (AIC)

• Information criteria uses the covariance matrix and the

number of parameters in a model to calculate a statistic

that summarizes the information represented by the

model by balancing a trade-off between a lack of fit term

and a penalty term.

• SAS calculates Akaike’s Information Criterion (AIC)

for every possible 2p models for p ≤ 10 independent

variables.

• AIC estimates a measure of the difference between a

given model and the “true” model. The model with the

smallest AIC among all competing models is deemed the

best model.

• Beal’s example provides SAS code that can be used to

simultaneously evaluate up to 1024 models to determine

the best subset of variables that minimizes the

information criteria among all possible subsets.

Minimum AIC

• The AIC statistic is widely used to select

the best model among alternative

parametric models.

• AIC = - 2( maximum log-likelihood) +

2( number of free parameters)

• The amount of AIC is not meaningful.

• The difference of the two AIC values is

considered insignificant if it is far less than 1.

Beal’s Simulation

• Implements five common statistical techniques

to determine the best linear model

– minimizing the RMSE

– maximizing R2

– forward selection

– backward elimination

– Stepwise regression

• The RMSE is a function of the sum of squared

errors (SSE), number of observations n and the

number of parameters p:

RMSE =sqrt(SSE/(n - p))

Generate the Data

Partial Code for Regressions

Simulation Results: n=1000

Simulation Result: n=10000

AIC Selected Coefficients

for Five Runs

Generalized Linear Mixed Models

PROC MIXED

• The mixed model generalizes the standard linear model:

y=X + Z +

with known design matrix Z, and is an unknown

random error vector whose elements are no longer

required to be independent and homogeneous.

• PROC MIXED is a generalization of the GLM procedure

in the sense that PROC GLM fits standard linear models,

and PROC MIXED fits the wider class of mixed linear

models.

• Both procedures have similar CLASS, MODEL,

CONTRAST, ESTIMATE, and LSMEANS statements.

• But their RANDOM and REPEATED statements differ.

RANDOM and REPEATED Statements

in PROC GLM and PROC MIXED

• The RANDOM statement in PROC MIXED incorporates random

effects constituting the vector in the mixed model.

• However, in PROC GLM, effects specified in the RANDOM

statement are still treated as fixed as far as the model fit is

concerned, and they serve only to produce corresponding expected

mean squares.

• The REPEATED statement in PROC MIXED is used to specify

covariance structures for repeated measurements on subjects.

• The REPEATED statement in PROC GLM is used to specify various

transformations with which to conduct the traditional univariate or

multivariate tests.

• In repeated measures situations, the mixed model approach used in

PROC MIXED is more flexible and more widely applicable than

either the univariate or multivariate approaches.

PROC GLIMMIX

• The GLIMMIX procedure fits statistical models to

data with correlations or nonconstant variability

and where the response is not necessarily

normally distributed.

• These models are known as generalized linear

mixed models (GLMM).

• November 2005: Production level version can

now be downloaded from http://

support.sas.com/rnd/app/da/glimmix.html

PROC GLIMMIX (continued)

• The GLMMs, like linear mixed models, assume

normal (Gaussian) random effects.

• Conditional on these random effects, data can

have any distribution in the exponential family.

• The binary, binomial, Poisson, and negative

binomial distributions, for example, are discrete

members of this family.

• The normal, beta, gamma, and chi-square

distrubtions are representatives of the

continuous distributions in this family.

Summary

• Know what your assumptions are and check

them.

• Theory, methods and techniques evolve.

• Consider using

– PROC GLIMMIX

– Enterprise Guide

• Fit the model to the data!

References

• Akaike, H. (1973), "Information Theory and an Extension of the Maximum Likelihood

Principle," in Petrov and Csaki, eds., "Proceedings of the Second International

Symposium on Information Theory," 267-281.

• Beal, Dennis J. (2005), SAS “Code to Select the Best Multiple Linear Regression Model

for Multivariate Data Using Information Criteria”, Proceedings, Southeast SAS Users

Group Conference.

• Bickel, Peter J. and Doksum, Kjell A. (2001), Mathematical Statistics, Prentice-Hall, Inc.,

Upper Saddle River, NJ.

• Cerrito, Patricia B. (2005), “From GLM to GLIMMIX-Which Model to Choose?” Workshop,

Southeast SAS Users Group Conference.

• Long, J.Scott (1997), Regression Models for Categorical and Limited Dependent

Variables, Thousand Oaks, CA: Sage Publications, Inc.

• McCullagh, P. and Nelder. J. A. (1989), Generalized Linear Models, Second Edition,

London: Chapman and Hall.

• Seber, G.A.F. (1984), Multivariate Observations, John Wiley & Sons, New York.

• Stokes, M.E., Davis, C.S., and Koch, G.G. (2000), Categorical Data Analysis Using the

SAS System, Second Edition, Cary, NC: SAS Institute Inc.

• SAS Online Documentation, http://www.sas.com

• GLIMMIX Procedure Documentation, “The GLIMMIX Procedure, Nov. 2005”, SAS

Institute.

UPCOMING COLLOQUIA

Wed., November 30, at 3:30 PM., presented by Ed Hall

----------------------

Please take a minute to complete the feedback form and

leave it on the counter as you exit. Thank you.

Wednesday-Friday, Nov. 23, 24 and 25. We will re-open on

Monday, November 28th at 9:00 a.m.

Note: EG project files, programs and other SAS source used in the original presentation

are available by request, but they are not contained in this online version - kmg

- Clinical-SASUploaded bySatyendra Gupta
- SAS Slides 1 : Introduction to SASUploaded bySASTechies
- LEARN SAS within 7 weeks: Part4 (More on Manipulating Data)Uploaded bysarath.annapareddy
- INTRODUCTION TO SAS PROCEDURES: 1Uploaded bysarath.annapareddy
- SAS Proficiency Test by Judy LorenUploaded bysarath.annapareddy
- SAS Macro to Create a Delimited Text File From a SAS DatasetUploaded bySASTechies
- LEARN SAS within 7 weeks: Part1Uploaded bysarath.annapareddy
- SAS Slides 2 : Basics of SAS Programming LanguageUploaded bySASTechies
- SAS Programming SkillsUploaded bystevensap
- SAS Slides 12 : MacrosUploaded bySASTechies
- SAS Macro ExamplesUploaded bySASTechies
- SAS Slides 11 : Utility ProceduresUploaded bySASTechies
- SAS Slides 15 : Improving Effeciency With options in SASUploaded bySASTechies
- SAS Slides 14 : SAS Access / SAS ConnectUploaded bySASTechies
- SAS TrainingUploaded byharishkode
- SAS OverviewUploaded bysarath.annapareddy
- SAS Slides 8 : BASE SAS Statistics ProceduresUploaded bySASTechies
- An Introduction to the SAS SystemUploaded byharshad_scribd
- Sas Clinical Imp QuestionsUploaded bySiva Nunna
- sas macrosUploaded byschinnam12
- SAS Slides 7 : Match Merging with DatastepUploaded bySASTechies
- LEARN SAS within 7 weeks: Part3 (Introduction to SAS – SET, MERGE, and Multiple Operations)Uploaded bysarath.annapareddy
- SAS Do ArrayUploaded bysarath.annapareddy
- SAS FundamentalUploaded bysarath.annapareddy
- SAS Interview questions and answersUploaded bysarath.annapareddy
- LEARN SAS within 7 weeks: Part2 (Introduction to SAS – The Data Step)Uploaded bysarath.annapareddy
- SasUploaded bytinkash
- SAS PRG Self Study Essentials 1Uploaded byJordieee
- Advanced SAS Programming Techniques (www.studysas.blogspot.com)Uploaded bysarath.annapareddy
- SAS OverviewUploaded bystevensap

- CDISC SDTM BasicsUploaded bysarath.annapareddy
- Learning SASUploaded bysarath.annapareddy
- Merging Data Seven Different WaysUploaded bysarath.annapareddy
- What is SAS Senior PositionUploaded bysarath.annapareddy
- LEARN SAS within 7 weeks: Part6 (Producing Graphics and Using SAS Analyst)Uploaded bysarath.annapareddy
- Survival Part 2Uploaded bysarath.annapareddy
- Survival Part 1Uploaded bysarath.annapareddy
- SAS_blog1Uploaded bysarath.annapareddy
- SAS OverviewUploaded bysarath.annapareddy
- SAS GRaphUploaded bysarath.annapareddy
- SAS OverviewUploaded bysarath.annapareddy
- SAS Manipulate DatasetsUploaded bysarath.annapareddy
- Sas Library Dataset CreationUploaded bysarath.annapareddy
- SAS FundamentalUploaded bysarath.annapareddy
- SAS ExportUploaded bysarath.annapareddy
- SAS Do ArrayUploaded bysarath.annapareddy
- SAS Sort Accum TotalUploaded bysarath.annapareddy
- SAS Error HandlingUploaded bysarath.annapareddy
- Last Updated : 29 June, 2004Uploaded bysarath.annapareddy
- SAS Accessing DataUploaded bysarath.annapareddy
- SAS Sample Code ProgramsUploaded bysarath.annapareddy
- Learn SAS programming : SAS video tutorials, SAS ebooks, SAS tutorials, SAS tips and Techniques, Base SAS and Advanced SAS certification, CDISC, SDTM, CRfs Annotation, ADAM, define.PDF, DEFINE.xml, SAS intervie Questions and answers, SAS macros, Proc SQL codes, SAS syntax,Uploaded bysarath.annapareddy
- Advanced SAS Programming Techniques (www.studysas.blogspot.com)Uploaded bysarath.annapareddy
- INTRODUCTION TO SASUploaded bysarath.annapareddy
- Parer Point Presentation on SAS by GEUploaded bysarath.annapareddy
- LEARN SAS within 7 weeks: Part5 (Procedures to Summarize Data)Uploaded bysarath.annapareddy

- Postgres Data Model.pdfUploaded byPradeep Yelamanti
- 6 Reasons for Engineering Fluid Dynamics (EFD) Being the Right Choice for CFD AnalysisUploaded bySunny Marks
- Planning 1Uploaded bylamba5
- Chi Square Test for Normal Distribution - A CaseUploaded byRicky Singh
- CS2353 OOAD 3 Models Revsd Year 2012Uploaded bysridharanc23
- Midterm I SolUploaded byprof_altamimi
- ProE TutorialsUploaded byIrfan Khan
- Oracle 12c Ch4Uploaded byMsShanylove
- Ch10navathe SolUploaded byJatin Arora
- FCE Listening Part 1- Free Practice TestUploaded bymadona_joxadze
- 10- ColorModelsUploaded byGosula Ravi Shankar Reddy
- sms-10Uploaded byMajhic Aryono
- Pengantar komputerUploaded bydinda
- 02561-01-2012Uploaded byPaweł Antemijczuk
- CHAPTER4_Continuous Random VariableUploaded byDustin White
- Data Staging Northwind(2)Uploaded bySubandi Wahyudi
- Ejercicio Variable Aletoria ConjuntaUploaded byadheol
- Verilog HDL - Samir PalnitkarUploaded byapi-26531686
- WST02_01_que_20170125Uploaded byMohammed MaGdy
- Histfit2(Data,Nbins,Dist,Ploton)Uploaded byAlwin Anno Sastra
- Arena 8 RandomVariateGenerationUploaded byVictoria Moore
- Time Series and Trend AnalysisUploaded byAbdullah Zakariyya
- Poster ThucTrinhLE Videoinpainting ICIPUploaded byDung Nguyen
- Advances on Tensor Network Theory.pdfUploaded byGiovanni La
- CASO PRACTICO EMPRESA COMERCIAL NIIF PCGE (2).pdfUploaded byRobert Sarmiento
- Contrast StretchingUploaded byKaustav Mitra
- NormalizationUploaded byPriti Mahajan
- Red Hat Openstack Administration i (Cl110)_datasheetUploaded byravi
- Fundamentals of ProbabilityUploaded byKarl John A. Galvez
- Chi-Square TestUploaded bySetang Besar