P. 1
Logistic

# Logistic

|Views: 0|Likes:

### Availability:

Read on Scribd mobile: iPhone, iPad and Android.
See more
See less

10/09/2012

pdf

text

original

# Introduction Probability, Odds, and Odds Ratios Logistic Regression ROC Curves

SAS Logistic Regression
Jason Brinkley - Department of Biostatistics

February 2, 2009

Jason Brinkley - Department of Biostatistics

SAS Logistic Regression

Introduction Probability, Odds, and Odds Ratios Logistic Regression ROC Curves

In traditional multiple regression, the intent is to study what eﬀect diﬀerent covariates have on a quantitative response. There are many scenarios when the main question of interest involves a dichotomous response: Yes/No, Success/Fail, Sick/Well, etc. Traditional methods fail to adequately model this kind of data, let’s look at an example.

Jason Brinkley - Department of Biostatistics

SAS Logistic Regression

Introduction Probability, Odds, and Odds Ratios Logistic Regression ROC Curves

Accident Data

(From Cody) Let’s say we want to see if age, vision status, driver education, and gender can be used to predict whether a person had an accident in the past year. Consider a sample of such individuals (see the website for related ﬁles). So we can import the data from Excel using the Proc Import statement.

Jason Brinkley - Department of Biostatistics

SAS Logistic Regression

MIXED=NO. USEDATE=YES.ACCIDENT DATAFILE= "U:\SAS Workshop\Lecture 3 . SCANTIME=YES. VALUE VISION VALUE YES_NO RUN. PROC FORMAT. SCANTEXT=YES. and Odds Ratios Logistic Regression ROC Curves SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> PROC IMPORT OUT= WORK.Introduction Probability.Linear and Logistic Regression\Cody DBMS=EXCEL REPLACE. RANGE="accident\$". 0 1 0 1 = = = = ’No Problem’ ’Some Problem’. RUN. GETNAMES=YES. Jason Brinkley .Department of Biostatistics SAS Logistic Regression . Odds. ’No’ ’Yes’.

SAS> SAS> LABEL SAS> ACCIDENT = ’Accident in Last Year?’ SAS> AGE = ’Age of Driver’ SAS> VISION = ’Vision Problem?’ SAS> DRIVER_ED = ’Driver Education?’. SAS> RUN.Introduction Probability.Department of Biostatistics SAS Logistic Regression . and Odds Ratios Logistic Regression ROC Curves SAS> DATA LOGISTIC.. SAS> FORMAT ACCIDENT DRIVER_ED YES_NO. SAS> VISION VISION. Jason Brinkley . SAS> SAS> Set Accident. Odds.

SAS> RUN.Department of Biostatistics SAS Logistic Regression . Odds. Driver_ Ed No Yes No No No Obs 1 2 3 4 5 Accident Yes Yes Yes No Yes Age 16 17 17 17 18 Vision Some Problem Some Problem No Problem No Problem Some Problem Gender M M M M M Jason Brinkley .Introduction Probability. and Odds Ratios Logistic Regression ROC Curves SAS> PROC PRINT DATA=LOGISTIC(OBS=5).

Odds.Introduction Probability.Department of Biostatistics SAS Logistic Regression . SAS> PLOT ACCIDENT*AGE. and Odds Ratios Logistic Regression ROC Curves Graph Age Versus Accident SAS> PROC GPLOT. Jason Brinkley . SAS> RUN.

and Odds Ratios Logistic Regression ROC Curves Jason Brinkley .Introduction Probability.Department of Biostatistics SAS Logistic Regression . Odds.

and gender have an impact on the PROBABILITY of an accident. Jason Brinkley .Introduction Probability. Odds. Since what are really interested in is modeling probabilities diﬀerent techniques need to be used. What we are really interested in is whether age.Department of Biostatistics SAS Logistic Regression . driver education. and Odds Ratios Logistic Regression ROC Curves Just by examining the graph we can see why traditional models will fail here. vision status.

in order to understand this type of regression we should ﬁrst talk about odds and odds ratios.Department of Biostatistics SAS Logistic Regression Odds 1 + Odds . Jason Brinkley . and Odds Ratios Logistic Regression ROC Curves Probability and Odds Logistic regression is a good way to model this type of data. then the odds of having an accident are given by P 1−P Sometimes we will ﬁnd it important to go from odds back to probabilities so also note Odds = P= . Odds.Introduction Probability. Let’s say P is the probability of an accident.

and Odds Ratios Logistic Regression ROC Curves Odds Ratios To compare the odds of having an accident between diﬀerent groups (i. diﬀerent genders or diﬀerent vision problems) we often use odds ratios. the ratio of the odds is 3. The corresponding odds are 0.Introduction Probability. Odds.43 and 0.e.91. especially in the scenarios where you have a rare event. Odds and odds ratios are easier to work with mathematically than pure probabilities. Say the chance of having a wreck among people with poor vision is 30% and among people with good vision it’s 10%.11.Department of Biostatistics SAS Logistic Regression . So the odds of an accident have almost quadrupled for people with poor vision problems. Jason Brinkley .

. and Odds Ratios Logistic Regression ROC Curves Logistic Regression Logistic regression ﬁts a model like log(Odds) = β0 + β1 X1 + β2 X2 + . We do logistic regression modeling in SAS with Proc Logistic and it will have many parallels with both Proc Reg and Proc GLM in terms of code format. are the covariates of interest.Introduction Probability. Odds. X2 .Department of Biostatistics SAS Logistic Regression .... Since we have 1 response and 4 potential predictors let’s do a logistic regression with all of our data. .. Where X1 . Jason Brinkley .

and Odds Ratios Logistic Regression ROC Curves Example SAS> SAS> SAS> SAS> SAS> SAS> SAS> PROC LOGISTIC DATA=LOGISTIC DESCENDING. MODEL ACCIDENT = AGE VISION DRIVER_ED GENDER. CLASS GENDER. QUIT.Department of Biostatistics SAS Logistic Regression . TITLE "Predicting Accidents Using Logistic Regression". Jason Brinkley . Odds.Introduction Probability. *Always use the DESCENDING option. RUN.

Department of Biostatistics SAS Logistic Regression . and Odds Ratios Logistic Regression ROC Curves Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Model Information Data Set WORK. Odds. Class Level Information Design Variables 1 -1 45 45 Accident in Last Year? Class Gender Value F M Jason Brinkley .Introduction Probability.LOGISTIC Response Variable Accident Number of Response Levels 2 Model binary logit Optimization Technique Fisher’s scoring Number of Observations Read Number of Observations Used Response Profile Ordered Total Value Accident Frequency 1 Yes 25 2 No 20 Probability modeled is Accident=’Yes’.

0191 0.128 49.0270 Effect Age Vision Driver_Ed Gender DF 1 1 1 1 Pr > ChiSq 0.Introduction Probability.827 Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square 12.633 61.4891 5.4432 9.094 68.094 Criterion AIC SC -2 Log L Intercept Only 63.7321 11.0776 1.0608 Type 3 Analysis of Effects Wald Chi-Square 0.8962 0.827 65.Department of Biostatistics SAS Logistic Regression .0127 0.3109 Jason Brinkley .0170 5. Odds.0132 DF 4 4 4 Pr > ChiSq 0. and Odds Ratios Logistic Regression ROC Curves Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept and Covariates 59.0242 0.0220 0.

00247 1.0776 1.315 21.286 Pairs 500 c 0.040 1.Department of Biostatistics SAS Logistic Regression .0170 5.6 Somers’ D 0.0270 Parameter Intercept Age Vision Driver_Ed Gender F DF 1 1 1 1 1 Estimate 0.8962 0.040 0.566 Percent Discordant 21.0 Gamma 0.0190 0.7160 0.002 5.3847 Pr > ChiSq 0.343 0.3109 Odds Ratio Estimates Point Estimate 1.0169 0.4 Tau-a 0.1373 0.8965 0.574 Percent Tied 1.487 9.559 Effect Age Vision Driver_Ed Gender F vs M Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Association of Predicted Probabilities and Observed Responses Percent Concordant 77.7153 0.0191 0.709 0.0548 0.6758 -1.3796 Wald Chi-Square 0.966 1. and Odds Ratios Logistic Regression ROC Curves Analysis of Maximum Likelihood Estimates Standard Error 1. Odds.800 0.4891 5.158 95% Wald Confidence Limits 0.0242 0.783 Jason Brinkley .180 2.Introduction Probability.7615 0.

Odds. and Odds Ratios Logistic Regression ROC Curves Model Selection SAS> PROC LOGISTIC DATA=LOGISTIC DESCENDING. SAS> MODEL ACCIDENT = AGE VISION DRIVER_ED GENDER/ SAS> SELECTION = BACKWARD. Jason Brinkley .Introduction Probability.Department of Biostatistics SAS Logistic Regression . SAS> CLASS GENDER. SAS> TITLE "Predicting Accidents Using Logistic Regression". SAS> QUIT. SAS> RUN.

5440 Odds Ratio Estimates Point 95% Wald Estimate Confidence Limits 5.0330 Analysis of Maximum Likelihood Estimates Standard Wald DF Estimate Error Chi-Square 1 0.223 0.655 Percent Tied 18.269 Pairs 500 c 0.5457 0.7049 5.9113 0.Introduction Probability.7037 4.0 Gamma 0.0150 0.1313 0.8 Tau-a 0.1110 0.5000 0.056 0.532 Percent Discordant 14.2875 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq Vision 1 5. Odds.0170 0.8962 2 Gender 1 2 1.394 22.Department of Biostatistics SAS Logistic Regression .766 Jason Brinkley . and Odds Ratios Logistic Regression ROC Curves Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Summary of Backward Elimination Effect Number Wald Step Removed DF In Chi-Square Pr > ChiSq 1 Age 1 3 0.9113 1 -1.8389 0.886 Variable Label Age of Driver Gender Parameter Intercept Vision Driver_Ed Pr > ChiSq 0.0150 Driver_Ed 1 4.2 Somers’ D 0.093 0.0414 1 1.7137 0.0330 Effect Vision Driver_Ed Association of Predicted Probabilities and Observed Responses Percent Concordant 67.5440 0.550 1.

PROC GCHART DATA=LOGISTIC. VBAR AGE / MIDPOINTS=10 TO 90 BY 10 GROUP=ACCIDENT. SAS> SAS> SAS> SAS> SAS> SAS> SAS> OPTIONS PS=24. Jason Brinkley .Introduction Probability. this seems counter intuitive. TITLE "Distribution of Ages by Accident Status". Odds. RUN. Let’s explore the data. PATTERN COLOR=BLACK VALUE=EMPTY.Department of Biostatistics SAS Logistic Regression . and Odds Ratios Logistic Regression ROC Curves Age is not signiﬁcant? Our regression models seem to indicate that age is not a signiﬁcant covariate.

Introduction Probability.Department of Biostatistics SAS Logistic Regression . and Odds Ratios Logistic Regression ROC Curves FREQUENCY 8 7 6 5 4 3 2 1 0 123456789 000000000 No 123456789 Age of Driver 000000000 Yes Accident in Last Year? Jason Brinkley . Odds.

RUN. Jason Brinkley . ELSE AGEGROUP = 1. and Odds Ratios Logistic Regression ROC Curves Spike in Young/Old People There seems to be a spike in accidents in the young and old groups. SAS> SAS> SAS> SAS> SAS> SAS> SAS> DATA LOGISTIC. Let’s focus on those people by making a new age group variable that indicates whether a person is between 20 and 65 or not. IF AGE GE 20 AND AGE LE 65 THEN AGEGROUP = 0. Odds. *CREATE AGE GROUPS. SET LOGISTIC.Introduction Probability.Department of Biostatistics SAS Logistic Regression .

SAS> RUN. Jason Brinkley . SAS> MODEL ACCIDENT = AGEGROUP VISION DRIVER_ED GENDER / SAS> SELECTION=BACKWARD. SAS> *THIS NEXT LINE CHANGES THE REFERENCE GROUP. SAS> TITLE "Predicting Accidents Using Logistic Regression".Introduction Probability. Odds. SAS> CLASS GENDER (PARAM=REF REF=’F’). and Odds Ratios Logistic Regression ROC Curves Model Selection SAS> PROC LOGISTIC DATA=LOGISTIC DESCENDING. SAS> QUIT.Department of Biostatistics SAS Logistic Regression .

Department of Biostatistics SAS Logistic Regression .7325 4. Odds.6258 0.3552 0. Summary of Backward Elimination Effect Number Wald Step Removed DF In Chi-Square 1 Gender 1 3 0.083 1.2711 4.0264 Point 95% Wald Effect Estimate Confidence Limits AGEGROUP 8. and Odds Ratios Logistic Regression ROC Curves NOTE: No (additional) effects met the 0.Introduction Probability.9265 Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Odds Ratio Estimates Pr > ChiSq 0.359 Jason Brinkley .8014 7.0070 0.2711 Vision 1 1.05 significance level for removal from the model.209 21.805 41.756 Vision 5.0264 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Intercept 1 -1.8548 2 Driver_Ed 1 2 2.5854 5.1541 Effect AGEGROUP Vision DF 1 1 Pr > ChiSq 0.0070 0.680 1.9265 Variable Label Gender Driver Education? Pr > ChiSq 0.3334 0.1611 0.1886 AGEGROUP 1 2.0227 0.0307 Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Type 3 Analysis of Effects Wald Chi-Square 7.

Jason Brinkley . RUN. and Odds Ratios Logistic Regression ROC Curves Final Model SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> SAS> ods graphics on.Introduction Probability. ods graphics off. MODEL ACCIDENT =VISION AGEGROUP/ CTABLE PPROB =(0 to 1 by . PROC LOGISTIC DATA=LOGISTIC DESCENDING. QUIT. OUTPUT OUT=PREDICTED P=PHAT LOWER=LCL UPPER=UCL.Department of Biostatistics SAS Logistic Regression . Odds. TITLE "Predicting Accidents Using Logistic Regression".10) OUTROC=ROC.

Odds. SAS> PROC PRINT DATA=PREDICTED(OBS=5).90197 0.70197 0. SAS> RUN.92082 0.70197 0. D r i v e r _ E d No Yes No No No O b s 1 2 3 4 5 A c c i d e n t Yes Yes Yes No Yes A g e 16 17 17 17 18 V i s i o n Some Problem Some Problem No Problem No Problem Some Problem G e n d e r M M M M M A G E G R O U P 1 1 1 1 1 _ L E V E L _ Yes Yes Yes Yes Yes P H A T 0.90197 0. SAS> TITLE ’Predicted Probabilities and 95% Confidence Limits’.69586 0.92082 0.Department of Biostatistics SAS Logistic Regression .98288 0.36264 0.69586 0.98288 0.70197 U C L 0.36264 0.98288 Jason Brinkley .Introduction Probability.92082 L C L 0. and Odds Ratios Logistic Regression ROC Curves Predicted Probabilities From our ﬁnal model we can output a new dataset that will have our original data and our predicted probabilites.

it is not used in general practice.Introduction Probability. is looking at percent of concordant and discordant between predicted probabilites and observed response.Department of Biostatistics SAS Logistic Regression . Odds. and Odds Ratios Logistic Regression ROC Curves Assessing your models ﬁt and prediction ability While there does exist a generalized R square measure for these types of models. Some alternatives to looking at model ﬁt. Another more popular option is called a receiver opererating characteristic curve (ROC Curve) Jason Brinkley .

but we will increase our false positives.50 is somewhat arbitrary and perhaps we want to look at cases where the cutoﬀ is larger or smaller than 0. You may assume that from our given model that we would say anyone with a predicted probability of 0.50. By changing that cutoﬀ we should be able to predict more accidents. this table shows us how well our model performs under various prediction probability cutoﬀs.Introduction Probability. and Odds Ratios Logistic Regression ROC Curves Classiﬁcation Table Note that in our last block of code we have listed a classiﬁcation table. Odds.Department of Biostatistics SAS Logistic Regression .50 or greater is likely to be in an accident. Choosing 0. Jason Brinkley .

5 44.Introduction Probability.Speci.0 0.900 1.000 Correct 55.500 0.0 60.0 84.9 68. .300 0.7 26.False tivity ficity POS 100.2 41.7 71.0 48.6 Jason Brinkley .2 68.4 44.0 100.0 44.1 71.000 0.0 0.100 0.0 44.0 100.0 0.0 55.0 30.0 44.0 37.0 84.0 30.0 100. Odds.2 41.200 0.0 0.Department of Biostatistics SAS Logistic Regression .8 62.400 0.7 47.0 26.0 85.6 46. 100.0 55.9 44.0 30.700 0.0 .0 55.7 26.0 21.6 45.0 55.2 55.600 0.0 0.800 0.4 84.1 57. Prob Level 0.4 False NEG .8 84. and Odds Ratios Logistic Regression ROC Curves Predicting Accidents Using Logistic Regression The LOGISTIC Procedure Classification Table Correct NonEvent Event 25 0 25 0 21 0 21 11 21 11 21 11 15 11 11 17 11 20 11 20 0 20 Incorrect NonEvent Event 20 0 20 0 20 4 9 4 9 4 9 4 9 10 3 14 0 14 0 14 0 25 Percentages Sensi.6 55.4 100.1 71.0 0.

Department of Biostatistics SAS Logistic Regression . and Odds Ratios Logistic Regression ROC Curves Sensitivity and Speciﬁcity In this table correct measures the total percentage correct. An ROC curve measures Sensitivity and 1-Speciﬁcity (the false positive rate) across diﬀerent cutoﬀs. Jason Brinkley .Introduction Probability. sensitivity measures how many events (accidents) were successfully predicted. speciﬁcity is the percentage of non-accidents were predicted. Odds.

Department of Biostatistics SAS Logistic Regression . Odds. and Odds Ratios Logistic Regression ROC Curves Jason Brinkley .Introduction Probability.

Introduction Probability. Note that in all ROC curve outputs of this type. We use this measurement to determine how good a ﬁt this model is to the data. Odds.50 or greater. SAS will tell you the area under the ROC curve. by being able to determine how well the ﬁtted model makes predictions.Department of Biostatistics SAS Logistic Regression . and Odds Ratios Logistic Regression ROC Curves SAS does the ROC curve for you easily with ODS graphics and the ”OUTROC=ROC” option after the model statement. Jason Brinkley . but they are an easy to do diagnostic in SAS. ROC Curves are not limited to logistic regression and aren’t always used in the analysis. We want an area under the curve of 0.

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->