5logreg1 02

Lecture Notes on Logistic Regression Outline: 1. Review and Some Uses & Examples. 2. Interpreting logistic regression models.
3. Inference for logistic regression. 4. Model checking. 5. Logit models for qualitative explanatory variables. 6. Multiple logistic regression. 7. Sample size & power. (Logit models for ordinal (polytomous) reponses covered later) Additional references: Collett, D. (1991). Analysis of Binary Data. Hosmer, D.W., & Lemeshow, S. (1989). Applied Logistic Regression. McCullagh, P, & Nelder, J.A., (1989). Generalized Linear Models, 2nd Edition. SAS Institute (1995). Logistic Regression Examples Using the SAS System, Version 6. Example data sets from SAS book are available via Anonymous ftp to ftp.sas.com. World wide web http://www.sas.com
1
Review of Logistic Regression The logistic regression model is a generalized linear model with Random component: The response variable is binary. Yi = 1 or 0 (an event occurs or it doesnt). We are interesting in probability that Yi = 1, (xi). The distribution of Yi is Binomial. Systematic component: A linear predictor such as + 1x1i + . . . + j xji The explanatory or predictor variables may be quantitative (continuous), qualitative (discrete), or both (mixed). Link Function: The log of the odds that an event occurs, otherwise known as the logit: logit( ) = log 1 Putting this all together, the logistic regression model is (xi ) = + 1 x1i + . . . + j xji logit( (xi)) = log 1 (xi )

Some Uses of Logistic Regression and Some Example Data Sets To get an idea of the range of problems for which we can use logistic regression. . . . 1. To model the probabilities of certain conditions or states as a function of some explanatory variables. To identify Risk factors for certain conditions (e.g., divorce, well adjusted, disease, etc.). Diabetes Example (I got these data from SAS Logistic Regression Examples who got it from Friendly (1991) who got it from Reaven & Miller, 1979). In a study of the relationship between various blood chemistry measures and diabetic status, data were collected from 145 nonobese adults who were diagnosed as Subclinical diabetics, Overt diabetics, or Normal The possible explanatory variables: Relative weight (persons weight/expected weight given height). Fasting plasma glucose. Test plasma glucose intolerence. Plasma insulin during test (measure of insulin response to oral glucose). Steady state glucose (measure of insulin resistance).
3
2. To describe dierences between individuals from separate groups as a function of some explanatory variables a descriptive discriminate analysis. High School and Beyond data: The response variable is whether a student attended an academic program or a non-academic program (i.e., general or vocational/techincal). Possible explanatory variables include Achievement test scores (continuous) reading, writing, math, science, and/or civics. Desired occupation (discretenominal) 17 of them. Socio-Economic status (discreteordinal) low, middle, high. Goal/Purpose: Describe dierences between those who attended academic versus non-academic programs. 3. To adjust for bias in comparing 2 groups in observational studies (Rosenbaum & Rubin, 1983) Propensity = Prob(one group given explanatory variables)
4. To predict probabilities that individuals fall into one of 2 categories on a dichotomous response variable as a function of some set of explanatory variables. This covers lots of studies (from epidemiological to educational measurement). Example: ESR Data from Collett (1991). A healthy individual should have an erythrocyte sedimentation rate (ESR) less than 20 mm/hour. The value of ESR isnt that important, so the response variable is just Yi = 1 if ESR < 20 or healthy = 0 if ESR 20 or unhealthy
The possible explanatory variables were Level of plasma brinogen (gm/liter). Level of gamma-globulin (gm/liter). 5. To classify individuals into one of 2 categories on the basis of the explanatory variables. Eron (1975), Press & Wilson (1978), and Amemiy & Powell (1980) compared logistic regression to discriminant analysis (which assumes the explanatory variables are multivariate normal at each level of the response variable).
5
6. To analyze responses from discrete choice studies (estimate choice probabilities). From SAS Logitistic Regression Examples (hypothetical). Chocolate Candy: 10 subjects presented 8 dierent chocolates choose which one of the 8 is the one that they like the best. The 8 chocolates consisted of 23 combinations of Type of chocloate (milk or dark). Center (hard or soft). Whether is had nuts or not. The response is which chocolate most preferred. The dierent names for this particular logit model are The multinomial logit model. McFaddens model. Conditional logit model. This model is related to Bradley-Terry-Luce choice model. This model is used To analyze choice data and use characteristics of the objects or attributes of the subject as predictors of choice behavior. In marketing research to predict consumer behavior. As an alternative to conjoint analysis.
6
7. In social network analysis, data often consist of individuals (people, organizations, countries, etc.) within a group or network upon which relations are recorded (e.g., is friends with, talks to, does business with, trades, etc). The relations can be Undirected (e.g., is biologicaly related to) Directed (e.g., asks advise from, gives money to) Example: Data from Parker & Asher (1993). Children in 35 dierent classes were asked who they were friends with (in the class). Other measures were also taken, including gender, race, a loneliness or connectedness measure, and others. This sort of data is often organized in a sociomatrix, which consists of a matrix of binary random variables: Xij = 1 = 0 if i chooses j otherwise
A family of models used to analyze such data is loglinear (Wasserman & Faust 1995); however, these models make the assumption that the pairs of individuals (dyads) are independent, which has been the major criticism of these models. Solution: Logit/Logistic regression where you model the odds of a the existance of a relation between two actors conditioning upon the present/absence of ties between actors in the rest of the network (see Wasserman & Pattison, 1996; Anderson, Wasserman & Crouch, 1999)
7
8. and lots of other possibilities and examples . . . .
Interpreting the Logistic Regression Model The model (x) = + x logit( (x)) = log 1 (x) or alternatively in terms of (x) (x) = exp{ + x} 1 + exp{ + x}

In considering the various interpretations of logistic regression, well use the High School and Beyond data (for now). The variables: Response: Yi = 1 if the student attended an academic program, and 0 if the student attended a non-academic program. Explanatory: xi = students mean of ve achievement test scores: reading writing math science civics where each test on a T score scale (i.e., mean = 50, and standard deviation = 10).
9
HSB Example: The simplist model, the linear probability model (i.e., link= identity, and distribution is Binomial). (xi) = .9386 + .0281xi Problems: We get some negative tted values and some greater than 1. The rate of change in the probability of attending an academic program is not constant across possible values of the mean Achievement T scores.
Logistic regression (i.e., logit link and Binomial distribution). The estimated model i i logit( i) = log = + x 1 i = 7.0548 + .1369xi equals .0133. Note: ASE for equals .6948 and ASE for

10
To help see how well this model does, the mean scores were grouped into 11 categories (a lot fewer than the 531 unique math scores).
Group x < 40 40 x < 45 45 x < 47 47 x < 49 49 x < 51 51 x < 53 53 x < 55 55 x < 57 57 x < 60 60 x < 65 65 x where
# attend Observed Predicted Prob Predicted academic # cases proportion Sum Equation #academic 8 46 .17 .12 .13 5.64 18 87 .20 .19 .23 16.75 14 44 .31 .27 .32 12.82 17 43 .39 .36 .38 15.46 18 50 .36 .40 .45 20.00 22 50 .44 .49 .52 24.98 34 58 .58 .49 .58 28.52 23 44 .52 .56 .65 24.70 35 68 .51 .58 .72 39.52 56 78 .71 .75 .82 58.62 26 32 .81 .80 .89 25.68
(xi ), where the sum is over Predicted # academic = i those values of i that correspond to observations with xi within each of math score categories. Predicted probability Sum = Predicted probability Equation = exp(7.0548+.1369(achievei))/(1+exp(7.0548+.1369(achievei ))
11
(xi)/(# i
cases).
How I did this in SAS. . . data group1; set preds; if achieve<40 then else if achieve>=40 else if achieve>=45 else if achieve>=47 else if achieve>=49 else if achieve>=51 else if achieve>=53 else if achieve>=55 else if achieve>=57 else if achieve>=60 else if achieve>=65
grp=1; and achieve<45 and achieve<47 and achieve<49 and achieve<51 and achieve<53 and achieve<55 and achieve<57 and achieve<60 and achieve<65 then grp=11;
then then then then then then then then then
grp=2; grp=3; grp=4; grp=5; grp=6; grp=7; grp=8; grp=9; grp=10;
proc sort data=group1; by grp; proc means data=group1 sum; by grp; var academic count fitted; output out=grpfit sum=num_aca num_cases fit2; data done; set grpfit; p=num_aca/num_cases; pp = fit2/num_cases; run;
12
Recall that determines the rate of change of the curve of (x) (plotted with values of x along the horizontal axis) such that If If If > 0, < 0, = 0, then the curve increases with x then the curve decreases with x then curve is at (horizontal)
To see how the curve changes as changes: Curve on the left: logit( (x)) = 7.0548 + .2000x Curve on the right: logit( (x)) = 7.0548 + .1369x
13
To see how curves change with dierent s: Curve on the left: logit( (x)) = 4.0000 + .1369x Curve on the right: logit( (x)) = 7.0548 + .1369x
14
Comparing three curves: Curve 1 : logit( (x)) = 7.0548 + .1369x Curve 2 : logit( (x)) = 4.0000 + .1369x Curve 3 : logit( (x)) = 7.0548 + .2000x
15
Linear Approximation Interpretation To illustrate this, well use the model estimated for the High School and Beyond Data, (xi ) = exp{7.0548 + .1369xi} 1 + exp{7.0548 + .1369xi}
Since the function is curved, the change in (x) for a unit change in x is not constant but varies with x.
16
At any given value of x, the rate of change corresponds to the slope of the curve draw a line tangent to the curve at some value of x, and slope (rate of change) equals (x)(1 (x)) For example, when the math achievement score x equals 70, exp{7.0548 + .1369(70)} = .92619 1 + exp{7.0548 + .1369(70)} 1 (70) = 1 .9261 == .0739 (70) = and the slope equals (.1369)(.9261)(.0739) = .009. The slope is greatest when (x) = (1 (x)) = .5; that is, when x = / = (7.0548)/.1369 = 51.53 (51.53) = (1 (51.53)) = .5 and slope at x = 51.53 is (.1369)(.5)(.5) = .034 The value of x = / is called the median eective level or EL50 (for short), because it is the point at which each event is equally likely.
17
Some other values: xi 70 60 52 51.5325 43.065 i .9261 .7612 .5160 .5000 .2388 1 i .0739 .2388 .4840 .5000 .7612 Slope at xi .009 .025 .03419 .03423 .025
18
Odds Ratio Interpretation As well see, this is somewhat simpler than the previous interpretation. The logit/logistic regression model is
logit( (x)) = log
(x) = + x 1 (x)
which can be written as a multiplicative model by taking the exponential of both sides, (x) = exp{ + x} = eex 1 (x) This is a model for odds. Odds increase multiplicatively with x. A 1 unit increase in x leads to an increase in the odds of e . So the odds ratio for a 1 unit increase in x equals (x + 1)/(1 (x + 1)) = e (x)/(1 (x)) When = 0, e0 = 1, so the odds do not change with x. The logarithm of the odds changes linearly with x; however, the logarithm of odds is not an intuitively easy or natural scale to interpret.
19
HSB Example of odds ratio interpretation: = .1369, a 1 unit increase in mean achievement test scores Since leads to an odds ratio of odds ratio for (x = 1) = e.1369 = 1.147
In other words, the odds of having attended an academic program given a mean achievement score of x + 1 is 1.147 times the odds given a math score of x. A change in x of 10 (1 standard deviation on the T score scale) leads to an odds ratio of e e (x+10) odds ratio for (x = 10) = e e (x) e exe (10) = e e (x) = e (10) For our example, e.1369(10) = 3.931 i.e., the odds of having attended an academic program given a mean achievement score of (say) 50 is 3.931 times the odds given a mean achievement test score of 40. Unlike the interpretation in terms of probabilities (where the rate of change in (x) is not constant for equal changes in x), the odds ratio interpretation leads to constant rate of change.
20
Logistic Models when the Explanatory variable is Random and the Response is Fixed This happens in retrospective studies (e.g., casecontrols) Example from Hosmer & Lemeshow (1989): In a study investigating the risk factors for low birth weight babies, the risk factors that were considered are Race Smoking status of mother History of high blood pressure History of premature labor Presence of uterine ittriatbility Mothers pre-pregnancy weight The 56 women who gave birth to low weight babies in this study were matched on the basis of age with a randomly selected control mother (i.e. each control gave birth to a normal weight baby and was the same age as the case mother). If the distribution of explanatory variables (i.e., risk factors) is dierent for the case and control moms, then this is evidence of an association between low birth weight and the explanatory variables.
21
The estimated coecients of an explanatory variable can be used to estimate the odds ratio (note: this only works for logit/logistic regression model for binary data, and does not work for linear & probit models for binary data). Youll have to wait for the results until later in semester (or check references).
22
Whether a logistic regression model is a good description of a data set is an empirical question, except for one particular case. . . The logistic regression model will necessarily hold when 1. The distribution of X for all those with Y = 1 is N (1, 2). 2. The distribution of X for all those with Y = 0 is N (0, 2). Do these assumptions sound familar? If these 2 conditions hold, then (x) follows a logistic regression curve. The sign of is the same as the sign of 1 0. If the variances are quite dierent, then a logistic regression model for (x) that also contains a quadratic term is likely to t the data well. (x) 2 = + 1 x1 + 2 x logit( ((x)) = log 1 1 (x)

23
Inference for Logistic Regression or the signicance and size of eects
1. Condence intervals for parameters. 2. Hypothesis testing. 3. Distribution of probability estimates. (1) and (2) will follow much like what we did for Poisson regression. (3) will be a bit dierent. Condence Intervals in Logistic Regression Since we use maximum likelihood estimation, for large samples, the distribution of parameter estimates is approximately normal. A large sample (1 )100% condence interval for is z/2(ASE ) where here refers to the signicance level (and not the intercept of the model). Example (High School and Beyond): A 95% condence interval for , the coecient for mean achievement test scores is .1369 1.96(.0133) (.1109, .1629)
24
To get an interval for the eect of mean achievement score on the odds, that is for e , we simply take the exponential of the condence interval for . Example: (e.1109, e.1629) (1.1173, 1.1770) To get an interval for the linear approximation (x)(1 (x)), multiply the endpoints of the interval for by (x)(1 (x)). For (x) = .5, so (x)(1 (x)) = .25, a 95% condence interval for (x)(1 (x)), the slope when X = x, is (.25)(.1109), (.25)(.1629) (.0277, .0407) So the incrimental rate of change of (x) when = 51.5325 is an increase in probability of 1.0277 to x = / .0407.
25
Hypothesis Testing: HO : = 0 (X is not related to response) 1. Wald test 2. Likelihood ratio test
Wald test. For large samples, z= ASE is approximated N (0, 1) when HO : = 0 is true. So for 1-tailed tests, just refer to standard normal distribution. 2 Wald statistic = ASE which if the null is true is approximately chi-square distributed with df = 1. HSB Example: HO : = 0 (i.e., mean achievement is not related to whether a student attended an academic or nonacademic program) versus HA : = 0.
2 . 1369 = (10.29)2 = 106 Wald statistic = .0133

which with df = 1 has a very small pvalue.
26
Likelihood ratio test statistc (the more powerful alternative to Wald test). test statistic = 2(LO L1) where LO is the log of the maximum likelihood for the model logit( (x)) = and L1 is the log of the maximum likelihood for the model logit( (x)) = + x If the null is true, then the likelihood ratio test statistic is approximately chi-square distributed with df = 1. Example: Likelihood ratio test statistic = 2(LO L1) = 2(415.6749 (346.1340)) = 138.08 which with df = 1 (1 less parameter in the null model) has a very very small pvalue. (These are GENMOD type3). Why is the Wald statistic only 106, while the likelihood ratio statistic is 138.08 and both have the same df ? Likelihood ratio test is more powerful.
27
Condence Intervals for Probability Estimates Our estimated probability for X = x, } exp{ + x (x) = } 1 + exp{ + x We can get condence intervals for (x) using the estimated model. HSB example with Mean achievement score as the explanatory variable: Suppose we are interested in the probability when the mean achievement score = 50.1, then the estimated probability (or propensity) that a student attended an academic program equals exp{7.0548 + .1369(51.1)} 1 + exp{7.0548 + .1369(51.1))} = e.05842/(1 + e.05842) = .4854
(50.1) =
and from PROC GENMOD, a 95% condence interval for the true probability when x = 51.1 is (.440, .531) Where does this all come from?
28
If you use SAS/GENMOD with the obstats option, the table created by obstats contains: Column Label Pred Xbeta Std Lower Upper HSB Example Translation (for x = 51.1) (x) 0.4853992 x) = logit( + x 0.05842 x)) logit( Var( 0.0927311 Lower value of 95% CI for (x) .4402446 Upper value of 95% CI for (x) .5307934
and how SAS got Std, Lower, and Upper.. . . To get a condence interval for the true probability (x), we nd a condence interval for the logit(x) and then transform is back to probabilities. To compute a condence interval for the logit, + x, we need an x), that is, estimate of the variance of logit( ) Var( + x which is equal to ) = Var( ) + 2xCov( ) , Var( + x ) + x2Var(
29
The estimated variances and covariances are a by-product of the estimation procedure that SAS uses. The CovB option in the model statement requests that the estimated variance/covariance matrix of estimates parameter be printed (in the listing le or output window). Estimated Covariance Matrix Prm1 Prm1 Prm2 0.48271 -0.009140 Prm2 -0.009140 0.0001762
To get this use the covb option to the MODEL statement. Estimated Covariance Matrix of Estimated Parameters 0.48271 -0.009140 -0.009140 0.0001762 = Note: ASE of .0001762 = .0133, as previously given.
30
So for x = 51.1, ) = Var( ) + 2xCov( ) , Var( + x ) + x2Var( = .48271 + (51.1)2(.0001762) + 2(51.1)(.009140) = .008697 )= and Var( + x
.008697 = .0933
A 95% condence interval for the true logit for x = 51.1 is 1.96 Var( ) + x + x 0.0584 1.96(.0933) (.2413, .1245) and nally to get the 95% condence interval for the true probability when x = 51.1, transform the endpoints of the interval for the logit to probabilities:
exp(.2413) exp(.1245) (.44, .53) , 1 + exp(.2413) 1 + exp(.1245)
31
Model vs Non-model based: The model based condence intervals are better than the non-model based ones. Consider the case where mean achievement 58.84, n = 2, # academic = 1, and p = 1/2 = .5
Whereas the model based estimate equals (58.84) = .73. Could we even computer a 95% condence interval for using the (non-model) based sample proportion? With the logistic regression model, the model based interval of (.678, .779). The model condence interval will tend to be much narrower than ones based on the sample proportion p, because Note that the estimated standard error of p is p(1 p)/n = .5(.5)/2 = .354 while the estimated standard error of (58.84) is .131. The model uses all 600 observations, while the sample proportion only uses 2 out of the 600 observations.
32
Comments regarding of Model based Estimates 1. Models do not represent the exact relationship between (x) and x. 2. As the sample size increases (i.e., ni for each xi), (xi) does not converge to the true probabilities; however, p does. 3. The extend to which the model is a good approximation of the true probabilities, the model based estimates are closer to the true values than p and the model based have lower variance. 4. Models smooth the data. See gure of observed proportions p versus math scores and model estimates of (x) versus math scores. 5. For the HSB example, most of the sample pis have ni s of 0 or 1 (largest is 5). The above results provide additional incentive to investigate whether our model is a good one; that is, does the model aproximate the true relationship between (x) and x.
33
Model Checking Outline: 1. Goodness-of-t tests for continuous x. (a) Group observed counts & tted counts from estimated model. (b) Group observed counts and then re-t the model. (c) Hosmer & Lemeshow goodness-of-t test. 2. Likelihood ratio model comparison tests. 3. Residuals. 4. Measures of inuence. Goodness of t tests If most of the estimated counts are 5, then G2 and X 2 are approximately chi-squared distributed with df = residual df where residual df = # sample logits # model parameters If the p-value is small, then we have evidence that the model does not t the data. However, with continuous data it is likely that assumptions required for this test of model t will be violated.
34
Goodness of Fit with Continuous Predictors HSB example: There are 531 dierent values for the mean achievement scores and we have small numbers of observations at each mean achievement score. All observations have yi 5. If we considered the data in the form of a (531 2) contingency table of mean score high school program type, many of the (531)(2) = 1062 cells of this table would be empty and contain very small cell counts. There are only 600 observations/students. Large sample theory for the goodness of model t tests is violated in two ways: 1. Most of the tted cell counts are very small. 2. As the number of students increase, the number of possible scores would also increase, which means that sample size eects the number of cells in the table.
35
Goodness of t test: Grouping data and tted values First the model was t to the data (i.e., the 531 dierent mean achievement scores), and then the observed and tted counts were grouped. In this case, the grouping was to get approximately equal cases in each category.
Achievement Observed Data Fitted Values Mean # attend # attend # cases # attend # attend Group score academic non-academic in group academic non-academic x < 40 37.77 8 38 46 6.48 39.52 40 x < 45 42.60 18 69 87 16.86 70.14 45 x < 47 46.03 14 30 44 11.98 32.02 47 x < 49 47.85 17 26 43 13.97 29.03 49 x < 51 50.12 18 32 50 17.41 32.59 51 x < 53 52.12 22 28 50 21.44 28.56 53 x < 55 53.95 34 24 58 24.21 33.79 55 x < 57 56.05 23 21 44 20.86 23.15 57 x < 60 58.39 35 33 68 33.45 34.55 60 x < 65 62.41 56 22 78 50.52 27.48 65 x 66.56 26 6 32 22.73 9.27
36
Test statistics for goodness of t: X

2
groups groups
G2 =
(observed tted)2 tted program observed log(observed/tted) program
and df = # group # parameters Using the above values, we get Statistic df Value pvalue X2 (11 2) = 9 12.67 G2 (11 2) = 9 12.68 How to do this in SAS: Group the data and tted values: data group1; set preds; if achieve<40 then else if achieve>=40 else if achieve>=45 else if achieve>=47 else if achieve>=49 else if achieve>=51 else if achieve>=53 else if achieve>=55
grp=1; and achieve<45 and achieve<47 and achieve<49 and achieve<51 and achieve<53 and achieve<55 and achieve<57
37
then then then then then then then
grp=2; grp=3; grp=4; grp=5; grp=6; grp=7; grp=8;
else if achieve>=57 and achieve<60 then grp=9; else if achieve>=60 and achieve<65 then grp=10; else if achieve>=65 then grp=11; proc sort data=group1; by grp; proc means data=group1 sum mean; by grp; var academic count fitted achieve; output out=grpfit sum=num_aca num_cases fit2 mean=dum1 dum2 dum3 meanach; Create components of X 2 and G2 and sum these: data done; set grpfit; num_non = num_cases-num_aca; p=num_aca/num_cases; pp = fit2/num_cases; fit0=num_cases-fit2; x2_no = (num_non-fit0)*(num_non-fit0)/fit0; x2_yes = (num_aca-fit2)*(num_aca-fit2)/fit2; g2_no = 2*num_non*log(num_non/fit0); g2_yes = 2*num_aca*log(num_aca/fit2); run; proc means data=done sum;; var x2_no x2_yes g2_no g2_yes; run;
38
The output from proc means is The MEANS Procedure Variable Sum -----------------------x2_no 12.6208891 x2_yes 11.5296816 g2_no -55.9755998 g2_yes 81.1612996 -----------------------Last step, add two numbers together.
39
Goodness of t test: Grouping data and re-t the model Using the same groups as before, the counts are sumed and the model re-t to the collapsed data. The mean achievement score was used for the numerical value of the explanatory variable. The estimated model was (xi)) = 6.9158 + .1342xi logit( Note that for the non-grouped data the estimated model was (xi)) = 7.0548 + .1369xi logit( So they are quite similar. Achievement Observed Data Mean # attend # cases Observed Fitted Group score academic in group proportion probability x < 40 37.77 8 46 0.17 .13 40 x < 45 42.60 18 87 0.20 .23 45 x < 47 46.03 14 44 0.31 .32 47 x < 49 47.85 17 43 0.39 .38 49 x < 51 50.12 18 50 0.36 .45 51 x < 53 52.12 22 50 0.44 .52 53 x < 55 53.95 34 58 0.58 .58 55 x < 57 56.05 23 44 0.52 .65 57 x < 60 58.39 35 68 0.51 .72 60 x < 65 62.41 56 78 0.71 .81 65 x 66.56 26 32 0.81 .88
40
Test statistics for goodness of t:

2
groups groups
G2 =
(observed tted)2 tted program observed log(observed/tted) program
and df = # group # parameters Using the above values, we get Statistic df Value pvalue X2 (11 2) = 9 9.45 2 G (11 2) = 9 9.79 Both methods yield the same result: the model lack of t is not signicantly large (i.e., dont reject HO that the model lack of t is large).
41
42
Where these numbers came from: SAS/INPUT:

data group; set hsbtab; if achieve<40 then grp=1; else if achieve>=40 and achieve<45 then grp=2; else if achieve>=45 and achieve<47 then grp=3; else if achieve>=47 and achieve<49 then grp=4; else if achieve>=49 and achieve<51 then grp=5; else if achieve>=51 and achieve<53 then grp=6; else if achieve>=53 and achieve<55 then grp=7; else if achieve>=55 and achieve<57 then grp=8; else if achieve>=57 and achieve<60 then grp=9; else if achieve>=60 and achieve<65 then grp=10; else if achieve>=65 then grp=11; proc sort data=group; by grp; proc means data=group mean sum; by grp; var achieve academic count; output out=grpfit mean=ach_bar sum=dum1 academic ncases; run; proc genmod data=grpfit; model academic/ncases = ach_bar /link=logit dist=binomial obstats; output out=pred2 pred=predgrp lower=lgrp upper=ugrp; run;
43
(Edited) SAS/OUTPUT The GENMOD Procedure Model Information Data Set Distribution Link Function Response Variable (Events) Response Variable (Trials) Observations Used Number Of Events Number Of Trials WORK.GRPFIT Binomial Logit academic ncases 11 308 600
Frequency Count
Criteria For Assessing Goodness Of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood DF 9 9 9 9 Value 9.7867 9.7867 9.4496 9.4496 -348.3910 Value/DF 1.0874 1.0874 1.0500 1.0500
Algorithm converged.
44
Analysis Of Parameter Estimates Standard Error 0.6877 0.0131 0.0000 Wald 95% Confidence Limits -8.2796 0.1088 1.0000 -5.5838 0.1603 1.0000 ChiSquare 101.59 104.84
Parameter Intercept ach_bar Scale
DF 1 1 0
Estimate -6.9317 0.1345 1.0000
Pr > ChiSq <.0001 <.0001
NOTE: The scale parameter was held fixed. Observation Statistics Frequency Count
Observation
academic
ach_bar HessWgt Reschi Reslik 37.765238 5.3968002 0.7556761 0.8384575 42.59589 15.467104 -0.539121 -0.679605 46.0315 9.6251076 -0.071598 -0.0772 47.852683 10.120038 0.2215033 0.2346972 50.118636 12.388902 -0.75093 -0.7995 52.122083 12.479813 45
Pred Lower Resdev
Xbeta Upper StResdev
Std Resraw StReschi
46
0.1357497 0.0947916 0.7290683 0.2312674 0.1823866 -0.546115 0.3232302 0.2741835 -0.071697 0.3789617 0.3314695 0.2209061 0.4528623 0.4074253 -0.754192 0.5200933 0.4751173
-1.851049 0.1906774 0.8313534 -1.201168 0.2886241 -0.682742 -0.738966 0.3764994 -0.077215 -0.493958 0.428893 0.2346251 -0.189113 0.4990986 -0.799884 0.0804167 0.564746
0.2068584 1.7555122 0.8616941 0.1526011 -2.120267 -0.673998 0.119662 -0.22213 -0.077109 0.1059145 0.7046465 0.2352594 0.0946481 -2.643113 -0.796424 0.0918536 -4.004667
18
87
14
44
17
43
20
50
22
50
43
58
28
44
47
68
10
62
78
11
29
32
-1.133607 -1.198451 53.94898 14.121029 2.4779642 2.7229358 56.051053 10.039987 -0.15757 -0.167126 58.390182 13.833511 -0.450033 -0.50561 62.405833 11.900782 -0.39024 -0.472505 66.562069 3.3011009 0.4060055 0.4523188
-1.133601 0.580833 0.5345719 2.5497081 0.6477108 0.5985358 -0.157193 0.7157916 0.6632142 -0.446305 0.8121312 0.7573939 -0.385837 0.8831979 0.8326048 0.4187207
-1.19845 0.3261941 0.6257154 2.7327704 0.6089918 0.6939424 -0.16708 0.9236813 0.7630929 -0.504688 1.463918 0.8568526 -0.470735 2.0230678 0.9199696 0.4543721
-1.198456 0.0957595 9.3116833 2.6558755 0.1069528 -0.499274 -0.16748 0.125528 -1.673828 -0.508903 0.1660609 -1.346232 -0.476108 0.2137113 0.7376686 0.4405743
46
An Alternative Method to Group/Partition Data Problem: When there is more than one explanatory variable, grouping observations becomes more dicult. Solution: Group the values according to the predicted probabilities. Groups are formed so that they have about the same number of observations in each of them. 1. Order the observations according to (x) from smallest to largest. In the HSB example, since there is only 1 explanatory variable and probabilities increase as achievement scores go up (i.e., > 1), we could just order the math scores). When there is more than 1 explanatory variable, the ordering must be done using the (x)s. 2. Depending on the number of groups desired, it is common to partition observations such that they have the same number observations per group. For example, if we wanted 10 groups in the HSB example, then we would try to put n/10 = 600/10 = 60 students per group. The rst 60 observations group 1 The next 60 observations group 2 etc.
47
Since some of the observations have the same value on the explanatory variable(s), it may not be possible to have exactly equal numbers in the groups, especially since observations with the exact same value on the explanatory variable(s) should be placed in the same group. 3. A Pearson-like X 2 computed on the data grouped this way is known as the Hosmer & Lemeshow statistic. It doesnt have a chi-squared distribution, but simulation studies have shown that the distribution of the Hosmer-Lemeshow statistic is approximately chi-squared with df = g 2 (where g = the number of groups). HSB example: Hosmer-Lemeshow statistic = 4.7476, df = 8, p = .7842. Conclusion: This model ts pretty good. PROC/LOGISTIC will compute the Hosmer-Lemeshow statistic, as well as print out the partitioning. Example using PROC LOGISTIC; proc logistic data=acd descending; model academic = achieve / lackfit; run;
48
Edited Output from PROC LOGISTIC:

The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Number of Observations Link Function Optimization Technique WORK.ACD academic 2 600 Logit Fishers scoring
Response Profile Ordered Value 1 2 Total Frequency 308 292
academic 1 0
Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied.
49
Model Fit Statistics Intercept and Covariates 696.268 705.062 692.268
Criterion AIC SC -2 Log L
Intercept Only 833.350 837.747 831.350
Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 139.0819 1 <.0001 Score 127.4700 1 <.0001 Wald 106.4038 1 <.0001 Analysis of Maximum Likelihood Estimates Standard Error 0.6948 0.0133
Parameter Intercept achieve
DF 1 1
Estimate -7.0543 0.1369
Chi-Square 103.0970 106.4038
Pr > ChiSq <.0001 <.0001
Odds Ratio Estimates Point Estimate 1.147 95% Wald Confidence Limits 1.117 1.177
Effect achieve
50
Partition for the Hosmer and Lemeshow Test academic = 1 Observed Expected 9 13 19 24 26 36 42 41 45 53 8.67 13.66 18.80 23.74 29.29 33.75 38.06 44.23 47.47 50.34 academic = 0 Observed Expected 51 47 41 36 34 24 18 21 15 5 51.33 46.34 41.20 36.26 30.71 26.25 21.94 17.77 12.53 7.66
Group 1 2 3 4 5 6 7 8 9 10
Total 60 60 60 60 60 60 60 62 60 58
Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square 4.7476 DF 8 Pr > ChiSq 0.7842
If data are in tabular form: proc logistic data=hsbtab descending; model academic/count = achieve / lackfit; run;
51
Likelihood Ratio Model Comparison Tests as Goodness of t tests for models with continuous predictors More complex models can be t, (e.g., additional explanatory variables. non-linear terms (e.g., x2). interactions etc. and a likelihood ratio test used to compare the model with respect to the more complex models. If the more complex models do not t signicantly better than the models t, then this indicates that the tted model is reasonable. Global goodness of t statistics only indicate that the model does not t perfectly (i.e., there is some lack of t). By comparing a models t with more complex models provides test for particular types of lack of t.
52
Goodness of t and Likelihood-Ratio Model Comparison Tests The likelihood-ratio statistic equals Likelihood-ratio statistic = 2(L0 L1) where L1 = the maximized log of the likelihood function from a complex model, say M1. L0 = the maximized log of the likelihood function from a simpler (nested) model, say M0. The goodness of model t statistic G2 is a special case of the likelihood ratio test statistic where MO = M , the model were testing. M1 = MS , the most complex model possible or the saturated model. This model has as many parameters are there are logits (observed data). For Poisson and logistic regression, G2 is also equal to deviance of the model.
53
Let LS = maximized log of the likelihood function for the saturated model. LO = maximized log of the likelihood function for the simpler model M0. L1 = maximized log of the likelihood function for the complex model M1. where we want to compare the t of the model M0 and M1. deviance for M0 = G2(M0) = 2(L0 LS ) and deviance for M1 = G2(M1) = 2(L1 LS ) The likelihood ratio statistic G2(M0|M1) = 2(L0 L1) = 2[(L0 LS ) (L1 LS )] = G 2 ( M0 ) G 2 ( M1 ) and df = df0 df1 Assuming that M1 holds, this statistic tests Whether the lack of t of M0 is signicantly larger than that of M1 . Whether the parameters in M1 that are not in M0 equal zero.
54
HSB example using the grouped data: M0 = Model with only an intercept logit(xi) = G2(M0) = 144.3546 with df0 = (11 1) = 10 M1 = Model with an intercept and math scores logit(xi) = + xi G2(M1) = 12.76 with df1 = (11 2) = 9. G2(M0|M1) = 144.3546 9.7867 = 134.57 with df = 10 9 = 1 and pvalue < .0001. Or using the type3 option in the GENMOD MODEL statement: LR Statistics For Type 3 Analysis ChiSquare 134.57
Source ach_bar
DF 1
Pr > ChiSq <.0001
In this case, we have 3 equivalent null hypotheses, which are The lack of t of M0 is not signicantly dierent from the lack of t of M1. Math scores are independent of high school program type. = 0.
55
Residuals Goodness of t statistics are global, summary measures of lack of t. We should also Describe the lack of t Look for patterns in lack of t. Let yi = observed number of events/successes. ni = number of observations with explanatory variable equal to xi. i = the estimated (tted) probability at xi. i = estimated (tted) number of events. ni Pearson residuals are ei = X2 =
2 i ei ;
i yi ni ni i (1 i)
that is, the eis are components of X 2.
When the model holds, the Pearson residuals are approximately normal with mean 0 and variance slightly less than 1. Values larger than 2 are large.
56
Just as X 2 and G2 are not valid when tted values are small, Pearson residuals arent that useful (i.e., they have limited meaning). If ni = 1 at many values, then the possible values for yi = 1 and 0, so ei can assume only two values. HSB using the grouped data and models t to it. (note: Computing residuals on un-grouped data with continuous explanatory variables is not always useful due to small n per cell). mean # attend Number Observed Predicted Pearson achieve academic of cases proportion probability residuals 37.67 8 46 .17 .13 .78 42.60 18 87 .21 .23 .55 46.01 14 44 .32 .32 .07 47.87 17 43 .40 .38 .21 50.11 20 50 .40 .45 .75 52.17 22 50 .44 .52 1.15 53.98 43 58 .74 .58 2.47 56.98 28 44 .63 .65 .15 58.42 47 68 .69 .72 .45 62.43 62 78 .79 .81 .38 66.56 29 32 .90 .88 .42
Group 1 2 3 4 5 6 7 8 9 10 11
and of course we can use graphical displays of
57
i versus yi /ni. For a good tting model, the points should lie on a 45 degree line.
58
yi /ni versus xi and overlay i versus xi.
. . . And a Q-Q plot of the Pearson Residuals and cummulative using SAS/INSIGHT . . .
59

5logreg1 02

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5logreg1 02

Uploaded by

Copyright:

Available Formats

Lecture Notes on Logistic Regression Outline: 1. Review and Some Uses & Examples. 2. Interpreting logistic regression models.

8. and lots of other possibilities and examples . . . .

then then then then then then then then then

grp=2; grp=3; grp=4; grp=5; grp=6; grp=7; grp=8; grp=9; grp=10;

Inference for Logistic Regression or the signicance and size of eects

which with df = 1 has a very small pvalue.

exp(.2413) exp(.1245) (.44, .53) , 1 + exp(.2413) 1 + exp(.1245)

Test statistics for goodness of t: X

(observed tted)2 tted program observed log(observed/tted) program

then then then then then then then

grp=2; grp=3; grp=4; grp=5; grp=6; grp=7; grp=8;

Test statistics for goodness of t:

(observed tted)2 tted program observed log(observed/tted) program

Where these numbers came from: SAS/INPUT:

Parameter Intercept ach_bar Scale

Estimate -6.9317 0.1345 1.0000

Pr > ChiSq <.0001 <.0001

Pred Lower Resdev

Xbeta Upper StResdev

Std Resraw StReschi

Edited Output from PROC LOGISTIC:

Response Profile Ordered Value 1 2 Total Frequency 308 292

Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics Intercept and Covariates 696.268 705.062 692.268

Criterion AIC SC -2 Log L

Intercept Only 833.350 837.747 831.350

Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq

Parameter Intercept achieve

Estimate -7.0543 0.1369

Chi-Square 103.0970 106.4038

Pr > ChiSq <.0001 <.0001

Pr > ChiSq <.0001

that is, the eis are components of X 2.

and of course we can use graphical displays of

yi /ni versus xi and overlay i versus xi.

You might also like