Basic Introduction LOGISTIC REGRESSION

Mike Bailey

02/09/08

1

Course at statistics.com

02/09/08

2

BASICS
Response

0 negative Y =   1 positive 
dichotomous categorical (usually make these dichotomous
Design variables)

Predictors X

real-valued
02/09/08 3

PREDICTING PROBABILITIES

E[Y | X ] = P[Y = 1 | X ] ≡ π ( X )

02/09/08

4

LOGISTIC MODEL β 0 + β1 x1 + β 2 x2 +... e π ( x) = β 0 + β1 x1 + β 2 x2 +... 1+ e bx e = bx 1+ e 1 = −bx 1+ e
02/09/08 5

LOGIT FUNCTION
π ( x)(1 + e
) =1 1 − π ( x) − xb e = π ( x) 1 − π ( x) xb = ln( ) = g ( x) π ( x)
6

− xb

So, if we can estimate π(x) and take the logit, we have a linear function of the x’s. We can use regression to estimate β’s
02/09/08

ODDS
∀ π(x)/(1-π(x)) is the ODDS that Y=1 given x

1 − π ( x) π ( x) ln( ) = − ln( ) π ( x) 1 − π ( x) = − ln(odds( x)) = xb

02/09/08

7

CASE 1: DICHOTOMOUS x
Y 1 0 0 0 1 0 1 0 0
02/09/08

X 0 0 1 0 0 1 0 0 1

data

contingency table

Y=0 X=0 X=1 a c

Y=1 d b
8

ODDS
what are the odds of Y=1 when X=1?

π (1) = odds ( x = 1) 1 − π (1)

02/09/08

9

ODDS RATIO
Y=0 X=0 X=1 a c Y=1 d b

ratio of odds for Y = 1

when X=1 when X=0

π (1) b 1 − π (1) c ab = = = odds ratio π (0) d cd 1 − π (0) a
odds ratios have easily-understood interpretation
02/09/08 10

EXAMPLE
• Y = 1 if the baby has low birth weight • X = 1 if the mother has frequent prenatal care • ODDS RATIO: the increase in P[Y=1] when X=1 • “Low birth weight occurs half as often (O.R. = ½) when the mother has adequate prenatal care.”
02/09/08 11

π (1) 1 − π (1) = π ( 0) 1 − π ( 0)

e β 0 + β1 x 1+ e e β0 1+ e
β0

β 0 + β1 x

1 β 0 + β1 x β1 1+ e =e 1 β0 1+ e
12

02/09/08

THE MAGIC CONTINUES...
∀ β1 = ln(O. R.) • the logit is linear in x

02/09/08

13

USING R
G <- glm(formula = weight ~ prenatal, family = binomial(link = logit) )

02/09/08

14

DATA
y 0 0 0 0 x1 200 100 300 200 x2 200 90 50 150 x3 1 1 1 1 x4 1 1 0 0 x5 1 1 0 0 marine 1 1 1 0 army 0 0 0 0 navy 0 0 0 1 iraqi 0 0 0 0

• •

Save out of Excel as a .csv file > eof2 <-read.csv(file="e:datafile2.csv", header = TRUE)

02/09/08

15

RESULTS
> g2 <- glm(formula = y ~ x11+x21+x3+x4+x5, family = binomial(link=logit), data = eof2) > g2 Call: glm(formula = y ~ x1 + x2 + x3 + x4 + x5, family = binomial(link = logit), data = eof) Coefficients: (Intercept) -950.506 x1 3.714 x2 -3.716 x3 NA x4 x5 951.613 -75.118

Degrees of Freedom: 36 Total (i.e. Null); 32 Residual Null Deviance: 29.31 Residual Deviance: 3.802 AIC: 13.8

02/09/08

16

ARMY vs. USMC
> SERV2 <- glm(formula = y ~ marine + army, family = binomial(link=logit), data = eof2) > SERV2 Call: glm(formula = y ~ marine + army, family = binomial(link = logit), data = eof2) Coefficients: (Intercept) marine army -1.757e+01 1.577e+01 2.312e-09 Degrees of Freedom: 36 Total (i.e. Null); 34 Residual Null Deviance: 29.31 Residual Deviance: 28.71 AIC: 34.71

02/09/08

17

AOR
> region2 <- glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh + eagle, family = binomial(link=logit), data = eof2) > region2 Call: glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh + eagle, family = binomial(link = logit), data = eof2) Coefficients: (Intercept) raleigh topeka denver mobile oshkosh eagle -1.957e+01 1.693e+01 -2.086e-08 1.847e+01 3.913e+01 -2.086e-08 NA Degrees of Freedom: 36 Total (i.e. Null); 31 Residual Null Deviance: 29.31 Residual Deviance: 20.84 AIC: 32.84

02/09/08

18

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.