very basic description of logistic regression

You are on page 1of 18

LOGISTIC REGRESSION

Mike Bailey

02/09/08 1

Course at

statistics.com

02/09/08 2

BASICS

0 negative

Response Y =

1 positive

Predictors X dichotomous

categorical (usually make these dichotomous

Design variables)

real-valued

02/09/08 3

PREDICTING PROBABILITIES

E[Y | X ] = P[Y = 1 | X ] ≡ π ( X )

02/09/08 4

LOGISTIC MODEL

β 0 + β1 x1 + β 2 x2 +...

e

π ( x) = β 0 + β1 x1 + β 2 x2 +...

1+ e

bx

e

=

1+ e bx

1

= −bx

1+ e

02/09/08 5

LOGIT FUNCTION

π ( x)(1 + e − xb

) =1

− xb 1 − π ( x)

e =

π ( x)

1 − π ( x)

xb = ln( ) = g ( x)

π ( x)

So, if we can estimate π(x) and take the logit, we have a linear

function of the x’s. We can use regression to estimate β’s

02/09/08 6

ODDS

∀ π(x)/(1-π(x)) is the ODDS that Y=1 given x

1 − π ( x) π ( x)

ln( ) = − ln( )

π ( x) 1 − π ( x)

= − ln(odds( x))

= xb

02/09/08 7

CASE 1: DICHOTOMOUS x

Y X

1 0 data

0 0

0 1

0 0

contingency table

1 0

0 1

Y=0 Y=1

1 0

X=0 a d

0 0

X=1 c b

0 1

02/09/08 8

ODDS

π (1)

= odds ( x = 1)

1 − π (1)

02/09/08 9

ODDS RATIO

X=0

Y=0

a

Y=1

d ratio of odds when X=1

X=1 c b for Y = 1 when X=0

π (1) b

1 − π (1) c ab

= = = odds ratio

π (0) d cd

1 − π (0) a

odds ratios have easily-understood interpretation

02/09/08 10

EXAMPLE

• Y = 1 if the baby has low birth weight

• X = 1 if the mother has frequent prenatal

care

when X=1

• “Low birth weight occurs half as often

(O.R. = ½) when the mother has adequate

prenatal care.”

02/09/08 11

β 0 + β1 x

e

β 0 + β1 x

1+ e

π (1) 1

1 − π (1) 1+ e β 0 + β1 x

β1

= =e

π ( 0) e β0

1 − π ( 0) 1+ e β0

1

β0

1+ e

02/09/08 12

THE MAGIC CONTINUES...

∀ β1 = ln(O. R.)

• the logit is linear in x

02/09/08 13

USING R

G <- glm(formula = weight ~ prenatal,

family = binomial(link = logit) )

02/09/08 14

DATA

0 200 200 1 1 1 1 0 0 0

0 100 90 1 1 1 1 0 0 0

0 300 50 1 0 0 1 0 0 0

0 200 150 1 0 0 0 0 1 0

• > eof2 <-read.csv(file="e:datafile2.csv", header = TRUE)

02/09/08 15

RESULTS

> g2 <- glm(formula = y ~ x11+x21+x3+x4+x5, family =

binomial(link=logit), data = eof2)

> g2

data = eof)

Coefficients:

(Intercept) x1 x2 x3 x4 x5

-950.506 3.714 -3.716 NA 951.613 -75.118

Residual Deviance: 3.802 AIC: 13.8

02/09/08 16

ARMY vs. USMC

> SERV2 <- glm(formula = y ~ marine + army, family = binomial(link=logit),

data = eof2)

> SERV2

data = eof2)

Coefficients:

(Intercept) marine army

-1.757e+01 1.577e+01 2.312e-09

Null Deviance: 29.31

Residual Deviance: 28.71 AIC: 34.71

02/09/08 17

AOR

> region2 <- glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh

+ eagle, family = binomial(link=logit), data = eof2)

> region2

eagle, family = binomial(link = logit), data = eof2)

Coefficients:

(Intercept) raleigh topeka denver mobile oshkosh eagle

-1.957e+01 1.693e+01 -2.086e-08 1.847e+01 3.913e+01 -2.086e-08

NA

Null Deviance: 29.31

Residual Deviance: 20.84 AIC: 32.84

02/09/08 18

