You are on page 1of 18

Basic Introduction

LOGISTIC REGRESSION
Mike Bailey

02/09/08 1
Course at
statistics.com

02/09/08 2
BASICS

0 negative
Response Y = 
 1 positive 

Predictors X dichotomous
categorical (usually make these dichotomous
Design variables)
real-valued

02/09/08 3
PREDICTING PROBABILITIES

E[Y | X ] = P[Y = 1 | X ] ≡ π ( X )

02/09/08 4
LOGISTIC MODEL
β 0 + β1 x1 + β 2 x2 +...
e
π ( x) = β 0 + β1 x1 + β 2 x2 +...
1+ e
bx
e
=
1+ e bx

1
= −bx
1+ e
02/09/08 5
LOGIT FUNCTION
π ( x)(1 + e − xb
) =1
− xb 1 − π ( x)
e =
π ( x)
1 − π ( x)
xb = ln( ) = g ( x)
π ( x)
So, if we can estimate π(x) and take the logit, we have a linear
function of the x’s. We can use regression to estimate β’s
02/09/08 6
ODDS
∀ π(x)/(1-π(x)) is the ODDS that Y=1 given x

1 − π ( x) π ( x)
ln( ) = − ln( )
π ( x) 1 − π ( x)
= − ln(odds( x))
= xb

02/09/08 7
CASE 1: DICHOTOMOUS x
Y X
1 0 data
0 0
0 1
0 0
contingency table
1 0
0 1
Y=0 Y=1
1 0
X=0 a d
0 0
X=1 c b
0 1
02/09/08 8
ODDS

what are the odds of Y=1 when X=1?

π (1)
= odds ( x = 1)
1 − π (1)

02/09/08 9
ODDS RATIO
X=0
Y=0
a
Y=1
d ratio of odds when X=1
X=1 c b for Y = 1 when X=0

π (1) b
1 − π (1) c ab
= = = odds ratio
π (0) d cd
1 − π (0) a
odds ratios have easily-understood interpretation
02/09/08 10
EXAMPLE
• Y = 1 if the baby has low birth weight
• X = 1 if the mother has frequent prenatal
care

• ODDS RATIO: the increase in P[Y=1]


when X=1
• “Low birth weight occurs half as often
(O.R. = ½) when the mother has adequate
prenatal care.”
02/09/08 11
β 0 + β1 x
e
β 0 + β1 x
1+ e
π (1) 1
1 − π (1) 1+ e β 0 + β1 x
β1
= =e
π ( 0) e β0

1 − π ( 0) 1+ e β0

1
β0
1+ e
02/09/08 12
THE MAGIC CONTINUES...
∀ β1 = ln(O. R.)
• the logit is linear in x

02/09/08 13
USING R
G <- glm(formula = weight ~ prenatal,
family = binomial(link = logit) )

02/09/08 14
DATA

y x1 x2 x3 x4 x5 marine army navy iraqi

0 200 200 1 1 1 1 0 0 0

0 100 90 1 1 1 1 0 0 0

0 300 50 1 0 0 1 0 0 0

0 200 150 1 0 0 0 0 1 0

• Save out of Excel as a .csv file


• > eof2 <-read.csv(file="e:datafile2.csv", header = TRUE)

02/09/08 15
RESULTS
> g2 <- glm(formula = y ~ x11+x21+x3+x4+x5, family =
binomial(link=logit), data = eof2)
> g2

Call: glm(formula = y ~ x1 + x2 + x3 + x4 + x5, family = binomial(link = logit),


data = eof)

Coefficients:
(Intercept) x1 x2 x3 x4 x5
-950.506 3.714 -3.716 NA 951.613 -75.118

Degrees of Freedom: 36 Total (i.e. Null); 32 Residual

Null Deviance: 29.31


Residual Deviance: 3.802 AIC: 13.8

02/09/08 16
ARMY vs. USMC
> SERV2 <- glm(formula = y ~ marine + army, family = binomial(link=logit),
data = eof2)
> SERV2

Call: glm(formula = y ~ marine + army, family = binomial(link = logit),


data = eof2)

Coefficients:
(Intercept) marine army
-1.757e+01 1.577e+01 2.312e-09

Degrees of Freedom: 36 Total (i.e. Null); 34 Residual


Null Deviance: 29.31
Residual Deviance: 28.71 AIC: 34.71

02/09/08 17
AOR
> region2 <- glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh
+ eagle, family = binomial(link=logit), data = eof2)
> region2

Call: glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh +


eagle, family = binomial(link = logit), data = eof2)

Coefficients:
(Intercept) raleigh topeka denver mobile oshkosh eagle
-1.957e+01 1.693e+01 -2.086e-08 1.847e+01 3.913e+01 -2.086e-08
NA

Degrees of Freedom: 36 Total (i.e. Null); 31 Residual


Null Deviance: 29.31
Residual Deviance: 20.84 AIC: 32.84

02/09/08 18