# Basic Introduction

LOGISTIC REGRESSION
Mike Bailey

BASICS

0 negative
Response Y = 
 1 positive 

Predictors X dichotomous
categorical (usually make these dichotomous
Design variables)
real-valued

PREDICTING PROBABILITIES

E[Y | X ] = P[Y = 1 | X ] ≡ π ( X )

LOGISTIC MODEL
β 0 + β1 x1 + β 2 x2 +...
e
π ( x) = β 0 + β1 x1 + β 2 x2 +...
1+ e
bx
e
=
1+ e bx

1
= −bx
1+ e
LOGIT FUNCTION
π ( x)(1 + e − xb
) =1
− xb 1 − π ( x)
e =
π ( x)
1 − π ( x)
xb = ln( ) = g ( x)
π ( x)
So, if we can estimate π(x) and take the logit, we have a linear
function of the x’s. We can use regression to estimate β’s
ODDS
∀ π(x)/(1-π(x)) is the ODDS that Y=1 given x

1 − π ( x) π ( x)
ln( ) = − ln( )
π ( x) 1 − π ( x)
= − ln(odds( x))
= xb

CASE 1: DICHOTOMOUS x
Y X
1 0 data
0 0
0 1
0 0
contingency table
1 0
0 1
Y=0 Y=1
1 0
X=0 a d
0 0
X=1 c b
0 1
ODDS

## what are the odds of Y=1 when X=1?

π (1)
= odds ( x = 1)
1 − π (1)

ODDS RATIO
X=0
Y=0
a
Y=1
d ratio of odds when X=1
X=1 c b for Y = 1 when X=0

π (1) b
1 − π (1) c ab
= = = odds ratio
π (0) d cd
1 − π (0) a
odds ratios have easily-understood interpretation
EXAMPLE
• Y = 1 if the baby has low birth weight
• X = 1 if the mother has frequent prenatal
care

## • ODDS RATIO: the increase in P[Y=1]

when X=1
• “Low birth weight occurs half as often
(O.R. = ½) when the mother has adequate
prenatal care.”
β 0 + β1 x
e
β 0 + β1 x
1+ e
π (1) 1
1 − π (1) 1+ e β 0 + β1 x
β1
= =e
π ( 0) e β0

1 − π ( 0) 1+ e β0

1
β0
1+ e
THE MAGIC CONTINUES...
∀ β1 = ln(O. R.)
• the logit is linear in x

USING R
G <- glm(formula = weight ~ prenatal,
family = binomial(link = logit) )

DATA

## y x1 x2 x3 x4 x5 marine army navy iraqi

0 200 200 1 1 1 1 0 0 0

0 100 90 1 1 1 1 0 0 0

0 300 50 1 0 0 1 0 0 0

0 200 150 1 0 0 0 0 1 0

## • Save out of Excel as a .csv file

RESULTS
> g2 <- glm(formula = y ~ x11+x21+x3+x4+x5, family =
> g2

## Call: glm(formula = y ~ x1 + x2 + x3 + x4 + x5, family = binomial(link = logit),

data = eof)

Coefficients:
(Intercept) x1 x2 x3 x4 x5
-950.506 3.714 -3.716 NA 951.613 -75.118

## Null Deviance: 29.31

Residual Deviance: 3.802 AIC: 13.8

ARMY vs. USMC
> SERV2 <- glm(formula = y ~ marine + army, family = binomial(link=logit),
data = eof2)
> SERV2

## Call: glm(formula = y ~ marine + army, family = binomial(link = logit),

data = eof2)

Coefficients:
(Intercept) marine army
-1.757e+01 1.577e+01 2.312e-09

## Degrees of Freedom: 36 Total (i.e. Null); 34 Residual

Null Deviance: 29.31
Residual Deviance: 28.71 AIC: 34.71

AOR
> region2 <- glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh
+ eagle, family = binomial(link=logit), data = eof2)
> region2

## Call: glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh +

eagle, family = binomial(link = logit), data = eof2)

Coefficients:
(Intercept) raleigh topeka denver mobile oshkosh eagle
-1.957e+01 1.693e+01 -2.086e-08 1.847e+01 3.913e+01 -2.086e-08
NA

## Degrees of Freedom: 36 Total (i.e. Null); 31 Residual

Null Deviance: 29.31
Residual Deviance: 20.84 AIC: 32.84

