You are on page 1of 14

Introduction to Logit

Regression
Introduction and Description

 Why use logistic regression?


 Estimation by maximum likelihood
 Interpreting coefficients
 Hypothesis testing
 Evaluating the performance of the model
Why use logistic regression?
 There are many important research topics for which the
dependent variable is "limited."
 For example: voting, mortality, buying decision and
participation data is not continuous or distributed
normally.
 Binary logistic regression is a type of regression analysis
where the dependent variable is a dummy variable: coded
0 (did not vote) or 1(did vote)
or coded 0 = not buying intention or 1 = buying intention
The Linear Probability Model

In the OLS regression:


Y =  + X + e; where Y = (0, 1)
 The error terms are heteroskedastic
 e is not normally distributed because Y takes on only two
values
 The predicted probabilities can be greater than 1 or less
than 0
Data Example
No. Buying intention year working income
(1=yes; 0=no) education year ($/month)

1 1 16 9 2000

2 0 18 4 1000

3 1 16 8 1700

4 0 18 3 900

5 0 12 5 1400

.... ............ ........ ...... .......


Plain old regression

 Y = A BINARY RESPONSE (DV)


 1 POSITIVE RESPONSE (Cocabuying) P
 0 NEGATIVE RESPONSE (Pepsibuying) Q = (1-P)
 MEAN(Y) = P, observed proportion of
Cocabuying
 VAR(Y) = PQ, maximized when P = .50, variance
depends on mean (P)
 XJ = ANY TYPE OF PREDICTOR  Continuous,
Dichotomous (divided or dividing into two parts),
Polytomous (the process of dividing into more than three parts)
The Logistic Regression Model
The "logit" model solves these problems:

ln[p/(1-p)] =  + X + e

 P Y  1 X  
ln      X  e
1  P Y  1 X 

 p is the probability that the event Y occurs, p(Y=1)


 p/(1-p) is the "odds ratio"
 ln[p/(1-p)] is the log odds ratio, or "logit"
The logistic function
The Logit – Multiple predictors

 PY  1 X  
ln      1 X 1   2 X 2  ...   k X k  e
1  PY  1 X 
Manual: Analyze/Regression/Binary Logistic
Discussion
 Value of score means that Size is significantly different
from zero, there is a relationship between the dependent
variable (Coca-buying) and the independent variable
(Pepsi-buying)
 R-square of Cox & Snell; Nagelkerke) are 0.474 and
0.632 respectively, means that 47.4% (Cox) and 63.2%
(Nagelkerke) information of independent variable
contributes to the dependent variable.
 Therefore, another parameter is concerned.

You might also like