Professional Documents
Culture Documents
University of Abomey-Calavi
(UAC)
(FAS)
Group 6
Members : Lecturer :
Boris BEHINGAN Dr. Ir. Epiphane SODJINOU
Auric DJENONTIN Agricultural Economist, Biostatistician
Elisé TOHO
July 2016
Outline
Introduction
1- Logit model..........................................................................................................................3
1-1- Principles......................................................................................................................3
1-2- Estimation of the Logit Model.....................................................................................4
1-3- Steps in estimating Logit Regression...........................................................................4
2- Probit model........................................................................................................................5
2-1- Assumption of the model.............................................................................................5
2-2- Steps involved in estimation of Probit Model..............................................................5
3- Logit versus probit...............................................................................................................6
4- Application in R...................................................................................................................7
Conclusion................................................................................................................................10
References.................................................................................................................................11
2
Introduction
There are certains type of regression models in which the dependant or response variable is
dichotomous in nature, taking a 1 or 0 value. There are special estimation associated with
such models. The most commonly used approachs to estimating such models are: the linear
probability model, the logit model and the probit model. But we will develop here the logit
and probit models. In the first part we will try to explain the theoretical aspect of probit and
logit regression followed by their application in R.
1- Logit model
1-1- Principles
Logit regression (logit) analysis is a uni/multivariate technic which allows for estimating the
probability that an event occurs or not, by predicting a binary dependent outcome from a set
of independent variables. In an example of home ownership where the dependent variable is
owning a house or nor in relation to income, the linear probability model can be write as:
Pi=E ( Y =1 ⋮ X i ) =β 1+ β2 X i
Where X is the income and Y=1 means that the family owns a house.
1 1
Pi=E ( Y =1 ⋮ X i ) = = (1)
1+ exp [ β1 + β 2 X i ] 1+exp (−Z i)
Where Zi =β 1+ β 2 X i
The equation (1) is known as the (cumulative) distribution function. Here Zi ranges from
−∞ ¿+∞ ; Pi ranges between 0 and 1.
1
Pi is the probability of owning a house and is given by: . Then the probability of
1+ exp (−Z i )
1
not owning a house is (1- Pi)¿ .
1+ exp (Z i)
Pi 1+ exp (Z i)
Then we can define the odd ration as in favour of owning a house = (2).
(1−P i) 1+ exp (−Z i)
3
Taking the natural log of (2) we can obtain the Logit L which is:
- As P goes from 0 to 1, the logit L goes from −∞ ¿+∞ . That is, although the
probabilities lie between 0 and 1, the logits is not bounded.
- Although L is linear in X, the probabilities themselves are not.
- The interpretation of the logit model is as follows, β 2 the slope, measures the change
in L for a unit change in X.it tells how the log odds in favour of owning a house
change as income changes by a unit. The intercept β 1 is the value of the log odds in
favour of owning a house if income is zero.
^Li=ln [ P ^ i ) ]=Z i= ^β 1+ β^ 2 X i
^ i /(1− P
Step 2
For each X i , obtain the logit as ^Li=ln [ P ^ i )]
^ i /(1− P
Step 3
Transform the logit regression as follows: √ W i Li=β 1 √ W i + β 2 √ W i X i + √ W i U i where
N i Pi
Wi= and U i is the non-normality of the disturbance.
1−P i
Step 4
Estimate (4) by OLS
4
Step 5
Establish confidence intervals and/or test hypothesis in the usual OLS framework.
2- Probit model
In order to explain the behavior of a dichotomous de pendent variable we have to use
suitably chosen Cumulative Distribution Function (CDF). The logit model uses the
cumulative logistic function. But this is not the only CDF that one can use. In some
applications the normal CDF has been found useful. The estimating model that emerges
from the normal CDF is known as the Probit Model.
Let us assume that in home ownership example, the decision of the ith family to own a
house or not depends on unobservable utility index I i, that is determined by the
explanatory variables in such a way that the larger the value of index I i, the greater the
probability of the family owning a house. The index I i can be expressed as I i=β 1+ β2 X i ,
where X i is the income of the ith family.
In probit analysis, the unobservable utility index I i is known as normal equivalent deviate
(n.e.d.) or simply Normit. Since n.e.d. or I i will be negative whenever Pi <0.5 , in practice
the number 5 is added to the n.e.d. and the result so obtained is called the Probit.
Probit = n.e.d + 5 = I i+ 5
I 1=β 1+ β 2 X i +U i (6)
5
2-2- Steps involved in estimation of Probit Model
Step 1
Compute the estimated probability of owning a house for each income level X i , as in a case of
ni
Logit model: ^
Pi=
Ni
Step 2
Obtain the n.e.d from the standard normal CDF, I i=β 1+ β2 X i +U i
Step 3
Add 5 to the estimated I i to convert them into probits and use the probits thus obtain the
dependent variable in (6).
Step 4
The term of residual errors is heteroscedastic as in Logit models. In order to get efficient
estimates, one has to transform the model
Step 5
Estimate (6) by OLS
6
Figure 1 : Logit and probit trend
Is logit better than probit, or vice versa? Both methods yield similar result. Preference
for probit or logit tends to vary by discipline. Logit is more popular in health sciences
like epidemiology. Probit model is popular in econometry and used by economists and
political scientists.
Qualitatively, logit and probit models give similar results, the estimates of parameters
of the two models are not directly comparable. If we want to make β comparable in
logit and probit model there is an approximate relationship: Multiply probit.s β by
1.81 and it will be approximately the same as logit.s.
4- Application in R
The command use to performe logit or probit analysis is the function glm available in R. The
following syntax show how to run it.
7
mydata<-read.table("Poids.txt",header=TRUE)
is the name of the data con y, x1, x2 and x3 where y is the dependent variable taking 0 and 1
as values the nit is dichotomous and x1, x2 and x3 are the explanatory variables
# Model
Call:
Deviance Residuals:
Coefficients:
.
x3 0.7512 0.4548 1.652 0.0986
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 70.056 on 69 degrees of freedom
Residual deviance: 65.512 on 66 degrees of freedom
8
AIC: 73.512
Number of Fisher Scoring iterations: 5
- The Pr (>|z|) column shows the two-tailed p-values testing the null hypothesis that the
coefficient is equal to zero (no significant effect). The usual value is 0.05, by this
measure none of the coefficients have a significant effect on the log-odds ratio of the
dependent variable. The coefficient for x3 is significant at 10% (<0.10).
- The z value also tests the null that the coefficient is equal to zero.
- The Estimate column shows the coefficients. When x3 increase by one unit, the
expected change in the log odds is 0.7512. What you get from this column is whether
the effect of the predictors is positive or negative.
# Here it is the sign of the coefficients which are important. It shows if y and x follow the
same direction. We also need to see the significance of the coefficient. For the exemple
only x3 is significant at 10%.
# The package mfx we can get the odd ratio by using the following command
library(mfx)
And we get
Call:
Odds Ratio:
.
x3 2.11957 0.96405 1.6516 0.09861
---
9
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# We’ve seen that only x3 is significant at 10%. Then we will focus the interpretation of the
odd ratio on x3. When x3 increases by one unit, the odds of y = 1 increase by 112% (2.12-
1)*100. Or, the odds of y =1 are 2.12 times higher when x3 increases by one unit (keeping all
other predictors constant).
Conclusion
Binary models are used when the dependant variable or response variable is dichotomous
Logit and probit are the model used in this case. There are similar and the choice depend on
the discipline.
10
References
Wooldridge M. J., 1960.Econometric Analysis of Cross Section and Panel Data. p: 453-460
11