You are on page 1of 11

Republic of Benin

University of Abomey-Calavi

(UAC)

Faculty of Agronomic Sciences

(FAS)

MASTER STATISTICS, ORIENTATION BIOSTATISTIC

BINARY MODEL: LOGIT AND PROBIT

Group 6

Members : Lecturer :
Boris BEHINGAN Dr. Ir. Epiphane SODJINOU
Auric DJENONTIN Agricultural Economist, Biostatistician

Elisé TOHO
July 2016
Outline
Introduction
1- Logit model..........................................................................................................................3
1-1- Principles......................................................................................................................3
1-2- Estimation of the Logit Model.....................................................................................4
1-3- Steps in estimating Logit Regression...........................................................................4
2- Probit model........................................................................................................................5
2-1- Assumption of the model.............................................................................................5
2-2- Steps involved in estimation of Probit Model..............................................................5
3- Logit versus probit...............................................................................................................6
4- Application in R...................................................................................................................7
Conclusion................................................................................................................................10
References.................................................................................................................................11

2
Introduction
There are certains type of regression models in which the dependant or response variable is
dichotomous in nature, taking a 1 or 0 value. There are special estimation associated with
such models. The most commonly used approachs to estimating such models are: the linear
probability model, the logit model and the probit model. But we will develop here the logit
and probit models. In the first part we will try to explain the theoretical aspect of probit and
logit regression followed by their application in R.

1- Logit model
1-1- Principles
Logit regression (logit) analysis is a uni/multivariate technic which allows for estimating the
probability that an event occurs or not, by predicting a binary dependent outcome from a set
of independent variables. In an example of home ownership where the dependent variable is
owning a house or nor in relation to income, the linear probability model can be write as:

Pi=E ( Y =1 ⋮ X i ) =β 1+ β2 X i

Where X is the income and Y=1 means that the family owns a house.

Let us consider the following representation of home ownership:

1 1
Pi=E ( Y =1 ⋮ X i ) = = (1)
1+ exp [ β1 + β 2 X i ] 1+exp ⁡(−Z i)

Where Zi =β 1+ β 2 X i

The equation (1) is known as the (cumulative) distribution function. Here Zi ranges from
−∞ ¿+∞ ; Pi ranges between 0 and 1.

1
Pi is the probability of owning a house and is given by: . Then the probability of
1+ exp ⁡(−Z i )
1
not owning a house is (1- Pi)¿ .
1+ exp ⁡(Z i)

Pi 1+ exp ⁡(Z i)
Then we can define the odd ration as in favour of owning a house = (2).
(1−P i) 1+ exp ⁡(−Z i)

3
Taking the natural log of (2) we can obtain the Logit L which is:

Li=ln [ Pi /(1−Pi ) ]=Zi =¿ β 1+ β2 X i (3)

- As P goes from 0 to 1, the logit L goes from −∞ ¿+∞ . That is, although the
probabilities lie between 0 and 1, the logits is not bounded.
- Although L is linear in X, the probabilities themselves are not.
- The interpretation of the logit model is as follows, β 2 the slope, measures the change
in L for a unit change in X.it tells how the log odds in favour of owning a house
change as income changes by a unit. The intercept β 1 is the value of the log odds in
favour of owning a house if income is zero.

1-2- Estimation of the Logit Model


In order to estimate the logit model, we need apart from X i , the values of logit Li. We need to
ni
compute the estimated relative frequency: ^
Pi= . This relative frequency is an estimate of
Ni
true Pi corresponding to each X i . Using the estimated Pi, we can obtain the estimated logit as:

^Li=ln [ P ^ i ) ]=Z i= ^β 1+ β^ 2 X i
^ i /(1− P

1-3- Steps in estimating Logit Regression


Step 1
ni
Compute the estimated probability of owning a house for each income level X i , as : ^
Pi=
Ni

Step 2
For each X i , obtain the logit as ^Li=ln [ P ^ i )]
^ i /(1− P

Step 3
Transform the logit regression as follows: √ W i Li=β 1 √ W i + β 2 √ W i X i + √ W i U i where

N i Pi
Wi= and U i is the non-normality of the disturbance.
1−P i

Step 4
Estimate (4) by OLS

4
Step 5
Establish confidence intervals and/or test hypothesis in the usual OLS framework.

2- Probit model
In order to explain the behavior of a dichotomous de pendent variable we have to use
suitably chosen Cumulative Distribution Function (CDF). The logit model uses the
cumulative logistic function. But this is not the only CDF that one can use. In some
applications the normal CDF has been found useful. The estimating model that emerges
from the normal CDF is known as the Probit Model.

Let us assume that in home ownership example, the decision of the ith family to own a
house or not depends on unobservable utility index I i, that is determined by the
explanatory variables in such a way that the larger the value of index I i, the greater the
probability of the family owning a house. The index I i can be expressed as I i=β 1+ β2 X i ,
where X i is the income of the ith family.

2-1- Assumption of the model


¿
For each family there is a critical or threshold level of the index (I ¿¿ i )¿, such that if I i
¿ ¿
exceeds I i , the family will own a house otherwise not. But the threshold level I i is not also
observable. If it is assumed that it is normally distributed with the same mean and
variance, it is possible to estimate the parameters of equation (5) and thus get some
information about unobservable index itself.

In probit analysis, the unobservable utility index I i is known as normal equivalent deviate
(n.e.d.) or simply Normit. Since n.e.d. or I i will be negative whenever Pi <0.5 , in practice
the number 5 is added to the n.e.d. and the result so obtained is called the Probit.

Probit = n.e.d + 5 = I i+ 5

In order to estimate β 1+ β2 , (5) can be written as

I 1=β 1+ β 2 X i +U i (6)

5
2-2- Steps involved in estimation of Probit Model
Step 1
Compute the estimated probability of owning a house for each income level X i , as in a case of
ni
Logit model: ^
Pi=
Ni

Step 2
Obtain the n.e.d from the standard normal CDF, I i=β 1+ β2 X i +U i

Step 3
Add 5 to the estimated I i to convert them into probits and use the probits thus obtain the
dependent variable in (6).

Step 4
The term of residual errors is heteroscedastic as in Logit models. In order to get efficient
estimates, one has to transform the model

Step 5
Estimate (6) by OLS

3- Logit versus probit


 The difference between logit and probit models lies in the assumption on the
distribution of the error term in the model. For logit model, the errors are assumed to
follow the standard logistic distribution while for the probit, the errors assumed to
follow a normal distribution.
 The logit function is similar, but has thinner tails than the normal distribution

6
Figure 1 : Logit and probit trend

Source : Harari-Kermadec, 2009

 Is logit better than probit, or vice versa? Both methods yield similar result. Preference
for probit or logit tends to vary by discipline. Logit is more popular in health sciences
like epidemiology. Probit model is popular in econometry and used by economists and
political scientists.
 Qualitatively, logit and probit models give similar results, the estimates of parameters
of the two models are not directly comparable. If we want to make β comparable in
logit and probit model there is an approximate relationship: Multiply probit.s β by
1.81 and it will be approximately the same as logit.s.

4- Application in R
The command use to performe logit or probit analysis is the function glm available in R. The
following syntax show how to run it.

# Import the data

7
mydata<-read.table("Poids.txt",header=TRUE)

is the name of the data con y, x1, x2 and x3 where y is the dependent variable taking 0 and 1
as values the nit is dichotomous and x1, x2 and x3 are the explanatory variables

# Model

or probit <- glm (y~ x1 + x2 + x3, family=binomial (link="logit or probit"),


data=mydata)
summary (logit or probit)

# Use summary to get the result

Call:

glm(formula = y ~ x1 + x2 + x3, family = binomial(link = "logit"), data = mydata)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.0277 0.2347 0.5542 0.7016 1.0839

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.4262 0.6390 0.667 0.5048

x1 0.8618 0.7840 1.099 0.2717

x2 0.3665 0.3082 1.189 0.2343

.
x3 0.7512 0.4548 1.652 0.0986

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 70.056 on 69 degrees of freedom
Residual deviance: 65.512 on 66 degrees of freedom

8
AIC: 73.512
Number of Fisher Scoring iterations: 5

- The Pr (>|z|) column shows the two-tailed p-values testing the null hypothesis that the
coefficient is equal to zero (no significant effect). The usual value is 0.05, by this
measure none of the coefficients have a significant effect on the log-odds ratio of the
dependent variable. The coefficient for x3 is significant at 10% (<0.10).
- The z value also tests the null that the coefficient is equal to zero.
- The Estimate column shows the coefficients. When x3 increase by one unit, the
expected change in the log odds is 0.7512. What you get from this column is whether
the effect of the predictors is positive or negative.

# Here it is the sign of the coefficients which are important. It shows if y and x follow the
same direction. We also need to see the significance of the coefficient. For the exemple
only x3 is significant at 10%.

# The package mfx we can get the odd ratio by using the following command

library(mfx)

logitor(y_bin ~ x1 + x2 + x3, data=mydata)

And we get

Call:

logitor(formula = y_bin ~ x1 + x2 + x3, data = mydata)

Odds Ratio:

OddsRatio Std. Err. z P>|z|

x1 2.36735 1.85600 1.0992 0.27168

x2 1.44273 0.44459 1.1894 0.23427

.
x3 2.11957 0.96405 1.6516 0.09861

---
9
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# We’ve seen that only x3 is significant at 10%. Then we will focus the interpretation of the
odd ratio on x3. When x3 increases by one unit, the odds of y = 1 increase by 112% (2.12-
1)*100. Or, the odds of y =1 are 2.12 times higher when x3 increases by one unit (keeping all
other predictors constant).

Conclusion
Binary models are used when the dependant variable or response variable is dichotomous
Logit and probit are the model used in this case. There are similar and the choice depend on
the discipline.

10
References

Torres-Reyna O., 2004. Logit/Probit models in R. Princeton University, 12p

Harari-Kermadec H., 2009. Econométrie 2 : données qualitatives, probit et logit. 7p.

Wooldridge M. J., 1960.Econometric Analysis of Cross Section and Panel Data. p: 453-460

11

You might also like