Professional Documents
Culture Documents
By
TANIMA BANERJEE
UNIVERSITY OF CALCUTTA
In a given group of individuals, we may be interested to find out the probability of an individual
within the group to enter into the labour market given a set of other factors that act as
explanatory factors/variables in determining the probability. In other words to say, we intend to
estimate the probability, pi, that the yi would be 1 given a set of other factors, such as age,
gender, education level, etc.
To measure pi, the simplest idea would be to consider pi to be a linear function of a set of
covariates, xi, where xi represents the vector of covariates. Then, we can write:
pi = xi′ β………………………………….(1)
We can generally solve this problem and get away of the requirement of imposing ant
complex range of restrictions on the coefficients by transforming the probability, i.e. pi, and
making the transformation a linear function of the covariates. First, we can convert pi into the
odds, where
Through this transformation, we can remove the restrictions that the dependent variable has to lie
within 0 and 1. Logit can take any value from - ∞ to + ∞. As pi goes down to zero and odd
moves to zero, the logit approaches - ∞. On the other hand, as pi approaches 1 and odd moves to
+ ∞, the logit also approaches + ∞. Negative logits represent probabilities below 0.5 and
positive logits represent probabilities above 0.5. At probability 0.5, logit is zero.
This model can be described as a generalized linear model with binomial response variable. In
this model, the regression coefficient, i.e. βj, represent the change in logit of the probability
associated with a unit change in the jth predictor holding all other explanatory factors constant.
Now, to get the value of estimated pi , we need to do the following steps:
Hence, marginal change in pi caused by a unit change in xj holding other predictors constant
can be given as follows:
The estimation of a logit model is generally a maximum likelihood estimation as the dependent
variable is associated with a probability.
Where pi stands for the probability of entering into the labour market as wage labourer for the ith
individual.
In this case, we should write (we are using emp.dta data file):
if Prob > chi2 is < 0.05 then the model can be said to be a good fit. This is a test to see
whether all the coefficients in the model are different than zero. In this case, our
regression model is a good fit as Prob > chi2 is 0.0003, i.e. less than 0.05.
P>|z| presents two-tail p-values test to see whether corresponding coefficient is different
from 0. The p-value has to be lower than 0.05 (95% confidence interval) if we want to
reject the null hypothesis that the coefficient is not different from zero. If p-value of an
explanatory variable is less than 0.05 at 5 percent level of significance, then we can say
that the variable has a significant influence on the dependent variable.
Coef presents Logit coefficients which are in log-odds units and cannot be read as regular
OLS coefficients. To interpret we need to estimate the predicted probabilities of y=1.
Now, to get the predicted probabilities to enter into the labour market for each
observation/individual in the data set we are using we to type the following just after running the
regression:
predict d_entry_hat
Now, for example, if we want to have the predicted probability of entering into the labour market
for male with age 30, then we should type
While running logit regression we can request odds ratio rather than logit coefficients by adding
the option or (after comma).
Here, Odds ratio represents the odds of Y=1 when X increases by 1 unit. These are the exp(logit
coeff).
If the Odds Ratio > 1, then the odds of Y=1 increases with increase in X. It implies that if Odds
Ratio is greater than 1, then Logit Coefficient would be positive.
If the Odds Ratio < 1, then the odds of Y=1 decreases with increase in X. It implies that if Odds
Ratio is lower than 1, then Logit Coefficient would be negative.
Marginal effects at mean:
Now, if we want to estimate the marginal effect of a predictor on the dependent variable, i.e. the
probability to enter into the labour market in the present case, while holding other factors
constant, then we need to use the margins command. Generally we estimate margins at means.
After running the logit regression type:
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
Note: dy/dx for factor levels is the discrete change from the base level.
Here, dy/dx for age is 0.0005 that represents the change in probability to enter the labour market
for one year increase in age. The effect here is significant at 5 percent level.
Now, dy/dx for sex is -0.023 that represents the change in probability to enter the labour market
for a female as compared to a male. It implies that a female has approximately 2 percent lower
probability to enter the labout market than a male. The effect here is significant at 5 percent
level.
If we want to estimate marginal effect for any categorical independent variable, say sex, for
some given values of a continuous dependent variable, say age, we need to type:
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
Note: dy/dx for factor levels is the discrete change from the base level.
Here, we are trying to find out marginal effect of sex on the probability of entering into the
labour market for those individuals who are 20 and 30 years old.
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
2.sex
_at
1 -.0227117 .0076011 -2.99 0.003 -.0376096 -.0078139
2 -.0231899 .0077571 -2.99 0.003 -.0383935 -.0079863
Note: dy/dx for factor levels is the discrete change from the base level.
Example 2: Let us now consider a case with three independent variables, age sex and
household group (hhd_gr). Here, household group variable is a categorical variable with four
groups, Scheduled Tribe(STs) (Code 1), Scheduled Castes (SCs) (Code 2), Other backward
castes (OBCs) (Code 3) and Others (Code 4). So, we should run a logit regression first with one
additional independent variable. Type:
hhd_gr
2 -.2981956 .1705197 -1.75 0.080 -.6324081 .0360169
3 -.6790007 .185865 -3.65 0.000 -1.043289 -.314712
9 -.5292953 .1649289 -3.21 0.001 -.85255 -.2060406
Now, if we want to find out marginal effect of sex on the probability of entering into the labour
market given that age is 20 years old and household group is Code 2. Then type:
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
Note: dy/dx for factor levels is the discrete change from the base level.
If we want to find out marginal effect of sex on the probability of entering into the labour
market given that age takes the value 20 and 30 and household group (hhd_gr) takes the value 2
and 3. Then type:
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
2.sex
_at
1 -.0253775 .0084874 -2.99 0.003 -.0420124 -.0087426
2 -.0199791 .0067653 -2.95 0.003 -.0332389 -.0067193
3 -.0259198 .0086651 -2.99 0.003 -.0429032 -.0089364
4 -.0204956 .0069333 -2.96 0.003 -.0340846 -.0069065
Note: dy/dx for factor levels is the discrete change from the base level.
If we want to find out marginal effect of household group (hhd_gr) (Code 1 of hhd_gr acts as
the reference group here) on the probability of entering into the labour market given that age
takes the value 20 and 30 and sex is 1 (male). Then type:
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
2.hhd_gr
_at
1 -.0567479 .0343602 -1.65 0.099 -.1240926 .0105969
2 -.0577799 .0349123 -1.66 0.098 -.1262068 .0106469
3.hhd_gr
_at
1 -.1169387 .0354702 -3.30 0.001 -.186459 -.0474185
2 -.1193385 .0360628 -3.31 0.001 -.1900204 -.0486567
9.hhd_gr
_at
1 -.0949275 .0334047 -2.84 0.004 -.1603996 -.0294554
2 -.0967944 .0339414 -2.85 0.004 -.1633183 -.0302705
Note: dy/dx for factor levels is the discrete change from the base level.
The rest of the steps associated with calculating predicted probabilities, marginal effects of
independent variables on the dependent variable would be same as in case of Logit regression.
However, it is necessary to remember that in case of probit estimation, we would not be able to
find odds ratio, as for probit estimation we consider standard normal transformation (z
score) of probability (pi). Hence, the probit regression coefficients indicate the influence of
predictors on z score of pi . Thus, to find out the marginal effects of predictors on pi, we need to
find marginal effects separately after running the probit regression using the ‘margins’
command same as in case of Loigit regression.
How to avoid heteroskedasticity problem in Logit and Probit regression
To avoid the heteroskedasticity problem given the sample data set while running logit or probit
regression, we can can follow the same process as we do in case of running OLS regression. We
just need to use Robust standard errors to eliminate the adverse effect of having heteroskedastic
error, if any, on the estimated regression coefficients. Hence, we just need to incorporate robust
command while running logit or probit regression as the following: