You are on page 1of 27

Advanced Quantitative Techniques

Logistic regressions
Difference between linear and logistic
regression
Linear (OLS) Regression Logistic Regression

For an interval-ratio dependent variable For a categorical (usually binary)*


dependent variable
Predicts value of dependent variable Predicts probability that dependent
given values of independent variables variable will show membership to a
category given values of independent
variables

*For this class, we are only using interval-ratio or binary variables. Count
variables (categorical variables with more than two outcomes) require a
more advanced regression (Poisson regression).
Logistic / logit
• Open divorce.dta
• list divorce
positives
• scatter divorce
positives
Logistic / logit
• logistic divorce
positives
• predict preddiv
• scatter preddiv
positives
Logistic / logit
Logit=ln(odds ratio)

• In Stata, there are two commands for logistic regression: logit


and logistic.
• The logit command gives the regression coefficients to estimate
the logit score.
• The logistic command gives us the odds ratios we need to
interpret the effect size of the predictors.
• The logit is a function of the logistic regression: it is just a
different way of presenting the same relationship between
independent and dependent variables (see Acock, section 11.2)
Logistic / logit
• Open nlsy97_chapter11.dta
• We want to test the impact of some variables on the
likelihood that a young person will drink alcohol
• summarize drank30 age97 pdrink97 dinner97 male
if !missing(drank30, age97, pdrink97, dinner97, male)
Logistic

• Interpretation:
• The odds of drinking are multiplied by 1.169 for each more year of age.
• The odds of drinking are multiplied by 1.329 for each peer that drinks.
• The odds of drinking are multiplied by 0.942 for every day the person has dinner with
their family.
• The LR chi2(4)=78.01, P<0.0001, means the model is statistically significant
Logit
Coefficients tell the amount of increase in
the predicted log odds of low = 1 that
would be predicted by a 1 unit increase in
the predictor, holding all other predictors
constant.
Comparing effects of variables
• It is hard to compare the effect of two independent variables
using odds ratio when they are measured in different scales.
• For example, the variable male is binary (0 to 1), so it is simple to
observe its effect in odds ratio terms.
• But it is hard to compare the effect of “male” with the effect of
variable dinner97 (number of days the person has dinner with
his or her family), which goes from 0 to 7.
• If he odds ratio of “male” tells us how more likely it is that a male
will drink compared to a female, dinner97 tells us the probability
change for each day.
• Beta coefficients standardize the effects, allowing a comparison
based on standard deviations.
Comparing effect of variables
• listcoef, help
. listcoef, help

logit (N=1654): Factor Change in Odds

Odds of: 1 vs 0

If listcoef does ------------------------------------------------------------------

not work, use


drank30 | b z P>|z| e^b e^bStdX SDofX
---------+--------------------------------------------------------
age97 | 0.15635 2.672 0.008 1.1692 1.1578 0.9371
findit listcoef to male | -0.02072
pdrink97 | 0.28463
-0.194
6.325
0.846
0.000
0.9795
1.3293
0.9897
1.4131
0.4985
1.2149
install command dinner97 | -0.05966 -2.693 0.007 0.9421 0.8692 2.3494
------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
e^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X
Comparing effect of variables
• listcoef, help percent
. listcoef, help percent

logit (N=1654): Percentage Change in Odds

Odds of: 1 vs 0

------------------------------------------------------------------
drank30 | b z P>|z| % %StdX SDofX
---------+--------------------------------------------------------
age97 | 0.15635 2.672 0.008 16.9 15.8 0.9371
male | -0.02072 -0.194 0.846 -2.1 -1.0 0.4985
pdrink97 | 0.28463 6.325 0.000 32.9 41.3 1.2149
dinner97 | -0.05966 -2.693 0.007 -5.8 -13.1 2.3494
------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
% = percent change in odds for unit increase in X
%StdX = percent change in odds for SD increase in X
SDofX = standard deviation of X
Hypothesis testing
• 1. Wald chi-squared test: z reported by Stata in
logistic regression.
• 2. Likelihood-ratio chi-squared test.
• Compare LR chi2 with and without the variable
you want to test.
• To test variable “age97”:
logistic drank30 male dinner97 pdrink97
estimates store a
logistic drank30 age97 male dinner97 pdrink97
lrtest a
Hypothesis testing
. logistic drank30 male dinner97 pdrink97

Logistic regression Number of obs = 1654


LR chi2(3) = 70.83
Prob > chi2 = 0.0000
Log likelihood = -1064.6372 Pseudo R2 = 0.0322

drank30 Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

male .9798341 .1045179 -0.19 0.849 .7949792 1.207673


dinner97 .9418512 .020823 -2.71 0.007 .9019105 .9835606
pdrink97 1.376461 .0594022 7.40 0.000 1.264823 1.497953
_cons .4153532 .0673672 -5.42 0.000 .3022449 .5707898

.
. estimates store a

.
. logistic drank30 age97 male dinner97 pdrink97

Logistic regression Number of obs = 1654


LR chi2(4) = 78.01
Prob > chi2 = 0.0000
Log likelihood = -1061.0474 Pseudo R2 = 0.0355

drank30 Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

age97 1.169241 .0684191 2.67 0.008 1.042546 1.311332


male .9794922 .1046935 -0.19 0.846 .7943646 1.207764
dinner97 .942086 .0208682 -2.69 0.007 .9020603 .9838878
pdrink97 1.329275 .0598174 6.33 0.000 1.217056 1.451841
_cons .0524677 .0415938 -3.72 0.000 .0110944 .2481314

.
. lrtest a

Likelihood-ratio test LR chi2(1) = 7.18


(Assumption: a nested in .) Prob > chi2 = 0.0074
Hypothesis testing
• Same process, but for each of the variables
• lrdrop1
• (install command using ssc install lrdrop1)
. lrdrop1
Likelihood Ratio Tests: drop 1 term
logistic regression
number of obs = 1654
------------------------------------------------------------------------
drank30 Df Chi2 P>Chi2 -2*log ll Res. Df AIC
------------------------------------------------------------------------
Original Model 2122.09 1649 2132.09
-age97 1 7.18 0.0074 2129.27 1648 2137.27
-male 1 0.04 0.8463 2122.13 1648 2130.13
-dinner97 1 7.23 0.0072 2129.33 1648 2137.33
-pdrink97 1 40.62 0.0000 2162.72 1648 2170.72
------------------------------------------------------------------------
Terms dropped one at a time in turn.
Marginal effects
• We will use the variable race97 and dropping
the variable male.
• We want to test the effect of a person being
black compared to being white. Thus, we will
drop observations where the person has other
racial background.
generate black = race97 – 1
replace black=. If race97>2
Marginal effects
label define black 0 “White” 1 “Black”
label define drank30 0 “No” 1 “Yes”
label values drank30 drank30
label values black black
logit drank30 age97 i.black pdrink97 dinner97
Marginal effects
. logit drank30 age97 i.black pdrink97 dinner97

Iteration 0: log likelihood = -935.86755


Iteration 1: log likelihood = -901.48553
Iteration 2: log likelihood = -901.37312
Iteration 3: log likelihood = -901.37311

Logistic regression Number of obs = 1413


LR chi2(4) = 68.99
Prob > chi2 = 0.0000
Log likelihood = -901.37311 Pseudo R2 = 0.0369

drank30 Coef. Std. Err. z P>|z| [95% Conf. Interval]

age97 .138153 .0635579 2.17 0.030 .0135818 .2627241

black
“Black” -.3804608 .1352133 -2.81 0.005 -.645474 -.1154476
pdrink97 .2822417 .048233 5.85 0.000 .1877067 .3767767
dinner97 -.069024 .0246204 -2.80 0.005 -.1172791 -.0207689
_cons -2.590308 .8609411 -3.01 0.003 -4.277722 -.9028946
Marginal effects
• The margins command tell the difference in
the probability of having drunk in the last 30
days is an individual is black compared with an
individual is white.
• Initially, we are setting the covariates at the
mean.
• So the command will tell us what is the
difference between blacks and whites who are
average on the other covariates.
Marginal effects
• margins, dydx(black) atmeans
. margins, dydx(black) atmeans

Conditional marginal effects Number of obs = 1413


Model VCE : OIM

Expression : Pr(drank30), predict()


dy/dx w.r.t. : 1.black
at : age97 = 13.67445 (mean)
0.black = .7523001 (mean)
1.black = .2476999 (mean)
pdrink97 = 2.112527 (mean)
dinner97 = 4.760793 (mean)

Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]

black
“Black” -.0862436 .0296054 -2.91 0.004 -.144269 -.0282181

Note: dy/dx for factor levels is the discrete change from the base level.

• dy/dx: derivate at the point selected (where all other variables are at the mean)
• Interpretation: a black individual that is 13.67 years old, etc. will be 8.6% less likely to
drink that a white individual that is 13.67 years old, etc.
Marginal effects
• We can also test marginal effects at points
other than the mean using the at( ) option.
• margins, at(pdrink97=(1 2 3 4 5)) atmeans
Marginal effects

For an individual with pdrink97 coded 2 we estimate a 36% probability that he or she
drank in the last 30 days
Marginal effects

Estimated probability that an adolescent drank in last month adjusted for age,
race, and frequency of family meals (testing all of those at the mean).
Marginal effects

For an individual that has dinner with his or her family 3 times a week, we estimate a
39% probability that he or she drank in the last 30 days
Example 1
• Use severity.dta
. logit severity liberal female

Iteration 0: log likelihood = -331.35938


Iteration 1: log likelihood = -217.1336
Iteration 2: log likelihood = -216.94677
Iteration 3: log likelihood = -216.94661
Iteration 4: log likelihood = -216.94661

Logistic regression Number of obs = 480


LR chi2(2) = 228.83
Prob > chi2 = 0.0000
Log likelihood = -216.94661 Pseudo R2 = 0.3453

severity Coef. Std. Err. z P>|z| [95% Conf. Interval]

liberal 1.055704 .090975 11.60 0.000 .8773958 1.234011


female .6526588 .2423695 2.69 0.007 .1776233 1.127694
_cons -3.547764 .3393913 -10.45 0.000 -4.212959 -2.882569
Example 1
• Use severity.dta
• We are trying to see what predicts whether an individual
thinks that prison sentences are too severe
. logistic severity liberal female

Logistic regression Number of obs = 480


LR chi2(2) = 228.83
Prob > chi2 = 0.0000
Log likelihood = -216.94661 Pseudo R2 = 0.3453

severity Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

liberal 2.873997 .2614619 11.60 0.000 2.404629 3.434981


female 1.920641 .4655047 2.69 0.007 1.194375 3.088527
_cons .0287889 .0097707 -10.45 0.000 .0148025 .0559907
Example 1
. margins, at(liberal=(1 3 5))

Predictive margins Number of obs = 480


Model VCE : OIM

Expression : Pr(severity), predict()

1._at : liberal = 1

2._at : liberal = 3

3._at : liberal = 5

Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]

_at
1 .1075282 .0224531 4.79 0.000 .063521 .1515354
2 .4887975 .0297587 16.43 0.000 .4304716 .5471234
3 .8833568 .0211126 41.84 0.000 .8419769 .9247367
Example 1
. margins, at(liberal=(1 3 5)) atmeans

Adjusted predictions Number of obs = 480


Model VCE : OIM

Expression : Pr(severity), predict()

1._at : liberal = 1
female = .5125 (mean)

2._at : liberal = 3
female = .5125 (mean)

3._at : liberal = 5
female = .5125 (mean)

Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]

_at
1 .1036257 .0217902 4.76 0.000 .0609177 .1463337
2 .4884607 .0305729 15.98 0.000 .428539 .5483824
3 .8874787 .0202504 43.83 0.000 .8477885 .9271688

You might also like