Professional Documents
Culture Documents
Logistic regressions
Difference between linear and logistic
regression
Linear (OLS) Regression Logistic Regression
*For this class, we are only using interval-ratio or binary variables. Count
variables (categorical variables with more than two outcomes) require a
more advanced regression (Poisson regression).
Logistic / logit
• Open divorce.dta
• list divorce
positives
• scatter divorce
positives
Logistic / logit
• logistic divorce
positives
• predict preddiv
• scatter preddiv
positives
Logistic / logit
Logit=ln(odds ratio)
• Interpretation:
• The odds of drinking are multiplied by 1.169 for each more year of age.
• The odds of drinking are multiplied by 1.329 for each peer that drinks.
• The odds of drinking are multiplied by 0.942 for every day the person has dinner with
their family.
• The LR chi2(4)=78.01, P<0.0001, means the model is statistically significant
Logit
Coefficients tell the amount of increase in
the predicted log odds of low = 1 that
would be predicted by a 1 unit increase in
the predictor, holding all other predictors
constant.
Comparing effects of variables
• It is hard to compare the effect of two independent variables
using odds ratio when they are measured in different scales.
• For example, the variable male is binary (0 to 1), so it is simple to
observe its effect in odds ratio terms.
• But it is hard to compare the effect of “male” with the effect of
variable dinner97 (number of days the person has dinner with
his or her family), which goes from 0 to 7.
• If he odds ratio of “male” tells us how more likely it is that a male
will drink compared to a female, dinner97 tells us the probability
change for each day.
• Beta coefficients standardize the effects, allowing a comparison
based on standard deviations.
Comparing effect of variables
• listcoef, help
. listcoef, help
Odds of: 1 vs 0
Odds of: 1 vs 0
------------------------------------------------------------------
drank30 | b z P>|z| % %StdX SDofX
---------+--------------------------------------------------------
age97 | 0.15635 2.672 0.008 16.9 15.8 0.9371
male | -0.02072 -0.194 0.846 -2.1 -1.0 0.4985
pdrink97 | 0.28463 6.325 0.000 32.9 41.3 1.2149
dinner97 | -0.05966 -2.693 0.007 -5.8 -13.1 2.3494
------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
% = percent change in odds for unit increase in X
%StdX = percent change in odds for SD increase in X
SDofX = standard deviation of X
Hypothesis testing
• 1. Wald chi-squared test: z reported by Stata in
logistic regression.
• 2. Likelihood-ratio chi-squared test.
• Compare LR chi2 with and without the variable
you want to test.
• To test variable “age97”:
logistic drank30 male dinner97 pdrink97
estimates store a
logistic drank30 age97 male dinner97 pdrink97
lrtest a
Hypothesis testing
. logistic drank30 male dinner97 pdrink97
.
. estimates store a
.
. logistic drank30 age97 male dinner97 pdrink97
.
. lrtest a
black
“Black” -.3804608 .1352133 -2.81 0.005 -.645474 -.1154476
pdrink97 .2822417 .048233 5.85 0.000 .1877067 .3767767
dinner97 -.069024 .0246204 -2.80 0.005 -.1172791 -.0207689
_cons -2.590308 .8609411 -3.01 0.003 -4.277722 -.9028946
Marginal effects
• The margins command tell the difference in
the probability of having drunk in the last 30
days is an individual is black compared with an
individual is white.
• Initially, we are setting the covariates at the
mean.
• So the command will tell us what is the
difference between blacks and whites who are
average on the other covariates.
Marginal effects
• margins, dydx(black) atmeans
. margins, dydx(black) atmeans
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
black
“Black” -.0862436 .0296054 -2.91 0.004 -.144269 -.0282181
Note: dy/dx for factor levels is the discrete change from the base level.
• dy/dx: derivate at the point selected (where all other variables are at the mean)
• Interpretation: a black individual that is 13.67 years old, etc. will be 8.6% less likely to
drink that a white individual that is 13.67 years old, etc.
Marginal effects
• We can also test marginal effects at points
other than the mean using the at( ) option.
• margins, at(pdrink97=(1 2 3 4 5)) atmeans
Marginal effects
For an individual with pdrink97 coded 2 we estimate a 36% probability that he or she
drank in the last 30 days
Marginal effects
Estimated probability that an adolescent drank in last month adjusted for age,
race, and frequency of family meals (testing all of those at the mean).
Marginal effects
For an individual that has dinner with his or her family 3 times a week, we estimate a
39% probability that he or she drank in the last 30 days
Example 1
• Use severity.dta
. logit severity liberal female
1._at : liberal = 1
2._at : liberal = 3
3._at : liberal = 5
Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]
_at
1 .1075282 .0224531 4.79 0.000 .063521 .1515354
2 .4887975 .0297587 16.43 0.000 .4304716 .5471234
3 .8833568 .0211126 41.84 0.000 .8419769 .9247367
Example 1
. margins, at(liberal=(1 3 5)) atmeans
1._at : liberal = 1
female = .5125 (mean)
2._at : liberal = 3
female = .5125 (mean)
3._at : liberal = 5
female = .5125 (mean)
Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]
_at
1 .1036257 .0217902 4.76 0.000 .0609177 .1463337
2 .4884607 .0305729 15.98 0.000 .428539 .5483824
3 .8874787 .0202504 43.83 0.000 .8477885 .9271688