You are on page 1of 8

interpretation

October 14, 2022

1 Interpretation of the linear regression model


We will use the housing dataset and go through several specifications. The aim is to see how to
interpret the coefficients in each specification. We will ignore whether or not the specification is
appropriate and whether the marginal effects are statistically significant. The aim is simply to
show how to properly interpret the coefficients. We will also show how to use the Stata margins
command and how to take advantage of Stata’s factorial notation.
We start by reading a dataset.
[1]: use housing2, clear

(Sample of single-family houses sold in CHS, SC 2003-2007)


and next we provide descriptive statistics of the variables that we will use in the regression
[2]: summarize price unemp bedrooms heatsqft built01plus

Variable | Obs Mean Std. Dev. Min Max


-------------+---------------------------------------------------------
price | 5,065 296301.6 327033.6 50000 5250000
unemp | 5,065 5.143139 .5423721 4.4 6.3
bedrooms | 5,065 3.408292 .6935228 1 6
heatsqft | 5,065 1981.833 732.0524 609 6915
built01plus | 5,065 .4471866 .497252 0 1
Consider the following specification:
[3]: regress price bedrooms heatsqft unemp i.built01plus

Source | SS df MS Number of obs = 5,065


-------------+---------------------------------- F(4, 5060) = 956.70
Model | 2.3322e+14 4 5.8305e+13 Prob > F = 0.0000
Residual | 3.0838e+14 5,060 6.0944e+10 R-squared = 0.4306
-------------+---------------------------------- Adj R-squared = 0.4302
Total | 5.4160e+14 5,064 1.0695e+11 Root MSE = 2.5e+05

-------------------------------------------------------------------------------

1
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
bedrooms | -52357.9 6557.047 -7.98 0.000 -65212.55 -39503.25
heatsqft | 322.2198 6.214718 51.85 0.000 310.0363 334.4033
unemp | -56232.35 6460.177 -8.70 0.000 -68897.09 -43567.6
1.built01plus | -89030.65 7139.838 -12.47 0.000 -103027.8 -75033.47
_cons | 165190.9 38456.95 4.30 0.000 89798.61 240583.2
-------------------------------------------------------------------------------
How shall we interpret the estimated coefficients of this regression?
Everything else constant (ceteris paribus) we can conclude that on average:
• each additional bedroom leads to a price decrease of 52357.9 dollars
• each additional square foot leads to an increase of 322.2 dollars
• an additional percentage point increase in the unemployment rate leads to a decrease of
56232.3 dollars
• houses built after 2000 have a price that is 89030.7 lower
Since our model is of the type

𝐸(𝑦𝑖 |x𝑖 ) = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ... + 𝛽𝑘 𝑥𝑘𝑖

the estimates of the 𝛽s have a direct interpretation as a partial effect.


We can use the margins command to compute marginal effects. One advantage is that margins
calculates the statistical significance associated with the effect.
[4]: margins, dydx(*)

Average marginal effects Number of obs = 5,065


Model VCE : OLS

Expression : Linear prediction, predict()


dy/dx w.r.t. : bedrooms heatsqft unemp 1.built01plus

-------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
bedrooms | -52357.9 6557.047 -7.98 0.000 -65212.55 -39503.25
heatsqft | 322.2198 6.214718 51.85 0.000 310.0363 334.4033
unemp | -56232.35 6460.177 -8.70 0.000 -68897.09 -43567.6
1.built01plus | -89030.65 7139.838 -12.47 0.000 -103027.8 -75033.47
-------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
As expected, the results are the same as those obtained with regress. This is because we want to
look at the impact of Δ𝑋 on Δ𝑌 (for simplicity we ommit the expected sign on Y). But what if
we want to look at the impact Δ𝑋 on Δ𝑌 𝑌 (semi-elasticities)? With margins that is quite easy.

2
[5]: margins, eydx(heatsqf unemp)

Average marginal effects Number of obs = 5,065


Model VCE : OLS

Expression : Linear prediction, predict()


ey/dx w.r.t. : heatsqft unemp

------------------------------------------------------------------------------
| Delta-method
| ey/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heatsqft | -.0374337 228.2588 -0.00 1.000 -447.5235 447.4486
unemp | 6.532762 39834.55 0.00 1.000 -78086.42 78099.49
------------------------------------------------------------------------------
𝛽
Recall that in this case the estimated semi-elasticity is given by 𝑌𝑗 . The question is which 𝑌 to
use. margins uses the predicted value of 𝑌 , does this computation for each observation, and then
averages all values. We can replicate the result:
[6]: qui regress price bedrooms heatsqft unemp i.built01plus
qui predict double yhat
gen double invy=1/yhat
qui sum invy
di "The semi-elasticity is -> "_b[heatsqft]*r(mean)

The semi-elasticity is -> -.03743371


We could have instructed margins to compute the formula using the average value of Y instead:

[7]: margins, eydx(heatsqft) atmean

Conditional marginal effects Number of obs = 5,065


Model VCE : OLS

Expression : Linear prediction, predict()


ey/dx w.r.t. : heatsqft
at : bedrooms = 3.408292 (mean)
heatsqft = 1981.833 (mean)
unemp = 5.143139 (mean)
0.built01p~s = .5528134 (mean)
1.built01p~s = .4471866 (mean)

3
------------------------------------------------------------------------------
| Delta-method
| ey/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heatsqft | .0010875 .0000245 44.32 0.000 .0010394 .0011356
------------------------------------------------------------------------------
The estimate is quite different but the interpretation would be the same. Everything else constant
an incease of one square foot in a house would on average lead to a price increase of 1.0875%. Note
that the change in 𝑌 is relative so it must be read as a percentage.
Finally, what if we wanted to compute the elasticity?
[8]: margins, eyex(heatsqft) atmean

Conditional marginal effects Number of obs = 5,065


Model VCE : OLS

Expression : Linear prediction, predict()


ey/ex w.r.t. : heatsqft
at : bedrooms = 3.408292 (mean)
heatsqft = 1981.833 (mean)
unemp = 5.143139 (mean)
0.built01p~s = .5528134 (mean)
1.built01p~s = .4471866 (mean)

------------------------------------------------------------------------------
| Delta-method
| ey/ex Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heatsqft | 2.155189 .0486256 44.32 0.000 2.059862 2.250516
------------------------------------------------------------------------------
and we can conclude that, ceteris paribus, an increase of 1 percent on footage will lead to a 2.1
percent increase in price.
Suppose now that to the specification above we decide to add a quadratic term on heatsqft. We
could create a new variable and add it to the regression
[9]: gen heat2=heatsqft*heatsqft
regress price bedrooms heatsqft heat2 unemp i.built01plus

Source | SS df MS Number of obs = 5,065


-------------+---------------------------------- F(5, 5059) = 993.56
Model | 2.6834e+14 5 5.3667e+13 Prob > F = 0.0000

4
Residual | 2.7326e+14 5,059 5.4015e+10 R-squared = 0.4955
-------------+---------------------------------- Adj R-squared = 0.4950
Total | 5.4160e+14 5,064 1.0695e+11 Root MSE = 2.3e+05

-------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
bedrooms | -27927.4 6246.967 -4.47 0.000 -40174.16 -15680.64
heatsqft | -119.9291 18.30153 -6.55 0.000 -155.808 -84.05014
heat2 | .0855498 .0033553 25.50 0.000 .078972 .0921275
unemp | -50821.73 6085.551 -8.35 0.000 -62752.05 -38891.41
1.built01plus | -61857.04 6805.673 -9.09 0.000 -75199.1 -48514.97
_cons | 536362.9 39021.86 13.75 0.000 459863.1 612862.6
-------------------------------------------------------------------------------
Since heat2 is the square of heatsqft the partial effect of this variable needs to take this into account.
This is a model of the type

𝐸(𝑦𝑖 |x𝑖 ) = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥22𝑖 + ... + 𝛽𝑘 𝑥𝑘𝑖


𝜕𝐸(𝑦𝑖 |x𝑖 )
and the marginal effect for 𝑥2 (heatsqft) is 𝜕𝑥2𝑖 = 𝛽2 + 2𝛽3 𝑥2𝑖 .
But with Stata you don’t need to calculate heat2. Instead we use the syntax
c.heatsqft##c.heatsqftto let Stata calculate the squared variable. An advantage of doing this
is that now the margins command will “know” how to compute the marginal effects.
[10]: regress price bedrooms c.heatsqft##c.heatsqft unemp i.built01plus
margins, dydx(heatsqft)

Source | SS df MS Number of obs = 5,065


-------------+---------------------------------- F(5, 5059) = 993.56
Model | 2.6834e+14 5 5.3667e+13 Prob > F = 0.0000
Residual | 2.7326e+14 5,059 5.4015e+10 R-squared = 0.4955
-------------+---------------------------------- Adj R-squared = 0.4950
Total | 5.4160e+14 5,064 1.0695e+11 Root MSE = 2.3e+05

-------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
bedrooms | -27927.4 6246.967 -4.47 0.000 -40174.16 -15680.64
heatsqft | -119.9291 18.30153 -6.55 0.000 -155.808 -84.05013
|
c.heatsqft#|
c.heatsqft | .0855498 .0033553 25.50 0.000 .078972 .0921275
|
unemp | -50821.73 6085.551 -8.35 0.000 -62752.05 -38891.41
1.built01plus | -61857.04 6805.673 -9.09 0.000 -75199.1 -48514.97
_cons | 536362.9 39021.86 13.75 0.000 459863.1 612862.6

5
-------------------------------------------------------------------------------

Average marginal effects Number of obs = 5,065


Model VCE : OLS

Expression : Linear prediction, predict()


dy/dx w.r.t. : heatsqft

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heatsqft | 219.1616 7.111178 30.82 0.000 205.2206 233.1026
------------------------------------------------------------------------------
and we can conclude that, everything else constant, on average, an increase of a foot leads to a
price incease of 219 dollars.
Similarly, if we wanted to compute any interaction of variables we should use Stata’s syntax. For
example, suppose that for some reason we want to introduce as a regressor the interaction between
unemp and heatsqft. We could simply do
[11]: regress price bedrooms c.heatsqft##c.unemp i.built01plus

Source | SS df MS Number of obs = 5,065


-------------+---------------------------------- F(5, 5059) = 799.61
Model | 2.3908e+14 5 4.7816e+13 Prob > F = 0.0000
Residual | 3.0252e+14 5,059 5.9799e+10 R-squared = 0.4414
-------------+---------------------------------- Adj R-squared = 0.4409
Total | 5.4160e+14 5,064 1.0695e+11 Root MSE = 2.4e+05

-------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
bedrooms | -53765.35 6496.683 -8.28 0.000 -66501.66 -41029.04
heatsqft | 779.3125 46.59624 16.72 0.000 687.9636 870.6613
unemp | 117899.9 18723.03 6.30 0.000 81194.64 154605.1
|
c.heatsqft#|
c.unemp | -89.233 9.016719 -9.90 0.000 -106.9097 -71.55633
|
1.built01plus | -92614.85 7081.681 -13.08 0.000 -106498 -78731.69
_cons | -721585.9 97367.25 -7.41 0.000 -912467.9 -530703.9
-------------------------------------------------------------------------------
and because we “told” Stata how the interaction variable was constructed margins would know
how to correctly compute the partial effects.

6
2 The log linear model
Suppose that instead of the above regression we used as a dependent variable the log of price.
[12]: gen lprice=log(price)
regress lprice bedrooms heatsqft unemp i.built01plus

Source | SS df MS Number of obs = 5,065


-------------+---------------------------------- F(4, 5060) = 1880.51
Model | 1185.87182 4 296.467956 Prob > F = 0.0000
Residual | 797.724331 5,060 .15765303 R-squared = 0.5978
-------------+---------------------------------- Adj R-squared = 0.5975
Total | 1983.59615 5,064 .391705402 Root MSE = .39706

-------------------------------------------------------------------------------
lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
bedrooms | -.1117128 .0105461 -10.59 0.000 -.1323878 -.0910378
heatsqft | .0007175 1.00e-05 71.79 0.000 .000698 .0007371
unemp | -.1607762 .0103903 -15.47 0.000 -.1811458 -.1404067
1.built01plus | -.0918013 .0114835 -7.99 0.000 -.1143138 -.0692887
_cons | 12.16847 .0618528 196.73 0.000 12.04721 12.28973
-------------------------------------------------------------------------------
This is a model of the type

𝐸(𝑙𝑜𝑔(𝑦𝑖 |x𝑖 )) = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ... + 𝛽𝑘 𝑥𝑘𝑖

whose coefficients have a direct interpretation as semi-elasticities. Based on this model we would
conclude that, everything else constant, and on average
• an increase of one bedroom leads to a price decrease of 11.17 percent
• an increase of a foot leads to an increase of 0.07 percent in the price of a house
• an increase of a percentage point in the unemployment rate leads to a decrease of 16 percent
in price
What about the coefficient on “1.built01plus”? We can conclude that, ceteris paribus, after 2000
prices increase by a factor of 𝑒𝑥𝑝(−0.0918) = .9122864 or, in other words, decreased 8.77 percent
(.9122864 − 1 = −.0877136). When working with dummies, if the dependent variable is in logs,
then the partial effect is usually calculated as (𝑒𝛽 − 1) × 100%. A simpler alternative is to express
the change in terms of log points. In this case we could simply say that after 2000 prices decreased
9.2 log points (we are reading the coefficient directly).

3 The log-log regression model


What if we wanted to estimate a constant-elasticity model? In this case the explanatory variables
would need to be in logs. Of course, that would only be possible if the variables had positive values.

7
[13]: gen lbedrooms=log(bedrooms)
gen lheatsqft=log(heatsqft)
gen lunemp=log(unemp)
regress lprice lbedrooms lheatsqft lunemp i.built01plus

Source | SS df MS Number of obs = 5,065


-------------+---------------------------------- F(4, 5060) = 1703.19
Model | 1138.21509 4 284.553772 Prob > F = 0.0000
Residual | 845.381067 5,060 .167071357 R-squared = 0.5738
-------------+---------------------------------- Adj R-squared = 0.5735
Total | 1983.59615 5,064 .391705402 Root MSE = .40874

-------------------------------------------------------------------------------
lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
lbedrooms | -.4608738 .0375345 -12.28 0.000 -.5344576 -.3872899
lheatsqft | 1.52488 .0219449 69.49 0.000 1.481858 1.567901
lunemp | -.8749995 .055821 -15.68 0.000 -.9844328 -.7655662
1.built01plus | -.1287332 .0118935 -10.82 0.000 -.1520496 -.1054168
_cons | 2.900198 .167044 17.36 0.000 2.572719 3.227676
-------------------------------------------------------------------------------
Now we have a model of the type (ignoring the dummy variable)

𝐸(𝑙𝑜𝑔(𝑦𝑖 |x𝑖 )) = 𝛽1 + 𝛽2 𝑙𝑜𝑔(𝑥2𝑖 ) + 𝛽3 𝑙𝑜𝑔(𝑥3𝑖 ) + ... + 𝛽𝑘 𝑙𝑜𝑔(𝑥𝑘𝑖 )

and the coefficients have a direct interpretation as elasticities. In this particular case it doesn’t
make much sense to take the log of bedrooms or unemp so we will only interpret the coeffcient for
heatsqft. Based on this model we would conclude that, everything else constant, on average when
the square footage increased by 1 percent price increased by 1.52 percent.
Note that Stata does not “know” that the variables are in logs. So if using the margins command
the dydx option would produce estimates for the elasticities.

4 Remarks
In general we do not include among the explanatory the log of variables that are already in per-
centage (as unemp). It is also not common to take logs of variables that have a small number of
discrete values (as bedroom). We also have to be careful about interactions because it may become
difficult to provide meaningful interpretation. When adding interactions it is also a good idea to
add the original variables by themselves.

You might also like