You are on page 1of 15

Logarithms in Regression

Analysis with Asiaphoria


and Dummy Variables

Lecture 6

Reading: “Logarithms in Regression Analysis


with Asiaphoria”

Overview of this Lecture


• Using natural logarithm transformations to
straighten scatter plots
– Interpreting coefficients when x and/or y are
logged
– Asiaphoria and Penn World Tables data
• The power and convenience of natural logs
• Regression when the x-variable is a dummy
– Combine everything: ln(y) and x is a dummy

Lecture 6 Slides, ECO220Y1Y, 1


2017 World Happiness Report (WHR)
• “Happiness is increasingly considered the
proper measure of social progress and the
goal of public policy. In June 2016, the OECD
committed itself ‘to redefine the growth
narrative to put people’s well-being at the
centre of governments’ efforts.’ In a recent
speech, the head of the UN Development
Program (UNDP) spoke against what she called
the ‘tyranny of GDP,’ arguing that what
matters is the quality of growth.” p. 3
https://worldhappiness.report/ed/2017/ 3

Worldwide Happiness Survey

• Huge annual survey in 150+ countries, each


w/ ~1,000 respondents, measuring happiness
– Cantril ladder: “Please imagine a ladder,
with steps numbered from 0 at the bottom
to 10 at the top. The top of the ladder
represents the best possible life for you and
the bottom of the ladder represents the
worst possible life for you. On which step
of the ladder would you say you personally
feel you stand at this time?”
4

Lecture 6 Slides, ECO220Y1Y, 2


Straightening a Scatter Plot
2016, n = 133 countries 2016, n = 133 countries
Cantril Ladder (mean reply)

Cantril Ladder (mean reply)


8 Costa 8
Rica Luxembourg
7 7
6 6
Singapore
5 Canada 5
4 4
3 South Sudan & Tanzania 3
0 20000 40000 60000 80000 100000 6 8 10 12
GDP per capita ln(GDP per capita)

Variable Mean S.D. Median Min. Max. 25th Per. 75th Per.
Cantril Ladder 5.41 1.13 5.43 2.89 7.66 4.52 6.12
GDP per capita 18,950 18,521 13,178 760 94,774 4,731 27,453
Ln(GDP per capita) 9.28 1.19 9.49 6.63 11.46 8.46 10.22
“GDP per capita is in terms of Purchasing Power Parity (PPP) adjusted to constant 2011
international dollars, taken from the World Development Indicators (WDI) released by
the World Bank in August 2016.” p. 17 of 2017 WHR 5

What to Log?
• Given a curved relationship, how do we know
whether to take the natural log of x, the
natural log of y, or the natural log of both?
– A quick rule of thumb: apply the natural log to the
positively skewed variable
– You can think about which variable would have
diminishing returns (applying the natural log to it)
– Of course, there’s always trial-by-error, so long as
you know how to use software

Lecture 6 Slides, ECO220Y1Y, 3


80 CASE 1: How to 80
straighten? Let’s try
60 60
everything.
y1

y1
40 40

20 20

0 0
0 1 2 3 4 5 -6 -4 -2 0 2
x1 ln(x1)

4 4

2 2
ln(y1)

ln(y1)
0 0

-2 -2
0 1 2 3 4 5 -6 -4 -2 0 2
x1 ln(x1)

6 6

4 4
y2

y2

2 CASE 2: How to 2
straighten? Let’s try
0
everything. 0
0 2 4 6 8 0 .5 1 1.5 2
x2 ln(x2)

2 2

1.5 1.5
ln(y2)

ln(y2)

1 1

.5 .5

0 0
0 2 4 6 8 0 .5 1 1.5 2
x2 ln(x2)

Lecture 6 Slides, ECO220Y1Y, 4


25 25
CASE 3: How to
20 straighten? Let’s try 20

15 everything. 15
y3

y3
10 10

5 5

0 0
0 5 10 15 20 0 1 2 3
x3 ln(x3)

4 4

2 2
ln(y3)

ln(y3)
0 0

-2 -2

-4 -4
0 5 10 15 20 0 1 2 3
x3 ln(x3)

Functional Form
• Some common functional forms in economics:
– Linear: 𝑌 = 𝑎 + 𝑏𝑋
– Log-log (constant elasticity): ln 𝑌 = 𝑎 + 𝑏 ln(𝑋)
– Semi-log (log-lin): ln 𝑌 = 𝑎 + 𝑏𝑋
– Lin-log: 𝑌 = 𝑎 + 𝑏 ln(𝑋)
• Next, review first two in a demand context
and then summarize all four
• “Logarithms in Regression Analysis with Asiaphoria”

10

Lecture 6 Slides, ECO220Y1Y, 5


Elasticity of Demand: %%
%
• Recall ECO101/102: 𝜀 = 𝜂 =
%

∗ Recall the Prerequisite


• With calculus: = = Review.

Table III. Own and Cross Price Elasticities


with respect to the price of
Cottonelle Charmin Angel Soft
Elasticity of Cottonelle -3.304 0.737 0.621
the demand Charmin 0.242 -2.292 0.262
for Angel Soft 0.765 1.132 -4.066
Hausman and Leonard (2002) “The Competitive Effects of a New Product Introduction:
A Case Study” p. 251 https://onlinelibrary.wiley.com/doi/epdf/10.1111/1467-6451.00176 11

Demand Specifications
𝑃 Demand Demand unit
• Linear specifications elastic: elastic: 𝜀 = −1
𝑎 𝜀 < −1
of demand: Demand
𝑎
𝑄= − 𝑃 inelastic:
2 𝜀 > −1
– Graph: 𝑃 = 𝑎 − 𝑏𝑄
0 𝑎 𝑎 𝑄
– Elasticity: 𝜀 = −
𝑃 2𝑏 𝑏
• Constant elasticity
function form: Elasticity is same all
ln 𝑄 = 𝑎 − 𝑏 ln(𝑃) along demand curve
– Elasticity 𝜀 = −𝑏

𝑄 12

Lecture 6 Slides, ECO220Y1Y, 6


Constant Elasticity (log-log; log-linear)
ln 𝑄 = 𝑎 − 𝑏 ln(𝑃) ln(Q) = 7 - 3.6ln(P)

2 4 6 8 10
exp(ln 𝑄 ) = exp 𝑎 − 𝑏 ln 𝑃

P
𝑄 = exp 𝑎 exp −𝑏 ln 𝑃
𝑄 = exp 𝑎 exp ln 𝑃
0 20 40 60 80 100
𝑄=𝑒 𝑃 Q

% ∗
𝑒𝑙𝑎𝑠𝑡𝑖𝑐𝑖𝑡𝑦 = = =
% ∗

𝑒𝑙𝑎𝑠𝑡𝑖𝑐𝑖𝑡𝑦 = = 𝑒 −𝑏 𝑃 = −𝑏
13

Regression Result What is 𝒃𝟏 ?


𝑦 =𝑏 +𝑏 𝑥 Slope: a one unit  in 𝑥 is associated
with a 𝑏 unit  in 𝑦 on average
Elasticity: a one percent  in 𝑥 is
ln(𝑦 ) = 𝑏 + 𝑏 ln(𝑥 ) associated with approximately a 𝑏
percent  in 𝑦 on average
“Semi-elasticity”: a one unit  in 𝑥 is
ln(𝑦 ) = 𝑏 + 𝑏 𝑥 associated with approximately a
100*𝑏 percent  in 𝑦 on average
(no name): a one percent  in 𝑥 is
𝑦 = 𝑏 + 𝑏 ln(𝑥 ) associated with approximately a
𝑏 /100 unit  in 𝑦 on average
Note well: A full interpretation must specify the context, give
variable descriptions, be clear on causality, and provide the
specific units of measurement of any non-logged variables.
14

Lecture 6 Slides, ECO220Y1Y, 7


Interpret the Results
2016, n = 133 countries, R-squared = 0.645
cantril_hat = -1.679 + 0.764*ln(gdp)

Cantril Ladder (mean reply)


8

7 Do these
data have
6
summary
5 values
4 (Section
3
19.4)?
6 8 10 12
ln(GDP per capita)

For countries with GDP per capita that is 10% higher, we observe
mean happiness, which is measured on a 10-point Cantril ladder
scale, that is approximately 0.08 points higher on average. 15

Fortune 500 Companies, 2013


Y-hat = 20.4 + 0.1X, R2 = 0.11
500
Wal-Mart & Exxon Mobile
Revenues ($b)

400
300
200
Fannie Mae, J.P. Morgan
100 Chase, Bank of America,
0 Freddie Mac, Citigroup,
0 1000 2000 3000
Assets ($b) Wells Fargo
Fortune 500 Companies, 2013
Y-hat = 16.1 + 0.1X, R2 = 0.17
250
Chevron
Revenues ($b)

200
150 Goldman Sachs, MetLife,
100 Morgan Stanley, Prudential
50 Financial, GE, AIG, TIAA-CREF,
0 Berkshire Hathaway
0 200 400 600 800 1000
Assets ($b)

http://money.cnn.com/magazines/fortune/fortune500/2013/full_list/ 16

Lecture 6 Slides, ECO220Y1Y, 8


Fortune 500 Companies, 2013
Y-hat = 13.7 + 0.2X, R2 = 0.15 Clearly the problem is not outliers
200
as it may have first appeared.
Revenues ($b)
150

100 Instead, the problem is that the


50 functional form is not linear so
0 linear descriptive techniques
0 100 200 300 400
Assets ($b) (OLS, R2) are NOT appropriate.
150

100
residual

50

-50
0 20 40 60 80
Y-hat

17

Fortune 500 Companies, 2013 Fortune 500 Companies, 2013


Y-hat = 20.4 + 0.1X, R2 = 0.11 Y-hat = 1.4 + 0.4X, R2 = 0.42
500 6
Ln Revenues ($b)
Revenues ($b)

400 5
300 4
200 3
100 2
0 1
0 1000 2000 3000 -2 0 2 4 6 8
Assets ($b) Ln Assets ($b)

Fortune 500 Companies, FAKE DATA Fortune 500 Companies, FAKE DATA
Y-hat = 16.6 + 0.1X, R2 = 0.22 Y-hat = 1.4 + 0.4X, R2 = 0.42
250 6 How to
Ln Revenues ($b)
Revenues ($b)

200
4 interpret
150
100
1.4?
2
50
0 0
0 500 1000 1500 -2 0 2 4 6 8
Assets ($b) Ln Assets ($b)

Firms w/ 1 billion in assets have revenues of ≈4 billion (𝑒 . ). 18

Lecture 6 Slides, ECO220Y1Y, 9


Penn World Tables (PWT) 10.0
• High-quality publicly available data, which are
periodically updated (10.0 released Jan 2021)
– Pritchett and Summers (2014) “Asiaphoria Meets
Regression to the Mean” use PWT 8.0
– It contains country-level GDP measures – more
than one variable measuring GDP depending on
researcher’s purpose – for each year
• Are these data cross-sectional, time series or panel?
– Can use these data to compute growth rates

19

Getting from GDP levels to growth


China, n = 35 years China, R-squared = 0.983
Real GDP/capita 2011US$

12000 ln_gdp_hat = -115.730 + 0.062*year


9.5
ln(Real GDP/capita)

10000
9
8000
8.5
6000 8
4000 7.5
2000 7
1980 1987 1994 2001 2008 2014 1980 1987 1994 2001 2008 2014
Year Year

How to interpret 0.062? What does the 𝑅 value of 0.983 mean?


Over the period from 1980 through 2014 in China, real GDP per capita
has risen at an impressive rate of about 6.2% annually on average.
“Real GDP/capita” is rgdpna/pop from PWT 9.0 (DACM): rgdpna is “Real GDP at
constant 2011 national prices (in mil. 2011US$)” and pop is “Population (in millions).”
20

Lecture 6 Slides, ECO220Y1Y, 10


Sometimes ln() is just convenient
Bangladesh, n = 11 years Bangladesh, n = 11 years
R-squared = 0.983 R-squared = 0.992

ln(Real GDP per capita)


gdp_hat = -182922.8 + 92.3*year ln_gdp_hat = -78.796 + 0.043*year
Real GDP per capita

2600 7.9
2400 7.8
2200 7.7
2000 7.6
1800 7.5
1600 7.4
2000 2002 2004 2006 2008 2010 2000 2002 2004 2006 2008 2010
Year Year

How to interpret -182922.9?

21

Pritchett and Summers (2014)


• One goal of the paper is to assess how well
past growth rates predict future growth rates
using a cross-section of countries
• To obtain the data on GDP growth rates, they
run many regressions and retrieve the OLS
coefficients: these populate the variables that
measure growth rates in each decade
– Growth rates allow cross-country comparisons

22

Lecture 6 Slides, ECO220Y1Y, 11


China, n = 11 years China, n = 11 years
R-squared = 0.83 R-squared = 0.93
ln_gdp_hat = -40.102 + 0.024*year ln_gdp_hat = -91.889 + 0.050*year

ln(Real GDP/capita)

ln(Real GDP/capita)
7.3 7.8
7.7
7.2
7.6
7.1 7.5
7.4
7 7.3
1970 1972 1974 1976 1978 1980 1980 1982 1984 1986 1988 1990
Year Year

China, n = 11 years China, n = 11 years


R-squared = 0.97 R-squared = 1.00
ln_gdp_hat = -113.762 + 0.061*year ln_gdp_hat = -162.308 + 0.085*year
ln(Real GDP/capita)

ln(Real GDP/capita)
8.4 9.2
8.2 9
8 8.8

7.8 8.6
8.4
7.6
1990 1992 1994 1996 1998 2000 2000 2002 2004 2006 2008 2010
Year Year

Repeat for each of 141 other countries to get growth rates? 23

Germany, n = 11 years Costa Rica, n = 11 years


R-squared = 0.77 R-squared = 0.94
Real GDP/capita 2011US$

Real GDP/capita 2011US$

gdp_hat = -887715 + 462*year gdp_hat = -424851 + 217*year


42000 9000
41000 8500
40000 8000
39000
7500
38000
37000 7000
2000 2002 2004 2006 2008 2010 1990 1992 1994 1996 1998 2000
Year Year

Canada, n = 11 years Egypt, n = 11 years


R-squared = 0.89 R-squared = 0.85
Real GDP/capita 2011US$

Real GDP/capita 2011US$

gdp_hat = -1027284 + 533*year gdp_hat = -247075 + 127*year


4000
32000
31000 3500
30000 3000
29000
28000 2500
27000 2000
1980 1982 1984 1986 1988 1990 1970 1972 1974 1976 1978 1980
Year Year

Are growth levels comparable across countries? Over time? 24

Lecture 6 Slides, ECO220Y1Y, 12


Regression when x is a dummy
• Recall that a dummy variable (also called an
indicator variable or a fixed effect) takes only
two possible values: 0 and 1
• Codes categorical information so we can use
methods (e.g. OLS, correlation, mean, s.d.,
etc.) usually reserved for interval variables
– For example, a dummy variable named “emp” is 1
if a person is employed and 0 otherwise

25

Happier in the OECD?


2016, n = 141 countries, R-squared = 0.383 Interpret 5.002?
cantril_hat = 5.002 + 1.650*OECD
In 2016 the mean
Cantril Ladder (mean reply)

8
happiness in non-
7
OECD countries is
6 only 5 on the Cantril
5 Ladder, which is on a
4
10-point scale.
3
Interpret 1.650? (Are
these observational or
0 .2 .4 .6 .8 1
OECD (=1 if OECD member, =0 otherwise) experimental data?)

In 2016, on average people living in OECD countries are a


whopping 1.7 units happier – on a 10-point Cantril Ladder scale –
compared to those living in non-OECD countries. 26

Lecture 6 Slides, ECO220Y1Y, 13


What if reverse definition of dummy?
2016, n = 141 countries, R-squared = 0.383 2016, n = 141 countries, R-squared = 0.383
cantril_hat = 5.002 + 1.650*OECD cantril_hat = 6.652 + -1.650*non-OECD
Cantril Ladder (mean reply)

Cantril Ladder (mean reply)


8 8

7 7

6 6

5 5

4 4

3 3
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
OECD (=1 if OECD member, =0 otherwise) non-OECD (=1 if non-OECD member, =0 otherwise)

27

Natural Logs and Dummies


2016, n =2016, R-squared
141 countries, = 0.327
R-squared = 0.383 2016, n 2016, R-squared
= 141 countries, = 0.327
R-squared = 0.383
cantril_hat = 5.002
ln(cantril)_hat + 1.650*OECD
= 1.59 + 0.30*OECD cantril_hat = 6.652
ln(cantril)_hat + -1.650*non-OECD
= 1.89 + -0.30*nOECD
Cantril Ladder (mean reply)

reply)

8 8
2 2
ln(Cantril Ladder)

Ladder)

7 7
1.8 1.8
Ladder (mean

6 1.6 61.6
ln(Cantril

5 1.4 51.4

4 1.2 41.2
Cantril

3 1 3 1
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
0 .2 .4 OECD
.6 .8 1 0 .2 non-OECD
.4 .6
(nOECD) .8 1
OECD (=1 if OECD member, =0 otherwise) non-OECD (=1 if non-OECD member, =0 otherwise)

(6.652 – 5.002)/5.002 = 0.33 (5.002 – 6.652)/6.652 = –0.25


Remember: Interpretations with logs are approximate.
Cantril Ladder Mean S.D. Median Min. Max.
34 OECD countries 6.652 0.730 6.824 5.303 7.660
107 non-OECD countries 5.002 0.950 5.100 2.693 7.136
28

Lecture 6 Slides, ECO220Y1Y, 14


Recap
• Used natural log transformations – on x, on y,
or both – to straighten scatter plots, deal with
seeming outliers, and ease interpretation
– How these affect interpretation of OLS coefficients
– PWT data and Asiaphoria case study
• Interpretations when x variable is a dummy
– When y is a regular interval (quantitative) variable
– When y has been logged

29

Lecture 6 Slides, ECO220Y1Y, 15

You might also like