Ace

14.
451 Introduction to
Economic Growth
Daron Acemoglu
MIT Department of Economics
January 2006
14.451: Introduction to Economic Growth
ii
Contents
I
Introduction
1 Stylized Facts of Economic Growth and Development
1.1 A Quick Look at the Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.3 The Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2 The Solow Growth Model
19
2.1 The Basic Model in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . .
19
2.1.1
The Production Structure . . . . . . . . . . . . . . . . . . . . . . . .
19
2.1.2
Endowments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.1.3
Fundamental Law of Motion of the Solow Model . . . . . . . . . . . .
27
2.1.4
Definition of Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.1.5
Equilibrium Without Population Growth and Technological Progress
29
2.1.6
Transitional Dynamics in the Solow Model . . . . . . . . . . . . . . .
35
2.2 The Solow Model in Continuous Time . . . . . . . . . . . . . . . . . . . . . .
41
2.2.1
From Dierence to Dierential Equations . . . . . . . . . . . . . . . .
41
2.2.2
The Fundamental Equation of the Solow Model in Continuous Time .
42
2.2.3
A First Look at Sustained Growth . . . . . . . . . . . . . . . . . . .
49
iii

2.3 Solow Model with Technological Progress . . . . . . . . . . . . . . . . . . . .
2.3.1
Balanced Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
2.3.2
Neutral Technological Progress . . . . . . . . . . . . . . . . . . . . .
53
2.3.3
The Steady-State Technological Progress Theorem . . . . . . . . . . .
55
2.3.4
The Solow Growth Model with Technological Progress: Continuous

Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 The Solow Model and the Data
59
63
3.1 Growth Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
3.2 Solow Model and Cross-Country Income Dierences . . . . . . . . . . . . . .
66
3.2.1
Solow Model with Human Capital . . . . . . . . . . . . . . . . . . . .
66
3.2.2
Problems with the Mankiw, Romer and Weil Approach . . . . . . . .
71
3.2.3
The Macro Mincer Approach (Bils-Klenow-Rodriguez-Hall-Jones) . .
75
3.3 An Alternative Approach to Estimating Productivity Dierences (Trefler) . .
79
4 Fundamental Determinants of Dierences in Income
II
51
83
4.1 From Proximate to Fundamental Causes . . . . . . . . . . . . . . . . . . . .
83
4.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.3 Europes Expansion and Colonial Origins of Institutions . . . . . . . . . . .
89
Neoclassical Growth
95
5 Towards Neoclassical Growth
99
5.1 Representative Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 Welfare Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
iv

5.4 Optimal Growth in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5 Optimal Growth in Continuous Time . . . . . . . . . . . . . . . . . . . . . . 111
6 Dynamic Programming and Optimal Growth
113
6.1 Brief Review of Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 114

6.2 Digression: Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2.1
Contraction Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2.2
Application of Contraction Mappings to Dynamic Programming . . . 123
6.3 Back to the Fundamentals of Dynamic Programming . . . . . . . . . . . . . 135

6.3.1
Basic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.2
Dynamic Programming Versus the Sequence Problem . . . . . . . . . 138
6.4 Optimal Growth in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . 141

6.5 Competitive Equilibrium Growth . . . . . . . . . . . . . . . . . . . . . . . . 146
7 Brief Review of Optimal Control
149
7.1 Finite-Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.1.1
The Fundamental Problem . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.2
Variational Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.1.3
Simplified Maximum Principle . . . . . . . . . . . . . . . . . . . . . . 154
7.1.4
Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.1.5
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.2 Infinite-Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . 160

7.2.1
The Basic Problem: Necessary and Sucient Conditions . . . . . . . 160
7.2.2
Lack of Transversality Conditions . . . . . . . . . . . . . . . . . . . . 163
7.2.3
Discounted Infinite-Horizon Optimal Control . . . . . . . . . . . . . . 164

v

8 The Neoclassical Growth Model
167
8.1 Preferences, Technology and Demographics . . . . . . . . . . . . . . . . . . . 167

8.2 Characterization of Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.2.1
Definition of Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.2.2
The Consumer Problem . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.2.3
Equilibrium Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.3 Optimal Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

8.4 Steady-State Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.5 Transitional Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.6 Technological Change and the Canonical Neoclassical Model . . . . . . . . . 185
8.7 The Role of Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.8 Quantitative Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8.1
Policy Dierences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8.2
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.9 Variants of the Neoclassical Model
. . . . . . . . . . . . . . . . . . . . . . . 199
9 Growth with Overlapping Generations
203
9.1 Problems of Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

9.2 Overlapping Generations and Overaccumulation . . . . . . . . . . . . . . . . 206
9.2.1
Demographics, Preferences and Technology . . . . . . . . . . . . . . . 206
9.2.2
Consumption Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.2.3
Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
9.2.4
More Specific Utility Functions . . . . . . . . . . . . . . . . . . . . . 210
9.2.5
Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.3 Role of Social Security in Capital Accumulation . . . . . . . . . . . . . . . . 217

vi

9.3.1
Fully Funded Social Security . . . . . . . . . . . . . . . . . . . . . . . 217
9.3.2
Unfunded Social Security . . . . . . . . . . . . . . . . . . . . . . . . . 219
10 Recitation Material: Stochastic Growth
221
10.1 The Brock-Mirman Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

10.2 Application: Risk, Diversification and Growth . . . . . . . . . . . . . . . . . 223
10.2.1 The Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
10.2.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
10.2.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
10.2.4 Eciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
10.2.5 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
10.2.6 Ineciency with Alternative Market Structures . . . . . . . . . . . . 234
III
Endogenous Growth
239
11 First-Generation Models of Endogenous Growth
243
11.1 AK Model Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

11.1.1 Demographics, Preferences and Technology . . . . . . . . . . . . . . . 244
11.1.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
11.1.3 Transitional Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 247
11.1.4 The Role of Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.2 The Extended AK Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.3 Growth with Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
11.3.1 Preferences and Technology . . . . . . . . . . . . . . . . . . . . . . . 256
11.3.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
11.3.3 Pareto Optimal Allocations . . . . . . . . . . . . . . . . . . . . . . . 261
vii

12 Multiple Equilibria and the Process of Development
263
12.1 Multiple Equilibria From Aggregate Demand Externalities . . . . . . . . . . 264

12.1.1 Preferences and Technology . . . . . . . . . . . . . . . . . . . . . . . 264
12.1.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.2 Human Capital Accumulation with Imperfect Capital Markets . . . . . . . . 275
12.2.1 A Simple Case With No Borrowing . . . . . . . . . . . . . . . . . . . 276
12.2.2 The Galor and Zeira Model . . . . . . . . . . . . . . . . . . . . . . . 279
12.3 Learning-by-Doing, Structural Change and Non-Balanced Growth . . . . . . 283
12.3.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
13 Interdependence and Growth in the Open Economy
289
13.1 Human Capital and Technology (Nelson-Phelps) . . . . . . . . . . . . . . . . 289

13.2 Trade and Technology Diusion . . . . . . . . . . . . . . . . . . . . . . . . . 291
13.2.1 The Basic Krugman Model . . . . . . . . . . . . . . . . . . . . . . . . 291
13.2.2 Understanding the Eects of Trade . . . . . . . . . . . . . . . . . . . 295
13.3 Trade, Specialization and the World Income Distribution . . . . . . . . . . . 296
13.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
13.3.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
13.3.3 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
13.4 Growth with Factor Price Equalization . . . . . . . . . . . . . . . . . . . . . 304
IV
Endogenous Technological Change
14 Expanding Variety Models
307
311
14.1 The Lab-Equipment Model of Growth with Product Varieties . . . . . . . . 312

viii

14.1.2 Digression on Continuous Time Value Functions . . . . . . . . . . . . 314
14.1.3 Characterization of Equilibrium . . . . . . . . . . . . . . . . . . . . . 315
14.1.4 Definition of Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 317
14.1.5 Steady State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
14.1.7 Pareto Optimal Allocations . . . . . . . . . . . . . . . . . . . . . . . 320
14.1.8 Policy in the Endogenous Technology Model . . . . . . . . . . . . . . 322
14.2 Growth with Knowledge Spillovers . . . . . . . . . . . . . . . . . . . . . . . 324
14.2.1 The Role of Competition Policy . . . . . . . . . . . . . . . . . . . . . 326
14.3 Growth without Scale Eects . . . . . . . . . . . . . . . . . . . . . . . . . . 328
15 Models of Quality Competition
333
15.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

15.2 Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
16 Directed Technical Change
341
16.1 Basics and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

16.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
16.1.2 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
16.1.3 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
16.2 Equilibrium Technology Bias: Some More General Results . . . . . . . . . . 354
16.3 Endogenous Labor-Augmenting Technological Change . . . . . . . . . . . . . 356
16.3.2 Consumer and Firm Decisions . . . . . . . . . . . . . . . . . . . . . . 361
16.3.3 Asymptotic and Balanced Growth Paths . . . . . . . . . . . . . . . . 364
ix

16.3.4 The Balanced Growth Path . . . . . . . . . . . . . . . . . . . . . . . 366
16.3.6 Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
17 Recitation Material: Appropriate Technology
373
17.1 Dierences in Capital-Labor Ratios (Atkinson-Stiglitz) . . . . . . . . . . . . 374

17.2 The Role of Human Capital (Acemoglu-Zilibotti) . . . . . . . . . . . . . . . 375
17.2.1 A Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
17.2.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
17.2.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
18 Epilogue: Political Economy of Growth
383
18.1 Thinking of Institutions and Growth . . . . . . . . . . . . . . . . . . . . . . 384

18.1.1 The Impact of Institutions . . . . . . . . . . . . . . . . . . . . . . . . 385
18.1.2 Modeling Institutional Dierences . . . . . . . . . . . . . . . . . . . . 390
18.1.3 Institutions in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
18.2 A Simple Model of Non-Growth Enhancing Institutions . . . . . . . . . . . . 396
18.2.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
18.2.2 Economic Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . 403
18.2.3 Inecient Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
18.2.4 Revenue Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
18.2.5 Factor Price Manipulation . . . . . . . . . . . . . . . . . . . . . . . . 408
18.2.6 Revenue Extraction and Factor Price Manipulation Combined . . . . 409
18.2.7 Political Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . 413
18.2.8 Subgame Perfect Versus Markov Perfect Equilibria . . . . . . . . . . 416
18.2.9 Lack of CommitmentHoldup . . . . . . . . . . . . . . . . . . . . . . 417
x

18.2.10 Technology Adoption and Holdup . . . . . . . . . . . . . . . . . . . . 419
18.2.11 Inecient Economic Institutions . . . . . . . . . . . . . . . . . . . . . 422
18.3 Modeling Political Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . 427
18.3.1 Dictatorship of the Middle Class . . . . . . . . . . . . . . . . . . . . . 428
18.3.2 Democracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
18.3.3 Ineciency of Political Institutions and Inappropriate Institutions . . 431
18.3.4 Institutional Change and Persistence . . . . . . . . . . . . . . . . . . 433
xi
xii
Part I
Introduction

We start with a quick look at the stylized facts of economic growth and the most basic
model of growth, the Solow growth model. The purpose is to both prepare us for the analysis
of more modern models of economic growth with forward-looking behavior and explicit capital accumulation and technological progress, and also give us a way of mapping the simplest
model to the data. I will also discuss dierences between proximate and fundamental causes
of economic growth and development.
Chapter 1
Stylized Facts of Economic Growth
and Development
1.1
A Quick Look at the Facts
There are very large dierences in income per capita or output per worker across countries
today. Countries at the top of the world income distribution are thirty times as rich as
countries at the bottom in PPP adjusted dollars. For example, in 2000, GDP per capita
in the United States was $32500 (valued at 1995 $ prices). In contrast, income per capita
is much lower in many other countries: $9000 in Mexico, $4000 in China, $2500 in India,
$1000 in Nigeria, and much much lower in some other sub-Saharan African countries such
as Chad, Ethiopia, Mali (all figures adjusted for purchasing power parity). The gap is larger
when there is no PPP adjustment. The next figure shows a cross-sectional look at these
income-level dierences in the year 2000.
5
Should we care about cross-country income dierences? The answer is a big yes. High
income levels reflect high standards of living. It is true that together with economic growth,
pollution increases and individual aspirations may also increase so that the same bundle
of consumption may no longer make an individual as happy. But at the end of the day,
when one compares an advanced, rich country with a less-developed one, there are striking
dierences in the quality of life, standards of living and health. In fact, it is even dicult for
us to imagine the burden of poverty at the levels experienced by countries in sub-Saharan
Africa. There is little doubt that the consumption level, living standards and health level of
richer countries are appreciably higher than those with lower income per capita. These gaps
represent big welfare dierences.
Understanding how some countries can be so rich while some others are so poor is one
of the most important, perhaps the most important, challenges facing social science.
6

How could a country be 30-times or so richer than another? The answer lies in dierences
in growth rates. Take two countries, A and B, with the same level of income to start with.
Imagine that country A has 0% growth per capita, so its income per capita remains constant,
while country B grows at 2% per capita. In 200 years time country B will be more than 52
times richer than country A. Therefore, the United States is considerably richer than Nigeria
because it has grown steadily over an extended period of time, while Nigeria has not. In
fact, even in the historically-brief postwar era, we see tremendous dierences in growth rates
across countries. This is shown in the next picture for the postwar era:
This picture shows how East Asian tigers have grown at much higher rates than the rest
of the world over the past 40 years, while a number of countries in sub-Saharan Africa and
Central America have experienced negative growth.

However, the substantial growth dierences in the postwar era do not mean that these
growth dierences are responsible for the current dierences in income levels. For one thing,
it may be precisely the poor countries that are growing faster. For instance, Hong Kong,
South Korea, Singapore and Taiwan were substantially poorer than the United States and
Western Europe in 1960. For another, these growth dierences may be small relative to those
necessary to cause large per capita income level dierences. The next question is therefore
when this growth gap opened up. The answer is that much of the divergence happened
during the 19th century and early 20th century. There are striking growth dierences during
the postwar era, but the world income distribution has been more or less stable, with a
slight tendency towards becoming more unequal.
For example, despite some big growth successes and disasters, countries that were rich
in 1960 are very very likely to be rich today. A regression of log income per worker in 1990
on log income per worker in 1960 gives the following relationship:
ln y1990 =
0.56
(0.48)
1.00
ln y1960
(0.06)
with R2 = 0.78. The next figure shows this relationship diagrammatically.

8
(1.1)
If we look at output or income per worker, the overall shape of the world income distribution has been relatively stable in the postwar period. There is certainly no narrowing of
income gaps. Instead, there is a small but notable increase in the dispersion of incomes. This
is shown in the next figure which depicts the standard deviation of log income per capita in
the world and the ratio of the income of the five richest to the five poorest countries in the
world.
9
Moreover, there is also a pattern of stratification, whereby some of the middle-income countries of the 1960s appear to have joined either the low-income or the high-income club. This
is shown in the next figure:
10
The above statements refer to the unconditional distributionthat is, they refer to
whether the income gap between two countries increases or decreases irrespective of these
countries characteristics. Alternatively, we can look at the conditional distribution (e.g.,
Barro and Sala-i-Martin, 1992). Here the picture is one of conditional convergence: in
the postwar period, the income gap between countries that share the same characteristics
typically closes over time (though it does so quite slowly).
How do we capture conditional convergence? Consider a typical Barro growth regression:
0
gt,t1 = ln yt1 + Xt1
+ t
(1.2)
where gt,t1 is the annual growth rate between dates t 1 and t, yt1 is per capita income
at date t 1 and X is a set of variables that the regression is conditioning on (in theory, the
determinants of steady state income and/or growth). When no covariates are included, this
11

regression leads to a positive or zero estimate of , reiterating the absence of unconditional
convergence as shown in the estimation of equation (1.1) above. In fact, without covariates,
this is really identical to the regression equation (1.1), since
gt,t1 ' ln yt ln yt1 ,
so equation (1.2) can be written as
ln yt ' (1 + ) ln yt1 + t ,
which is identical to (1.1) above. The estimate of (1 + ) in (1.1) equal to 1 implies that
' 0, thus no unconditional convergence.
But when Xt1 includes some human capital-related variables such as years of schooling
or life expectancy, is estimated to be approximately -0.02, indicating that the income
gap between countries that have the same human capital endowment has been typically
narrowing over the postwar period, roughly at the rate of 2 percent a year.
If we look at a longer period, for example, from 1870 to today, the pattern is quite
dierent, however. Here, there is divergence. The income gap between countries was much
smaller during the 19th century than today.
Pritchett illustrates this point using data from Angus Maddison and deriving an absolute
lower bound on country incomes due to subsistence. He argues that $250 in terms of 1985
purchasing power parity is a practical lower bound below which the death rate would be
extremely high. This suggests that in 1870, the U.S. was at most eight times as rich as the
poorest country in the world, while it is over 30 times as rich today. Therefore there has
been significant divergence over the past 130 years. This is illustrated in the next figure:
12
If we go even further back, the pattern may be one of reversal: Acemoglu, Johnson and
Robinson (2002) show that in 1500, among the societies that were later to be colonized by
European powers, those that were relatively prosperous are today relatively poor.
How do we measure/proxy economic prosperity in 1500? It turns out that urbanization
rates and population density are good proxies for prosperity during preindustrial periods
(and urbanization rates are also good proxies even today).
A variety of evidence shows that in 1500 the Mughals, Aztecs and Incas were much
more urbanized and densely settled than the civilizations in North America, New Zealand
and Australia. Today the U.S., Canada, New Zealand and Australia are orders of magnitude richer than the countries now occupying the territories of the Mughal, Aztec and Inca
Empires, such as India, Ecuador or Peru. Therefore, among this set of countries there was
a pattern of reversal, whereby those that were relatively prosperous in 1500 have become
relatively poor today. The reversal is not confined to this set of countries, and is more wide13

spread among the former European colonies. This is shown in the next two figures, the first
using urbanization, the second population density as proxies for prosperity in 1500:
USA
CAN
AUS
10
SGP
HKG
Log GDP per capita, PPP, 1995
NZL
CHL
ARG
VEN
URY
TUN
ECU
BLZ PER
GTM
DOM
PRY
MEX
MYS
COL PAN
CRI
BRA
JAM
PHL
DZA
IDN
EGY
SLV BOL
GUY
MAR
LKA
HND
NIC
PAK
VNM IND
HTI
LAO
BGD
CAN
AUS
10
10
Urbanization in 1500
15
20
USA
SGP
HKG
NZL
CHL BRB
BHS
ARG
BWA
BRA
NAM
SUR
GUY
VEN
ZAF
GAB
MYS
KNA
PAN
COL TTO
CRI
MEX
LCA
ECU
GRD
PER
BLZ
DOM
DMA
VCT
GTM
TUN
DZA
JAM
PHL
IDN
MAR
SLV
AGO
LKA
ZWE HND
NIC
CMR
GIN
CIV
COG MRTGHA
SEN
COM
IND
SDN PAK
LSO
VNM
GMB
TGO
CAF
HTI
LAO KEN BEN
UGA NPL
BGD
ZAR
BFA
TCD
MDG
ZMB
NGA
NER
ERI
MLI
BDI
RWA
MWI MOZ
PRY
SWZ
CPV BOL
TZA
EGY
SLE
ETH
6
-5
0
Log Population Density in 1500
14
When did this reversal take place? Consistent with the discussion from Pritchetts paper
above, the evidence suggests that the reversal among the former European colonies took
place during the 19th century as well. Up to the late 18th century, previously prosperous
places continued to be somewhat more prosperous. It was the age of industrialization, the
19th century, when previously less-prosperous former colonies became rapidly urbanized,
industrialized and increased their GDP per capita. The next two pictures give a sense of
these processes:
Timing of the Reversal

Urbanization in excolonies with low and high urbanization in 1500
(averages weighted within each group by population in 1500)
25
20
15
10
0
800
1000
1200
1300
1400
1500
1600
low urbanization in 1500 excolonies
15
1700
1750
1800
1850
high urbanization in 1500 excolonies
1900
1920
Reversal, Industrialization and Divergence

Industrial Production Per Capita, UK in 1900 = 100
(from Bairoch)
400
350
300
250
200
150
100
50
0
1750
1800
US
1.2
1830
Australia
1860
Canada
1880
New Zealand
1900
Brazil
1913
Mexico
1928
1953
India
Interpretation
This discussion points to the following set of facts and questions that are central to an
investigation of the determinants of long-run dierences in income levels and growth:
1. The major pattern to be explained is why there are such large dierences in income per
capita and worker productivity across countries. This immediately takes us to questions
of why some countries grow (or have grown) while other countries have failed to grow
and stagnated.
2. The relative stability of the postwar income distribution has suggested to many economists that we should look for dierences across countries leading to very large permanent dierences in income, but not necessarily large permanent dierences in
16

growth rates in the recent decades. This is based on the following reasoning: with substantially dierent long-run growth rates (as in models of endogenous growth, where
countries that invest at dierent rates grow at dierent rates), we should expect significant divergence. We saw above that despite some widening between the top and the
bottom, the cross-country distribution of income across the world is relatively stable.
So this reasoning might have some merit. Furthermore, economists with this view
argue that the finding of conditional convergence suggests the presence of transitional
dynamics taking countries towards their steady state values as in the basic Solow
and neoclassical models.
3. Nevertheless, we have seen that there there is still some notable (though perhaps not so
large) divergence in the world income distribution. Clearly, countries have not settled
into a stationary world income distribution. It is important to understand why even
in this age of free-flow of technology some countries are growing faster than others.
Equally puzzling is how the very large income dierences we observe today can persist
in this age of free-flow of technology, trade and financial integration.
4. Moreover, the divergence from the 19th century to today suggests that we might want
to look for a set of theories where the large dierences in income per capita, at least to
some extent, reflect technological or institutional changes that took place during the
19th and early 20th centuries. For example, some countries may have taken advantage
of industrialization opportunities, while other societies have failed to do so, or may
have only started adopting technologies very late. We therefore need theories which
can shed light on why certain societies may fail to take advantage of better technologies.
5. The reversal (among the former European colonies) suggests that theories that empha17

size dierences in (economic and perhaps political) institutions or social organization
or more generally man-made factors as key determinants of economic performance
may be more promising than theories emphasizing fixed environmental factors such as
geography or climate. (With such environmental factors as the main determinants of
income dierences, we should expect countries that were relatively rich 200 or 500 years
ago to be also relatively rich todayi.e., persistence not a reversal). More ambitiously,
we may want to investigate whether and why certain characteristics that make countries richer at some point contribute to their relative poverty during other episodes.
Alternatively, we may want to see what type of shocks could cause a reversal in the
relative incomes of countries over long periods.
1.3
The Agenda
In the rest of the class, we will look at models that can help us understand the mechanics
of economic growth. This means understanding a variety of models that underpin the way
economists think about the process of capital accumulation, technological progress, and
productivity growth. Only by understanding these mechanics can we have a framework for
thinking about the causes of why some countries are growing and some others are not, and
why some countries are rich and some others are not.
Therefore, the approach will be two pronged: on the one hand, we want to understand
the mathematical structure of these models as well as possible; on the other, we want to
understand what these models and others have to say about which key parameters or key
economic processes are dierent across countries and why.
18
Chapter 2
The Solow Growth Model
2.1
2.1.1
The Basic Model in Discrete Time

The Production Structure
We start with the simplest growth model, sometimes referred to as the Solow-Swan model
after two economists who developed versions of it, or simply as the Solow growth model after
our own Bob Solow, who was awarded the Nobel prize for his contributions to growth theory.
This is a closed economy, with a unique final good. The economy is in discrete time
running to infinite horizon, so that time is indexed by t = 0, 1, 2, .... Time periods here can
correspond to days, weeks, or years. So far we do not need to take a position on this.
The economy is inhabited by a large number of households, and for now we are going
to make relatively few assumptions on the households because in this baseline model, they
will not be optimizing. To fix ideas, you may want to assume that all households are
identical, so that the economy admits a representative consumer. We return to what this
assumption of the representative consumer involves below. As an aside, you should know
19

from basic general equilibrium theory that most economies do not admit a representative
consumer, in fact the celebrated Debreu-Mantel-Sonnenschein theorem states that we can
say relatively little about the preferences of a consumer obtained by aggregating a number of
well-behaved neoclassical consumers. But much of macroeconomics (unfortunately) ignores
this basic theorem, and works with representative consumers. In many situations this can
be justified on the basis of parsimony. Here I will adopt the same defense and for much of
this course I will limit myself to models with representative consumers. Heterogeneity of
preferences, abilities and income are in fact quite important to understand the process of
economic growth, but many of these topics are beyond the scope of this class.
The key assumption of the Solow model will be that each household saves an exogenous
fraction s of their income. Much of the neoclassical growth theory is about understanding
exactly how much individuals save and how capital accumulates. In the basic model this is
taken as exogenous.
The other key agents in the economy are firms. Let us assume that the economy also
admits an aggregate production function for the unique final good
Y (t) = F [K (t) , L (t) , A (t)]
(2.1)
where Y (t) is the total amount of production of the final good, K (t) is the capital stock,
L (t) is total employment and A (t) is technology. The capital stock here denotes the quantity
of machines used in production. Both the capital stock and technology are taken to be
single indices, and at some level, they are treated as black boxeswe will later discuss how
such models can be extended to think of multiple types of technologies and capital goods.
For now, the important assumption is that technology is free, it is publicly available as a
non-excludable, non-rival good. Thus the firm does not have to pay for it.
20

As an aside, you might want to note that some authors use xt or Kt when working with
discrete time and reserve the notation x (t) or K (t) for continuous time. Since I will go back
and forth between continuous time and discrete time, I use the latter notation all throughout,
except when discussing dynamic programming where the subscripts are the usual notation.
Throughout, I will drop time dependence when this causes no confusion, but include it
when there is any chance of such confusion.
The production function F : R3 R is, for simplicity, assumed to be twice continuously
dierentiable and increasing in all of its arguments, and to be strictly concave in K and L.
In particular, we have:
Assumption 1 (Continuity, Dierentiability, Positive Marginal Products, Concavity and Constant Returns to Scale) F is twice continuously dierentiable in K and
L, and satisfies
F (K, L, A)
> 0,
K
2 F (K, L, A)
< 0,
FKK (K, L, A)
K 2
FK (K, L, A)
F (K, L, A)
> 0,
L
2 F (K, L, A)
FLL (K, L, A)
< 0.
L2
FL (K, L, A)
Moreover, F exhibits constant returns to scale in K and L.

All of the components of Assumption 1 are important. It specifies that marginal products
are positive (thus ruling out some production functions), but more importantly that there
are diminishing returns both to capital and labor, i.e., FKK < 0 and FLL < 0. We will see
below that the degree of diminishing returns to capital will play a very important role in
many of the results of the basic growth model.
The other important assumption is that of constant returns to scale. Recall that F
exhibits constant returns to scale in K and L if it is linearly homogeneous (homogeneous of
degree 1) in these two variables. More specifically:
21

Definition 1 Let z RK for some K 1. The function g (x, y, z) is homogeneous of degree
m in x R and y R if and only if
g (x, y, z) = m g (x, y, z) for all R+ and z RK .
Linearly homogeneous (constant returns to scale) production functions are particularly
useful because of the following theorem:
Theorem 1 (Eulers theorem) Suppose that g : RK+2 R is continuously dierentiable
in x R and y R, with partial derivatives denoted by gx and gy and is homogeneous of
degree m in x and y. Then
mg (x, y, z) = gx (x, y, z) x + gy (x, y, z) y for all x R, y R and z RK .
Moreover, gx (x, y, z) and gy (x, y, z) are themselves homogeneous of degree m 1 in x and
y.
Proof. We have that g is continuously dierentiable and
g (x, y, z) = m g (x, y, z) .
(2.2)
Dierentiate both sides of equation (2.2) with respect to , which gives

mm1 g (x, y, z) = gx (x, y, z) x + gy (x, y, z) y
for any . Setting = 1 yields the first result. To obtain the second result, dierentiate
both sides of equation (2.2) with respect to x:
gx (x, y, z) = m gx (x, y, z) .
22

Dividing both sides by establishes the desired result.
Throughout this course we are going to assume that all factor markets are competitive.
Until we come to models of endogenous technological change, we will further assume that
product markets are also competitive, so ours will be a prototypical competitive general
equilibrium model.
Moreover, as noted above, we will work with aggregate production functions as a representation of underlying production structure of the economy. This would be the case, for
example, when the economy consists of a large number of firms all having access to the same
constant returns to scale production function, for example F above. In that case, there is
no dierence between assuming an aggregate production function or working with a large
number of firms competing for factors of production. Notice, however, that the assumption of an aggregate production function could be quite restrictive. In particular it rules
out heterogeneity of productivity among firms, and it also creates problems when there are
non-constant returns to scale (can you see what would go wrong with decreasing returns to
scale?).
2.1.2
Endowments
Let us imagine that all factors of production are owned by households. In particular, households own all of the labor, which they supply inelastically. If there is population growth, this
can be thought of as existing households becoming larger, or new households being born.
For our purposes here this does not matter. The households also own the capital stock of
the economy, and we take their initial holdings of capital, K (0), as given (as part of the description of the environment), and this will determine the initial condition of the dynamical
system we will be analyzing. For now how this initial capital stock is distributed among the
23

households is not important.
The more important point is that the households will rent their capital to firms. Let the
rental price of capital be denoted by R (t) and the rental price of labor by w (t). Then in
competitive markets a representative firm is solving the problem of profit maximizing.
Another important set of issues involves how to think of capital. There are many
dierent ways of conceptualizing capital, and some of them are beyond the scope of this
course. Loosely speaking, we want to think of capital as corresponding to machines. But
for now let us make the rather heroic assumption that capital is essentially the same as the
final good. So the economy consists of corn, and it can use some amount of this corn
as input into producing further corn. Then K (0) is the amount of corn that individual
households have at the beginning of period t = 0, which they can eat or rent to firms to
enable them to produce further corn. [...These types of models are sometimes referred to
as putty-putty, since capital is totally malleable both before and after it is designated as
capital. Alternatives include putty-clay models where corn can be used as capital, but
once it is in place, it becomes fixed and it cannot be turned back into consumption goods,
and certain features of it, for example, at which capital-labor ratio it can be used, cannot
be changed...]
Given this structure, there is a natural choice of numeraire in this economy which is to
normalize the price of the final good in each period to 1. Recall that we always have to
choose a numeraire, but here we are making a normalization in each period. But this is
without loss of any generality, because the interest-rate between periods will play the role of
relative prices.
This discussion should already alert you to a central fact: you should think of all of the
models we are going to be talking about as general equilibrium economies, where dierent
24

commodities correspond to the same good at dierent dates. Recall from basic general
equilibrium theory that the same good at dierent dates (or in dierent states or in dierent
localities) is a dierent commodity. Therefore, in almost all of the models that we will
study in this course, there will be an infinite number of commodities (because time runs to
infinity). This raises a number of special issues in the theory of general equilibrium which
we will touch on as we go along.
Now returning to our treatment of the basic model, the next important assumption is
that capital depreciates. We assume that this depreciation takes an exponential form. This
means that capital depreciates (exponentially) at the rate , so that out of 1 unit of capital
this period, only 1 is left for next period. This depreciation in general stands for the
wear and tear of the machinery, as well as, in more realistic models, the replacement of old
machines by new machines. For now it is treated as a black box.
The importance of this for a household is that, combined with the normalization of the
price of the final goods to 1, it implies that the rate of return faced by the household will be
r (t) = R (t) + 1 .
Recall that every unit of capital can be eaten now or rented to firms. In the latter case, the
household will receive R (t) units of good as the rental price, but will get back only 1
units of the capital, since the rest has depreciated. This implies that the individual has given
up one unit of commodity dated t 1 for r (t) units of commodity dated t.
Now let us consider the problem of a representative firm. This firm will maximize profits,
which implies
max F [K(t), L(t), A(t)] w (t) L (t) R (t) K (t) .
L(t),K(t)
A couple of features are worth noting:

25
(2.3)

1. I set up the problem in terms of aggregate variables. This is without loss of any generality given the representative firm (or the existence of aggregate production function).
2. There is nothing multiplying the F term, since the price of the final good has been
normalized to 1.
3. This way of writing the problem already imposes competitive factor markets, since the
firm is taking the prices of labor and capital, w (t) and R (t) , as given.
4. This is a concave problem, since F is concave (though not necessarily strictly so).
The first-order necessary conditions of the firms problem (combined with dierentiability
of F ) imply that the competitive factor returns are equal to their marginal products:
w (t) = FL [K(t), L(t), A(t)].
(2.4)
R (t) = FK [K(t), L(t), A(t)].
(2.5)
and
An immediate corollary of Theorem 1 combined with competitive factor markets is:

Proposition 1 In equilibrium, firms make no profits, and in particular,
Y (t) = w (t) L (t) + R (t) K (t) .
Proof. This follows immediately from Theorem 1 for the case of m = 1, i.e., constant
returns to scale.
This result is convenient, since it implies that firms make no profits, so, in contrast to
the basic general equilibrium theory, the ownership of firms does not need to be specified.
All we need to know is that firms are profit-maximizing entities.
26

In addition to these standard assumptions on the production function, in growth theory
we often impose the following additional boundary conditions, referred to as Inada conditions.
Assumption 2 (Inada conditions) F satisfies the Inada conditions
lim FK (K, L, A) = and lim FK (K, L, A) = 0 for all L > 0 and all A
K0
lim FL (K, L, A) = and lim FL (K, L, A) = 0 for all K > 0 and all A.
L0
2.1.3
Fundamental Law of Motion of the Solow Model
Finally, we can write the law of motion of the capital stock of the economy. Recall that K
depreciates exponentially at the rate , so that the law of motion of the capital stock is given
by
K (t + 1) = (1 ) K (t) + I (t) ,
(2.6)
where I (t) is investment at time t. From national income accounting for a closed economy,
we have
Y (t) = C (t) + I (t) + G (t) ,
(2.7)
where C (t) is consumption and G (t) is government spending. For now, we take G (t) 0,
so that national income is divided between consumption and investment. Therefore, using
(2.1), (2.6) and (2.7), feasible dynamic allocations in this economy would have to satisfy
K (t + 1) F [K (t) , L (t) , A (t)] + (1 ) K (t) C (t) .
The question is to determine the equilibrium dynamic allocation among the set of feasible
dynamic allocations. Here the behavioral rule of the constant savings rate simplifies the
structure of equilibrium considerably. It is important that the constant savings rate is a
27

behavioral rule, it is not derived from a well-defined utility function. This means that any
welfare comparisons based on the Solow model have to be taken with a grain of salt. We
have no idea what the utility function of the individuals are.
First note that given G (t) 0 (and the closed economy assumption), aggregate investment is equal to savings,
S (t) = I (t) = Y (t) C (t) .
Now recall that individuals are assumed to save a constant fraction s of their income,
i.e.,
S (t) = sY (t) ,
(2.8)
so that they consume the remaining 1 s fraction of their income:

C (t) = (1 s) Y (t)
(2.9)
Thus combining (2.1), (2.6) and (2.8), we have the key dynamic (dierence) equation
of the Solow growth model:
K (t + 1) = sF [K (t) , L (t) , A (t)] + (1 ) K (t) .
(2.10)
In the Solow growth model, the equilibrium is essentially described by this equation
together with laws of motion for L (t) and A (t).
2.1.4
Definition of Equilibrium
The Solow model is a mixture of an old-style Keynesian model and a modern dynamic macroeconomic model. Households do not optimize when it comes to their savings/consumption
decisions. Instead, their behavior is captured by a behavioral rule. But firms maximize and
28

factor markets clear. Thus it is useful to start defining equilibria in the way that is customary
in modern dynamic macro models.
Definition 2 In the basic Solow model for a given sequence of {L (t) , A (t)}
t=0 and an initial
capital stock K (0), an equilibrium path is a sequence of capital stocks, output levels, consumption levels, wages and rental rates {K (t) , Y (t) , C (t) , w (t) , R (t)}
t=0 such that K (t)
satisfies (2.10), Y (t) is given by (2.1), C (t) is given by (2.9), and w (t) and R (t) are given
by (2.4) and (2.5).
2.1.5
Equilibrium Without Population Growth and Technological

Progress
We can make more progress by exploiting the constant returns to scale nature of the production function. To do this, let us make some further assumptions:
1. Let us assume that population is constant and individuals supply labor inelastically,
so that L (t) = L.
2. Let us also assume that there is no technological progress, so that A (t) = A.
We will relax these assumptions later. For now, let us define the capital-labor ratio of
the economy as
k (t)
K (t)
.
L
Then using the constant returns to scale assumption we have that income per capita, y (t)
Y (t) /L, is given by
K (t)
, 1, A
y (t) = F
L
f (k (t)) .
29
(2.11)

In other words, with constant returns to scale, income per capita is simply a function of the
capital-labor ratio. Given Theorem 1, we also have
R (t) = f 0 (k (t)) > 0 and
w (t) = f (k (t)) k (t) f 0 (k (t)) > 0.
(2.12)
The fact that both of these factor prices are positive follows from Assumption 1, which
imposed that the first derivatives of F with respect to capital and labor are always positive
(with more general production functions, zero factor prices are possible over certain ranges).
Given this, we can divide both sides of (2.10) by L and obtain a simpler dierence
equation
k (t + 1) = sf (k (t)) + (1 ) k (t) .
(2.13)
Since this dierence equation is derived from (2.10), it also can be referred to as the equilibrium dierence equation of the Solow model, in that it describes the equilibrium behavior
of the key object of the model, the capital-labor ratio, and the other equilibrium quantities
can be obtained from the capital-labor ratio k (t).
At this point, we can also define a steady-state equilibrium for this model without technological progress and population growth.
Definition 3 A steady-state equilibrium without technological progress and population growth
is an equilibrium path in which k (t) = k for all t.
In other words, in the steady-state equilibrium the capital-labor ratio remains constant.
Most of the models we will analyze in this course will admit a steady state equilibrium, and
typically the economy will tend to this steady state equilibrium over time (but often never
reach it in finite time). This is also the case for this simple model.
30

This can be seen by plotting the dierence equation which governs the equilibrium behavior of this economy, (2.13). The intersection of the right hand side with the 45 line gives
the steady-state value of the capital-labor ratio k , which satisfies
f (k )
=
.
s
k
(2.14)
An alternative visual representation of the steady state is to view it as the intersection between a ray through the origin with slope (representing the function k) and the function
sf (k). The next figure shows this picture, which is also useful in seeing the level of consumption and investment in a single figure.
This establishes:
Proposition 2 Consider the basic Solow growth model and suppose that Assumptions 1 and
2 hold. Then there exists a unique steady state where the capital-labor ratio is equal to
31

k (0, ) and is given by (2.14), per capita output is given by
y = f (k )
(2.15)
c = (1 s) f (k ) .
(2.16)
and per capita consumption is given by
Proof.
The preceding argument establishes that (2.14) is a steady state, i.e., a zero
of the dierence equation (2.13). To establish existence, note that from Assumption 2,
limk0 f (k) /k = and limk f (k) /k = 0. Moreover, f (k) /k is continuous from Assumption 1, so there exists k such that (2.14) is satisfied. To see uniqueness, dierentiate
f (k) /k with respect to k, which gives
[f (k) /k] f 0 (k) k f (k)
w
=
< 0,
=
k
k2
k
(2.17)
where the last equality uses (2.12). Since f (k) /k is everywhere decreasing, there can only
exist a unique value k that satisfies (2.14).
Equation (2.15) and (2.16) then follow by definition.
So far the model is very parsimonious, and does not have many parameters. But what we
are most interested in is to understand how cross-country dierences in certain parameters
translate into dierences in growth rates or income levels. This will be done in the next
proposition. But before doing so, let us generalize the production function in one simple
way, and assume that
f (k) = af (k)
so that a is a shift parameter, with greater values corresponding to greater productivity of
factors. This type of productivity is referred to as Hicks-neutral as we will see below, but
32

for now it is just a convenient way of looking at the impact of productivity dierences across
countries. Since f (k) satisfies the regularity conditions imposed above, so does f (k).
Proposition 3 Suppose Assumptions 1 and 2 hold and f (k) = af (k). Denote the steadystate level of the capital-labor ratio by k (a, s, ) and the steady-state level of output by
y (a, s, ) when the underlying parameters are given by a, s and . Then we have
k (a, s, )
k (a, s, )
k (a, s, )
> 0,
> 0 and
<0
a
s
y (a, s, )
y (a, s, )
y (a, s, )
> 0,
> 0 and
< 0.
a
s
Proof. The proof follows immediately by writing

f (k )
= ,
k
as
which holds for an open set of values of k . Now apply the implicit function theorem to
obtain the results. For example,
k
(k )2
= 2 >0
s
sw
where w = f (k ) k f 0 (k ) > 0. The other results follow similarly.
Therefore, countries with higher savings rates and better technologies will have higher
capital-labor ratios and will be richer. Those with greater (technological) depreciation, will
tend to have lower capital-labor ratios and will be poorer. All of the results in Proposition 3
are intuitive, and start giving us a sense of some important determinants of the capital-labor
ratios and income levels across countries.
The same comparative statics with respect to a and immediately apply to c as well.
However, it is straightforward to see that c will not be monotonic in the savings rate (think,
33

for example, of the case where s = 1!). To obtain the steady state relationship between c
and s, let us suppress the other parameters and write
c (s) = (1 s) f (k (s)) .
= f (k (s)) k (s)
Now dierentiating this expression with respect to s (again using the implicit function theorem), we have
c (s)
k
= [f 0 (k (s)) ]
.
s
s
Since from Proposition 3 we have k /s > 0, consumption can only be maximized when
f 0 (k (s)) = . Moreover, when f 0 (k (s)) = , it can be verified that 2 c (s) /s2 < 0, so
f 0 (k (s)) = is indeed a local maximum. That f 0 (k (s)) = is also the global maximum
follows from the following observations: s [0, 1], we have k /s > 0 and moreover,
when s < sgold , f 0 (k (s)) > 0 by the concavity of f , so c (s) /s > 0 for all s < sgold ,
and by the converse argument, c (s) /s < 0 for all s > sgold . Therefore, only sgold
satisfies f 0 (k (s)) = and gives the unique global maximum of consumption per capita.
The relationship between consumption and the savings rate takes the form plotted in the
next figure.
Consequently, we have established:
Proposition 4 In the basic Solow growth model, the highest level of consumption is reached
for sgold , with the corresponding steady state capital level kgold
such that

f 0 kgold
= .
In other words, there exists a unique savings rate and the corresponding capital-labor
ratio which will maximize steady-state consumption. This is shown in the next figure with
34

the consumption-maximizing savings rate denoted by sgold and the corresponding consumption per capita by cgold :
Below this savings rate, the society has too low a capital-labor ratio to maximize consumption, and above this rate, the capital-labor ratio is too high, i.e., individuals are investing too much and not consuming enough. This is the essence of what people refer to as
dynamic ineciency, which we will encounter in greater detail in models of overlapping generations. However, recall that there is no explicit utility function here, so statements about
ineciency have to be considered with caution and skepticism. In fact, the reason why
such dynamic ineciency will not arise once we endogenize consumption-saving decisions of
individuals will be apparent to many of you already.
2.1.6
Transitional Dynamics in the Solow Model
Proposition 2 establishes a unique steady state equilibrium. Recall, however, that an equilibrium path does not refer simply to the steady state but to the entire path of capital stock,
35

output, consumption and factor prices. To determine what this equilibrium path looks like
we need to study the transitional dynamics of the equilibrium dierence equation (2.13)
starting from an arbitrary capital-labor ratio, k (0). Of special interest is the answer to the
question of whether the economy will tend to this steady state starting from such an arbitrary capital-labor ratio, and how it will behave along the transition path. It is important to
consider an arbitrary capital-labor ratio, since, as noted above, the total amount of capital
at the beginning of the economy, K (0), is taken as a state variable, while for now, the supply
of labor L is fixed. Therefore, at time t = 0, the economy starts with k (0) = K (0) /L as
its initial value and then follows the law of motion given by the dierence equation (2.13).
Thus the question is whether the dierence equation (2.13) will take us to the unique steady
state.
Before doing this, recall some definitions and key results from the theory of dynamical
systems. Consider the nonlinear system of autonomous dierence equations,
x (t + 1) = F (x (t)) ,
(2.18)
where x (t) Rn and F : Rn Rn . Let x be a zero (equilibrium) of this system, which

means a fixed point of the mapping F (), i.e., x = F (x ).
Definition 4 An equilibrium point x is (locally) asymptotically stable if there exists an
open set B (x ) 3 x such that for any solution {x (t)}

t=0 to (2.18) with x (0) B (x ), we
have x (t) x . Moreover, x is globally asymptotically stable if for all x (0) Rn , for
any solution {x (t)}

t=0 , we have x (t) x .
Theorem 2 Consider the following linear dierence equation system

x (t + 1) = Ax (t)
36
(2.19)

with initial value x (0), where x (t) Rn for all t and A is an n n matrix. Suppose that
all of the eigenvalues of A are strictly inside the unit circle (i.e., the absolute value of the
real parts of the eigenvalues is strictly less than 1). Then the dierence equation (2.19) is
globally asymptotically stable, in the sense that starting from any x (0) Rn , the unique
solution {x (t)}
t=0 satisfies x (t) x where x is the steady state (zero) of the dierence
equation given by Ax = x .
The proof of this theorem can be found in any textbook on dynamical systems, for example, David Luenberger Introduction to Dynamic Systems: Theory Models and Applications,
John Wiley & Sons, 1979, and a version of it for dierential equations is in Carl Simon and
Lawrence Bloom Mathematics for Economists, Norton, 1994.
Next let us return to be the nonlinear autonomous system (2.18). Unfortunately, much
less can be said about nonlinear systems, but the following is a standard local stability result.
Theorem 3 Consider the following nonlinear autonomous system
x (t + 1) = F [x (t)]
(2.20)
where F :Rn Rn and suppose that F is continuously dierentiable, with initial value x (0).
Let x be a zero of this system, i.e., F (x ) = x . Define
A =F (x ) ,
and suppose that all of the eigenvalues of A are strictly inside the unit circle. Then the
dierence equation (2.20) is locally asymptotically stable, in the sense that there exists an
open neighborhood of x , B (x ) Rn such that starting from any x (0) B (x ), we have
x (t) x .
37

Therefore, for nonlinear systems, we can have local stability results. An immediate
corollary of these results is:
Corollary 1 Let x (t) R, then the linear dierence equation x (t + 1) = ax (t) + b is
asymptotically stable (in the sense that x (t) x = b/ (1 a)) if |a| < 1. Moreover, let g :
R R be a continuous function, dierentiable at x where g (x ) = x . Then, the nonlinear
dierence equation x (t + 1) = g (x (t)) is locally asymptotically stable if |g 0 (x )| < 1.
Now let us apply this result to (2.13):
Proposition 5 Suppose that Assumptions 1 and 2 hold, then the equilibrium of the Solow
growth model described by the dierence equation (2.13) is asymptotically stable, and starting
from any k (0) > 0, k (t) k .
Proof. From (2.13), we have
k (t + 1) = sf (k (t)) + (1 ) k (t) ,
(2.21)
with a unique zero at k . Now recall that f () is concave from Assumption 1 and satisfies
f (0) = 0 from Assumption 2. For any strictly concave function, we have that
f (k) > f (0) + kf 0 (k) = kf 0 (k) ,
(2.22)
where the second line uses the fact that f (0) = 0. Now linearizing (2.21) around k , we have
k (t + 1) ' [sf 0 (k ) + (1 )] (k(t) k ).
Since from (2.14), k = sf (k ), (2.22) implies that = sf 0 (k ) /k > f 0 (k ), and thus
[sf 0 (k ) + (1 )] (0, 1), establishing local asymptotic stability for the Solow model from
Corollary 1.
38

Moreover, (2.21) also implies that for all k > k (0) > 0, we have k (t + 1) k (t) > 0
and for all k (0) > k , we have k (t + 1) k (t) < 0. Consequently, the solution to (2.21),
{k (t)}
t=0 always approaches k , thus must be globally stable.
This stability result is easier to see diagrammatically, which is shown in the next figure.
The following corollary is then immediate:
Corollary 2 Suppose that Assumptions 1 and 2 hold, and k (0) < k , then {w (t)}
t=0 is
an increasing sequence and {R (t)}

t=0 is a decreasing sequence. If k (0) > k , the opposite
results apply.
Intuitively, if the economy starts with too little capital relative to its labor supply, there
will be capital deepening (capital accumulation relative to labor), and as a result the marginal product of capital will fall given the diminishing returns to capital feature embedded in
Assumption 1, and the wage rate will increase. Conversely, if it starts with too much capital,
it will decumulate capital, and in the process the wage rate will decline and the rate of return
to capital will increase. The next figure shows this process diagrammatically, emphasizing
that the trade-o is between the replacement of the capital stock per eective labor due to
depreciation (and perhaps population growth and technological change) and the capital to
eective labor ratio:
39
Therefore, the Solow growth model has a number of nice properties; unique steady state,
asymptotic stability, and simple and intuitive comparative statics.
So far, it has no growth however. The steady state is the point at which there is no
growth in the capital-labor ratio, no more capital deepening, and no growth in income per
capita. The Solow model typically incorporates economic growth by allowing technological
change. Before doing this, however, it is useful to look at the mapping between discrete time
and continuous time.
40
2.2
2.2.1
The Solow Model in Continuous Time

From Dierence to Dierential Equations
Recall from the discussion above that the time periods could refer to days, weeks, months or
years. In some sense, the time unit is not important. This suggests that perhaps it may be
more convenient to look at dynamics by making the time unit as small as possible, i.e., by
going to continuous time. The continuous time setup in general has a number of advantages,
since some pathological results of discrete time disappear in continuous time (see Problem
Set 1). Moreover, especially in the presence of uncertainty, continuous time models have
more flexibility both in doing dynamics and for providing explicit form solutions. For us,
they are useful particularly because a lot of growth theory is cast in continuous time.
Let us start with a simple dierence equation
x (t + 1) x (t) = g (x (t)) .
(2.23)
This equation states that between time t and t + 1, the absolute growth in x is given by
g (x (t)). Let us now consider the following approximation
x (t + t) x (t) ' t g (x (t)) ,
for any t [0, 1]. When t = 0, this equation is just an identity. When t = 1, it gives
(2.23). In-between it is a linear approximation, which should not be too bad if the distance
between t and t + 1 is not very large, so that g (x) ' g (x (t)) for all x [x (t) , x (t + 1)]
(however, you should also convince yourself that this approximation could in fact be quite
bad if you take a very nonlinear function g, for which the behavior changes significantly
between x (t) and x (t + 1)). Now divide both sides of this equation by t, and take limits
41

to obtain
x (t + t) x (t)
= x (t) ' g (x (t)) ,
t0
t
lim
as a dierential equation representing the same dynamics as the dierence equation (2.23)
for the case in which the distance between t and t + 1 is small. Recall that here x (t) denotes
the time derivative x (t) /t.
2.2.2
The Fundamental Equation of the Solow Model in Continuous Time
We can now repeat all of the analysis so far using the continuous time representation. Nothing
has changed on the production side, so we continue to have (2.4) and (2.5) as the factor prices,
but now these refer to instantaneous rental rates (i.e., w (t) is the flow of wages that the
worker receives for an instant etc.).
Savings are again given by
S (t) = sY (t) ,
while consumption is given by (2.9) above.
Also, let us now introduce population growth into this model, and assume that the labor
force L (t) grows proportionally, i.e.,
L (t) = exp (nt) L (0) .
(2.24)
The purpose of doing so is that in many of the classical analyses of economic growth, population growth plays an important role, so it is useful to see how it aects things here. We
are not introducing technological progress yet, which will be done below.
42

Recall that
k (t)
K (t)
,
L (t)
which implies that

K (t)
k (t)
=
n.
k (t)
K (t)
The law of motion of the capital stock, from the limiting argument in the previous subsection,
is given by:
K (t) = sF [K (t) , L (t) , A(t)] K (t) .
Now using the definition of k (t) as the capital-labor ratio and the constant returns to scale
properties of the production function, we obtain the fundamental law of motion of the Solow
model in continuous time for the capital-labor ratio as
k (t) = sf (k (t)) (n + ) k (t) ,
(2.25)
Therefore we have:
Definition 5 In the basic Solow model in continuous time with population growth at the
rate n, no technological progress and an initial capital stock K (0), an equilibrium path
is a sequence of capital stocks, labor, output levels, consumption levels, wages and rental
rates [K (t) , L (t) , Y (t) , C (t) , w (t) , R (t)]
t=0 such that K (t) satisfies (2.25), L (t) satisfies
(2.24), Y (t) is given by (2.1), C (t) is given by (2.9), and w (t) and R (t) are given by (2.4)
and (2.5).
As before, a steady-state equilibrium involves k (t) remaining constant. As before, we
will refer to the steady-state equilibrium capital-labor ratio as k .
43

It is easy to verify that the equilibrium dierential equation (2.25) has a unique zero at
k , which is given by a slight modification of (2.14) above to incorporate population growth:
n+
f (k )
=
.
k
s
(2.26)
In other words, going from discrete to continuous time has not changed any of the basic
economic features of the model, and again the steady state can be plotted in the familiar
figure used above (now with the population growth rate featuring in there as well):
We immediately obtain:
Proposition 6 Consider the basic Solow growth model in continuous time and suppose that
Assumptions 1 and 2 hold. Then there exists a unique steady state equilibrium where the
capital-labor ratio is equal to k (0, ) and is given by (2.26), per capita output is given
44

by
y = f (k )
and per capita consumption is given by
c = (1 s) f (k ) .
Moreover, again let
f (k) = af (k) .
Then we have
Proposition 7 Suppose Assumptions 1 and 2 hold and f (k) = af (k). Denote the steadystate equilibrium level of the capital-labor ratio by k (a, s, , n) and the steady-state level of
output by y (a, s, , n) when the underlying parameters are given by a, s and . Then we
have
k (a, s, , n)
k (a, s, , n)
k (a, s, , n)
k (a, s, , n)
> 0,
> 0,
and
<0
a
s
n
y (a, s, , n)
y (a, s, , n)
y (a, s, , n)
y (a, s, , n)
> 0,
> 0,
and
< 0.
a
s
n
The new result relative to the earlier comparative static proposition is that now a higher
population growth rate, n, also reduces the capital-labor ratio and income per capita. The
reason for this is simple. A higher population growth rate means there is more labor to
use the existing amount of capital, which only accumulates slowly, and consequently the
equilibrium capital-labor ratio ends up lower. This result implies that countries with higher
population growth rates will have lower incomes per person (or per worker).
The stability analysis is also unchanged. To do this in detail, we simply need to remember
the equivalents of the above theorems for dierential equations. In particular we have:
45

Theorem 4 Consider the following linear dierential equation system
x (t) = Ax (t)
(2.27)
all of the eigenvalues of A have negative real parts. Then the dierential equation (2.27) is
asymptotically stable, in the sense that starting from any x (0) Rn , x (t) x where x is
the steady state (zero) of the system given by Ax = 0.
Theorem 5 Consider the following nonlinear autonomous dierential equation
x (t) = F [x (t)]
(2.28)
where F : Rn Rn and suppose that F is continuously dierentiable, with initial value

x (0). Let x be a zero of this system, i.e., F (x ) = 0. Define
A =F (x ) ,
and suppose that all of the eigenvalues of A have negative real parts. Then the dierential
equation (2.28) is locally asymptotically stable, in the sense that there exists an open neighborhood of x , B (x ) Rn such that starting from any x (0) B (x ), we have x (t) x .
Corollary 3 Let x (t) R, then the linear dierence equation x (t) = ax (t) is asymptotically
stable (in the sense that x (t) 0) if a < 0. Moreover, let g : R R be continuous and
dierentiable at x where g (x ) = 0. Then, the nonlinear dierential equation x (t) =
g (x (t)) is a locally asymptotically stable if g0 (x ) < 0.
Finally, with continuous time, we also have another useful theorem:
46

Theorem 6 Let g : R R be a continuous function, and suppose that there exists a unique
x such that g (x ) = 0. Moreover, suppose g (x) < 0 for all x > x and g (x) > 0 for all
x < x . Then the nonlinear dierential equation x (t) = g (x (t)) is a (globally) asymptotically
stable, and starting with any x (0), x (t) x .
Notice that the equivalent of Theorem 6 is not true in discrete time, and this will be
illustrated by one of the problems in Problem Set 1.
In view of these results, Proposition 5 immediately generalizes:
Proposition 8 Suppose that Assumptions 1 and 2 hold, then the basic Solow growth model
in continuous time with no population growth and technological change is asymptotically
stable, and starting from any k (0) > 0, k (t) k .
Proof. The proof of stability is now simpler and follows immediately from Theorem 6 by
noting that whenever k < k , sf (k)(n + ) k > 0 and whenever k > k , sf (k)(n + ) k <
0.
It is also useful at this point to look at one of the most common examples of the production function used in macroeconomics, the Cobb-Douglas production function:
Example 1 Supposed the aggregate production function is given by
F [K, L] = AK L1 with 0 < < 1.
You should remember from basic micro theory that the Cobb-Douglas production function is
extremely special, in particular because it has an elasticity of substitution equal to 1 between
capital and labor. This production function is very easy to work with, but it also has many
special features that are far from general. It is a good vehicle to illustrate issues, but you
should not think that all production functions are Cobb-Douglas!
47

One very important feature of the Cobb-Douglas production function is that factor shares
are constant. It can be immediately calculated that, with competitive factor markets, we have
the share of capital is constant irrespective of the capital-labor ratio:
R (t) K (t)
Y (t)
FK (K(t), L (t)) K (t)
=
Y (t)
A [K (t)]1 [L (t)]1 K (t)
=
A [K (t)] [L (t)]1
= .
K (t) =
Similarly, the share of labor is L (t) = 1 .

With this production function, we have that
f (k) = Ak ,
so the steady state is given again from (2.26) (with population growth at the rate n) as
A (k )1 =
or
k =
sA
n+
n+
s
1
1
which is a very nice and simple interpretable form for the steady-state capital-labor ratio.
Transitional dynamics are also straightforward in this case. In particular, we have:
k (t) = sA [k (t)] (n + ) k (t)
with initial condition k (0). To solve this equation, let x (t) k (t)1 , so the equilibrium law
of motion of the capital labor ratio can be written in terms of x (t) as
x (t) = (1 ) sA (1 ) (n + ) x (t) ,
48

which is a linear dierential equation, with a general solution
sA
sA
x (t) =
+ x (0)
exp ( (1 ) (n + ) t)
n+
n+
or in terms of the capital-labor ratio
1
1
sA
sA
1
k (t) =
+ [k (0)]
exp ( (1 ) (n + ) t)
.
n+
This solution illustrates that starting from any k (0), the equilibrium k (t) k = (sA/ (n + ))1/(1) ,
and in fact, the rate of adjustment is related to (1 ) (n + ). This is intuitive: a higher
implies less diminishing returns to capital, which slows down dynamics. Similarly a smaller
means less replacement of depreciated capital and a smaller n means slower population
growth, both of those slowing down the adjustment of capital per worker and thus transitional dynamics.
2.2.3
A First Look at Sustained Growth
Before discussing technological progress, it is useful to see how the model we have developed
so far can generate sustained growth (without technological progress). The Cobb-Douglas
example above already shows that when is close to 1, adjustment of the capital-labor
ratio back to its steady-state level can be very very slow. A very slow adjustment towards a
steady-state has the flavor of sustained growth rather than the system settling down to a
stationary point quickly.
In fact, the simplest model of sustained growth essentially takes = 1 in terms of the
Cobb-Douglas production function above. To do this, let us relax Assumptions 1 and 2
(which do not allow = 1), and suppose that
F [K (t) , L (t) , A (t)] = AK (t) ,
49
(2.29)

where A > 0 is a constant. This is the so-called AK model, and in its simplest form
output does not even depend on labor. The results I would like to highlight apply with a
more general constant returns to scale production function, for example,
F [K (t) , L (t) , A (t)] = AK (t) + BL (t) ,
(2.30)
but it is simpler to illustrate the main insights with (2.29), leaving the analysis of the richer
production function (2.30) to Problem Set 1.
With this production function, the fundamental law of motion of the capital stock is
given by (again with population growth given by (2.24)):
k (t)
= sA n.
k (t)
Therefore, if sA n > 0, there is sustained growth in the capital-labor ratio, and given
(2.29), there is sustained growth in income per capita. This immediately establishes the
following proposition:
Proposition 9 Consider the Solow growth model with the production function (2.29) and
suppose that sA n > 0. Then in equilibrium, there is sustained growth of income per
capita at the rate sA n. In particular, starting with a capital-labor ratio k (0) > 0, the
economy has
k (t) = exp ((sA n) t) k (0)
and
y (t) = exp ((sA n) t) Ak (0) .
This proposition not only establishes the possibility of endogenous growth, but also shows
that in this simplest form, there are no transitional dynamics. The economy always grows
50

at a constant rate sA n, irrespective of what level of capital-labor issue it starts from.
The next figure shows this equilibrium diagrammatically, denoting the growth rate of the
economy (and the capital-labor ratio by K ):
2.3
2.3.1
Solow Model with Technological Progress

Balanced Growth
The models analyzed so far did not feature technological progress. We now introduce changes
in A (t) to capture improvements in the technological know-how of the economy. There is
little doubt that what human societies know to produce, and how eciently they can produce
them, has progressed tremendously over the past 200 years, and even more tremendously
over the past 1000 or 10,000 years. An attractive way of introducing economic growth is
to allow technological progress. The question is how to do this. At some level we will see
that the production function F [K (t) , L (t) , A (t)] is too general to achieve our objective. In
51

particular, with this general structure, we may not have balanced growth.
By balanced growth, we mean a path of the economy in which, while income per capita
increases, the capital-labor ratio and the distribution of income between capital and labor
is roughly constant. These are sometimes referred to as the Kaldor facts. The next picture,
for example, shows the evolution of the share of capital in national income in the United
States.
100%
Labor and capital share in total value added
90%
80%
70%
60%
50%
Labor
Capital
40%
30%
20%
10%
1994
1989
1984
1979
1974
1969
1964
1959
1954
1949
1944
1939
1934
1929
0%
Capital and Labor Share in the U.S. GDP.
Despite fairly large fluctuations, there is no trend. This and the relative constancy of
capital-output ratios until the 1970s have made many economists prefer models with balanced
growth to those without. (Since the 1970s capital-output ratios may or may not be constant
depending on how you measure them). Also for future reference, note that the capital share
in national income is about 1/3, while the labor share is about 2/3. We are ignoring the share
of land here as we did in the analysis so far: land is not a major factor of production. This
52

is clearly not the case for the poor countries, and we should think about how incorporating
land into this picture changes the patterns. In any case, this pattern of factor distribution
of income, combined with economists desire to work with simple models, often makes them
choose an aggregate production function of the form AK 1/3 L2/3 as an approximation to
reality (especially since it ensures that factor shares are constant by construction). This
production function does a good job in certain circumstances, but of course it is very special.
For us, the most important characteristic of balanced growth is that it is much easier to
handle than non-balanced growth. So it is an advantage to have models featuring balanced
growth. In reality, growth has many non-balanced features. For example, the share of
dierent sectors changes systematically over the growth process, with agriculture shrinking,
manufacturing first increasing and then shrinking. Ultimately, we would like to have models
that combine certain quasi-balanced features with these types of structural transformations
embedded in them. These are interesting frontiers of research, but for this course, we will
largely focus on models with balanced growth.
2.3.2
Neutral Technological Progress
What are some convenient special forms of the general production function F [K (t) , L (t) , A (t)]?
First we could have
F [K (t) , L (t) , A (t)] = A (t) F [K (t) , L (t)] ,
so that technological progress simply multiplies output. This is known as Hicks-neutral
technological progress. Intuitively, in this case if we think of the isoquants in the L-K
space, technological progress simply corresponds to a relabeling of the isoquants (without
any change in their shape).
53

Another alternative is to have capital-augmenting or Solow-neutral technological progress,
in the form
F [K (t) , L (t) , A (t)] = F [A (t) K (t) , L (t)] .
This is referred to as capital-augmenting progress, because a higher A (t) is equivalent to
the economy having more capital. This type of technological progress corresponds to the
isoquants shifting with technological progress in a way that they have constant slope at a
given labor-output ratio.
Finally, we can have labor-augmenting or Harrod-neutral technological progress
F [K (t) , L (t) , A (t)] = F [K (t) , A (t) L (t)] ,
whereby an increase in technology increases output as if the economy had more labor. Equivalently, the slope of the isoquants are constant along rays with constant capital-output ratio.
Of course, in practice technological change can be a mixture of these, so we could have
a vector valued at index of technology and a production function that looks like
F [K (t) , L (t) , A (t)] = AH (t) F [AK (t) K (t) , AL (t) L (t)] .
It turns out that, although all of these forms of technological progress look equally
plausible ex ante, balanced growth forces us to one of these types of neutral technological
progress. In particular, balanced growth necessitates that all technological progress be labor
augmenting or Harrod-neutral. This is a very surprising result, and it is also somewhat
troubling, since we have no idea why technological progress should take this form. We now
state and prove the relevant theorem here.
54
2.3.3
The Steady-State Technological Progress Theorem
A version of the following theorem was first proved by Uzawa in 1961. For simplicity and
without loss of any generality, let us focus on continuous time models. The key elements of
balanced growth, as suggested by the discussion above, are the constancy of factor shares
and the constancy of the capital-output ratio, K (t) /Y (t). Since there is only labor and
capital in this model, by factor shares, we mean
L (t)
w (t) L (t)
R (t) K (t)
and K (t)
.
Y (t)
Y (t)
By Assumption 1 and Theorem 1, we have that L (t) + K (t) = 1.

The following theorem was first stated and proved by Uzawa. Here I present a version
of Uzawas proof along the lines of the more recent paper by Jones and Scrimgeour (2005),
and then also give a more heuristic proof.
Theorem 7 (Uzawa) Consider a growth model with a constant returns to scale aggregate
production function F [K (t) , L (t) , A (t)] and capital accumulation equation
K (t) = F [K (t) , L (t) , A (t)] C (t) K (t) .
Suppose also that there is a constant growth rate of population, i.e., L (t) = exp (nt) L (0). If a
balanced growth path exists with constant capital-output ratio and per capita growth rate, i.e.,
y (t) /y (t) = g > 0, and factor shares are nonzero and constant, i.e., K (t) = (x ) (0, 1)
as t , then asymptotically, the production function can be represented as:
Y (t) = F [K (t) , A (t) L (t)] ,
where *s denote asymptotic steady-state values, and
A (t)
= g.
A (t)
55

Proof. Let us look at the following derivative
v
=
log y (t)
log (k (t) /y (t))
1
log K(t)
log Y (t)
=
=
=
log F [K(t),L(t),A(t)]
log K(t)
FK [K(t),L(t),A(t)]K(t)
F [K(t),L(t),A(t)]
K (t)
,
1 K (t)
where the last line uses the definition of K (t).

Now let x (t) K (t) /Y (t), and by hypothesis asymptotically K (t) = (x ) where
x refers to the steady state value of K/Y , and the share of capital in national income is
potentially a function of this capital-output ratio. Therefore, asymptotically, we have the
following partial dierential equation:
log y (t)
(x )
=
.
log x (t)
1 (x )
Integrating both sides and noting that the right hand side does not depend on time, we have
Z
(x ) dx
log y (t) = a (t) +
1 (x ) x
for some function a (t), which only depends on time. Taking exponents, we have
y (t) = A (t) (x ) ,
R
(x ) dx
. Notice, also, for future use that

where A (t) exp (a (t)) and (x ) exp
1(x ) x
from the inverse function theorem, (x ) is invertible in the neighborhood of x , with inverse
denoted by 1 (y/A)
56

Since (x ) is constant and y (t) /y (t) = g, we must have A (t) exp (gt) A (0). Finally,
note that, by definition, k (t) = x (t) y (t), which implies asymptotically (in steady state)
that
y (t) 1 y (t)
k (t)
=
A (t)
A (t)
A (t)

y (t)
= f 1
A (t)
or

y (t)
k (t)
=f
,
A (t)
A (t)
and thus
K (t)
Y (t) = A (t) L (t) f
,
A (t) L (t)
which, under constant returns to scale, is another way of writing

Y (t) = F [K (t) , A (t) L (t)] ,
completing the proof.
For a more heuristic reasoning, consider production function of the form F [AK (t) K (t) , AL (t) L (t)].
Balanced growth requires factor shares to be constant, which can only be the case when total
capital inputs, AK (t) K (t), and total labor inputs, AL (t) L (t), grow at the same rate; otherwise, the share of either capital or labor will be increasing over time. Capital accumulation
implies that K (t) will grow at the same rate as AL (t) L (t). Thus balanced growth can only
be possible if AK (t) is asymptotically constant.
There is one exception to this, which is the Cobb-Douglas production function, where
we can have
Y (t) = [AK (t) K (t)] [AL (t)L(t)]1
57

and both AK (t) and AL (t) could grow asymptotically, while maintaining balanced growth.
However, notice that Theorem 7 does not require that Y (t) = F [K (t) , A (t) L (t)], but that
it should have a representation of the form Y (t) = F [K (t) , A (t) L (t)]. It is quite straightforward to see that in this Cobb-Douglas example we can define A (t) = [AK (t)]/(1) AL (t),
and the production function can be represented as
Y (t) = [K (t)] [A(t)L(t)]1 ,
in other words, technological change can be represented as purely labor augmenting, which
is what Theorem 7 requires.
Notice finally that this theorem does not state that technological change has to be labor
augmenting all the time. But it requires that it has to be labor augmenting asymptotically,
i.e., along the balanced growth path.
Based on these ideas, is possible to give the more heuristic proof of Theorem 7.
Alternative Proof of Theorem 7: Suppose that Y (t) = AH (t) F [AK (t) K (t) , AL (t) L (t)],
and since we are interested in asymptotic states, suppose that AH (t), AK (t) and AL (t) are
growing asymptotically at the rates gH , gK and gL . Normalize AH (0), AK (0) and AL (0)
to1. Then we can write that asymptotically
AL (t) L (t)
Y (t)
= exp ((gH + gK ) t) F 1,
K (t)
AK (t) K (t)
L (t)
.
exp ((gH + gK ) t) f exp ((gL gK ) t)
K (t)
Now we also have
Y (t)
K (t)
=s
,
K (t)
K (t)
and in steady state, according to the hypotheses of the theorem, we have Y (t) /K (t) constant, so K (t) /K (t) = g, i.e., capital grows at the same rate as total output. Combined
58

with the hypothesis that L (t) = exp (nt) L (0), this then implies (for L(0) normalized to 1),
Y (t)
= exp ((gH + gK ) t) f (exp ((gL gK + n g) t)) .
K (t)
But from this equation Y (t) /K (t) can remain constant only under the one of the two
following circumstances:
1. exp ((gH + gK ) t) is constant and exp ((gL gK + n g) t) is constant, i.e., gH = gK =
0, and g = gL + n.
2. exp ((gH + gK ) t) increases exactly at the same rate as f (exp ((gL gK + n g) t))
decreases, which is only possible when f (x) = x for some .Then, if we impose
Assumption 1 (or just CRS and positive marginal products) then we get (0, 1).
This completes the alternative proof of Theorem 7.
2.3.4
The Solow Growth Model with Technological Progress: Continuous Time
Now we are ready to analyze the Solow growth model with technological progress. I will
only present the analysis for continuous time (the discrete time case is equivalent). From
Theorem 7, we know that the production function must take the form
F [K (t) , A (t) L (t)] ,
with purely labor-augmenting technological progress asymptotically. For simplicity, let us
assume that it takes this form throughout. Moreover, suppose that there is technological
progress at the rate g, i.e.,
A (t)
= g,
A (t)
59
(2.31)

and population growth at the rate n,
L (t)
= n.
L (t)
Again using the constant savings rate we have
K (t) = sF [K (t) , A (t) L (t)] K (t) .
(2.32)
The simplest way of analyzing this economy is again to express everything in terms of
a normalized variable. Since eective units of labor are given by A (t) L (t), and F exhibits
constant returns to scale in its two arguments (by virtue of exhibiting constant returns to
scale in capital and labor), we can define
k (t)
K (t)
.
A (t) L (t)
(2.33)
Now dierentiating this expression with respect to time, we obtain

K (t)
k (t)
=
gn
k (t)
K (t)
The quantity of output per unit of eective labor can be written as
Y (t)
A (t) L (t)
K (t)
= F
,1
A (t) L (t)
f (k (t)) .
y (t)
Income per capita is y (t) Y (t) /L (t), i.e.,

y (t) = A (t) y (t) .
60
(2.34)

Now substituting for K (t) from (2.32) into (2.34), we have
k (t)
sF [K (t) , A (t) L (t)]
=
g n.
k (t)
K (t)
Now using (2.33),
k (t)
sf (k (t))
=
g n,
k (t)
k (t)
(2.35)
which is very similar to the law of motion of the capital-labor ratio in the continuous time
model, (2.25).
An equilibrium in this model is defined similarly to before. Consequently, we have:
Proposition 10 Consider the basic Solow growth model in continuous time, with Harrodneutral technological progress at the rate g and population growth at the rate n. Suppose that
Assumptions 1 and 2 hold, and define the eective capital-labor ratio as in (2.33). Then
there exists a unique steady state equilibrium where the eective capital-labor ratio is equal
to k (0, ) and is given by
f (k )
+g+n
=
.
k
s
Per capita output and consumption grow at the rate g.

The comparative static results are also similar to before, with the additional comparative
static with respect to the initial level of the labor-augmenting technology, A (0) (since the
level of technology later, A (t), is completely determined by A (0) given the assumption in
(2.31)).
Proposition 11 Suppose Assumptions 1 and 2 hold and let A (0) be the initial level of technology. Denote the balanced growth path level of eective capital-labor ratio by k (A (0) , s, , n)
61

and the level of income per capita by y (A (0) , s, , n, t) (the latter is a function of time since
it is growing over time). Then we have
k (A (0) , s, , n)
k (A (0) , s, , n)
k (A (0) , s, , n)
k (A (0) , s, , n)
= 0,
> 0,
< 0 and
< 0,
A (0)
s
n
and also
y (A (0) , s, , n, t)
y (A (0) , s, , n, t)
y (A (0) , s, , n, t)
y (A (0) , s, , n, t)
> 0,
> 0,
< 0 and
< 0,
A (0)
s
n
Finally, we also have very similar transitional dynamics.

Proposition 12 Suppose that Assumptions 1 and 2 hold, then the Solow growth model with
Harrod-neutral technological progress and population growth in continuous time is asymptotically stable, and starting from any k (0) > 0, the eective capital-labor ratio converges to a
steady-state value k , i.e., k (t) k .
Therefore, the comparative statics and dynamics are very similar to the model without
technological progress (and without population growth). The major dierence, of course, is
that now the model generates growth in income per capita, so can be mapped to the data
much better. However the disadvantage is that this growth is driven entirely exogenously.
The growth rate is exactly the same as the exogenous growth rate of the technology stock.
The model does not specify where this technology stock comes from and how fast it grows.
62
Chapter 3
The Solow Model and the Data
One of the important uses of the aggregate production function approach and the basic
Solow model is that they provide us with a simple vehicle to look at the data, both at
growth over time and income-level dierences (and growth rate dierences) across countries.
I start here with over-time changes, i.e., growth accounting, and then will move to the more
important application for the purposes of this course, which involves looking at cross-country
dierences.
3.1
Growth Accounting
Let us go back to the most general form of the aggregate production function given by (2.1),
whereby
Y (t) = F [K (t) , L (t) , A (t)] .
63

Dierentiate this function with respect to time on both sides to obtain (dropping timedependence)
FA A A FK K K
FL L L
Y
=
+
+
.
Y
Y A
Y K
Y L
and
Recalling the definition of factor shares above, and denoting g Y /Y , gK K/K
and also defining

gL L/L,
FA A A
Y A
as the contribution of technology to growth, we have

x = g K gK L gL .
This is the fundamental growth accounting equation. This equation lets us estimate the contribution of technological progress to economic growth from factor shares, output growth,
labor force growth and capital stock growth. This contribution from technological progress
is also referred to as Total Factor Productivity (TFP) or sometimes as Multi Factor Productivity.
In particular, denoting an estimate by ^, we have the estimate of TFP growth as:
x = g K gK L gL .
If we are interested in A/A

rather than x, we need to make further assumptions. For
example, if we assume that the production function takes the standard labor-augmenting
form
Y (t) = F [K (t) , A (t) L (t)] ,
then we have
1
A
[g K gK L gL ] ,
=
A L
64
but this equation is not particularly useful, since A/A

is not something we are inherently
interested in. Much more interesting is precisely x.
In continuous time, this equation is exact. In practice, of course, instead of instantaneous
changes, we look at changes over discrete time periods, for example over a year (or sometimes
with the better data, perhaps over a quarter or a month). In this case, there is a problem,
since over the time horizon in question, factor shares can change. It can be shown that this
could lead to serious biases. The most common way of dealing with this is to use factor
shares calculated as the average of the two points in time. Therefore in discrete time, for a
change between times t and t + 1, we have
K,t,t+1 gK,t,t+1
L,t,t+1 gL,t,t+1 ,
xt,t+1 = gt,t+1
where
K,t,t+1
K,t + K,t+1
2
and
L,t,t+1 is defined similarly.
Applying this method, Solow found that much of economic growth over the 20th century
was due to technological progress. This has been a landmark finding, focusing the attention
of economists on sources of technology dierences over time, across nations, across industries
and across firms.
Since then, many economists, most notably Dale Jorgensen, have attempted to reduce
the amount due to the residual technology by adjusting for the quality of labor and capital
inputs. This is still an active research area, partly because there are conceptual issues about
how far one should go in adjusting the quality of inputs. For example, better computers
can translate into more capital, reducing the TFP residual, but at the end of the day better
computers are a result of better technology. We will return to these issues again below.
65
3.2
Solow Model and Cross-Country Income Dierences
We are now in a position to take the basic Solow model to the data. The simplest way
of doing this is to follow the approach of Mankiw, Romer and Weil (1992). These authors
basically estimated a cross country regression inspired by the above model. However, a
basic estimation which does not take human capital into account proved to be inadequate.
Therefore, Mankiw, Romer and Weil (1992) used an augmented Solow also incorporating
human capital. I first develop this model briefly, and then look at the empirical evidence.
Since our purpose here is to look at cross-country income dierences, from the beginning, I
present the model for a cross-section of countries.
Here already there is a major (and at some level a very problematic assumption), adopted
by many authors, among them Mankiw, Romer and Weil (1992), Barro (1991) and much of
Barro and Sala-i-Martin (2004), which is that the world consists of a cross-section of countries
which do not interact. In other words, these countries do not trade financial assets, goods,
or there is no slow diusion of technology across these countries. These countries inhabit the
world, but they are all islands onto themselves. I start with this case of no interdependence,
but interdependences arising from technology flows and international trade will be discussed
below.
3.2.1
Solow Model with Human Capital
Suppose that output in country j is given by

Yj = Kj Hj (Aj Lj )1 ,
66
(3.1)

where I have dropped time to simplify notation. We have , 0, + 1, j denotes
country, Y is total output, H is human capital, L is labor, A is labor-augmenting technical
change.
The important assumption here is that human capital is taken to be a dierent factor
of production rather than simply augmenting labor (i.e., equation (3.1) rather than Yj =
Kj1 (Aj Hj ) with Hj interpreted as eciency units of labor, see (3.6) below). In fact, this
latter approach is much more in line with the Becker model of human capital, and writing
the model in this way is not without loss of any generality (as we will see below). But before
seeing why this is, we should solve the model.
First, we can use the usual trick of the neoclassical growth model of transforming variables to per capita eective units:
kj
Kj
Hj
and hj
,
Aj Lj
Aj Lj
and define yj Yj /Lj as output per worker. Then

yj = Aj kj hj .
(3.2)
Suppose also that population grows at a constant rate nj in country j.

This model cannot be easily taken to the data because we have no idea what Aj is.
A key assumption of Mankiw, Romer and Weil (1992), which enables them to take the
augmented-Solow model to the data is the following:
Common technology advances assumption: Aj (t) = Aj exp (gt) .
That is, countries may dier according to their technology level, but they share the same
common technology growth rate, g. This is in part motivated by the relative stability of
67

the world income distribution discussed earlier. In the absence of this assumption, countries
would grow at dierent rates, and the world income distribution would become more and
more dispersed.
Next, consider constant savings rates for human and physical capital, as a direct generalization of the standard Solow model:
K j = skj Yj k Kj and H j = shj Yj h Hj
where s denote constant depreciation rates. Then

kj = sk yj nj + g + k kj
j
A
j
yj
nj + g + h hj .
h j = shj
Aj
(3.3)
(3.4)
As in our baseline models, in steady state, both kj and hj have to be constant. Thus
setting k j = 0 and h j = 0 in (3.3) and (3.4) and solving yields the following steady-state
values of physical capital and human capital ratios to eective labor:
kj
hj
skj
nj + g + k
skj
nj + g + k
!1
!
shj
nj + g + h
shj
nj + g +
1
! 1
1
!1 1
.
h
Now substituting back into (3.2) and taking logs, we obtain

!
!
shj
skj
+
ln yj = ln Aj + gt +
ln
ln
1
1
nj + g + h
nj + g + k
(3.5)
This is an equation which can be estimated using cross-country data if we have measures of
shj . In addition, we can use investment rates (investments/GDP) for skj , population growth
68

rates nj , and the standard depreciation rates for k . This is what Mankiw, Romer and Weil
do (or they estimate a version of this with h = k ). They approximate shj using the fraction
of the working age population enrolled in school [... is this a good proxy for investment in
human capital?...].
However, with all of these assumptions, equation (3.5) can still not be estimated, because
the term ln Aj is unobserved to the econometrician, and could be correlated with all of
the other right hand side variables. Therefore implicitly, Mankiw, Romer and Weil make
another crucial assumption, considerably stronger than the common technology advances
assumption:
Orthogonal technology assumption: Aj = j A where j is orthogonal to all other country

variables.
With these assumptions, Mankiw, Romer and Weil estimate equation (3.5). The estimation is a success for the augmented-Solow model. If human capital is not included, the
fit is not very good and the estimates are not reasonable. This is shown in the next table.
69
Without human capital, the coecient in front of the investment/GDP ratio should be
/ (1 ), thus the estimate suggests ' 0.6, which is far too high bearing in mind that
given the factor distribution of income we expect the exponent of capital in the production
function to be closer to 1/3.
But for the augmented model with human capital, the fit is very good as shown in the
next table. Now the parameter estimates imply 1/3, 1/3 and R2 .78.
70
At face value, these results provide strong support for the augmented Solow model. The
estimate of is consistent with a capital share of one-third in national income, and the R2
implies that almost 80 percent of the dierences in income per capita can be explained by
investment decisions (human and physical capital dierences).
3.2.2
Problems with the Mankiw, Romer and Weil Approach
But there are two major (and related) problems with this approach:
1. The orthogonal technology assumption is too strong. When Aj varies across countries,
71

it will plausibly be correlated with our measures of shj and skj , so there will be an
omitted variable bias leading to overestimates of and as well as an exaggeration of
the R2 .
2. The coecient on shj is too large. To see this, recall that Mankiw, Romer and Weil
use the fraction of the working age population enrolled in school. This variable ranges
from 0.4 to over 12 in the sample of countries used for this regression. Their estimates
therefore imply that a country with approximately 12 for this variable should have
income per capita about 9 times that of a country with shj = 1! (This is holding all
other variables constant).
More explicitly, the predicted log dierence in incomes between these two countries is
(ln 12 ln (0.4)) 2.24,

1
and exp (2.24) 1 9 times. In practice, the dierence in average years of schooling
between any two countries over this time period is less than 12. The labor literature
suggests that additional years of schooling is associated with a 6 to 10 percent increase
in individual earnings (e.g., consider the individual level Mincer regression
ln wi = Xi0 + Ei
where w is wage income, Xi is a set of demographic controls, and E is years of schooling.
Here is estimated to be between 0.06 and 0.1). This implies that a worker with one
more year of schooling is typically about 6 to 10 percent more productive. So in
the absence of human capital externalities, a country with 12 more years of average
schooling should be at most twice as rich instead of 9 times as rich! Even allowing for
human capital externalities, one would need very very large human capital externalities
72

in order to get this type of results (existing estimates of human capital externalities,
for example, Acemoglu and Angrist, 2000, show that they are rather small).
To understand this last point, consider a simple competitive economy. Suppose that each
firm has a production function
y = k 1 (Ah)
Firms face cost of capital r, and human capital is a function of schooling, with the standard
exponential form hi = exp (Ei ). First-order condition from firm maximization gives r =
(1 ) (Ah/k) . In other words, all workers, irrespective of their level of schooling, will work
exactly at the same physical to human capital ratio. Wages are equal to marginal product,
so
w (h) = (1 )(1)/ Ar(1)/ h
So wages are linear in human capital due to constant returns to scale. Taking logs of this
equation, we end up with the standard log linear wage equation
ln wi = cst + Ei ,
with the slope coecient on education measuring the relationship between education and
human capital.
Now consider two economies with the same technology, the same interest rate (for example, open capital accounts), the same technology, but in one economy all workers have E1
years of schooling, while in the other, they have E2 > E1 schooling. How large should the
income gap between these two countries be?
Using the fact that with the same interest rate, both economies will function at the same
physical to human capital ratio, we immediately obtain
Yi = A (1 )(1)/ r(1)/ exp (Ei ) ,
73

Or, taking logs, we obtain that log Y2 log Y1 = (E2 E1 ). So if one economy has on
average one year more of schooling, and is about 6 percent, its income should be 6 percent
higher.
In the data, there are much larger dierences. For example a cross-country regression of
income per capita on average years of schooling in 1985 gives
log Y
= 0.313
(0.027)
In other words, the correlation between income and schooling is too strong relative to
what we should expect on the basis of micro evidence. In particular, the eect of schooling
on income is much larger than the 6-10 percent dierence expected.
This result is not simply explained by the fact that interest rates vary across countries.
Notice that we can write r = (1 ) Y /K, so including the (log) capital output ratio would
be one way to control for interest rate dierences. In this regression, the log capital-output
should have a coecient of (1 ) /, approximately 0.5 taking as 2/3. Running this
regression with 1985 data, we obtain
log Y
= 0.266
+ 0.408
(0.033)
(0.178)
log
K
Y
So, there is still a very large eect of education on income, and the quantitative eect of
capital (as a proxy for interest rates) is plausible.
This relationship between education and income may reflect human capital externalities.
For example, we might have the productivity term, A, as a function of average human capital
in the economy. In this case, the rate of return to human capital in the Mincer regressions
would only reflect the private returnthat is, the increase in the individuals wage as a
74

function of his own human capital, holding average human capital constant. But regressions
using aggregate data would capture the total eect of an increase in human capital on
incomethat is, the private plus the external eect of schooling.
Therefore, one possibility is that there are large human capital externalities. However,
as noted above, existing evidence indicates that human capital externalities are limited. The
alternative interpretation of the patterns is that there are dierences in technologies, Aj s,
and these are correlated with human capital dierences. Such a pattern of correlation may
arise because human capital responds to technology, or because some third factor aects
both human capital and technology.
3.2.3
The Macro Mincer Approach (Bils-Klenow-Rodriguez-HallJones)
A related approach is to use calibration/levels accounting rather than regression analysis and
make use of the findings of Mincer (micro wage) regressions. This is the approach first taken
by Bils and Klenow, and then by Klenow and Rodriguez and Hall and Jones. The advantage
of the calibration approach is that the omitted variable bias underlying the estimates of
Mankiw, Romer and Weil will be less important (since microlevel evidence is being used to
anchor the contribution of human capital). The disadvantage is that certain assumptions on
functional forms have to be taken much more seriously, and we explicitly have to assume no
human capital externalities.
Here let me follow Hall and Jones. Consider the following production function
Yj = Kj1 (Aj Hj )
(3.6)
with Hj interpreted as eciency units of labor. Assume the following Mincer-type relation75

ship between human capital and education
Hj =
X
E
exp { (E)} Lj (E)
where (E) is the rate of return to E years of schooling and Lj (E) is the number of individuals in country j with E years of schooling. We can use dierent values for (E) and
construct alternative estimates of Hj . Hall and Jones (1999) use a piecewise linear specification for (E) based on work by Psacharapoulos from less developed countries (showing
returns to earlier years of schooling that are greater than to higher education). Once we have
a series for Hj and one for Kj , which can be constructed using standard perpetual inventory
methods, we can construct predicted incomes, for example, as
2/3
1/3
Yj = Kj AUt S Hj
and compare these predicted incomes with actual incomes.
Alternatively, we could back out country-specific technology terms (relative to the U.S.)
as
Ajt
=
AUt S
Ytj
YtU S
!3/2
KtU S
Ktj
1/2
HtU S
Htj
Hall and Jones perform this exercise using output per worker rather than income per
capita. They find:
1. Dierences in physical and human capital still matter a lot, accounting for as much as
50 percent of the actual dierences in output per worker.
2. But there are also significant productivity dierences.
76

The next figure and the table show a summary of their results:
77
The conclusion of this calibration exercise is therefore very similar to the one that followed
from the regression analysis presented in the previous section.
Naturally, some of the assumptions of these calibration exercise can be relaxed. For
example instead of assuming at Cobb-Douglas production function, one could do levels
accounting. Essentially, ranked the countries according to their capital-labor ratio (or
capital-output ratio), and then use the equivalent of the growth accounting equation above,
in particular, we can write
xj,j+1 = gj,j+1
K,j,j+1 gK,j,j+1
Lj,j+1 gL,j,j+1 ,
78

where j stands for country, thus gK,j,j+1 is the proportional dierence in capital stock between
countries j and j + 1, gL,j,j+1 is a proportional dierence of labor supply between the two
countries, and xj,j+1 is the TFP dierence. With this method, and taking one of the countries,
for example the United States, as the base country, we can calculate relative technology
dierences across countries. Of course, for this we need to have good measures of factor
shares in dierent countries which are not always available.
3.3
An Alternative Approach to Estimating Productivity Dierences (Trefler)
In the above approach, productivity/technology dierences are obtained as residuals from

a calibration exercise, so we have to trust the functional form assumptions used in this
exercise. An alternative is to use additional data. This is what Trefler does to test an
augmented version of the Heckscher-Ohlin approach to international trade. Although Trefler
does not emphasize the implications of his findings for productivity dierences, a byproduct
of his analysis is a series of estimates for dierences in factor productivities across countries.
Trefler starts from the standard Heckscher-Ohlin model of international trade, but allows
for factor-specific productivity dierences across countries. Other than these factor-specific
dierences, all countries share the same technology (i.e., there are no dierences in industry technologies) and share the same homothetic preferences (in particular, they allocate
consumption expenditures across goods in the same manner).
It is important that technology dierences take the form of factor productivity dierences. In particular, one unit of labor (or one college graduate) in the U.S. could be more
productive than one unit of labor (or one college graduate) in Nigeria. The same applies to
79

capital. This specification of course is more general than the production function in (3.1),
since capital-augmenting technology dierences are allowed and the elasticity of substitution
between dierent factors is not assumed to be equal to 1.
A standard equation in international trade is that, in the absence of any trading frictions
and with identical (or homothetic) preferences, the net export of factor f embedded in the
exports of country j, Xjf , is
Xjf
fj Vjf
sj
N
X
fi Vif
(3.7)
i=1
where Vjf is that endowment of factor f in country j, fj is the factor productivity of factor f
in country j, and sj is the share of country j in world consumption (this uses the assumption
that all countries have the same homothetic preferences). N is the total number of countries.
Given estimates of the net export of factor contents, the Xjf s, equation (3.7) solves for
a unique sequence of fj s taking one of the countries as the base. So from this equation we
can obtain an estimate of the dierences in factor productivities. At this level, this may be
viewed simply as an untested strong hypothesis.
The major contribution of Treflers paper is to note that if there is factor price equalization, we should also have
wjf
fj
wjf0
fj0
(3.8)
for any pair of countries, j and j 0 , where wjf is the price of factor f in country j. With data
on factor prices, we can therefore construct alternative series for fj s. It turns out that the
series implied by (3.7) and (3.8) are very similar, so there appears to be some validity to this
approach. The following figure shows his estimates:
80
Given this validation, we can presume that there is some information in the numbers
that Trefler obtains. These numbers imply that there are very large dierences in labor
productivity, and some substantial, but much smaller dierences in capital productivity. For
example, labor in Pakistan is 1/25th as productive as labor in the United States. In contrast,
capital productivity dierences are much more limited than labor productivity dierences.
For example, capital in Pakistan is only half as productive as capital in the United States.
81
82
Chapter 4
Fundamental Determinants of
Dierences in Income
4.1
From Proximate to Fundamental Causes
The use of the Solow model and the production function approach illustrated how cross
country income dierences can be understood as resulting from physical capital dierences,
human capital dierences and technology dierences. These technology dierences, themselves, may represent actual dierences in the technologies used by countries, or other eciency dierences in the use of the factors. At this level, the framework we have does a very
good job of helping us understand the proximate causes of income dierences. The same
procedure also helps us understand the proximate causes of the process of economic growth.
However, the observation that a country is poorer than another because it has worse
technology, less physical capital and less human capital immediately poses the next question:
why does it have worse technology, less physical capital, less human capital? This question
83

is, in some sense, about the fundamental causes of income per capita (and growth) dierences
across countries.
Growth theory is useful in highlighting the proximate causes, in providing us with a
framework for thinking about the fundamental causes, and also in clarifying the mechanics of the process of growth, so that we can more carefully evaluate dierent theories and
approaches. But we have to take this additional step of looking for fundamental causes,
otherwise what we have learned will be only partial.
4.2
Hypotheses
Why do some countries invest more in physical and human capital and possess better technologies? There are four sets of broad hypotheses:
1. Luck: some countries just turned out to be lucky. It is dicult to operationalize this
approach, and at some level, it is quite similar to the other hypotheses, but less specific
(one way of operationalizing it may be by using the multiple equilibrium models we
will discuss below).
A version of this hypothesis where such dierences are transitory is clearly not supported by the evidence presented so far, which points out to very persistent dierences
over long periods.
A version of this hypothesis where a small dierence caused by luck may lead to large
persistent dierences is also dicult to reconcile with the data given the reversal documented above. So I will place less emphasis on the importance of luck. Nevertheless,
some of the theories presented below will show how small dierences in initial condi84

tions can lead to large ultimate dierences.
2. Geography: This view is becoming very popular recently. It claims that dierences
in economic performance reflect, to a large extent, dierences in geographic, climatic
and ecological characteristics across countries. The most common is the view that
climate has a direct eect on income through its influence on work eort. This idea
dates back to Machiavelli and Montesquieu. Alfred Marshall (1890) similarly wrote:
vigor depends partly on race qualities: but these, so far as they can be explained
at all, seem to be chiefly due to climate. Gunnar Myrdal (1968): climate exerts
everywhere a powerful influence on all forms of life, and that serious study of the
problems of underdevelopment... should take into account the climate and its impacts
on soil, vegetation, animals, humans and physical assets in short, on living conditions
in economic development.
The recent bestseller by Jared Diamond, Guns, Germs and Steel, suggests that the
timing of the Neolithic revolution has had a long lasting eect by determining which
societies were the first ones to develop strong armies, and technology. For example,
he states that: ...proximate factors behind Europes conquest of the Americas were
the dierences in all aspects of technology. These dierences stemmed ultimately from
Eurasias much longer history of densely populated... societies dependent on food production (1997, p. 358). Diamond argues that dierences in the nature and history of
food production, in turn, are due to the types of crops, domesticated animals, and the
axis of agricultural technology diusion in dierent continents, all of which are geographically determined characteristics. In the economics circles, Je Sachs has been
pushing for this view. He argues that Certain parts of the world are geographically
85

favored. Geographical advantages might include access to key natural resources, access
to the coastline and sea navigable rivers, proximity to other successful economies,
advantageous conditions for agriculture, advantageous conditions for human health.
(2000, p. 30). He further suggests that Tropical agriculture faces several problems
that lead to reduced productivity of perennial crops in general and of staple food crops
in particular (2000, p. 32), and that The burden of infectious disease is similarly
higher in the tropics than in the temperate zones (2000, p. 32). Finally, Sachs argues
that the greater population in the temperate areas over the past centuries led to more
rapid advances in technologies appropriate for these areas relative to technologies necessary for development in the tropics (2001, p. 3 and 2000, pp. 33-34). The following
figure shows the geographical distribution of income per capita, which is consistent
with some geographic factors, such as climate) having an eect on the long-run distribution of income across countries:
86
3. Institutions: according to this view, dierences in economic performance largely

reflect dierences in the organization of society. Societies that provide incentives and
opportunities for investment will be richer than those that fail to do so. There are many
versions of this hypothesis, some of them suggesting that institutions that support
property rights and rule of law are important, others suggest that limited government,
or equal opportunity, or specific government policies are important for investment and
eciency (of course, whether these policies are adopted is in turn determined by other
factors).
4. Culture and social capital: this view instead emphasizes whether societies are able
to engender the values conducive to entrepreneurship or cooperation among agents.
87

Popular versions of this story include the thesis by Max Weber on the importance
of religion for capitalism, and the recent work by Robert Putnam on social capital
and co-operation (which is in turn related to some early work by Banfield on lack of
corporation in the South of Italy).
There are two major dierences between the institutions view and the culture view.
First, in the institutions view, it is the social organization of the society, which, at least
in theory, is changeable, that is responsible for prosperity. Instead, in the culture view,
culture or social capital, to a first approximation, cannot be changed. Second, the
institutions view emphasizes much more the importance of conflict between dierent
groups or individuals as a determinant of social outcomes, whereas there is a more cooperative undertone to the culture view (especially in the social capital versions of this
view). Finally, many versions of the culture view, such as those of Max Weber or David
Landes, emphasize religion or other predetermined factors as crucial determinants of
individuals approach to life and economic success.
Can we say anything about the relative importance of geography, institutions and culture? Measures of each are strongly correlated with income per capita or other determinants
of income. This is borne out both by growth regressions, and level regressions.
For example, returning to growth regressions of the type (1.2), the variables in X that
enter significantly can be interpreted as determinants of cross-country dierences in growth.
There is a very large literature on regressions of this sort. These regression analyses find a
variety of variables to be important in explaining growth. First, investment rates in physical
and human capital are found to be important. But, this does not inform us much about
the ultimate sources of dierences in economic performance, since dierences in physical and
88

human capital investments must be in turn caused by other factors.
Among these other factors, openness, the role of government, institutions, geography,
political instability, share of natural resources, financial development, and demographics are
typically included in these types of empirical analyses and found to be important. The
big problem with all this literature is that there is very little attempt to formally establish
causality. Much of the correlation may be no more than just thatspurious correlation,
reflecting the importance of other omitted factors.
This lack of causality could be important especially when we think about the broad
hypotheses outlined above. In particular, institutions and culture are endogenous, so one
might be tempted to think that the correlation between these variables and income is more
likely to spurious, and therefore give more importance to geography. This reasoning is invalid,
since the particular historical development of the world economy may have brought about a
correlation between institutional development and geography.
4.3
Europes Expansion and Colonial Origins of Institutions
As discussed above, in Acemoglu, Johnson and Robinson (2002), we looked at the horserace
between geography and institutions. The geography explanation predicts persistence in
income, since the geographic, ecological and climatic factors that should matter are changing
only little over periods as long as 500 years. Although the institutions view also suggests
persistence, a major shock could disrupt persistence, or even create a reversal.
In this context, the expansion of European overseas empire provides a natural experiment. Europeans aected the institutions of many societies through their colonization.
89

More strikingly, it turns out that Europeans introduced worse institutionsin the sense of
institutions discouraging investment in previously prosperous places. Therefore, while the
geography view predicts persistence between 1500 and today among the former European
colonies, the institutions view suggests the possibility of a reversal. The data, as discussed
above, strongly suggest that there was a reversal in relative rankings across this set of countries.
This discussion suggests that geographic or climactic dierences across countries are not
of first-order importance in shaping dierences in income we observe today. At the very
least, these geographic factors appear to be less important than other factors.
Nevertheless, this observation does not give us a direct estimate of the eect of institutions/social organization on economic performance.
To go beyond a simple horserace of geography versus institutions, and to estimate the
impact of institutions on economic performance, we need a source of exogenous variation
in institutions. In Acemoglu, Johnson and Robinson (2001), we proposed a theory of institutional dierences among countries colonized by Europeans, and exploited this theory to
derive a possible source of exogenous variation. Our theory rests on three premises:
1. There were dierent types of colonization policies which created dierent sets of institutions. At one extreme, European powers set up extractive states, exemplified
by the Belgian colonization of the Congo. These institutions did not introduce much
protection for private property, nor did they provide checks and balances against government expropriation. At the other extreme, many Europeans migrated and settled
in a number of colonies. The settlers in many areas tried to replicate European institutions, with strong emphasis on private property and checks against government
power. Primary examples of this include Australia, New Zealand, Canada, and the
90

United States.
2. The colonization strategy was influenced by the feasibility of settlements. In places

where the disease environment was not favorable to European settlement, extractive
policies were more likely.
3. The colonial state and institutions, at least to some extent, persisted.
Based on these three premises, we use the mortality rates expected by the first European
settlers in the colonies as an instrument for current institutions in these countries.
Summarizing this schematically:
(potential) settler
mortality
settlements
early
institutions
current
institutions
current
performance
The results show a large eect of institutions on income, and generate no evidence that
geography matters. The following two figures summarize most of the findings. The first
shows the cross-sectional relationship between income per capita and a measure of economic
institutions, protection against expropriation risk. This is one of many potential variables
capturing the institutional features of a country that can be used. Its advantage is that it is
directly about protection of property rights, thus intimately related to economic incentives
that are highlighted by the institutions approach.
91
HKG
10
ARG
PAN
USA
SGP
AUS CAN
NZL
MLT
BHS CHL
VEN
URY
MEX GAB
MYS
ZAF
CRI COL
TTO BRA
ECU
PER DOMTUN
DZA
PRY
JAM
EGYMAR
BOLGUY
AGO
LKA
HND
NIC
CMR
GIN CIV
COG
SEN
GHA
PAK
SDN
VNM TGO
HTI
KEN
UGA
BGD NGA
ZAR
BFA
MDG
NER
MLI
GTM
SLV
SLE
ETH
IDN
IND
GMB
TZA
4
4
6
8
Average Expropriation Risk 1985-95
10
The second shows the first-stage relationship between log (potential) settler mortality
and protection against expropriation risk (so that higher scores correspond to better protection against expropriation by government or elites, or generally to better property rights
protection), and the third shows the reduced form between income per capita and settler
mortality. The latter two figures together give the two-stage least squares estimate of the
eect of broad economic institutions on long-run income per capita dierences.
92
10
NZL
USA
CAN
SGP
Average Expropriation Risk 1985-95
AUS
IND
HKG
MYS
MLT
ZAF
PAK
GUY
ETH
GMB
BRA
CHL
IDN
BHS
MEX
TTO
COL
VEN
MAR
JAM
CRI
URY
PRY
EGY
ECU
DZA
TUN
VNM
ARG
DOM
LKA
KEN
SEN
PAN
PER
BOL
HND
NIC
BGD
GTM
SLV
GAB
CIV
TGO
TZA
CMR
GIN
GHA
SLE
NGA
AGO
NER
COG
UGA
BFA
MDG
SDN
MLI
HTI
ZAR
10
4
6
Log of Settler Mortality
USA
SGP
HKG
CAN
AUS
NZL
MLT
MYS
ZAF
FJI
CHL
BHS
BRB
ARG
VEN
URY
MEX
GAB
PAN
COL
CRI
TTO
BRA
TUN
ECU
PER
DZA DOM
BLZ
GTM
PRY JAM
IDN
MAR
EGY
SLV
BOL
GUY
AGO
LKA
HND
NIC
CMR
GIN CIV
MRT
SEN COG
GHA
PAK IND
SDN VNM
TGO
CAF
HTI
BEN
LAO
KEN
UGA
BGD
ZAR
BFA
TCD
NERMDG
BDI
RWA
TZA
SLE
ETH
MUS
GMB
NGA
MLI
4
2
4
6
Log of Settler Mortality
Acemoglu, Johnson and Robinson (2001) conduct a variety of checks to show that this
relationship is robust, and likely due to the institutional channel (but like all instrumental
variable strategies, there is always the possibility that the instrument is not excludable).
93

Even taking this evidence at face value, can we distinguish between culture and institutions? Some of the results in Acemoglu, Johnson and Robinson (2001), which control for
proxies of cultural dierences, religion and the identity of colonial power, suggest that it is
political and economic institutions not culture that matter more. But this is not conclusive.
Future and smarter work is needed to make progress in distinguishing between culturebased and institutional explanations. For the rest of the course, we will look deeper into
a range of models in order to understand how dierences in various policies, technologies,
preferences and institutions translate into growth rates and cross-country dierences. Thus
the rest of the course will be about the mechanics of economic growth. But throughout,
you may want to bear in mind how these mechanics may relate to fundamental causes as
discussed in this chapter.
94
Part II
Neoclassical Growth
95

In this part, we discuss the basic neoclassical approaches to economic growth, focusing on models with exogenous technological progress. The most important advance over
what we have seen so far is that these models explicitly incorporate consumer preferences
and consumer behavior, so we can make meaningful statements about savings rates being
endogenous and also think of welfare of consumers.
97
98
Chapter 5
Towards Neoclassical Growth
At this point, let us take a step back. The entire Solow growth model was predicated on a
constant savings rate. Instead, it would be much more satisfactory to specify the preference
orderings of individuals as in standard general equilibrium theory and go from there. To
prepare for this, let us consider an economy consisting of a unit measure of infinitely-lived
households. These households can be truly infinitely lived, or could consist of overlapping
generations with full (or partial) altruism linking generations within the household. Then
the problem would be one in which each household i has an instantaneous utility function
given by
ui (ci (t))
where ui : R R is increasing and concave and ci (t) is the consumption of household i
this means that the individual does not derive any utility from the consumption of other
households, so consumption externalities are ruled out. Throughout, we will assume that
individuals discount the future proportionally (also referred to as exponentially), so that
99

in discrete time and ignoring uncertainty, their preferences at time t = 0 are given by
ti ui (ci (t)) ,
t=0
where i (0, 1) is the discount factor of household i. In addition, we can have dierences in
households income processes, for example, for each household we could have eective labor
endowments of {hi (t)}

t=0 , thus a sequence of labor income of {w (t) hi (t)}t=0 where w (t) is
the equilibrium wage rate per unit of eective labor.

Unfortunately, at this level of generality, this problem is too hard. Even though we may
be able to establish some existence results, it would be impossible to go beyond that. To
avoid the complexities involved in this general formulation, the standard approach in macroeconomics and economic growth is to assume the existence of a representative consumer.
5.1
Representative Consumer
Instead of the more general framework mentioned above, we will look at economies that admit
a representative consumer. What this means is that we will think that the preference side of
the economy can be represented as if there were a single consumer making the consumption
and saving decisions (and labor supply decisions when these are endogenized).
One way of having a representative consumer is to assume that each household has the
same utility function
u (ci (t))
where u : R R is increasing and concave and ci (t) is the consumption of household i,
and also the same discount factor , and the same sequence of eective labor endowments
{h (t)}
t=0 . The advantage of this approach is that the economy indeed has a representative
100

consumer, so the representative consumer has a normative meaning as well as a positive
meaning. In other words, we can represent the savings and consumption decisions as if they
are coming from a representative consumer, and we can use the same preferences to evaluate
aggregate welfare.
Yet alternatively, we could assume that there is heterogeneity among households, but
the aggregate behavior can be represented as if it were the outcome of the maximization
of a representative consumer. In this case, the representative consumer will have positive
meaning, but no normative meaning.
In any case, with the representative consumer assumption in discrete time, we have that
the preference side can be represented as the following maximization problem starting at
time t = 0:
max
t u (c (t)) ,
t=0
where (0, 1) is the common discount factor of all the households, and c (t) is the consumption level of the representative household.
This is an extremely convenient assumption, though as the next theorem shows, most
models do not admit representative consumers:
Theorem 8 (Debreu-Mantel-Sonnenschein) Consider an exchange economy with a finite number N < of commodities and H < households, each with potentially dierent
preferences. Let p be the vector of prices and x (p) be the vector of aggregate excess demands
0
for these commodities at the price vector p. For > 0, let P = p RN
+ :pj /pj 0 for all j and j .
Then any > 0, any continuous function x : P RN
+ that satisfies Walras Law and homogeneity of degree 0 can be an aggregate excess demand function.
Proof. See Debreu (1974) or Mas-Colell, Winston and Green (1995), Proposition 17.E.3.
101

Essentially, this theorem states that in general, the fact that there are optimizing individuals in the background imposes no restriction (such as being downward sloping, satisfying
the weak axiom of revealed preference, or possessing a negative-semi-definite Jacobian) for
aggregate (market) excess demand functions. This is therefore a negative result warning us
against the use of models with representative consumers.
Nevertheless, this result is partly an outcome of very strong income eects. Special
but approximately realistic preference functions, as well as restrictions on the distribution of
income across individuals, enable us to rule out arbitrary aggregate excess demand functions.
Here the following aggregation theorem is particularly useful. To state this theorem, recall
that an indirect utility function for household i is vi (p, yi ), which specifies the households
(ordinal) utility as a function of the price vector p and the households income (wealth) yi .
Theorem 9 (Gorman) Consider an economy with a finite number N < of commodities
and H < households. Suppose that the preferences of household i lead to an indirect utility
function of the form vi (p, yi ) = ai (p) + b (p) yi for i = 1, ..., H, then these preferences can
be aggregated to be represented by those of a representative consumer, with indirect utility
v (p, y) =
H
X
ai (p) + b (p) y,
i=1
where y
PH
i=1
yi is aggregate income.
Proof. The proof follows from basic micro theory, and is left to you as an exercise.
Therefore, when there is a special form of quasi-linearity in the preferences, aggregating
them to have representation for a representative consumer is possible.
In this context, it is interesting to consider the CRRA (constant relative risk aversion)
102

utility function in the infinite-horizon economy
P t c(t)1 1 if 6= 1 and 0
t=0
1
,
U=
P
ln
c
(t)
if
=
1
t=0
where is the coecient of relative risk aversion and also the inverse of the intertemporal
elasticity of substitution, which regulates how willing individuals are to substitute consumption over time.
This class of utility functions satisfy the conditions of Theorem 9. We will see below that
CRRA preferences have a special role in models of economic growth, because they are the
unique class of utility functions that are consistent with balanced growth. Therefore, if we
wish to impose balanced growth, the assumption that the economy admits a representative
consumer is not as restrictive as in models in which we wish to analyze growth without
making the balanced growth assumption.
5.2
Problem Formulation
Let us now make the representative consumer assumption. Suppose that each households
utility function in discrete time starting at time t = 0 is (ignoring uncertainty)
t u (c (t)) ,
(5.1)
t=0
where (0, 1) is the discount factor of the households.

In continuous time, this utility function becomes
Z
exp (t) u (c (t)) dt
0
where > 0 is now the discount rate of the individuals.

103
(5.2)

Where does the exponential form of the discounting in (5.2) come from? At some level,
we called discounting in the discrete time case also exponential, so the link should be
apparent.
But to see it more precisely, imagine we are trying to calculate the value of $1 in T
periods, and divide the interval [0, T ] into T /t equally-sized subintervals. Let the interest
rate in each subinterval be equal to t r. It is important that the quantity r is multiplied
by t, otherwise as we vary t, we would be changing the interest rate. Clearly the value
of $1 in T periods at this interest rate is given by
v (T | t) (1 + t r)T /t .
Now we want to take the continuous time limit by letting t 0, i.e., we wish to calculate
v (T ) lim v (T | t) lim (1 + t r)T /t .
t0
t0
Since the limit operator is continuous, we can write

h
T /t
v (T ) exp lim ln (1 + t r)
t0
T
ln (1 + t r)
= exp lim
t0 t
However, the term in square brackets has a limit of the form 0/0. Let us next write this as
ln (1 + t r)
r/ (1 + t r)
= lim
= rT
t0
t0
t/T
1/T
lim
where I used lHopitals rule to obtain the first equality, and then took the limits in the
numerator and denominator to obtain the second equality. Therefore,
v (T ) = exp (rT ) .
104

With the same reasoning, $1 in T periods, is worth exp (rT ) today. The same reasoning
applies to discounting utility, so discounting in continuous time takes the exponential form,
with as the discount rate.
5.3
Welfare Theorems
Ultimately, we are interested in equilibrium growth. But in competitive economies such as

those analyzed so far, we know that there should be an intimate connection between Pareto
optima and competitive equilibria (so far we were not able to exploit these connections,
since utility functions were not specified, so we could not talk of preferences explicitly). To
remember these theorems, denote the vector of prices for a finite dimensional commodity
vector by p, the vector of production across commodities and firms by q and the vector of
consumption across commodities and households by x. Also denote the vector of endowments
across households by and the vector of utilities by u. Denote the set of households by I,
each household denoted by i, and use xi as the vector of consumption of household i, and
p xi as the inner product of the vector of prices and the vector of consumption of household
i, which is, by definition, equal to the total expenditure of household i. Other inner products
and subvectors are defined similarly.
Recall also that by a competitive economy, we refer to an environment without any
externalities and where all commodities are traded competitively (recall that here goods
at dierent dates are dierent commodities). Then we have the following two important
theorems. Both of these theorems are proved in the most elegant fashion in Debreus Theory
of Value for finite commodity spaces, and easier versions of the proofs are contained in
Mas-Colell, Winston and Green (1995). Here I give sketch proofs.
105

Theorem 10 (First Welfare Theorem) Consider a competitive economy with a finite
number of individuals with preferences satisfying non-satiation and a finite number of commodities and an endowment vector . Suppose a competitive equilibrium (p , q , x ) exists.
Then it is Pareto optimal.
Proof. First recall that (p , q , x ) being a competitive equilibrium implies that each
household maximizes its utility by choosing xi at the price vector p and income level
P
yi (p ) p i + f if p qf , where if is the share of profits of firm f held by
household i, and qf is the competitive equilibrium production vector of firm f . We have

P
that i if = 1 by virtue of being shares.
Suppose to obtain a contradiction that there exists (p, q, x) which Pareto dominates
(p , q , x ). Then it must be the case that for all households i I, xi is weakly preferred to
xi , i.e.,
xi i xi
and for at least one i0 I, the new allocation is strictly preferred to xi , i.e.,
xi i xi .
Since (p , q , x ) is a competitive equilibrium, it must be the case that for all i I,
p xi yi (p )
(5.3)
where yi (p ) is the income of household i at price vector p defined above. Suppose not.
We know that by non-satiation p xi = yi (p ), then if p xi < yi (p ), household i could
choose more of each commodity, i.e., xi + for small enough, and again by non-satiation
reach higher utility than that given by xi . This would contradict the hypothesis that xi is
utility maximizing at the price vector p .
106

Moreover, for i0 I, it must be that
p xi0 > yi0 (p ) .
(5.4)
Now summing (5.3) and (5.4) over I, we have

X
iI
p xi >
yi (p ) =
iI
iI
iI
since
p i +
p i +
X
f
X
f
if p qf
p qf
if = 1. Moreover, by the fact that qf is a profit-maximizing vector at prices p ,
we have that
X
f
for all feasible qf , thus

X
iI
p xi >
p qf
X
iI
X
f
p qf
p i +
X
f
p qf
(5.5)
for all feasible qf . However, by feasibility of an allocation, we must have
!
X
X
X
i +
xi =
qf ,
iI
iI
which contradicts (5.5). Consequently, the competitive equilibrium allocation (p , q , x ) is

not Pareto dominated by any other feasible allocation, and is thus Pareto optimal.
Notice that the proof of the first welfare theorem only uses the summation of the values of
commodities at a given price vector. No convexity assumption is necessary, but the fact that
the sums above exist is essential for the proof. Finiteness of the number of commodities and
number of individuals was sucient to guarantee the existence of the sums. Naturally, the
107

sums may exist under other conditions, but with infinite number of commodities, they may
possibly fail to exist, in which case the proof, and even perhaps the First Welfare Theorem,
may not apply.
Theorem 11 (Second Welfare Theorem) Consider a Pareto optimal allocation yielding
utility vector u to households. Then provided that all production sets and preferences are
convex, there exists an endowment vector , such that the resulting competitive equilibrium
(p , q , x ) yields exactly the utility vector u .
Proof. (idea) Given convexity of preference and production sets, (p , q , x ) is a point
of tangency between the aggregate production possibilities set and the aggregate preference
set, both the which are convex. Then by the standard separating hyperplane theorem, there
exists a hyperplane separating these two sets. This hyperplane gives relative prices that can
decentralize the competitive equilibrium at an appropriately chosen endowment vector.
The second welfare theorem is the harder theorem because of the convexity requirement.
In many ways, it is also the more important one. It states that any Pareto optimal allocation
can be decentralized as a competitive allocation. This motivates many macroeconomists to
look for the set of Pareto optimal allocations instead of explicitly characterizing competitive
equilibria. This is especially useful in dynamic models where sometimes competitive equilibria can be quite dicult to characterize or even to specify, while social welfare maximizing
allocations are more straightforward.
Motivated by this, we could start by looking at optimal growth, that is, a capital accumulation, saving and consumption path that is Pareto optimal given the preferences of a
representative household. Although this is standard practice, there is a technical problem
here, since the classical welfare theorem apply when there are finite number of commodities,
108

whereas in growth models there is an infinite number of commodities. The welfare theorems
can be extended to infinite number of commodities under certain circumstances, and the
exact conditions will be discussed below.
For now, let us suppose that the Second Welfare Theorem applies in this environment.
In fact, having an infinite-dimensional commodity space does not create a problem for the
Second Welfare Theorem, as long as convexity continues to hold. The problem arises for the
First Welfare Theorem because of the issue of existence of sums as discussed above. Given
this, we can start on the analysis of economic growth at optimizing agents by looking at
the social planners choice of an allocation that maximizes the representative households
lifetime discounted utility. This is the optimal growth approach.
5.4
Optimal Growth in Discrete Time
Let us continue to consider an economy characterized by an aggregate production function,

and a representative consumer (household). The optimal growth problem in discrete time
with no population growth or technological progress can be written as follows:
max
{c(t),k(t)}
t=0
t u (c (t))
t=0
subject to
k (t + 1) = f (k (t)) c (t) + (1 ) k (t) ,
(5.6)
k (t) 0 and given k (0) > 0.

In other words, in this optimal growth problem, the social planner chooses an entire
sequence of consumption levels and capital stocks in order to maximize the discounted sum
109

of the utility of the representative consumer. The constraint (5.6) embeds the capital accumulation equation together with the production function.
We have also specified that the initial level of capital stock is k (0), but this gives a single
initial condition. We will see later that we need another boundary condition but not in the
form of an initial condition. Instead, this will come from of the optimality of a dynamic plan
in the form of a transversality condition.
This maximization problem can be solved in a number of dierent ways, for example, by
setting up an infinite dimensional Lagrangian. But the most convenient and common way
of approaching it is by using dynamic programming.
Even if our purpose were not to characterize the Pareto optimal allocations, but to find
equilibrium, we would have to solve a problem similar to this. In particular, each household
would be solving the following problem:
max
{c(t),k(t)}
t=0
t u (c (t))
t=0
subject to
a (t + 1) = r (t) a (t) c (t) + w (t) ,
(5.7)
given a (0), where a (t) denotes the assets of the household at time t and r (t) is the rate
of return on assets and w (t) is wage in come. The constraint, (5.7) is the flow budget
constraint, meaning that it links tomorrows assets to todays assets. Here we need an
additional condition so that this flow budget constraint eventually converges (i.e., so that
a (t) should not go to negative infinity). This can be ensured by imposing a lifetime budget
constraint, but the flow budget constraint is often more convenient to work with, so we need
to augment it with another condition as we will see later.
110
5.5
Optimal Growth in Continuous Time
The formulation of the optimal growth problem in continuous time is very similar. In particular, we have
max
[c(t),k(t)]t=0
exp (t) u (c (t)) dt
subject to
k (t) = f (k (t)) c (t) k (t)
(5.8)
k (t) 0 and given k (0). Once again, this problem lacks one boundary condition which will
come from the transversality condition.
The most convenient way of characterizing the solution to this problem is via optimal
control.
We next discuss dynamic programming and optimal control briefly.
111
112
Chapter 6
Dynamic Programming and Optimal

Growth
Here I provide a very brief overview of infinite horizon optimization in discrete time, in
particular of stationary dynamic programming. I also include some technical details, which
are not essential for the purposes of this course, but may be useful for those of you who want
to understand some of the tools better.
113
6.1
Brief Review of Dynamic Programming
Using abstract but simple notation, the canonical dynamic optimization program in discrete
time can be written as
Problem A1
v (x0 ) =
sup
{xt+1 }
t=0
t F (xt , xt+1 )
t=0
subject to
xt+1 (xt ),
for all t 0
x0 given.
where xt X RK for some K 1. In many economic applications, we will have K = 1,
so that xt R. Here I used sup rather than max, since there is no guarantee that the
maximal value is attained by any feasible plan.
Here F is the payo function, depending on xt , which is the state variable, and xt+1 ,
which corresponds to the control variable. In this simple formulation, xt+1 will also directly
become the state variable in the next time period.
The constraint on the problem is written as
xt+1 (xt )
where
:X X
is a correspondence determining what type of xt+1 is allowed given the state variable xt .
Notice that this problem is stationary in the sense that the payo function F is not
time-dependent. It only depends on xt and xt+1 .
114

Of particular importance is the function v (x0 ), which can be thought of as the value
function, meaning the value of pursuing the optimal strategy starting with initial condition
x0 .
Notice also that I have already simplified life by writing the objective function as a
discounted sum. This is the class of problems in which dynamic programming will be most
useful. If instead we had a much more general problem, for example,
sup
{xt+1 }
t=0
F (x0 , x1 , ...),
then because there is no discounted structure, dynamic programming could not be used (at
least in its simplest form). Moreover, it can be noted that problems that do not have an
exponential discounted structure pose another problem for us: they are not time-consistent,
in the sense that the original plan that maximizes the initial objective function is not necessarily what an individual would like to stick to if he or she is carrying out the optimization
period by period. Time consistency is both a very natural property and one that makes
the mathematical analysis much simpler. In many ways, it is also the essence of dynamic
programming.
For concreteness, let us recall the optimal growth problem from above:
max
{c(t),k(t)}t=0
t u (c (t))
t=0
subject to
k (t + 1) = f (k (t)) c (t) + (1 ) k (t) ,
k (t) 0 and given k (0). To map this problem into the form here, let xt = k (t) and
xt+1 = k (t + 1). Then use the constraint to write:
c (t) = f (k (t)) k (t + 1) + (1 ) k (t) ,
115

and substitute into the objective function to obtain:
max
{c(t),k(t)}
t=0
t u (f (k (t)) k (t + 1) + (1 ) k (t))
t=0
subject to k (t) 0 (which is the simplest form of a constraint correspondence ).

Problem A1, also referred to as the sequence problem, is one of choosing an infinite
sequence {xt }
t=0 from some (vector) space of infinite sequences (for example, {xt }t=0 L ,
where L is the vector space of infinite sequences that are bounded with the kk norm,
which I will denote throughout by the simpler notation kk). Such problems sometimes have
nice features, but often are dicult to characterize both analytically and numerically.
The basic idea of dynamic programming is to turn the sequence problem into a functional
equation, i.e., one of finding a function rather than a sequence. This often gives better
economic insights, similar to the logic of comparing today to tomorrow. It is also often easier
to characterize analytically or numerically. In this particular case, the relevant functional
equation can be written as
Problem A2
v(x) =
sup [F (x, y) + v(y)] , for all x X.
(6.1)
y(x)
In fact, this form of the problem suggests itself naturally from the formulation Problem
A1. Suppose Problem A1 has a maximum with optimal sequence denoted by {xt }
t=0 starting
with x0 Then by definition,
v (x0 ) =
t F (xt , xt+1 )
t=0
F (x0 , x1 )
F (x0 , x1 )
+
+
j F (xj+1 , xj+2 )
j=0
v (x1 )
116

This equation encapsulates the basic idea of dynamic programming: the principle of optimality.
Essentially, an optimal plan can be broken into two parts, what is optimal to do today,
and the optimal continuation path. Dynamic programming exploits this principle and provides us with a set of powerful tools to analyze optimization in discrete time infinite horizon
problems.
Part of the theory of dynamic programming is about specifying the conditions under
which Problems A1 and A2 are equivalent. These are not central for us to focus upon
here, but I will return to some of these issues below. Problem A2 is commonly referred to
as the Bellman equation, after Richard Bellman, who introduced dynamic programming
to operations research and engineering applications (though identical tools and reasonings,
including the contraction mapping theorem were earlier used by Lloyd Shapley in his work
on stochastic games).
A couple of points are immediately worth noting. First, v (x) is a function, more formally,
v:XR
Dierently from other maximization problems, here maximization itself defines the function
v, as the notation makes it clear with the sup (or max) defining the function. Therefore,
instead of finding a sequence {xt }

t=0 L , we will try to find a function v, that satisfies
(6.1). Second, because the function v is defined recursively, in the sense that it is on the
right hand side of (6.1) as well, this is often referred to as the recursive formulation.
What makes this formulation useful is that the solution will often be a time invariant
policy function, g : X X determining what value of xt+1 to choose for a given value of
the state variable xt . [In general, there are two complications: first, a control reaching the
117

optimal value may not exist, which was the reason why we originally used the notation sup;
second, we may not have a policy function, but a policy correspondence g : X X, because
there may be more than one maximizers for a given state variable. Let me avoid these
complications for now, and assume that g () is single valued, thus a functionconditions
to guarantee this are provided below]. Moreover, as we will see, once the value function v
is determined, the policy function is given straightforwardly. In particular, by definition it
must be the case that
v(x) = [F (x, g (x)) + v(g (x))] , for all x X,
which is one way of determining the policy function. This equation simply follows from the
fact that g (x) is the optimal policy, so reaches the maximal value v (x).
The usefulness of the recursive formulation as in (6.1) comes from the fact that there
are some powerful tools which not only establish existence of the solution, but also some of
its properties. These are not essential for understanding the application of these tools to
economic growth models, but they are useful for working with these tools in general and
with growth models in particular.
6.2
6.2.1
Digression: Technical Details

Contraction Mappings
We say that (S, ) is a metric space, if S is a space and is a metric defined over this space
with the usual properties (loosely corresponding to distance between elements of S).
Definition 6 Let (S, ) be a metric space and T : S S be an operator mapping S into
118

itself. T is a contraction mapping (with modulus ) if for some (0, 1),
(T x, T y) (x, y), for all x, y S.
In other words, a contraction mapping brings elements of the space S closer to each
other.
For example, let us take a simple interval of the real line as our space, S = [a, b], with
usual metric of this space (x, y) = |x y|. Then T : S S is a contraction if for some
(0, 1),
|T x T y|
< 1,
|x y|
all x, y S with x 6= y.
Definition 7 A fixed point of T is any element of S satisfying T x = x.

Recall also that a metric space (S, ) is complete if every Cauchy sequence in S converges
to an element in S.
Theorem 12 (Contraction Mapping Theorem) Let (S, ) be a complete metric space,
and T : S S be a contraction. Then there exists a unique v S such that
T = ,
i.e., a unique fixed point.
Proof. Note T n x = T (T n1 x) for any n = 1, 2, .... Now take 0 S, and a sequence { n }
n=0
with each element in S, such that n+1 = T n so that
n = T n 0.
This implies that
( 2 , 1 ) = (T 1 , T 0 ) ( 1 , 0 ),
119

where the last inequality uses the contraction property of T . Moreover, by induction, we
have
( n+1 , n ) n ( 1 , 0 ),
n = 1, 2, ...
(6.2)
Hence, for any m > n,

( m , n ) ( m , m1 ) + ... + ( n+2 , n+1 ) + ( n+1 , n )
m1 + ... + n+1 + n ( 1 , 0 )
= n mn1 + ... + + 1 ( 1 , 0 )
n
( 1 , 0 ),
1
where the first line uses the triangle inequality (which is true by definition for any metric),
and the second line uses (6.2).
The last line implies that as n, m , m and n are getting closer, so { n }
n=0 is a
Cauchy sequence. Since S is complete, this establishes that
n S.
Now note that for any 0 S and any n N, we have
(T , ) (T , T n 0 ) + (T n 0 , )
(
, T n1 0 ) + (T n 0 , ),
where the first line again uses the triangle inequality, and the second line the definition of
the contraction. The above argument shows that both of the terms on the right tend to zero
as n , which implies that (T , ) = 0, establishing that T = , thus a fixed point
exists.
120

Uniqueness is proved by contradiction. Suppose that there exist , S, such that
T = and T = with 6= . This implies
0 < a = (
, ) = (T , T ) (
, ) = a.
Since < 1, this yields a contradiction, proving uniqueness.
The use of the contraction mapping theorem is that it can be applied to any metric
space, so in particular to the space of functions. Applying it to equation (6.1) will establish
the existence of a unique value function v, greatly facilitating the analysis of such dynamic
models. Naturally, for this we have to prove that the recursion in (6.1) defines a contraction
mapping. We will see below that this is often straightforward.
Before doing this, let us consider another useful result.
First, recall that if (S, ) is a complete metric space and S 0 is a closed subset of S, then
(S 0 , ) is also a complete metric space.
Theorem 13 Let (S, ) be a complete metric space, T : S S be a contraction mapping
with T = . If S 0 is a closed subset of S, and T (S 0 ) S 0 , then S 0 . Moreover, if
T (S 0 ) S 00 S 0 , then S 00 .
Proof. Take an arbitrary 0 S 0 , and consider the sequence {T n 0 }
n=0 . Each element of
this sequence is in S 0 by the fact that T (S 0 ) S 0 . T n 0 from Theorem 12. Since S 0 is
closed, S 0 , proving the first claim in the theorem. If in addition we have that T (S 0 ) S 00 ,
then by virtue of the fact that S 0 . T S 00 , so S 00 , proving the second part.
The second part of this theorem is very important to prove results such as strict concavity
or that a function is strictly increasing. This is because the set of strictly concave functions
or the strictly increasing functions are not closed. The second part of the theorem enables
us to avoid this complication.
121

How do we check that a mapping is a contraction? Here, the following theorem is useful,
especially in the context of dynamic programming. Let us use the notation (f + a)(x) =
f (x) + a for some a R. Then:
Theorem 14 (Blackwells sucient conditions for a contraction) Let X RK ,
and B(X) be the space of bounded functions f : X R. defined on X Suppose that
T : B(X) B(X) is an operator satisfying the following two conditions:
1. (monotonicity) For any f, g B(X) and f (x) g(x) for all x X implies
(T f )(x) (T g)(x) for all x X.
2. (discounting) There exists (0, 1) such that
[T (f + a)](x) (T f )(x) + a, for all f B(X), a 0, x X,
Then, T is a contraction with modulus .
Proof. Let f g stand for f (x) g(x) for all x X. By definition
for any f, g B(X), f g + kf gk ,
where again kk is the sup norm. Now applying the operator T on both sides, we have
T f T (g + kf gk) T g + kf gk ,
where the first inequality uses monotonicity and the second discounting. Applying the same
argument in reverse establishes
T g T f + kf gk .
122

Combining these two inequalities yields
kT f T gk kf gk ,
proving that T is a contraction.
6.2.2
Application of Contraction Mappings to Dynamic Programming
Let us now apply the above tools to the problem of dynamic programming, outlined at the
beginning. Consider a sequence {xt+1 }
t=0 which attains the supremum of Problem A1. We
will now show that this sequence will satisfy the recursive equation of dynamic programming
v(xt ) = F (xt , xt+1 ) + v(xt+1 ), for all t = 0, 1, 2, ...,
(6.3)
and moreover, under some boundedness conditions, any sequence that is a solution to (6.3)
is a solution to Problem A1, in the sense that it attains its supremum. In other words, we
will establish some equivalence results between the solutions to Problem A1 and Problem
A2.
To prepare for these results, let us define the set of feasible sequences or plans starting
with initial value x0 :
(x0 ) = {{xt+1 }
t=0 : xt+1 (xt ),
t = 0, 1, ...}.
Let us denote a typical element of the set by x = (x0 , x1 , ...) (x0 ), and assume:
Assumption 3 (x) is nonempty for all x X; and for all x0 X and x (x0 ),
P
limn nt=0 t F (xt , xt+1 ) exists.
123

where R
is the extended real line
Next define the supremum function v : X R,
= R {+, }), as:

(R
v (x0 ) = sup u(x).

x(x0 )
Thus v (x0 ) is the supremum in Problem A1 (i.e., the value of the program in Problem A1).
Note that it follows by definition that v is the unique function satisfying the following three
conditions for Problem A1, or the sequence problem, SP:
1. if |v (x0 )| < , then
v (x0 ) u(x),
all x (x0 );
(6.4)
some x (x0 );
(6.5)
and for any > 0,

v (x0 ) u(x) + ,
2. if v (x0 ) = +, then there exists a sequence {xk } in (x0 ) such that limk u(xk ) =
+; and
3. if v (x0 ) = , then u(x) = ,, for all x (x0 ).
Conversely, we will say that v is a solution to Problem A2 (and thus satisfies the
functional equation (6.3)), if the following three conditions for FE hold:
1. If |v (x0 )| < , then
v (x0 ) F (x0 , y) + v (y),
all y (x0 ),
(6.6)
some y (x0 );
(6.7)
and for any > 0,

v (x0 ) F (x0 , y) + v (y) + ,
124

2. if v (x0 ) = +, then there exists a sequence {y k } in (x0 ) such that
lim F (x0 , y k ) + v (y k ) = +;
(6.8)
3. if v (x0 ) = , then
F (x0 , y) + v (y) = ,
all y (x0 ).
(6.9)
We now have the following simple lemma:

Lemma 1 Let X, , F, and satisfy Assumption 3.
Then for any x0 X and any
x = (x0 , x1 , ...) (x0 ),

u(x) = F (x0 , x1 ) + u(x0 )
with x0 = (x1 , x2 , ...).
Proof. Under Assumption 3, for any x0 X and any x (x0 ),
u(x) =
lim
n
X
t F (xt , xt+1 )
t=0
= F (x0 , x1 ) + lim
n
X
t F (xt+1 , xt+2 )
t=0
= F (x0 , x1 ) + u(x0 ).
This lemma basically says that the utility from any feasible plan can be decomposed into
two parts, the current return and continuation value. It therefore formalizes the principle of
optimality introduced more informally above.
Theorem 15 Let X, , F, and satisfy Assumption 3. Then the function v is a solution
to Problem A2.
125

Proof. If = 0, the result is trivial. Suppose that > 0, and choose x0 X.
Suppose v (x0 ) is finite. Then SP conditions (6.4) and (6.5) hold, and it is sucient to
show that this implies that the FE conditions (6.6) and (6.7) hold. To establish (6.6), let
x1 (x0 ) and > 0 be given. Then by SP (6.5) there exists x0 = (x1 , x2 , ...) (x1 ) such
that u(x0 ) v (x1 ) . Note also that x = (x0 , x1 , x2 , ...) (x0 ). Hence it follows from SP
(6.4) and Lemma 1 that
v (x0 ) u(x) = F (x0 , x1 ) + u(x0 ) F (x0 , x1 ) + v (x1 )
for any > 0, establishing FE (6.6).
To establish FE (6.7), choose x0 X and > 0. From SP (6.5) and Lemma 1, it follows
that one can choose x = (x0 , x1 , ...) (x0 ), so that
v (x0 ) u(x) + = F (x0 , x1 ) + u(x0 ) + ,
where x0 = (x1 , x2 , ...). It then follows from SP (6.4) that
v (x0 ) F (x0 , x1 ) + v (x1 ) + .
Since x1 (x0 ), this establishes FE (6.7).
If v (x0 ) = +, then there exists a sequence {xk } in (x0 ) such that lim u(xk ) = +.
k
Since
xk1
(x0 ), all k, and

u(xk ) = F (x0 , xk1 ) + u(x0k ) F (x0 , xk1 ) + v (xk1 ),
all k,
it follows that FE (6.8) holds for the sequence {y k = xk1 } in (x0 ). If v (x0 ) = , then
u(x) = F (x0 , x1 ) + u(x0 ) = ,
126
all (x0 , x1 , x2 , ...) = x (x0 ),

where x0 = (x1 , x2 , ...). Since F is real-valued (thus does not take the values or +),
it follows that
u(x0 ) = ,
all x1 (x0 ), all x0 (x1 ).
Hence v (x1 ) = , all x1 (x0 ). Since F is real-valued and > 0, (6.9) follows immediately.
Under the additional boundedness condition, we have the following converse to this
theorem:
Theorem 16 Let X, , F, and satisfy Assumption 3.
If v is a solution to (FE) and
satisfies
lim n v(xn ) = 0,
all (x0 , x1 , ...) (x0 ), all x0 X,
(6.10)
then v = v .
Proof. (sketch) Condition (6.10) implies that v cannot take on the values + or .
Hence v satisfies (6.6) and (6.7), and it is sucient to show that this implies v satisfies (6.4)
and (6.5).
Since v is the solution to Problem A2, then (6.6) implies that for all x0 X and x (x0 )
v(x0 ) F (x0 , x1 ) + v(x1 )
F (x0 , x1 ) + F (x1 , x2 ) + 2 v(x2 )
..
.
un (x) + n+1 v(xn+1 ).
Now taking the limit as n and using the convergence property from (6.10), we obtain
(6.4) for any x (x0 ).
127

Now for a given x0 X and > 0, choose an arbitrary sequence { n }
n=1 in R+ such
P n1
that n=1 n /2. Since (6.7) holds, we can choose xt+1 (xt ) so that
v(xt ) F (xt , xt+1 ) + v(xt+1 ) + t+1 .
Using these inequalities, we obtain that for any x = (x0 , x1 , x2 , ...) (x0 ), we have
v(x0 ) un (x) + n+1 v(xn+1 ) + ( 1 + 2 + ... + n n+1 )
un (x) + n+1 v(xn+1 ) + /2,
n = 1, 2, ...
Since (6.10) implies that for n suciently large the second term is also less than /2, it
follows that as n ,
v(x0 ) u(x) + ,
completing the proof.
An important implication is that although Problem A2 may have many solutions, only
one of those will satisfy the convergence condition (6.10). In general, we can make a lot of
progress by studying solutions to Problem A2, but sometimes we need to impose (6.10) in
order to pick the right solution (this is similar to sometimes working with necessary conditions
for optimization, though of course then we need to impose the suciency conditions).
Naturally, our interest is mainly with optimal plans. For this we have:
Theorem 17 Let X, , F, and satisfy Assumption 3. Let x (x0 ) be a feasible plan

that attains the supremum in Problem A1 starting with initial state x0 . Then
v (xt ) = F (xt , xt+1 ) + v (xt+1 ),
128
t = 0, 1, 2, ...
(6.11)

Proof. Since x attains the supremum,
v (x0 ) = u(x ) = F (x0 , x1 ) + u(x0 )
u(x) = F (x0 , x1 ) + u(x0 ),
all x (x0 ).
(6.12)
Now choose x1 = x1 , (6.12) still holds. Since (x1 , x2 , x3 , ...) (x1 ) implies that (x0 , x1 , x2 , x3 , ...)
(x0 ), so that
u(x0 ) u(x0 ),
all x (x1 ).
Therefore u(x0 ) = v(x1 ). Substituting this into (6.12) yields (6.11) for t = 0. Continuing
by induction establishes (6.11) for all t.
Finally, the converse to this theorem is:
Theorem 18 Let X, , F, and satisfy Assumption 3. Let x (x0 ) be a feasible plan
from x0 satisfying (6.11), and with
lim sup t v (xt ) 0.
(6.13)
Then x attains the supremum in Problem A1 for initial state x0 .

Proof. Suppose that x (x0 ) satisfies (6.11) and (6.13). Then it follows by induction
on (6.11) that
v (x0 ) = un (x ) + n+1 v (xn+1 ),
n = 1, 2, ...
Then using (6.13), we find that v (x0 ) u(x ). Since x (x0 ), the reverse inequality
holds, establishing the result.
The above theorems are useful in showing the equivalence of Problem A1 and Problem
A2. Now the usefulness of the dynamic programming formulation in Problem A2, and hence
129

of the contraction mapping theorem, comes from the fact that its solution is often easy to
characterize. So for this purpose, take the following version of the dynamic programming
problem (Problem A2)
v(x) = max [F (x, y) + v(y)] ,
y(x)
(6.14)
where < 1. As before, X is the possible set of values for the state variable and : X X
is the correspondence describing the constraints on the problem. We now make an additional
assumption, which is not necessary, but greatly simplifies the analysis.
Assumption 4 X is a compact subset of RK , is nonempty, compact-valued and continuous. Moreover, let A = {(x, y) X X : y (x)} and F : A R be bounded and
continuous.
The importance of Assumption 4 is that it will allow us to focus on the space of bounded
functions. Most importantly, since F is bounded over its eective domain, there exists
some B < , such that |F (x, y)| < B for all (x, y) A. This immediately implies that
|v (x)| B/(1 ), all x X. Consequently, we can focus our attention on value functions
in the space C (X) of continuous bounded functions defined on X, with the natural norm
on this space, the sup norm, kf k = supxX |f (x)|.
In particular, to see the usefulness of the contraction mapping theorem, now define the
operator T such that
(T f )(x) = max [F (x, y) + f (y)].
y(x)
(6.15)
A fixed point of this operator, v = T v, will be a solution to (6.14), establishing the desired
results. Then we can derive the policy functions from the value function.
130

Theorem 19 Let X, , F, and satisfy Assumption 4 and let C(X) be the space of bounded
continuous functions f : X R, with the sup norm. Then the operator T maps C(X) into
itself, i.e., T : C(X) C(X), and has a unique fixed point, v C(X) satisfying (6.14).
Proof. Formulated in this way, it is immediate that T is a contraction. Since the maximization problem on the right hand side of (6.15) is one of maximizing a bounded function over
a compact set, it has a solution. Consequently, T is well defined and is easily seen to satisfy
the sucient conditions for a contraction in Theorem 14. Therefore, applying Theorem 12,
a unique v C(X) satisfying (6.14) exists.
Corollary 4 Let G : X X defined as
G(x) = {y (x) : v(x) = F (x, y) + v(y)} ,
(6.16)
be the policy function (correspondence). Under the assumptions of Theorem 19, G is compact
valued and upper hemi-continuous.
Proof. This follows immediately from Berges maximum theorem.
We can next see how Theorem 13 enables us to establish more properties of the value
function and the policy correspondence. In particular, for example, let us assume
Assumption 5 For each y, F (, y) is strictly increasing in each of its first K arguments,
and is monotone in the sense that x x0 implies (x) (x0 ).
Theorem 20 Let X, , F, and satisfy Assumptions 4 and 5, and let v be the unique
solution to (6.14). Then v is strictly increasing.
131

Proof. Let C 0 (X) C(X) be the set of bounded, continuous, nondecreasing functions on
X, and let C 00 (X) C 0 (X) be the set of strictly increasing functions.
Since C 0 (X) is a
closed subset of the complete metric space C(X), by Theorem 13, it is sucient to show
that T [C 0 (X)] C 00 (X). Assumption 5 immediately implies that for any nondecreasing f ,
T f is increasing, establishing the result.
Furthermore, let us impose
Assumption 6 F is strictly concave, i.e.,
F [(x, y) + (1 )(x0 , y 0 )] F (x, y) + (1 )F (x0 , y 0 ),
all (x, y), (x0 , y 0 ) A,
and all (0, 1).
In addition, the inequality is strict if x 6= x0 .

Moreover, is convex in the sense that for any 0 1, and x, x0 X,
y (x) and y 0 (x0 ) implies
y + (1 )y 0 [x + (1 )x0 ].
This assumption imposes enough concavity on the problem, in particular, it rules out
increasing returns of any form.
Theorem 21 Let X, , F, and satisfy Assumptions 4, 5 and 6, and let v satisfy (6.14);
and let G satisfy (6.16).
Then v is strictly concave and G is a continuous, single-valued
function.
Proof. The proof again follows from Theorem 13. Let C 0 (X) C(X) be the set of bounded,
continuous, (weakly) concave functions on X, and let C 00 (X) C 0 (X) be the set of strictly
132

concave functions. Since C 0 (X) is a closed subset of the complete metric space C(X), by
Theorem 13, T [C 0 (X)] C 00 (X) would establish the results. To see this, let f C 0 (X) and
let
x0 6= x1 ,
(0, 1),
and x = x0 + (1 )x1 .
Let yi (xi ) attain (T f )(xi ), for i = 0, 1. Then Assumption 6 implies that y = y0 + (1

)y1 (x ), so that
(T f )(x ) F (x , y ) + f (y )
> [F (x0 , y0 ) + f (y0 )] + (1 )[F (x1 , y1 ) + f (y1 )]
= (T f )(x0 ) + (1 )(T f )(x1 ),
where the first line is a simple implication of (6.15) and the fact that y (x ); the second
line uses the hypothesis that f is concave and the concavity restriction on F from Assumption
6. Since these relationships are true for any f C 0 (X), they establish T [C 0 (X)] C 00 (X),
so that the unique fixed point v is strictly concave. Since, from Assumption 6, F is also
concave and for each x X, (x) is convex, it follows that the maximum in (6.15) is attained
at a unique y value. Hence G is a single-valued function, and its continuity follows from the
fact that it is upper hemi-continuous.
Finally, by also assuming dierentiability, we can also prove that the value function is
dierentiable.
Assumption 7 F is continuously dierentiable on the interior of its domain A.
Theorem 22 Let X, , F, and satisfy Assumptions 4, 5, 6 and 7. Furthermore, let v
satisfy (6.14) and G satisfy (6.16). Suppose also that x0 IntX and G (x0 ) Int (x0 ), then
v is continuously dierentiable at x0 .
133

Proof. From Theorem 21, G is a function (i.e., single valued). Moreover, since G(x0 ) Int(x0 )
and is continuous, it follows that G(x0 ) Int(x), for all x in some neighborhood D of x0 .
Define W () on D by
W (x) = F [x, G(x0 )] + v[G(x0 )].
Since F is concave (Assumption 6) and dierentiable (Assumption 7), it follows that W ()

is concave and dierentiable. Moreover, since G(x0 ) (x) for all x D, it follows that
W (x) max [F (x, y) + v(y)] = v(x),

y(x)
for all x D
(6.17)
with equality at x0 . Now, we show that (6.17) implies that v () is dierentiable. For this
note that v () is concave, thus v () is convex, and by a standard result in convex analysis,
it possesses subgradients. Moreover, for any subgradient p of v at x0 must satisfy
p (x x0 ) v(x) v(x0 ) W (x) W (x0 ),
for all x D,
where the first inequality uses the definition of a subgradient and the second uses the fact
that W (x) v(x), with equality at x0 as established in (6.17). Since W is dierentiable at
x0 , p is unique, and again by a standard result in convex analysis, any convex function with
a unique subgradient at an interior point x0 is dierentiable at x0 . This establishes that
v (), thus v (), is dierentiable as desired.
134
6.3
Back to the Fundamentals of Dynamic Programming
6.3.1
Basic Equations
Now consider the functional equation

v(x) = max [F (x, y) + v(y)] , for all x X.
y(x)
(6.18)
We know that the solution to our problem has to satisfy this functional equation. Moreover,
let us assume (as proved under some conditions above) that the value function v is dierentiable (we take the payo function F to be dierentiable everywhere). Moreover, consider
y Int (x), in other words, the constraints on the problem are not binding. Then we can
write a convenient Euler equation for this problem (again using s to denote optimal values)
as
y F (x , y ) + y v (y ) = 0.
Let us first focus on the case where both x and y are real numbers. Then, we have the
simpler condition:
F (x , y )
+ v 0 (y ) = 0.
y
(6.19)
This is very intuitive; it requires the sum of the marginal gain today from increasing y
and the discounted marginal gain from increasing y on the value of all future returns to be
equal to zero. For example, we can think of F as being decreasing in y and increasing in x
(recall for example the representation of the basic growth model with F (x, y) corresponding
to u (f (x) y + (1 ) x)or u (f (k (t)) k (t + 1) + (1 ) k (t))). In this case, equation
(6.19) requires the current cost of increasing y to be compensated by higher values tomorrow.
135

In the context of growth, this corresponds to current cost of reducing consumption to be
compensated by higher consumption tomorrow.
This is a very nice condition, but it involves v 0 (y), i.e., the derivative of the value
function, which we do not know. Here we can use the equivalent of the Envelope Theorem
for dynamic programming, and dierentiate (6.18):
x v (x) = x F (x, y ),
where z f denotes the gradient vector of function f with respect to the vector z. In the
case of one-dimensional variables, we have the more intuitive equation:
v0 (x) =
F (x, y )
.
x
(6.20)
These equations follow from the fact that x does not appear directly anywhere else (and its
eects through y, i.e., x y or y/x can be ignored, given the optimality condition (6.19)).
Now in the one-dimensional case, combining (6.20) together with (6.19), we have the
following very useful condition:
F (x , y )
F (y , g (y ))
+
=0
y
x
where x denotes the derivative with respect to the first argument and y with respect to
the second argument, and g (x) is the optimal policy given state variable x.
Alternatively, we could write this with the time subscripts as
F (xt+1 , xt+2 )
F (xt , xt+1 )
+
= 0.
xt+1
xt+1
(6.21)
However, this Euler equation is not sucient for optimality. In addition we need the
transversality condition. In the more general case this is equivalent to:
lim t xt F (xt , xt+1 ) xt = 0
136

where denotes the inner product operator. In the one-dimensional case, we have the
simpler transversality condition:
lim t
F (xt , xt+1 )
xt = 0.
xt
(6.22)
In words, this condition requires that the product of the marginal return from the state
variable x times the value of this state variable does not increase asymptotically at a rate
faster than 1/.
We will see why this transversality condition makes sense shortly. But for now, we can
note the following theorem:
Theorem 23 Let X RK
+ , and suppose that X, , F, and satisfy Assumptions 4, 5, 6 and

7. Then the sequence xt+1 t=0 , with xt+1 Int(xt ), t = 0, 1, . . . , is optimal for Problem
A1 given x0 , if it satisfies (6.21) and (6.22).
Proof. Let x0 be given; let {xt } be a feasible (nonnegative) sequence satisfying (6.21) and
(6.22) and {xt } another feasible (nonnegative) sequence. Assumptions 4, 6 and 7 imply that
F is continuous, concave, and dierentiable, so let us define
lim
T
X
t=0
t [F (xt , xt+1 ) F (xt , xt+1 )]
as the dierence of the objective function between the feasible sequences {xt } and {xt }. If
we establish that is nonnegative for any feasible nonnegative sequence {xt }, then we will
have established {xt } yields no lower utility than any feasible {xt }, thus it must be optimal.
Now by definition of a concave function, we have
lim
T
X
t=0
t [Fx (xt , xt+1 ) (xt xt ) + Fy (xt , xt+1 ) (xt+1 xt+1 )]

137

Since x0 x0 = 0, rearranging terms gives
(T 1
)
X
t [Fy (xt , xt+1 ) + Fx (xt+1 , xt+2 )] (xt+1 xt+1 ) + T Fy (xT , xT +1 ) (xT +1 xT +1 ) .
lim
T
t=0
Since {xt } satisfies (6.21), the terms in the summation are all zero. Therefore, substituting
from (6.21) into the last term and then using (6.22) gives
lim T Fx (xT , xT +1 ) (xT xT )
T
lim T Fx (xT , xT +1 ) xT ,
T
where the last line uses the fact that from Assumption 5, F is increasing in x, i.e., Fx 0
and xt 0, all t.
0 then immediately follows from (6.22), establishing the desired
result.
6.3.2
Dynamic Programming Versus the Sequence Problem
To get more insights into dynamic programming, let us return to the sequence problem.
Also, let us suppose that xt is one dimensional and that there is a finite horizon T . Then
the problem becomes
max
{xt+1 }T
t=0
T
X
t F (xt , xt+1 )
t=0
subject to xt+1 0 with x0 as given. Moreover, let F (xT , xT +1 ) be the last periods utility,
with xT +1 as the state variable left after the last period (this utility could be thought of as
the salvage value for example), since the world ends after date T .
In this case, we have a finite-dimensional optimization problem and we can simply look
at first-order conditions. Moreover, let us again assume that the optimal solution lies in
138

the interior of the constraint set, i.e., xt > 0, so that we do not have to worry about
boundary conditions and complementary-slackness type conditions. Given these, the firstorder conditions of this finite-dimensional problem are exactly the same as the above Euler
equation. In particular, we have
for any 0 t T 1,
t F (xt , xt+1 )
xt+1
t+1 F (xt+1 , xt+2 )
xt+1
= 0,
or
for any 0 t T 1,
F (xt+1 , xt+2 )
F (xt , xt+1 )
+
= 0,
xt+1
xt+1
which are identical to the Euler equations for the infinite-horizon case. In addition, for xT +1 ,
we have the following boundary condition
xT +1 0, and T
F (xT , xT +1 )
xT +1 = 0.
xT +1
(6.23)
Intuitively, this boundary condition requires that xT +1 should be positive only if an interior
value of it maximizes the salvage value at the end.
Again, returning to the growth example for a second, recall that
F (x, y) = u (f (x) + (1 ) x y) ,
with the mapping x = k and y = k+1 .
Now in this case at the last date T , we have
F (xT , xT +1 )
= u0 (cT ) < 0,
xT +1
Therefore, we must have kT +1 = 0, i.e., there will be no capital left at the end of the world.
This is very intuitive. If any of it were left, utility could be improved by consuming that
capital either at the last date or at some earlier date.
139

Now, heuristically we can derive the transversality condition as an extension of condition
(6.23) to T . Take this limit, which implies
F (xT , xT +1 )
xT +1 = 0.
xT +1
lim T
Moreover, as T , we have the Euler equation

F (xT +1 , xT +2 )
F (xT , xT +1 )
+
= 0,
xT +1
xT +1
thus substituting into the previous equation, we have
lim T +1
T
F (xT +1 , xT +2 )
xT +1 = 0.
xT +1
or canceling the negative sign, and without loss of any generality, changing the timing:
lim
F (xT , xT +1 )
xT = 0,
xT
which is exactly the transversality condition as (6.22). This derivation also emphasizes that
alternatively we could have had the transversality condition as
lim T
F (xT , xT +1 )
xT +1 = 0,
xT +1
which emphasizes that there is no unique transversality condition, but we generally need a
boundary condition at infinity, which would be one of multiple potential conditions. This
issue will return when we look at optimal control in continuous time.
Therefore, a slightly dierent (and more heuristic) way of obtaining Theorem 23, is to
consider the above sequence problem with T , i.e.,
max
{xt+1 }
t=0
t F (xt , xt+1 ).
t=0
140

By taking the limit of the above finite-dimensional conditions, we obtain the Euler equation:
F (xt+1 , xt+2 )
F (xt , xt+1 )
+
= 0 for all t 0,
xt+1
xt+1
and now the transversality condition (6.22) is also necessary, which can be established by
using a variational argument, or heuristically, as the limit of the boundary condition as
derived above.
6.4
Optimal Growth in Discrete Time
We are now in a position to apply the methods developed so far to the problem of optimal
growth. In this section, I will limit myself to optimal growth.
Recall the optimal growth problem as
max
{c(t),k(t)}
t=0
t u (c (t))
(6.24)
t=0
subject to
k (t + 1) = f (k (t)) + (1 ) k (t) c (t) and k (t) 0,
(6.25)
with given k (0).

We continue to make the standard assumptions on the production function as in Assumptions 1 and 2. In addition, we assume that:
Assumption 8 u : [c, ) R is continuously dierentiable and strictly concave.
This is considerably stronger than what we need. In fact, concavity or even continuity is
enough for most of the results. But this assumption helps us avoiding inessential technical
141

details. The lower bound on consumption is imposed to have a compact set of consumption
possibilities.
The first step is to write the optimal growth problem as a (stationary) dynamic programming problem. This follows immediately from what we have done so far:
V (k) = max {u (c) + V [f (k) + (1 ) k c]}
c(k)
(6.26)
with (k) given by the interval [c, f (k) + (1 ) k] given the nonnegativity of the capital
stock.
Given the above theorems, in particular Theorems 15-22, the following proposition immediately follows:
Proposition 13 Given Assumptions 1, 2 and 8, the optimal growth model as specified in
(6.24) and (6.25), has a stationary solution characterized by the value function V (k) and
consumption function c (k). The amount s (k) is the capital stock of the next period, where
s (k) = f (k) + (1 ) k c (k). Moreover, V (k) is strictly increasing and concave in k and
s (k) is nondecreasing.
Proof. Optimality of the solution to the value function (6.26) for the problem (6.24) and
(6.25) follows from Theorems 15-18. That V (k) exists follows from Theorem 19, and the
fact that it is increasing and strictly concave, with the policy correspondence being a policy
function follows from Theorem 21.
Thus we only have to show that s (k) is nondecreasing. This can be proved by contradiction. Suppose, to arrive at a contradiction, that s (k) is decreasing, i.e., there exists k
and k 0 > k such that s (k) > s (k0 ). Since k0 > k, s (k) is feasible when the capital stock is
k0 . Moreover, since, by hypothesis, s (k) > s (k0 ), s (k 0 ) is feasible at capital stock k.
142

By optimality and feasibility, we must have:
V (k) = u (f (k) + (1 ) k s (k)) + V (s (k))
u (f (k) + (1 ) k s (k0 )) + V (s (k0 ))
V (k 0 ) = u (f (k0 ) + (1 ) k0 s (k0 )) + V (s (k0 ))
u (f (k0 ) + (1 ) k0 s (k)) + V (s (k)) .
Combining and rearranging these, we have
u (f (k) + (1 ) k s (k)) u (f (k) + (1 ) k s (k0 )) [V (s (k0 )) V (s (k))]
u (f (k0 ) + (1 ) k0 s (k))
u (f (k0 ) + (1 ) k0 s (k0 )) .
Or denoting z f (k) + (1 ) k and x s (k) and similarly for z 0 and x0 , we have
u (z x0 ) u (z x) u (z 0 x0 ) u (z 0 x) .
(6.27)
But clearly,
(z x0 ) (z x) = (z 0 x0 ) (z 0 x) ,
which combined with the fact that z 0 > z and that u is strictly concave and increasing implies
that
u (z x0 ) u (z x) > u (z 0 x0 ) u (z 0 x) ,
contradicting (6.27). This establishes that s (k) must be nondecreasing everywhere.
In addition, Assumption 2 (the Inada conditions) imply that savings and consumption
levels have to be interior, thus Theorem 22 applies and immediately establishes:
143

Proposition 14 Given Assumptions 1, 2 and 8, the value function V (k) defined above is
dierentiable.
Consequently, from Theorem 23, we can look at the Euler equations. To do this, let us
write the recursive formulation as
V (k) = max {u (f (k) + (1 ) k s) + V [s]}
s(k)
In this case the Euler equation takes the simple form:

u0 (c) = V 0 (s)
where s denotes the next dates capital stock. Applying the envelope condition, we have
V 0 (k) = [f 0 (k) + (1 )] u0 (c) .
Consequently, we have the familiar-looking condition
u0 (ct ) = [f 0 (kt+1 ) + (1 )] u0 (ct+1 ) .
A steady state is defined as usual as an allocation in which the capital-labor ratio and
consumption do not depend on time, so again denoting this by *, we have the steady state
capital-labor ratio as
[f 0 (k ) + (1 )] = 1,
(6.28)
which is a remarkable result, because it shows that the steady state capital-labor ratio does
not depend on preferences, but simply on technology, depreciation and the discount factor.
We will obtain an analogue of this result in the continuous-time neoclassical model as well.
Moreover, since f () is strictly concave, k is uniquely defined. Thus we have
144

Proposition 15 In the neoclassical optimal growth model specified in (6.24) and (6.25)
with Assumptions 1, 2 and 8, there exists a unique steady-state capital-labor ratio k given
by (6.28), and starting from any initial k0 , the economy monotonically converges to this
unique steady state, i.e., if k0 < k , then the equilibrium capital stock sequence kt k and
if k0 > k , then the equilibrium capital stock sequence kt k .
Proof. Uniqueness and existence were established above. To establish monotonic convergence, simply use the fact that kt+1 = s (kt ) with s () defined in Proposition 13, and was
shown to be nondecreasing. Since the steady-state is unique, s (kt ) for kt 6= k cannot satisfy
k = s (k ), thus it must be increasing. Next, note s (kt ) is nonnegative and can never exceed

s k = f s k , which exists, is unique and finite by Assumption 2. Consequently, s (kt ) is
an increasing sequence in a compact set. A monotonically increasing sequence in a compact
set necessarily converges, thus s (kt ) s for some s. However, any limit point of s (kt ) must
be equal to k , since this is the unique steady state, thus s (kt ) k , completing the proof.
Consequently, in the optimal growth model there exists a unique steady state and the
economy monotonically converges to the unique steady state, for example by accumulating
more and more capital (if it starts with a too low capital-labor ratio).
Finally, we can also show that consumption also monotonically increases (or decreases)
along the path of adjustments to the unique-steady state:
Proposition 16 c (k) defined in Proposition 13 is nondecreasing. Moreover, if k0 < k ,
then the equilibrium consumption sequence ct c and if k0 > k , then ct c , where c is
given by
c = f (k ) k .
145

The proof of Proposition 16 is left as an exercise to you.
This treatment shows that the optimal growth model is very tractable, and we can
do the usual exercises we performed with the Solow growth model, including incorporating
population growth and technological change. There is no immediate counterpart of a savings
rate, since this depends on the utility function. But interestingly and very dierently from
the Solow growth model, the steady state capital-labor ratio and steady state income level
do not depend on the savings rate anyway.
We will return to all of these issues, and provide a more detailed discussion of the
equilibrium growth in the context of the continuous time model. But for now, it is also
useful to see how this optimal growth allocation can be decentralized, i.e., in this particular
case we can use the second welfare theorem to show that the optimal growth allocation is
also a competitive equilibrium.
6.5
Competitive Equilibrium Growth
To show that the Pareto optimal growth allocation can be decentralized is very straightforward. Suppose that all households are identical, with utility function given by u (c) as
above, and normalize their measure to 1. Suppose they all start with capital stock k0 . The
other side of the economy are competitive firms. Households rent their capital to firms. It
is straightforward to see that households will receive a rental price of Rt = f 0 (kt ) because of
competitive market prices. They will therefore face a gross rate of return equal to
rt = [f 0 (kt ) + (1 )]
(6.29)
for renting one unit of capital at time t in terms of date t + 1 goods. In addition, they will
receive the wage rate of wt = f (kt ) kt f 0 (kt ).
146

Now consider the maximization problem of the representative household:
max
{ct ,at }
t=0
t u (ct )
t=0
subject to the flow budget constraint

at+1 = rt at + wt ct ,
where at denotes asset holdings at time t, and also subject to a no Ponzi constraint which
requires the individual asset holdings not to go to minus infinity. I will discuss this in greater
detail below.
For now, it suces to see that by exactly the same Euler equation type arguments, we
have
u0 (ct ) = rt+1 u0 (ct+1 ) .
Imposing steady state implies that ct = ct+1 , therefore, we must have
rt+1 = 1.
Next, market clearing immediately implies that rt+1 is given by (6.29), so the capital-labor
ratio of the competitive equilibrium is given by
[f 0 (kt+1 ) + (1 )] = 1,
The steady state is given by
[f 0 (k ) + (1 )] = 1,
Both are exactly as in the optimal growth problem, i.e., equations (6.28) and (6.29). In fact,
a similar argument establishes that the whole competitive equilibrium path is identical to
147

the optimal growth path. This is, of course, not surprising in view of the second (and first)
welfare theorems we saw above.
We will discuss many of the implications of competitive economic growth in the neoclassical model once we go through the continuous time version as well.
148
Chapter 7
Brief Review of Optimal Control
The continuous time problem brings a number of new issues. The main reason is that even
with a finite horizon, the maximization is with respect to an infinite-dimensional object (in
fact an entire function, y : [t0 , t1 ] R). This requires us to review some basic ideas from
the calculus of variation and from optimal control, but most of the tools and ideas that are
necessary for this course are very straightforward.
I will start with the finite-horizon problem and the simplest treatment (which is much
more similar to calculus of variation than optimal control), to give you the basic idea, and
then provide the more powerful theorems from optimal control.
149
7.1
Finite-Horizon Optimal Control
7.1.1
The Fundamental Problem
Consider the following finite-horizon continuous time problem

Z t1
max J (x (t) , y (t))
f (t, x (t) , y (t)) dt
x(t),y(t),x1
(7.1)
subject to
x (t) = g (t, x (t) , y (t))
(7.2)
y (t) Y (t) for all t, x (0) = x0 and x (t1 ) = x1 .
(7.3)
and
Here x (t) R is the state variable, whose behavior is governed by the dierential equation
(7.2). y (t) Y (t) R is the control variable. In addition, we assume that f and g are
continuously dierentiable functions.
This is the simplest optimal control problem because it has boundary conditions that
regulate when the planning horizon ends (more generally, t1 can be a choice variable as well,
or it could extend to infinity as we will see later).
The diculty of this problem arises from two features:
1. We are choosing a function: y : [0, t1 ] Y rather than a vector or a finite dimensional
object.
2. The constraint takes an unusual form of a dierential equation.
These features make it dicult for us to know what type of optimal policy to look for.
For example, y may be a very discontinuous function. It may often hit the boundary of the
feasible set etc.
150
7.1.2
Variational Arguments
Before going into greater detail, let us try to understand the essence of the problem, which
can be done by using the variational principle of the calculus of variation.
For this purpose, let us suppose that
a continuous function y () defined over [0, t1 ] with y (t) IntY (t)
which achieves the optimum in this problem. Therefore, we are ruling out both the boundary
conditions and discontinuities.
Now consider the following variation
y (t, ) = y (t) + (t) ,
where (t) is an arbitrary fixed continuous function. We refer to this as a variation, because
given (t), by varying , we obtain dierent sequences of controls. The problem, of course,
is that some of these may be infeasible, i.e., y (t, )
/ Y (t) for some t. However, since
y (t) IntY (t), and a continuous function over a compact set [0, t1 ] is bounded, we can
always find > 0 such that for any () function
y (t) + (t) IntY (t)
for all < . Thus we can conduct variational arguments for small s. But, in analogy
with regular calculus, the argument that there is no gain from a variation for small s is
essentially what we need.
To prepare for these arguments, let us fix an arbitrary (), and define x (t, ) as the
path of the state variable corresponding to the path of control variable y (t, ). This implies
that x (t, ) is given by:
x (t, ) = g (t, x (t, ) , y (t, )) for all t [0, t1 ] and with x (0, ) = x0 .
151
(7.4)

Now define for < :
()
t1
(7.5)
f (t, x (t, ) , y (t, )) dt.
By the fact that y (t) is optimal, and that for < , y (t, ) (and thus x (t, )) is feasible,
we have
() (0) for all < .
Next, rewrite the equation (7.4), so for all t [0, t1 ]:
g (t, x (t, ) , y (t, )) x (t, ) 0.
Now for any continuously dierentiable function : [0, t1 ] R, it must be the case that
Z t1
(t) [g (t, x (t, ) , y (t, )) x (t, )] dt = 0.
(7.6)
0
The function (), chosen suitably, will be the costate variable, with a similar interpretation
to the Lagrange multipliers in regular (constrained) optimization. Now add (7.6) to (7.5) to
obtain:
()
t1
[f (t, x (t, ) , y (t, )) + (t) [g (t, x (t, ) , y (t, )) x (t, )]] dt.
Now we want to evaluate this term. Start by considering the integral

Integrating this by parts, we have
Z
Z t1
(t) x (t, ) dt = (t1 ) x (t1 , ) (0) x0
0
t1
R t1
0
(t) x (t, ) dt.
(t) x (t, ) dt.
Substituting this back, we obtain:

Z t1 h
i
()
f (t, x (t, ) , y (t, )) + (t) g (t, x (t, ) , y (t, )) + (t) x (t, ) dt
0
(t1 ) x (t1 , ) + (0) x0 .
152

Now dierentiate this term with respect to . This is feasible by Leibnizs rule, since f
and g are continuously dierentiable, and so is y (t, ) in by construction. Denoting their
derivatives by x and y , and the derivatives of f and g by ft , fx , fy etc., dierentiation gives
Z
()
t1
h
i
fx (t, x (t, ) , y (t, )) + (t) gx (t, x (t, ) , y (t, )) + (t) x (t, ) dt
t1
[fy (t, x (t, ) , y (t, )) + (t) gy (t, x (t, ) , y (t, ))] (t) dt
(t1 ) x (t1 , ) .
Now at evaluating this expression at = 0, we have
0
(0)
t1
h
i
fx (t, x (t) , y (t)) + (t) gx (t, x (t) , y (t)) + (t) x (t, 0) dt
t1
[fy (t, x (t) , y (t)) + (t) gy (t, x (t) , y (t))] (t) dt
(t1 ) x (t1 , 0) .
where x (t) denotes the path of the state variable corresponding to the optimal plan, y (t).
As with the standard finite-dimensional optimization, if there exists some function (t) for
which 0 (0) 6= 0, this means that the value of the program can be improved. Therefore, we
need to have
0 (0) 0 for all (t) .
This can only be possible if the second integral is equal to zero for all (t), i.e., only if
Z
t1
[fy (t, x (t) , y (t)) + (t) gy (t, x (t) , y (t))] (t) dt = 0 for all (t) ,
which is only possible if

fy (t, x (t) , y (t)) + (t) gy (t, x (t) , y (t)) 0 for all t [0, t1 ] .
153
(7.7)

By the same reasoning, x is also arbitrary, so we need to have the first integral identically
equal to zero, or
(t) = [fx (t, x (t) , y (t)) + (t) gx (t, x (t) , y (t))]
(7.8)
and therefore (t1 ) = 0.

This derivation (from calculus of variation) therefore has established the following theorem:
Theorem 24 (Necessary Conditions) Consider the problem of maximizing (7.1) subject to (7.2) and (7.3), with f and g continuously dierentiable, has an interior solution
y (t) IntY (t) with corresponding path of state variable x (t), then there exists a continuously dierentiable costate function () defined over t [0, t1 ] such that (7.2), (7.7) and
(7.8) hold.
7.1.3
Simplified Maximum Principle
The conditions (7.7) and (7.8) should remind you of a Lagrangian maximization. By analogy
with the Lagrangian, a much more economical way of expressing Theorem 24 is to construct
the equivalent of the Lagrangian in this case, the Hamiltonian:
H (t, x, y, ) f (t, x (t) , y (t)) + (t) gy (t, x (t) , y (t)) .
(7.9)
Then we have
Theorem 25 (Simplified Maximum Principle) Consider the problem of maximizing
(7.1) subject to (7.2) and (7.3), with f and g continuously dierentiable, has an interior
solution y (t) IntY (t) with corresponding path of state variable x (t). Let H (t, x, y, ) be
154

given by (7.9). Then the optimal control y (t) and the corresponding path of the state variable
x (t) satisfy the following necessary conditions:
Hy (t, x (t) , y (t) , (t)) = 0 for all t [0, t1 ] .
(7.10)
(t) = Hx (t, x (t) , y (t) , (t)) for all t [0, t1 ] , and (t1 ) = 0.
(7.11)
x (t) = H (t, x (t) , y (t) , (t)) for all t [0, t1 ] , and x (0) = x0 .
(7.12)
Theorem 25 is a simplified version of the celebrated Maximum Principle, and the

more general version will be given below.
For now, a couple of features are worth noting:
1. As in the usual constrained maximization problems, we find the optimal solution by
looking jointly for a set of multipliers and the optimal path of the control and state
variables. Here the multipliers are referred to as the costate variables.
2. Again as in the usual constrained maximization problems, the costate variables are
informative about the value of relaxing the constraint. Here (t) is the value of an
infinitesimal increase in x (t) at time t.
3. With this interpretation, it makes sense that (t1 ) = 0 is part of the necessary conditions. After the planning horizon, there is no value to having more x. This is therefore
the finite-horizon equivalent of the transversality condition we encountered above.
While Theorem 25 gives necessary conditions, as in regular optimization problems, these
may not be sucient. Suciency is again guaranteed by imposing concavity. The following
theorem provides conditions for the necessary conditions to also be sucient to characterize
the optimal plan.
155

Theorem 26 (Mangasarian Sucient Conditions) Consider the problem of maximizing (7.1) subject to (7.2) and (7.3), with f and g continuously dierentiable. Define
H (t, x, y, ) as in (7.9), and suppose that an interior solution y (t) IntY (t) and the corresponding path of state variable x (t) satisfy (7.10)-(7.12). Suppose also that for the resulting
costate variable (t), H (t, x, y, ) is jointly concave in (x, y) for all t [0, t1 ], then y (t) and
the corresponding x (t) achieve the unique global maximum of (7.1).
An alternative set of sucient conditions are provided by Arrow:
Theorem 27 (Arrow Sucient Conditions) Consider the problem of maximizing (7.1)
subject to (7.2) and (7.3), with f and g continuously dierentiable. Define H (t, x, y, ) as
in (7.9), and suppose that an interior solution y (t) IntY (t) and the corresponding path
of state variable x (t) satisfy (7.10)-(7.12). Given the resulting costate variable (t), define
M (t, x, ) H (t, x, y (t) , ). If M (t, x, ) is concave in x for all t [0, t1 ], then y (t) and
the corresponding x (t) achieve the unique global maximum of (7.1).
The proofs of these theorems are long and not necessary for what will follow, so they are
omitted.
As stated Theorem 26, even Theorem 27, is dicult to apply, since given the concavity/convexity of the g () function, the concavity of the Hamiltonian will depend on the sign
on the costate variable (t). The following lemma (again proof omitted) provides some
information on the sign of (t):
Lemma 2 Suppose that y (t) and the corresponding x (t) are the optimal solutions to maximizing (7.1) subject to (7.2) and (7.3), with corresponding costate variable (t). Then we
have that
156

1. If fx (t, x (t) , y (t) , (t)) > 0 for all t [0, t1 ], then (t) > 0 for all t [0, t1 ).
2. If fx (t, x (t) , y (t) , (t)) < 0 for all t [0, t1 ], then (t) < 0 for all t [0, t1 ).
3. If fx (t, x (t) , y (t) , (t)) = 0 for all t [0, t1 ], then (t) = 0 for all t [0, t1 ).
Therefore, as in standard maximization problems, there is an intimate relationship between the sign of the multiplier and the returns from increasing the stock of the state variable,
but here we need the eect of the state variable to be positive everywhere in order for the
multiplier to be positive, etc.
The usefulness of Lemma 2 comes from the fact that if (t) > 0 for all t (which follows
from fx > 0 for all t), then the Hamiltonian given in (7.9) is a concave function of x and y
for given (t) when f and g are concave functions. Therefore, the sucient conditions in
Theorem 26 are very straightforward to check (though often quite restrictive).
7.1.4
Generalizations
The above theorems can be immediately generalized to the case in which the state variable
and the controls are vectors rather than scalars, and also to the case in which there are constraints. The constrained case requires constraint qualification conditions as in the standard
finite-dimensional optimization case. These are slightly more messy to express, and since we
will make no use of the constrained maximization problems, I will not state these theorems.
The vector-values theorems are direct generalizations of the ones presented above, and
are useful in growth models with multiple capital goods. In particular, let
max
x(t),y(t),x1
J (x (t) , y (t))
157
t1
f (t, x (t) , y (t)) dt
(7.13)

subject to
x (t) = g (t, x (t) , y (t)) ,
(7.14)
y (t) Y (t) for all t, x (0) = x0 and x (t1 ) = x1 .
(7.15)
and
Here x (t) RK for some K 1 is the state variable and again y (t) Y (t) RN for some
N 1 is the control variable. In addition, we again assume that f and g are continuously
dierentiable functions. We then have:
Theorem 28 (Maximum Principle) Consider the problem of maximizing (7.13) subject to (7.14) and (7.15), with f and g continuously dierentiable, has an interior solution
y
(t) IntY (t) with corresponding path of state variable x (t). Let H (t, x, y, ) be given by
H (t, x, y, ) f (t, x (t) , y (t)) + (t) gy (t, x (t) , y (t)) ,
(7.16)
where (t) RK . Then the optimal control y

(t) and the corresponding path of the state
variable x (t) satisfy the following necessary conditions:
y H (t, x (t) , y
(t) , (t)) = 0 for all t [0, t1 ] .
(7.17)
(t) = x H (t, x (t) , y

(t) , (t)) for all t [0, t1 ] and (t1 ) = 0.
(7.18)
(t) , (t)) for all t [0, t1 ] and x (0) = x0 .

x (t) = H (t, x (t) , y
(7.19)
Moreover, we have straightforward generalizations of the suciency conditions:

Theorem 29 (Mangasarian Sucient Conditions) Consider the problem of maximizing (7.13) subject to (7.14) and (7.15), with f and g continuously dierentiable. Define
158

H (t, x, y, ) as in (7.16), and suppose that an interior solution y
(t) IntY (t) and the corresponding path of state variable x (t) satisfy (7.17)-(7.19). Suppose also that for the resulting
(t)
costate variable (t), H (t, x, y, ) is jointly concave in (x, y) for all t [0, t1 ], then y
and the corresponding x (t) achieve the unique global maximum of (7.13).
Theorem 30 (Arrow Sucient Conditions) Consider the problem of maximizing (7.13)

subject to (7.14) and (7.15), with f and g continuously dierentiable. Define H (t, x, y, ) as
in (7.16), and suppose that an interior solution y
(t) IntY (t) and the corresponding path of
state variable x (t) satisfy (7.17)-(7.19). Suppose also that for the resulting costate variable
(t), define M (t, x, ) H (t, x, y
(t) , ). If M (t, x, ) is concave in x for all t [0, t1 ],
then y
(t) and the corresponding x (t) achieve the unique global maximum of (7.13).
7.1.5
Limitations
The limitations of what we have done so far are obvious. First, we have assumed that a
continuous and interior solution to the optimal control problem exists. This is in general
a very strong assumption. Second, and equally important for our purposes, we have so far
looked at the finite horizon case, whereas analysis of growth models requires us to solve
infinite horizon problems. To deal with both of these issues, we need to look at the more
modern theory of optimal control. This is done in the next section.
159
7.2
7.2.1
Infinite-Horizon Optimal Control

The Basic Problem: Necessary and Sucient Conditions
Consider the infinite-horizon following version of problem of maximizing (7.1) subject to

(7.2) and (7.3).
max J (x (t) , y (t))
x(t),y(t)
f (t, x (t) , y (t)) dt
(7.20)
subject to
x (t) = g (t, x (t) , y (t)) ,
(7.21)
y (t) R for all t, x (0) = x0 and lim x (t) x1 .
(7.22)
and
t
The main dierence is that now time runs to infinity, and there is no choice of endpoint
x1 . In addition, I have simplified the problem by removing the feasibility set on the control
y (t), simply requiring this function to be real-valued.
For this problem, we call a pair (x (t) , y (t)) admissible if y (t) is a piecewise continuous
function of time and x (t) is a piecewise smooth function of time satisfying (7.21) given y (t)
(since x (t) is given by a continuous dierential equation, the piecewise continuity of y (t)
ensures the piecewise smoothness of x (t)). Notice that this is a significant generalization of
the above approach, since discontinuous controls are allowed as long as they are piecewise
continuous.
There are a number of technical diculties when dealing with the infinite-horizon case,
which are similar to those in the discrete time analysis. Primary among those is the fact
160

that the value of the functional in (7.20) may not be finite. We will deal with some of these
issues below.
The main theorem for the infinite-horizon optimal control problem is the following maximum principle:
Theorem 31 Suppose that problem of maximizing (7.20) subject to (7.21) and (7.22), with
f and g continuously dierentiable, has an interior solution y (t) with corresponding path of
state variable x (t). Let H (t, x, y, ) be given by (7.9). Then the optimal control y (t) and
the corresponding path of the state variable x (t) satisfy the following necessary conditions:
Hy (t, x (t) , y (t) , (t)) = 0 for all t R+ .
(7.23)
(t) = Hx (t, x (t) , y (t) , (t)) for all t R+ .
(7.24)
x (t) = H (t, x (t) , y (t) , (t)) for all t R+ , x (0) = x0 and lim x (t) x1 .
t
(7.25)
Notice an important dierence between Theorem 25 and the current theorem. There is no
boundary condition in Theorem 31 corresponding to (t1 ) = 0 of Theorem 25. Consequently,
the necessary conditions in Theorem 31 will not uniquely pin down a solution path. To do
this we need an infinite-horizon version of the transversality condition. One might be tempted
to impose a condition of the form
lim (t) 0
as the transversality condition, but this is not in general the case. We will see an example
where this does not apply soon. A milder transversality condition of the form
lim H (t, x, y, ) = 0
161

always applies, but is not easy to check. Stronger transversality conditions apply when we
put more structure on the problem. Before we do this, there are immediate generalizations
of the suciency theorems to this case.
Theorem 32 (Mangasarian Sucient Conditions for Infinite Horizon) Consider

the problem of maximizing (7.20) subject to (7.21) and (7.22), with f and g continuously
dierentiable. Define H (t, x, y, ) as in (7.9), and suppose that a solution y (t) and the
corresponding path of state variable x (t) satisfy (7.23)-(7.25). Suppose also that for the
resulting costate variable (t), H (t, x, y, ) is jointly concave in (x, y) for all t R+ and
that limt (t) (x (t) x (t)) 0 for all x (t) implied by an admissible control path y (t),
then y (t) and the corresponding x (t) achieve the unique global maximum of (7.20).
Theorem 33 (Arrow Sucient Conditions for Infinite Horizon) Consider the problem of maximizing (7.20) subject to (7.21) and (7.22), with f and g continuously dierentiable. Define H (t, x, y, ) as in (7.9), and suppose that a solution y (t) and the corresponding
path of state variable x (t) satisfy (7.23)-(7.25). Given the resulting costate variable (t), define M (t, x, ) H (t, x, y (t) , ). If M (t, x, ) is concave in x and limt (t) (x (t) x (t))
0 for all x (t) implied by an admissible control path y (t), then y (t) and the corresponding
x (t) achieve the unique global maximum of (7.20).
Notice that both of these this eciency theorems have the dicult to check condition
that limt (t) (x (t) x (t)) 0 for all x (t) implied by an admissible control path y (t).
This condition will disappear when we can impose a proper transversality condition.
162
7.2.2
Lack of Transversality Conditions
The following example, which is very close to the original Ramsey model, illustrates that
there are in general no transversality conditions.
Example 2 Consider the following problem:
Z
[log (c (t)) log c ] dt
max
0
subject to
k (t) = [k (t)] c (t) k (t)
k (0) = 1
and
lim k (t) 0
where c [k ] k and k (/)1/(1) . In other words, c is the maximum level of

consumption that can be achieved in this model. This way of writing the objective function
makes sure that the integral converges and takes a finite value (since c (t) cannot exceed c
forever).
The Hamiltonian is straightforward to construct and takes the form
H (k, c, ) = [log c log c ] + [k c k] ,
and implies the following necessary conditions (dropping time dependence to simplify the
notation):
1
=0
c
= k 1 = .
Hc =
Hk
163

It can be verified that c (t) c satisfies the necessary conditions, and must be part of any
optimal path. This, however, implies that
lim (t) =
1
> 0 and lim k (t) = k .
t
c
Therefore, the equivalent of the standard finite-horizon transversality conditions do not hold.
It can be verified, however, that along the optimal path
lim H (k (t) , c (t) , (t)) = 0,
so the weaker transversality condition holds.
7.2.3
Discounted Infinite-Horizon Optimal Control
Part of the diculty, especially regarding the absence of a transversality condition, comes
from the fact that we did not impose enough structure on the functions f and g. As discussed
above, our interest is with the growth models where the utility is discounted exponentially.
Then the problem is a more special one, taking the form:
max J (x (t) , y (t))
x(t),y(t)
exp (t) u (x (t) , y (t)) dt with > 0,
(7.26)
subject to
x (t) = g (t, x (t) , y (t)) ,
(7.27)
y (t) R for all t, x (0) = x0 and lim x (t) x1 .
(7.28)
and
t
The special feature of this problem is that the payo function, the equivalent of f ,
depends on time only through exponential discounting. The Hamiltonian in this case would
164

be:
H (t, x (t) , y (t) , (t)) = exp (t) u (x (t) , y (t)) + (t) g (t, x (t) , y (t))
= exp (t) [u (x (t) , y (t)) + (t) g (t, x (t) , y (t))] ,
where the second line defines (t) exp (t) (t). This equation makes it clear that the
Hamiltonian depends on time explicitly only through the exp (t) term.
In fact, in this case, rather than working with the standard Hamiltonian, we can work
with the current-value Hamiltonian, defined as
(x (t) , y (t) , (t)) u (x (t) , y (t)) + (t) g (t, x (t) , y (t))
H
(7.29)
which is not explicitly a function of time.

We have the following result, which states not only the necessary conditions similar to
Theorem 31, but also shows the necessity of a transversality condition:
Theorem 34 (Maximum Principle for Discounted Infinite-Horizon Problems) Suppose that problem of maximizing (7.26) subject to (7.27) and (7.28), with u and g continuously dierentiable, has a solution y (t) with corresponding path of state variable x (t). Let
(x, y, ) be the current-value Hamiltonian given by (7.29). Then the optimal control y (t)
H
and the corresponding path of the state variable x (t) satisfy the following necessary conditions:
y (x (t) , y (t) , (t)) = 0 for all t R+ .
H
(7.30)
x (x (t) , y (t) , (t)) for all t R+ and lim [exp (t) x (t) (t)] = 0.
(t) (t) = H
t
(7.31)
(x (t) , y (t) , (t)) for all t R+ , x (0) = x0 and lim x (t) x1 .
x (t) = H
t
165
(7.32)

The important feature of Theorem 34, which is the most useful theorem for the rest of
the course, is that it also shows the transversality condition
lim [exp (t) x (t) (t)] = 0
is necessary. Notice that compared to the transversality condition before, there is the additional term exp (t). This is because the transversality condition applies to the original
costate variable (t), i.e., limt [x (t) (t)] = 0, and as shown above the current-value
costate variable (t) is given by (t) = exp (t) (t) = 0.
The suciency theorems can also be strengthened now by incorporating the transversality condition and expressing the conditions in terms of the current-value Hamiltonian:
Theorem 35 (Mangasarian Sucient Conditions for Discounted Infinite-Horizon
Problems) Consider the problem of maximizing (7.26) subject to (7.27) and (7.28), with u
(x, y, ) as the current-value Hamiltonian as in
and g continuously dierentiable. Define H
(7.29), and suppose that a solution y (t) and the corresponding path of state variable x (t)
satisfy (7.30)-(7.32). Suppose also that for the resulting current-value costate variable (t),
(x, y, ) is jointly concave in (x, y) for all t R+ , then y (t) and the corresponding x (t)
H
achieve the unique global maximum of (7.26).
Theorem 36 (Arrow Sucient Conditions for Discounted Infinite-Horizon Problems) Consider the problem of maximizing (7.26) subject to (7.27) and (7.28), with u
(x, y, ) as the current-value Hamiltonian as
and g continuously dierentiable. Define H
in (7.29), and suppose that a solution y (t) and the corresponding path of state variable
x (t) satisfy (7.30)-(7.32). Given the resulting current-value costate variable (t), define
(x, y, ). If M (t, x, ) is concave in x, then y (t) and the corresponding x (t)
M (t, x, ) H
achieve the unique global maximum of (7.26).
166
Chapter 8
The Neoclassical Growth Model
We are now ready to start our analysis of the standard neoclassical growth model (also known
as the Ramsey, or Cass-Koopmans model). This model diers from the Solow model only
in explicitly modeling the consumer side and endogenizing savings (i.e., allowing consumer
optimization). Beyond its use as a basic growth model, this model has become a workhorse
for many areas of macroeconomics, including the analysis of fiscal policy, taxation, business
cycles, and even monetary policy.
8.1
Preferences, Technology and Demographics
The economy is an infinite-horizon economy in continuous time (the discrete-time version

was analyzed above). We assume that the economy admits a representative household with
instantaneous utility function
u (c (t)) ,
and we make the following standard assumptions on this utility function:
167
(8.1)

Assumption 9 u (c) is strictly increasing, twice continuously dierentiable with derivatives
u0 and u00 , and concave, and satisfies the following Inada type assumptions:
lim u0 (c) = and lim u0 (c) = 0.
c0
Alternatively, we can think of the economy as consisting of a unit measure of identical

households each with the instantaneous utility function given by (8.1). Population within
each household grows at the rate n, starting with L (0) = 1, so that total population is
L (t) = exp (nt) .
(8.2)
All members of the household supply their labor inelastically.

Consequently, we assume that each household maximizes overall utility U (0) at time
t = 0 given by
exp ( ( n) t) u (c (t)) dt,
(8.3)
where c (t) is consumption per capita at time t, is the subjective discount rate, and the
eective discount rate is n, since it is assumed that the household derives utility from the
consumption of its additional members in the future as well. We assume throughout that
Assumption 10
> n.
This assumption ensures that there is in fact discounting of future utility streams. Otherwise, (8.3) would have infinite value, and standard optimization techniques would not be
useful in determining what an optimal plan is (we would need to use over-taking type criteria
etc.). More generally, there is something somewhat strange about models in which utility is
168

equal to infinity. Assumption 10 makes sure that in the model without growth, discounted
utility is finite. When there is growth, we will strengthen this assumption.
We start with an economy without any technological progress. Factor and product
markets are competitive, and the production possibilities set of the economy is represented
by the aggregate production function
Y (t) = F [K (t) , L (t)] ,
which is a simplified version of the production function (2.1) used in the Solow growth model
above, where the simplification comes from the fact that there is no technology term. As in
the analysis there, we impose the standard constant returns to scale and Inada assumptions
embedded in Assumptions 1 and 2. The constant returns to scale feature enables us to work
with the per capita production function f () such that, output per capita is given by
Y (t)
L (t)
K (t)
= F
,1
L (t)
f (k (t)) ,
y (t)
where, as before,
k (t)
K (t)
.
L (t)
(8.4)
Competitive factor markets then imply that, at all points in time, the rental rate of
capital and the wage rate are given by:
R (t) = FK [K(t), L(t)] = f 0 (k(t)).
(8.5)
w (t) = FL [K(t), L(t)] = f (k (t)) k (t) f 0 (k(t)).
(8.6)
and
169

The household optimization side is more complicated, since each household will solve a
continuous optimization problem in deciding how to use their assets and allocating consumption over time. To prepare for this, let us denote the asset holdings of the representative
household at time t by A (t). Then we have the following law of motion for the total assets
of the household
A (t) = r (t) A (t) + w (t) L (t) c (t) L (t)
where c (t) is consumption per capita of the household, r (t) is the risk-free market rate of
return on assets, and w (t) L (t) is the total labor income earnings of the household. Note
that the r(t) is now a flow return, not a gross return as used before. Defining per capita
assets as
a (t)
A (t)
,
L (t)
we obtain:
a (t) = (r (t) n) a (t) + w (t) c (t) .
(8.7)
In practice, household assets can consist of capital stock, K (t), which they rent to
firms and government bonds, B (t). In models with uncertainty, households would have a
portfolio choice between the capital stock of the corporate sector and riskless bonds. Government bonds play an important role in models with uncertainty and heterogeneity, allowing
households to smooth idiosyncratic shocks. But in representative household models without government, their only use is in pricing assets (for example riskless bonds versus equity
etc.), since they have to be in zero net supply, i.e., total supply of bonds has to be B (t) = 0.
Consequently, we will have that assets per capita are equal to the capital stock per capita
(or the capital-labor ratio in the economy), i.e.:
a (t) = k (t) .
170

Moreover, since there is no uncertainty here and a depreciation rate of , the market rate of
return on assets will be given by
r (t) = R (t) .
(8.8)
The equation (8.7) is only a flow constraint, and it is not sucient to act as a proper
budget constraint on the individual. To see this, consider a finite-horizon economy, ending
at the time T . In this case, we could express the entire set of constraints on the household
as a single budget constraint of the form:
Z T
Z T
c (t) L(t) exp
r (s) ds dt + A (T )
0
t
Z T
Z
Z T
=
w (t) L (t) exp
r (s) ds dt + A (0) exp
t
(8.9)
T
r (s) ds ,
which requires the households discounted budget constraint to hold at time T (hence all
income and expenditures are carried forward to date T units). Clearly, dierentiating this
expression and expanding L(t) gives (8.7). And yet (8.7) by itself does not guarantee that
the level of A (T ) is such that this lifetime budget constraint holds. Therefore, in the finitehorizon, we would simply impose this lifetime budget constraint as a boundary condition.
In the infinite-horizon case, we need a similar boundary condition. This is generally
referred to as the no-Ponzi-game condition, and takes the form
Z t
lim a (t) exp

(r (s) n) ds 0.
t
(8.10)
This condition is stated as an inequality, to ensure that the individual does not asymptotically
tend to a negative wealth. But we will see from the transversality condition of the individual
problem that the individual would never want to have positive wealth asymptotically, so the
no-Ponzi-game condition can be alternatively stated as:
Z t
lim a (t) exp

(r (s) n) ds = 0.
t
171
(8.11)

In what follows we will use (8.10), and then derive (8.11) using the transversality condition
explicitly.
The name no-Ponzi-game condition comes from the chain-letter schemes, which are sometimes called Ponzi games, where an individual can continuously borrow from a competitive
financial market (or more often, from unsuspecting souls that become part of the chain-letter
scheme) and pay his or her previous debts using current borrowings.
To understand where this form of the no-Ponzi-game condition comes from, multiply
R
T
both sides of (8.9) by exp 0 r (s) ds to obtain
Z
Z t
Z T
c (t) L(t) exp

r (s) ds dt + exp
r (s) ds A (T )
0
0
Z t
w (t) L (t) exp

r (s) ds dt + A (0) ,
0
then divide everything by L (0) and note that L(t) grows at the rate n, to obtain
Z t
Z T
Z T
c (t) exp
(r (s) n) ds dt + exp
(r (s) n) ds a (T )
0
0
0
Z t
Z T
=
w (t) exp
(r (s) n) ds dt + a (0) .
0
Now take the limit as T and use the no-Ponzi-game condition (8.11) to obtain
Z t
Z t
Z
Z
c (t) exp
(r (s) n) ds dt = a (0) +
w (t) exp
(r (s) n) ds dt,
0
which essentially requires the discounted sum of expenditures to be equal to initial income
plus the discounted sum of labor income. Therefore this equation is a direct extension of
(8.9) to infinite horizon. This derivation makes it clear that the no-Ponzi-game condition
(8.11) essentially ensures that the individuals lifetime budget constraint holds in infinite
horizon.
172
8.2
8.2.1
Characterization of Equilibrium
We are now in a position to define an equilibrium in this dynamic economy. I will provide
two definitions, the first somewhat less formal, and second more useful in characterizing the
equilibrium below.
A competitive equilibrium of the Ramsey economy consists of paths of consumption,
capital stock, wage rates and rental rates of capital, [C (t) , K (t) , w (t) , R (t)]
t=0 such that
the representative household maximizes its utility given initial capital stock K (0) and the
time path of prices [w (t) , R (t)]

t=0 , and the time path of prices [w (t) , R (t)]t=0 is such that
given the time path of capital stock and labor [K (t) , L (t)]
t=0 all markets clear.
Notice that in equilibrium we need to determine the entire time path of real quantities
and the associated prices. This is a very important point. In dynamic models whenever we
talk of equilibrium, this refers to the entire path of quantities and prices. In some models,
we will focus on the steady-state equilibrium, but equilibrium always refers to the entire
path.
Since everything can be equivalently defined in terms of per capita variables, let me
states the alternative definition in terms of those:
Definition 8 A competitive equilibrium of the Ramsey economy consists of paths of per

capita consumption, capital-labor ratio, wage rates and rental rates of capital, [c (t) , k (t) , w (t) , R (t)]
t=0
such that the representative household maximizes (8.3) subject to (8.7) and (8.10) given initial capital-labor ratio k (0) and factor prices [w (t) , R (t)]
t=0 with the rate of return on assets
r (t) given by (8.8), and factor prices [w (t) , R (t)]
t=0 are given by (8.5) and (8.6).
173
8.2.2
The Consumer Problem
Let us start with the problem of the representative consumer. From the definition of equilibrium we know that this is to maximize (8.3) subject to (8.7) and (8.11). Let us ignore
(8.11) first, and set up the current value Hamiltonian:
(a, c, ) = u (c (t)) + (t) [w (t) + (r (t) n) a (t) c (t)] ,
H
with state variable a, control variable c and current-value costate variable .
From Theorem 34, the following are necessary conditions:
c (a, c, ) = 0 = u0 (c (t)) (t) ,
H
a (a, c, ) = (t) + ( n) (t) = (t) (r (t) n) ,
H
lim [exp ( ( n) t) (t) a (t)] = 0.
and the transition equation.

Notice that the transversality condition is written in terms of the current-value costate
variable.
(a, c, ) is a concave function of (a, c), and thus from Theorem
Moreover, for any (t), H
35, these conditions are sucient for a solution.
Rearranging the second condition, we have
(t)
= (r (t) ) ,
(t)
(8.12)
which states that the multiplier changes depending on whether the rate of return on assets
is currently greater than or less than the discount rate of the household.
The first condition, on the other hand, implies
u0 (c (t)) = (t) .
174

To make more progress, let us dierentiate this with respect to time and divide by (t),
which yields
u00 (c (t)) c (t) c (t)
(t)
=
.
u0 (c (t)) c (t)
(t)
Substituting this into (8.12), we have
c (t)
1
=
(r (t) )
c (t)
u (c(t))
(8.13)
u00 (c (t)) c (t)

u0 (c (t))
(8.14)
where
u (c (t))
is the elasticity of the marginal utility u0 (c(t)). More importantly, u (c (t)) is also the
inverse of the intertemporal elasticity of substitution, which plays a crucial role in most macro
models. The intertemporal elasticity of substitution regulates the willingness of individuals
to substitute consumption (or labor or any other attribute that yields utility) over time.
This elasticity for dates t and s > t is defined as
u (t, s) =
d log (c (s) /c (t))

.
d log (u0 (c (s)) /u0 (c (t)))
As s t,we have
u (t, s) u (t) =
1
u0 (c (t))
=
.
00
u (c (t)) c (t)
u (c (t))
This is not surprising, since the concavity of the utility function u (), thus the elasticity of
marginal utility, determines how willing individuals are to substitute consumption over time.
Next, note also that integrating (8.12), we have
Z t
(r (s) ) ds
(t) = (0) exp
0
Z t
0
(r (s) ) ds ,
= u (c (0)) exp
0
175

where the second line uses the first optimality condition of the current-value Hamiltonian at
time t = 0. Now substituting into the transversality condition, we have
Z t
0
(r (s) ) ds
= 0,
lim exp ( ( n) t) a (t) u (c (0)) exp
t
0
Z t
(r (s) n) ds
= 0,
lim a (t) exp
t
which implies that the strict no-Ponzi condition, (8.11) has to hold.
We can derive further results on the consumption behavior of households. In particular,
R
t
notice that the term exp 0 r (s) ds is a present-value factor that converts a unit of
income at time t to a unit of income at time 0. In the special case where r (s) = r, this
factor would be exactly equal to exp (rt). But more generally, we can define an average
interest rate between dates 0 and t as
1
r (t) =
t
r (s) ds.
In that case, we can express the conversion factor between dates 0 and t as
exp (
r (t) t) .
Now recalling that the solution to the dierential equation
y (t) = b (t) y (t)
is
y (t) = y (0) exp
we can integrate (8.13), to obtain

c (t) = c (0) exp
176
b (s) ds ,
r (s)
ds
u (c (s))

as the consumption function. Thus once we determine c (0), the initial level of consumption,
the path of consumption can be exactly solved out. In the special case where u (c (s)) is
constant, for example, u (c (s)) = , this equation simplifies to
c (t) = c (0) exp

r (t)
t ,
and moreover, the lifetime budget constraint simplifies to

Z
c (t) exp ( (
r (t) n) t) dt = a (0) +
w (t) exp ( (
r (t) n) t) dt,
and substituting for c (t) into this lifetime budget constraint in this iso-elastic case, we obtain
c (0) =
Z
(1 ) r (t)
+ n t dt a (0) +
exp
w (t) exp ( (
r (t) n) t) dt
0
(8.15)
as the initial value of consumption.
8.2.3
Equilibrium Prices
Equilibrium prices are straightforward and are given by (8.5) and (8.6). This implies that
the market rate of return for consumers, r (t), is given by (8.8), i.e.,
r (t) = f 0 (k (t)) .
Substituting this into the consumers problem, we have
c (t)
1
=
(f 0 (k (t)) )
c (t)
u (c (t))
(8.16)
as the equilibrium version of the consumption growth equation, (8.13). Equation (8.15) in
the iso-elastic utility case also similarly generalizes.
177
8.3
Optimal Growth
Before characterizing the equilibrium further, it is useful to look at the optimal growth
problem, defined as the capital and consumption path chosen by a benevolent social planner
trying to achieve a Pareto optimal outcome. In particular, suppose that the social planner
gives exactly the same weights to people in dierent generations, so that it solves the problem
max
[k(t),c(t)]
t=0
exp ( ( n) t) u (c (t)) dt,
subject to
k (t) = f (k (t)) (n + )k (t) c (t)
and k (0) > 0. To solve this problem, once again set up the current-value Hamiltonian, which
in this case takes the form
(k, c, ) = u (c (t)) + (t) [f (k (t)) (n + )k (t) c (t)] ,
H
with state variable k, control variable c and current-value costate variable .
From Theorem 34, the following are necessary conditions:
c (k, c, ) = 0 = u0 (c (t)) (t) ,
H
k (k, c, ) = (t) + ( n) (t) = (t) (f 0 (k (t)) n) ,
H
lim [exp ( ( n) t) (t) k (t)] = 0.
Going exactly through the same steps as before, it is straightforward to see that these
optimality conditions imply
c (t)
1
=
(f 0 (k (t)) ) ,
c (t)
u (c (t))
178

which is identical to (8.16), and the transversality condition
Z t
0
(f (k (s)) n) ds
= 0,
lim k (t) exp
t
which is identical to (8.11).

This establishes that the competitive equilibrium is a Pareto optimum, and the natural
Pareto allocation can be decentralized as a competitive equilibrium with exactly the initial
endowments. This result is stated in the next proposition:
Proposition 17 In the neoclassical growth model described above, with Assumptions 1, 2,
9 and 10, the equilibrium is Pareto optimal and coincides with the optimal growth path
maximizing the utility of the representative household.
8.4
Steady-State Equilibrium
Now let us characterize the steady-state equilibrium (or equivalently the steady-state optimal
allocation). In steady state, consumption per capita will be constant, thus
c (t) = 0.
From (8.16), this implies that irrespective of the exact utility function, we must have a
capital-labor ratio k such that
f 0 (k ) = + ,
(8.17)
which is the equivalent of the steady-state relationship in the discrete-time optimal growth
model, and as is the case there, it pins down the steady state capital-labor ratio only as
a function of the production function, the discount rate and the depreciation rate. This
also corresponds to the modified golden rule, rather than the golden rule we saw in the
179

Solow model. Rather than maximizing consumption, the capital stock is chosen at a level
that does not maximize steady-state consumption, because earlier consumption is preferred
to later consumption. This is because of discounting, which means that the objective is
not to maximize steady-state consumption, but involves giving a higher weight to earlier
consumption.
Given k , the steady-state consumption level is straightforward to determine as:
c = f (k ) (n + )k ,
(8.18)
which is similar to the consumption level in the basic Solow model, but the steady-state
capital-labor ratios determined dierently. Moreover, given Assumption 10, a steady state
where the capital-labor ratio and thus output are constant necessarily satisfies the transversality condition.
This analysis therefore establishes:
Proposition 18 In the neoclassical growth model described above, with Assumptions 1, 2, 9

and 10, the steady-state equilibrium capital-labor ratio, k , is uniquely determined by (8.17),
and is independent of the utility function. The steady-state consumption per capita, c , is
given by (8.18).
8.5
Transitional Dynamics
Next, we can determine the transitional dynamics of this model. Recall that transitional
dynamics in the basic Solow model were given by a single dierential equation with an initial
condition. This is no longer the case, since the equilibrium is determined by two dierential
180

equations, repeated here for convenience:
k (t) = f (k (t)) (n + )k (t) c (t)
and
c (t)
1
=
(f 0 (k (t)) ) .
c (t)
u (c (t))
Moreover, we have an initial condition k (0) > 0, but also a boundary condition at infinity,
of the form
Z t
0
(f (k (s)) n) ds
= 0.
lim k (t) exp
This combination of an initial condition and a transversality condition is quite typical for
optimal control problems where we are trying to pin down the behavior of both state and
control variables. This means that the notion of stability has to be dierent from that
of those in Theorems 4, 5 and 6. In particular, the consumption level (or equivalently the
costate variable ) is the control variable, and its initial value c (0) (or equivalently (0)) is
free. It has to adjust in a way to satisfy the transversality condition at infinity. Therefore,
rather than requiring all eigenvalues of the linear system or the linearized system to be
negative, what we want is saddle-path stability, which involves the number of negative
eigenvalues to be the same as the number of state variables. In particular, we have the
following straightforward generalizations of Theorems 4 and 5:
Theorem 37 Consider the following linear dierential equation system
x (t) = Ax (t)
(8.19)
m n of the eigenvalues of A have negative real parts. Then there exists an m-dimensional
181

manifold M of Rn such that starting from any x (0) M, the dierential equation (8.19)
has a unique solution with x (t) x where x is the steady state (zero) of the system given
by Ax = 0.
Theorem 38 Consider the following nonlinear autonomous dierential equation
x (t) = F [x (t)]
(8.20)
where F : Rn Rn and suppose that F is continuously dierentiable, with initial value

x (0). Let x be a zero of this system, i.e., F (x ) = 0. Define
A =F (x ) ,
and suppose that m n of the eigenvalues of A have negative real parts and the rest have
positive real parts. Then there exists an open neighborhood of x , B (x ) Rn and an mdimensional manifold M B (x ) such that starting from any x (0) M, the dierential
equation (8.20) has a unique solution with x (t) x .
Put dierently, these two theorems state that only a lower-dimensional subset of the
original space leads to stable solutions. However, in this context this is exactly what we
require, since c (0) will adjust in order to place us on exactly such a lower-dimensional
subset of the original space.
There are two ways of seeing this. The first one is simply by analyzing the above system
diagrammatically. This is done in the next picture:
182
The inverse U-shaped curve is the locus of points where k = 0. The vertical line, on the
other hand, is the locus of points where c = 0. The shape of the first one can be understood
by analogy to the diagram where we saw the golden rule. If the capital stock is too low,
steady-state consumption is low, and if the capital stock is too high, then the steady-state
consumption is again low. There exists a unique level, kgold , which maximizes the state-state
consumption per capita. The reason why the c = 0 locus is just a vertical line simply follows
from the fact that only the unique level of k given by (8.17) can keep per capita consumption
constant. Once these two loci are drawn, the rest of the diagram can be completed by looking
at the direction of motion according to the dierential equations. Given this direction of
movements, it is clear that there exists a unique stable arm, the lower-dimensional manifold
183

tending to the steady state. All points away from this stable arm diverge, and eventually
reach zero consumption or zero capital stock as shown in the figure.
The next important observation is that the initial consumption level c (0) has to adjust
to be on this stable arm. To see this note that if it were above it, in finite time, the capital
stock would reach 0 with positive consumption, violating feasibility. If it were below it, in
finite time, consumption would reach zero, thus capital would accumulate continuously, thus
violating the transversality condition. This establishes that the transitional dynamics in the
neoclassical growth model will take the following simple form: c (0) will jump to the stable
arm, and then (k, c) will monotonically travel along this arm towards the steady state. This
establishes:
Proposition 19 In the neoclassical growth model described above, with Assumptions 1, 2,

9 and 10, there exists a unique equilibrium path starting from any k (0) > 0 and converging
to the unique steady-state (k , c ) with k given by (8.17). Moreover, if k (0) < k , then
k (t) k and c (t) c , whereas if k (0) > k , then k (t) k and c (t) c .
An alternative way of establishing the same result is by linearizing the set of dierential
equations, and looking at their eigenvalues. Recall the two dierential equations determining
the equilibrium path:
k (t) = f (k (t)) (n + )k (t) c (t)
and
c (t)
1
=
(f 0 (k (t)) ) .
c (t)
u (c (t))
184

Linearizing these equations around the steady state (k , c ), we have (suppressing time dependence)
k = constant + (f 0 (k ) n ) (k k ) c
c f 00 (k )
(k k ) .
c = constant +
u (c )
Moreover, from (8.17), f 0 (k ) = , so the eigenvalues of this two-equation system are
given by the values of that solve the following quadratic form:
n 1
= 0.
det 00
c f (k )
0
u (c )
It is straightforward to verify that, since c f 00 (k ) /u (c ) < 0, there are two real eigenvalues,
one negative and one positive. This implies that there exists a one dimensional stable
manifold converging to the steady state, exactly as the stable arm in the above figure.
Therefore, the local analysis also leads to the same conclusion. However, the local analysis
can only establish local stability, whereas the above analysis established global stability.
8.6
Technological Change and the Canonical Neoclassical Model
The above analysis was for the neoclassical growth model without any technological change.
Let us now extend the production function to:
Y (t) = F [K (t) , A (t) L (t)] ,
where
A (t) = exp (gt) A (0) .
185
(8.21)

Notice that the production function (8.21) imposes purely labor-augmenting, or Harrodneutral, technological change. This is a consequence of Theorem 7 above, which was proved
in the context of the constant savings rate model, but equally applies in this context. Only
purely labor-augmenting technological change is consistent with balanced growth.
We continue to adopt all the other assumptions, in particular Assumptions 1, 2 and 9.
Assumption 10 will be strengthened further in order to ensure finite discounted utility in the
presence of sustained economic growth.
The constant returns to scale feature again enables us to work with normalized variables.
Now let us define
Y (t)
A (t) L (t)
K (t)
= F
,1
A (t) L (t)
f k (t) ,
y (t)
where now
k (t)
K (t)
.
A (t) L (t)
(8.22)
is the capital to eective labor ratio, taking into account that eective labor is increasing
because of labor-augmenting technological change.
In addition to the assumption on technology, we also need to impose a further assumption
on preferences in order to ensure balanced growth. We define balanced growth as growth
consistent with the Kaldor facts of constant capital-output ratio and capital share in national
income. These two observations together also imply that the rental rate of return on capital,
R (t), has to be constant, which, from (8.8), implies that r (t) has to be constant. In addition,
balanced growth requires that consumption and output grow at a constant rate. The Euler
186

equation implies that
1
c (t)
=
(r (t) ) .
c (t)
u (c (t))
If r (t) r in BGP (balanced growth path), then c (t) /c (t) gc is only possible if
u (c (t)) u , i.e., if the elasticity of marginal utility of consumption is asymptotically
constant.
Therefore, balanced growth is only consistent with utility functions that have
asymptotically constant elasticity of marginal utility of consumption. Given this restriction,

we might as well start with a utility function that has this feature throughout. As noted
above, the unique utility function with this feature is the CRRA preferences, given by
c(t)1 1 if 6= 1 and 0
1
u (c (t)) =
,
ln c(t)
if = 1
where the elasticity of marginal utility of consumption, u , is given by the constant . When
= 0, these represent linear (risk-neutral) preferences, whereas when = 1, we have log preferences. As , these preferences become infinitely risk-averse, and infinitely unwilling
to substitute consumption over time.
More specifically, we now assume that the economy admits a representative consumer
with CRRA preferences
exp (( n)t)
c (t)1 1
dt.
1
(8.23)
I refer to this model, with labor-augmenting technological change and CRRA preference
as given by (8.23) as the canonical model, since it is the model used in almost all applications
with steady growth (unless non-balanced growth is the purpose as will be discussed in some
of the structural change models below). Clearly, the Euler equation in this case takes the
simpler form:
1
c (t)
= (r (t) ) .
c (t)
187
(8.24)

Let us first characterize the steady-state equilibrium in this model with technological
progress. Since with technological progress there will be growth in per capita income, c (t)
will grow. Instead, in analogy with y (t), let us define
C (t)
A (t) L (t)
c (t)
.
A (t)
c (t)
We will see that this normalized consumption level will remain constant along the BGP. In
particular, we have
c (t)
c (t)
g
c (t)
c (t)
1
(r (t) g) .
=
Moreover, for the accumulation of capital stock, we have
k (t) = f k (t) c (t) (n + g + ) k (t) .
The transversality condition, in turn, can be expressed as
Z th
i
0
f k (s) g n ds
= 0.
lim k (t) exp
In addition, r (t) is still given by (8.8), so
r (t) = f 0 k (t)
Since in steady state c (t) must remain constant, therefore
r (t) = + g
188
(8.25)

or

f 0 k = + + g,
(8.26)
which pins down the steady-state value of the normalized capital ratio k uniquely, in a way
similar to the model without technological progress. The level of normalized consumption is
then given by

c = f k (n + g + ) k ,
(8.27)
while per capita consumption grows at the rate g.

The only additional condition in this case is that because there is growth, we have to
make sure that the transversality condition is in fact satisfied. Substituting (8.26) into (8.25),
we have
Z t
lim k (t) exp

[ (1 ) g n] ds
= 0,
which can only be the case if the integral within the exponent goes to zero, i.e., if
(1 ) g n > 0, or alternatively if the following assumption is satisfied:
Assumption 11
n > (1 ) g.
Note that this assumption strengthens Assumption 10 when < 1. Alternatively, recall
that in steady state we have r = + g and the growth rate of output is g + n. Therefore,
Assumption 11 is equivalent to requiring that r > g + n. We will encounter conditions like
this all throughout, and they will also be related to issues of dynamic eciency as we will
see below.
For now, we have the following immediate generalization of Proposition 18:
189

Proposition 20 Consider the neoclassical growth model with labor augmenting technological
progress at the rate g and preferences given by (8.23). Suppose that Assumptions 1, 2, 9 and
11 hold. Then there exists a unique balanced growth path equilibrium with a normalized
capital to eective labor ratio of k , given by (8.26), and output per capita and consumption
per capita grow at the rate g.
Interestingly, the results that the steady-state capital-labor ratio was independent of
preferences is no longer the case, since now k given by (8.26) depends on the elasticity
of marginal utility (or the inverse of the intertemporal elasticity of substitution), . The
reason for this is that there is now growth, so the willingness of individuals to substitute
consumption today for consumption tomorrow determines how much they will accumulate
and thus the equilibrium capital to eective labor ratio.
A similar analysis to before also lead to an immediate generalization of Proposition 19,
which is stated here. The proof is left as at home work exercise, but the next figure gives
the sketch already.
Proposition 21 Consider the neoclassical growth model with labor augmenting technological
progress at the rate g and preferences given by (8.23). Suppose that Assumptions 1, 2, 9 and
11 hold. Then there exists a unique equilibrium path of normalized capital and consumption,
k (t) , c (t) converging to the unique steady-state k , c with k given by (8.26). More-
over, if k (0) < k , then k (t) k and c (t) c , whereas if k (0) > k , then k (t) k and
c (t) c .
190
It is also useful to briefly look at an example with Cobb-Douglas technology.

Example 3 Consider the model with CRRA utility and labor-augmenting technological progress
at the rate g. Assume that the production function is given by F (K, AL) = K (AL)1 , so
that

f k = k ,
and thus r = k1 . In this case, the Euler equation becomes:
c 1 1
g ,
=
k
c
and the accumulation equation can be written as
k 1
=k
gn
k
191
c
.
k
k.
Therefore, these
Now define z c/k and x k1 , which implies that x/x
= ( 1) k/
two equations can be written as
x
= (1 ) (x g n z)
x
(8.28)
z
c k
= ,
z
c k
thus
1
z
=
(x g) x + + g + n + z
z
=
(( )x (1 ) + n) + z.
(8.29)
The two dierential equations (8.28) and (8.29) together with the initial condition x (0) and
the transversality condition completely determine the dynamics of the system. In Problem
Set 4, you will be asked to complete this example for the special case in which 1 (i.e.,
log preferences).
8.7
The Role of Policy
In the above model, the rate of growth of per capita consumption and growth are determined
exogenously, by the growth rate of labor-augmenting technological progress. The level of
income, on the other hand, depends on preferences, in particular, on the intertemporal
elasticity of substitution, 1/, the discount rate, , the depreciation rate, , the population
growth rate, n, and naturally the form of the production function f ().
If we were to go back to the proximate causes of dierences in income per capita or
growth across countries, this model would give us a way of understanding those dierences
only in terms of preference and technology parameters.
192

However, the model can be easily enriched to include policy variables, which, at least
according to the institutions view, play an equally important role in accounting for dierences
in physical (and human) capital and technology across countries.
Let us do this in the simplest possible way here and suppose that returns on capital net
of depreciation are taxed at the rate and the proceeds of this are redistributed back to the
consumers. In that case, the capital accumulation equation, in terms of normalized capital,
still remains
k (t) = f k (t) c (t) (n + g + ) k (t) ,
but the rental rate of return is now

0
r (t) = (1 ) f k (t) .
This implies
c (t)
c (t)
g
c (t)
c (t)
1
(r (t) g) .
=
1
(1 ) f 0 k (t) g ,
=
so that the steady-state capital to eective labor ratio is given by

+ g
f 0 k = +
.
1
A higher tax rate increases the right hand side, and since from Assumption 1, f 0 () is
decreasing, it reduces k . Therefore, higher taxes on capital have the eect of depressing
capital accumulation. In the next section, we will discuss how large these eects can be and
whether they could account for the dierences in cross-country incomes.
193

For now, we can also note that similar results would be obtained if instead of taxes being
imposed on returns from capital, they were imposed on the amount of investment. We will
see this in the next section.
8.8
8.8.1
Quantitative Evaluations
Policy Dierences
For a qualitative evaluation of the eect of policy dierences, let us follow Jones (1995)
and Chari, Kehoe and McGrattan (1997). Imagine that the main policy dierence across
countries is in terms of tax structure that aects the the relative price of capital goods.
Chad Jones uses data from the Summers-Heston data set on the price of investment goods
relative to consumption goods and shows that there are large dierences in the relative price
of capital goods (compared to consumption goods); he also shows that a high relative price of
capital goods is associated with low growth over the postwar period. This has led a number
of economists, for example, Chari, Kehoe and McGrattan (1997) or Parente and Prescott
(1994) to argue that a major dierence across countries is the extent of distortions arising
from taxes, corruption or other policy dierences, which aect the relative price of capital.
Although this is a plausible starting point, once you look at the data, the dierences in the
relative price come not from the fact that investment goods are much more expensive in
some countries, but from the fact that consumption goods are cheaper. We will discuss this
later below, but for now let us stick with the traditional approach and think of dierential
policies aecting the relative price of capital.
Suppose that all countries admit a representative consumer with identical preferences,
194

given by
exp (t)
Cj1 1
dt,
1
(8.30)
where j J denotes country j. There is no population growth, so Cj interchangeably refers

to total or per capita consumption. All countries also have access to the same production
technology given by the Cobb-Douglas production function
Yj = Kj1 (AHj ) ,
(8.31)
with Hj representing exogenously given stock of eective labor (human capital). The accumulation equation is
K j = Ij Kj .
The only dierence across countries is in the budget constraint for the representative
consumer, which takes the form
(1 + j ) Ij + Cj Yj ,
(8.32)
where j is the tax on investment. This tax varies across countries, for example because
of policies or dierences in institutions/property rights enforcement. Notice that 1 + j is
also the relative price of investment goods (relative to consumption goods): one unit of
consumption goods can only be transformed into 1/ (1 + j ) units of investment goods.
Note that the right hand side variable of (8.32) is still Yj , which implicitly assumes that
j Ij is wasted, rather than simply redistributed to some other agents in the economy. This
is without any major consequence, since, as noted in Theorem 9 above, CRRA preferences
as in (8.30) have the nice feature that they can be exactly aggregated across individuals, so
we do not have to worry about the distribution of income in the economy.
195

The competitive equilibrium can be characterized as the solution to the maximization
of (8.30) subject to (8.32) and the capital accumulation equation.
With the same steps as above, the Euler equation of the representative consumer is
C j
1 (1 ) AHj
=
.
Cj
(1 + j ) Kj
Consider the steady state. Because A is assumed to be constant, the steady state corresponds to C j /Cj = 0. (Alternatively, we could have A growing at a constant rate and C j /Cj
equal to the growth rate of A.) This immediately implies that
Kj =
(1 )1/ AHj
[(1 + j ) ( + )]1/
So countries with higher taxes on investment will have a lower capital stock in steady state.
Equivalently, they will also have lower capital per worker, or a lower capital output ratio
(using (8.31) the capital output ratio is simply K/Y = (K/AH) ).
Now substituting this into (8.31), and comparing two countries with dierent taxes (but
the same human capital), we obtain the relative incomes as
Y ( )
=
Y ( 0 )
1 + 0
1+
(8.33)
So countries that tax investment, either directly or indirectly, at a higher rate will be poorer.
The advantage of using the neoclassical growth model for quantitative evaluation relative
to the Solow growth model is that the extent to which dierent types of distortions (here
captured by the tax rates on investment) will aect income and capital accumulation is
determined endogenously. In contrast, in the Solow growth model, what matters is the
savings rate, so we would need other evidence to link taxes or distortions to savings (or to
other determinants of income per capita such as technology).
196

How large is this eect? Or can such policy dierences have quantitatively large eects
generating income dierences comparable to what we observe in practice?
Recall that a plausible value for is 2/3, since this is the share of labor income in national
product which, with Cobb-Douglas production function, is equal to . The Summers-Heston
data suggest that there is a large amount of variation in the relative price of investment
goods. For example, countries with the highest relative price of investment goods have
almost eight times as high a value as countries with the lowest relative price. Then, using
= 2/3, equation (8.33) implies that the income gap between two such countries should be
approximately threefolds:
Y ( )
(8)1/2 3.
Y ( 0 )
Therefore, dierences in capital-output ratios or capital-labor ratios caused by taxes or
tax-type distortions, even very large dierences in taxes or distortions, are unlikely to account
for anywhere near as large dierences in income per capita as we observe in practice.
This is not surprising. The discussion above showed that dierences in income per
capita across countries cannot be accounted for by dierences in capital per worker alone.
Instead, to explain such large dierences in income per capita across countries, we need
sizable dierences in the eciency with which these factors are used. Such dierences do
not feature in this model. Therefore, the simplest model does not provide a good starting
point. Nevertheless, many authors have tried to use this model to go further.
8.8.2
Extensions
Basically, these authors start from a one-sector model, and try to generate large responses
to distortions. But there is a constraint in this exercise: the share of capital in GDP is
197

1/3, and in the simple Cobb-Douglas production function as in (8.31), this also turns out to
determine the elasticity of output to distortions (this elasticity is simply (1 ) / as shown
by (8.33)). This is intuitive: if capital only has a small share, in this setup this means that
variations in capital will have only a small eect on income, so distortions that aect capital
can only have a small eect on income.
One line of attack is taken by Chari, Kehoe and McGrattan, who suggest that the
correct value for is 2/3. They think that human capital is not exogenous, but accumulates
in exactly the same way as physical capital. In particular, they posit
(1 + j ) (Ij + Xj ) + Cj Yj
and
H j = Xj Hj .
where X denotes investment in human capital. With this reasoning, and using the numbers
implied by Mankiw, Romer and Weils regression analysis discussed above, they take = 1/3
(or they take the share of accumulable factors in GDP to be 2/3). In this case, (8.33) implies
income dierences as large as 64 fold. They therefore conclude that the augmented Solow
model is capable of explaining income dierences across countries quantitatively based on
distortions on investment.
However, this conclusion is subject to exactly the same caveats as Mankiw, Romer and
Weils analysis. A share of 2/3 for the accumulable factors in GDP is too high, and implies
implausibly large eects of education on income as pointed out above.
198
8.9
Variants of the Neoclassical Model
Parente-Prescott argue that the simple neoclassical model is not sucient to account for
the large dierences in income per capita across countries. Consistent with the evidence
presented above, they suggest that we have to take dierences in technology into account.
Their approach for technology dierences, however, is very similar to the tax-type distortions aecting physical capital (and human capital) decisions in the neoclassical model. In
particular, they argue that technology dierences arise because there are barriers to technology adoption, inducing economies with worse distortions not to adopt superior technologies.
Essentially, this explanation turns technology into an accumulable factor, with a neoclassical
production function. Consequently, even though Parente-Prescott argue that their model is
dierent from the neoclassical model, it is really a variant of it, and I will treat it that way
here.
Therefore, this explanation circumvents the problems of the neoclassical models without
being forced to increase the share of capital in national income. Moreover, the ParentePrescott formulation does this while keeping exogenous growth. However, there is a sense in
which what is being done here is to add the degree of freedom, and interpret a more broad
concept of capital is technology.
Here is a very simple version of their model. Suppose that output is given by
Yt = At Nt
where At is technology/knowledge which will be accumulated endogenously. Each firm can
firms in this economy). This limit on firm workers (so there will be Nt /N
at most employ N
level employment is imposed because the production technology exhibits increasing returns
to scale, and otherwise all workers would be employed in one firm.
199

In acquiring their technology, countries/firms benefit from world knowledge, which progresses exogenously as follows
Tt = T0 (1 + g)t
In order to benefit from world technology, countries/firms need to undertake some investments. In particular, the investment required to improve technology from At to At+1
is
1
Xt =
At+1
At
S
Tt
dS
Intuitively, each incremental improvement between At and At+1 costs an amount that depends on the distance of this improvement to the frontier technology, and also a shift parameter . As before, a high level of corresponds to better technology for absorbing
world knowledge, and a low level of corresponds to significant distortions in the process of
technology adoption.
Solving this integral, we obtain
1/
Xt =
1/
At+1 At
(1)/
(1/) Tt
Clearly, the budget constraint of the representative consumer is now Ct + Xt Yt .

Denote
1/
Zt
At
(1)/
Tt
as the eective knowledge stock. Then, we have the law of motion of this knowledge stock
1/
(1)/
as Zt+1 At+1 / (1 + g) Tt
. So
Zt+1 = 0 Zt + Xt
as the law of motion, where 0 is a constant. This equation makes it clear that the modeling
200

question is very similar to a neoclassical model, except that Xt has replaced investments and
Zt has replaced the capital stock.
Then, output per capita dierences in two economies can be expressed as functions of
their knowledge stocks. Denoting output per capita in country j by y j , we have
!
ytj
Ztj
.
0 =
0
ytj
Ztj
Will dierences in now lead to large output dierences? The answer depends on . If
is small, as in the neoclassical model with a small share of capital in national product, there
will only be quantitatively small dierences in output per capita across countries. However,
if is large, for example = 0.7, this economy will behave similar to the neoclassical model
with a capital share equal to 2/3.
So therefore the basic dierence between this model and the standard neoclassical model
is that here instead of capital, we have knowledge being accumulated, and is assumed
to be large so knowledge is taken to be very important in production (e.g., corresponding
to a large share of payments to technology in GDP, if everything was priced according to
marginal productbut because of the increasing returns to scale this is not the case).
In other words, by introducing the knowledge stock and increasing returns to scale,
Parente and Prescott take us back to a production function of the form Y = K 0.7 L0.3 but
with K replaced by Z. As a result, they obtain significantly larger eects of distortions on
income than implied by the neoclassical production function with Y = K 0.3 L0.7 .
Although the explanation is plausible, the model does not generate further insights than
the statement that distortions lead to lower input use, and knowledge is just another input.
As we will see below, many models of endogenous technological progress will have a similar
flavor of technology being accumulated by purposeful investments, but they will allow for
201

much richer interactions.
202
Chapter 9
Growth with Overlapping Generations
The models analyzed so far were assumed to admit a representative household or consumer.
These models are useful in providing us a tractable framework for the analysis of capital accumulation and neoclassical growth. Moreover, they had the nice feature that the competitive
equilibrium coincided with the natural Pareto optimal allocation. In many situations, however, the assumption of a representative household is not appropriate. This was already
discussed above. But one specific set of circumstances where we have to depart from this
assumption is the one where we look at an economy in which new households (individuals)
arrive over time. These models, first analyzed by Paul Samuelson and then later Peter Diamond, are referred to as overlapping generations models, since dierent generations are born
at dierent points in time.
For economic growth, these models are useful, first, because they provide a tractable
alternative to the infinite-horizon representative agent models, second, because they have
some very dierent implications, and third, they allow a discussion of national debt and
Social Security type issues.
203
9.1
Problems of Infinity
Let us start considering the following economy analyzed by Shell (1971).

Consider the following static economy with a countably infinite number of agents, i
N and a countably infinite number of commodities, j N. Assume all agents behave
competitively (alternatively, we can have more than one agent of each type in order to
ensure this). Agent i has preferences given by:
ui = ci + ci+1 ,
that is, he enjoys the consumption of commodities with the same index as his index and the
next indexed commodity.
Moreover, the endowment vector of the economy is as follows: each agent has one unit
endowment of the commodity with the same index as his. Let us choose the price of the first
commodity as the numeraire, i.e., p0 = 1.
The following is straightforward to see:
Proposition 22 In the above-described economy, the price vector p
such that pj = 1 for
all j N is a competitive equilibrium price vector and induces an equilibrium with no trade,
denoted by x
.
Proof. At p
, each individual has income equal to a 1, thus the budget constraint of the
form
ci + ci+1 1.
This implies that consuming his endowment is optimal for each individual, establishing that
p
and no trade, x
, constitute a competitive equilibrium.
204

However, the competitive equilibrium in Proposition 22 is not Pareto optimal. To see
this, consider the following alternative allocation, referred to as x
. Consumer i = 0 consumes
one unit of good j = 0 and one unit of good j = 1, and consumer i > 0 consumes one unit
of good i + 1. This is a feasible allocation. All consumers with i > 0 are as well o in this
allocation as in the competitive equilibrium at p
, and individual i = 0 is strictly better o.
This establishes
Proposition 23 In the above-described economy, the competitive equilibrium at (
p, x
) is not
Pareto optimal.
Therefore, in this economy the First Welfare Theorem, Theorem 10, does not hold. This
is the reason why Theorem 10 was stated under the assumption of finite number of commodities. This finiteness assumption was used in the proof in making sure that summations
existed and were finite. When there are infinite number of commodities and infinite number
of individuals, summing over the value of the consumption basket of all individuals may (will
often) lead to infinite sums, and the proof of Theorem 10 does not work.
The reason why this admittedly artificial economy is relevant is that it is isomorphic to
the overlapping generations economy which we will analyze next. The same issue of Pareto
suboptimality will arise in this overlapping generations economy.
However, recall also that Theorem 11 did not make use of summations in the same way
as Theorem 10; instead, it made use of convexity. So one might conjecture that in this
model, which is convex, while competitive equilibrium may be suboptimal, Pareto optima
must be decentralizable. This is in fact true, and the following proposition shows how the
allocation x
can be decentralized as a competitive equilibrium, even without any change in
endowments:
205

Proposition 24 In the above-described economy, there exists a reallocation of the endowment vector to
, and an associated competitive equilibrium (
p, x
) that is Pareto optimal
where x
is as described above, and p
is such that pj = 1 for all j N.
Proof. Consider the following reallocation of the endowment vector : the endowment of
agent i 1 is given to agent i1. Consequently, at the new endowment vector
, individual
i = 0 has one unit of good j = 0 and one unit of good j = 1, while all other agents i have
one unit of good i + 1. At the price vector p
, individual 0 has a budget set
c0 + c1 2,
thus chooses c0 = c1 = 1. All other agents have budget sets given by
ci + ci+1 1,
thus each i > 0 is happy to consume one unit of the good ci+1 , which is within his budget
set and gives as high utility as any other allocation within his budget set, establishing that
x
is a competitive equilibrium.
9.2
Overlapping Generations and Overaccumulation
We now discuss the following canonical two-period overlapping generation economy.
9.2.1
Demographics, Preferences and Technology
In this economy, time is discrete and runs to infinity. Each individual lives for two periods.
For example, all individuals born at time t live for dates t and t + 1. For now let us assume
206

a general utility function for individuals born at date t, of the form
U (t) = u (c1 (t)) + u (c2 (t + 1)) ,
(9.1)
where u () satisfies the conditions in Assumption 9, c1 (t) denotes the consumption of the
individual born at time t during the first period of his life (which is at date t), and c2 (t + 1)
this the consumption during the second period of his life (at date t + 1). Also (0, 1) is
the discount factor.
Individuals can only work in the first period of their lives, and supply one unit of labor
inelastically, earning the equilibrium wage rate w (t).
Let us also assume that there is population growth, so that total population is
L (t) = (1 + n)t L (0) .
The production side of the economy is the same as before, characterized by a set of
competitive firms, and represented by the standard constant returns to scale aggregate production function, satisfying Assumptions 1 and 2. Also assume that capital fully depreciates
after being used. As a result, we have that the rate of return to saving equals the rental rate
of capital, i.e.,
r(t) = R (t) = f 0 (k (t)) ,
(9.2)
where f (k) is the standard per capita production function described above, and the wage
rate is
w (t) = f (k (t)) k (t) f 0 (k (t)) .
207
(9.3)
9.2.2
Consumption Decisions
Let us start with the individual consumption decisions. Denoting savings by s (t), this is a
solution to the following maximization problem
max
c1 (t),c2 (t+1),s(t)
u (c1 (t)) + u (c2 (t + 1))
subject to
c1 (t) + s (t) w (t)
and
c2 (t + 1) R (t + 1) s (t) ,
where we are using the convention that old individuals rent their savings of time t as capital
to firms at time t + 1, so receive the gross rate of return R (t + 1). The second constraint
incorporates the notion that individuals will only spend money on their own end of life
consumption (there is no consumption term for descendants etc.). I have not imposed the
constraints that s (t) 0, since with negative savings, individuals would violate their secondperiod budget constraint (given non-negativity of consumption).
It is clear that both constraints will hold as equalities given that u () is strictly increasing.
Then the first-order condition for a maximum implies:
u0 (c1 (t)) = R (t + 1) u0 (c2 (t + 1)) .
(9.4)
Solving these equations for consumption and thus for savings, we also obtain the following
implicit solution for savings
s (t) = s (w (t) , R (t + 1)) .
(9.5)
The function s (, ) is increasing in its first argument, and may be increasing or decreasing
in its second argument.
208
9.2.3
Equilibrium
An equilibrium in this economy is an allocation in which firms maximize and consumers

optimize. Therefore, the factor price sequence {R (t) , w (t)}
t=0 is given by (9.2) and (9.3),
while individual consumption and saving decisions are given by (9.4) and (9.5).
Consequently, given the full depreciation assumption, the law of motion of the capital
stock is given by
K (t + 1) = N (t) s (w (t) , R (t + 1)) ,
or in words, simply by the savings of the newly born at time t, N(t). Writing everything in
terms of per worker units, this implies
k (t + 1) =
s (w (t) , R (t + 1))
1+n
or substituting for R (t + 1) and w (t) from (9.2) and (9.3), we obtain the
k (t + 1) =
s (f (k (t)) k (t) f 0 (k (t)) , f 0 (k (t + 1)))

1+n
(9.6)
as the fundamental law of motion of the overlapping generations economy. A steady state is
given by a solution to this equation such that k (t + 1) = k (t) = k , i.e.,
k =
s (f (k ) k f 0 (k ) , f 0 (k ))
1+n
(9.7)
Since the savings function s (, ) can take essentially any form, the dierence equation
(9.6) can lead to quite complicated dynamics, and multiple steady states are possible. The
next figure shows some potential plots of the equation (9.6), which can lead to a unique
stable equilibrium, to multiple equilibria, or to an equilibrium with zero capital stock.
209
9.2.4
More Specific Utility Functions
To get more insights, let us now specialize the above setup by assuming CRRA utility
functions, in particular,:
c1 (t)1 1
+
U (t) =
1
c2 (t + 1)1 1
1
(9.8)
where > 0, (0, 1). Furthermore, assume that technology is Cobb-Douglas, so that
f (k) = k
Everything else is the same as above. This simplifies the first-order condition for consumer
optimization and implies
c2 (t + 1)
= (R (t + 1))1/ .
c1 (t)
210

You can recognize this expression as the discrete-time counterpart of the Euler equation from
the Ramsey model, which was c/c
= (r )/.
Alternatively, the first-order condition can be written as
s (t) R (t + 1)1 = (w (t) s (t)) .
(9.9)
Equation (9.9) implies that the saving rate can be written as

s (t) =
w (t)
,
(t + 1)
(9.10)
where
(t + 1) [1 + 1/ R (t + 1)(1)/ ] > 1,
ensuring that savings are always less than earnings. The relationship between the savings
and factor prices is given by
s (t)
1
=
,
w (t)
(t + 1)
s (t)
s (t)
1
=
.
(R (t + 1))1/
R (t + 1)
(t + 1)
sw
sr
Note that 0 < sw < 1. Moreover, sr > 0 if < 1, but sr < 0 if > 1, and sr = 0 if = 1.
The relationship between the rate of return on savings and the level of savings reflects the
counteracting influences of income and substitution eects you are familiar with from basic
micro. The case of = 1, i.e., log preferences, is of special importance and is often used in
many applied models. With log preferences, income and substitution eects exactly cancel
each other, and thus changes in the interest rate (and therefore changes in the capital-labor
ratio of the economy) have no eect on the savings rate.
Now equation (9.6) implies
k (t + 1) =
w(t)
s(t)
=
,
(1 + n)
(1 + n) (t + 1)
211
(9.11)

or more explicitly,
k (t + 1) =
f (k (t)) k (t) f 0 (k (t))
(9.12)
(1 + n) [1 + 1/ f 0 (k(t + 1))(1)/ ]
The steady state then involves a solution to the following implicit equation:
k =
f (k ) k f 0 (k )
.
(1 + n) [1 + 1/ f 0 k )(1)/ ]
Now using the Cobb-Douglas formula, we have that the steady state is the solution to the
equation
(1)/ i
= (1 )(k )1 .
(1 + n) 1 + 1/ (k )1
(9.13)
For simplicity, define R (k )1 as the marginal product of capital in steady-state, in

which case, equation (9.13) can be rewritten as
h
i 1
(1 + n) 1 + 1/ (R )(1)/ =
R .
(9.14)
The steady-state value of R and thus k can now be determined from equation (9.14), which
always has a unique solution. We can next investigate the stability of this steady state. To
do this, substitute for the Cobb-Douglas production function in (9.12):
k (t + 1) =
(1 ) k (t)
(1 + n) [1 + 1/ (k(t + 1)1 )(1)/ ]
(9.15)
Now using (9.15), the following proposition can be proved (proof left for Problem Set 5):
Proposition 25 In the overlapping-generations model with two-period lived agents CobbDouglas technology and CRRA preferences, there exists a unique steady-state equilibrium with
the capital-labor ratio k given by (9.13) and as long as 1, this steady-state equilibrium
is globally stable for all k (0) > 0.
The next figure shows the dynamics diagrammatically in this particular (well-behaved)
case, which look very similar to the dynamics of the basic Solow model:
212
Figure 9.1:
213
9.2.5
Pareto Optimality
Let us now return to the general problem, and compare the overlapping-generations equilibrium to the choice of a social planner wishing to maximize a weighted average of all
generations utilities. In particular, the social planner in question maximizes
tS U (t)
t=0
where S is the discount factor of the social planner. Substituting from (9.1), this implies:
tS (u (c1 (t)) + u (c2 (t + 1)))
t=0
subject to the resource constraint

F (K (t) , N (t)) = K (t + 1) + N (t) c1 (t) + N (t 1) c2 (t) ,
which can alternatively be divided by N (t) and written in per capita terms as
f (k (t)) = (1 + n) k (t + 1) + c1 (t) +
c2 (t)
.
1+n
This maximization problem immediately implies

u0 (c1 (t)) = f 0 (k (t + 1)) u0 (c2 (t + 1)) ,
which is identical to (9.4) noting that R (t + 1) = f 0 (k (t + 1)). This is not surprising, since
the social planner would allocate consumption of a given individual in exactly the same way
as the individual himself would do.
However, the social planners and the competitive economys allocations across individuals will dier, since the social planner is giving dierent weights to dierent generations
214

as captured by the parameter S . In particular, it can be shown that the socially planned
economy will converge to a steady state with capital-labor ratio k S such that

S f 0 kS = 1 + n,
which is similar to the modified golden rule we saw in the context of the Ramsey growth
model. In particular, it does not depend on preferences (the utility function u ()) and
does not even depend on the individual rate of time preference, . Clearly, k S is typically
dierent from the steady-state value of the competitive economy, k , given by (9.7), which
is not surprising given the dierent preferences that are being maximized.
More interesting is the question of whether the competitive equilibrium is Pareto optimal.
The example from Shell in the previous section suggests that it may not be. In particular,
exactly as in the Shells example, we cannot use the First Welfare Theorem (Theorem 10)
because of the infinite number of commodities.
In fact, the competitive equilibrium is not in general Pareto optimal. The simplest way
of seeing this is that the steady state level of capital stock, k , given by (9.7), can be so high
that it is in fact greater than kgold , that is, the economy is to the right of the golden rule,
thus by reducing savings, consumption can increase for every generation.
More specifically, note that in steady state we have
f (k ) (1 + n)k = c1 + (1 + n)1 c2
c ,
where the first line follows by the accounting identity, and the second defines c as the total
steady-state consumption. Therefore
c
= f 0 (k ) (1 + n)
k
215

and kgold is defined as
f 0 (kgold ) = 1 + n.
Now if k > kgold , then c /k < 0, so reducing savings can increase (total) consumption for
everybody. If this is the case, the economy is referred to as dynamically inecient. Another
way of expressing dynamic ineciency is that
r < n,
that is, the steady-state interest rate r = R 1 is less than the rate of population growth.
Recall that in the infinite-horizon Ramsey economy, the transversality condition (which follows from individual optimization) required that r > g + n, therefore, dynamic ineciency
could never arise in this Ramsey economy. Dynamic ineciency arises because of the heterogeneity inherent in the overlapping generations model which removes the transversality
condition.
In particular, suppose we start from steady state at time T with k > kgold . Consider
the following variation where the capital stock for next period is reduced by a small amount,
i.e. changed by k, where k > 0, and we move immediately to a new steady state. Then
we have
cT = (1 + n) k > 0
ct = (f 0 (k k) (1 + n)) k for all t > T
Since k > kgold , for small enough k, f 0 (k k) (1 + n) < 0, thus ct > 0 for all t T .
The increase in consumption for each generation can be allocated equally during the two
periods of their lives, thus necessarily increasing their utility (by the assumption that u () is
strictly increasing from Assumption 1). This variation clearly creates a Pareto improvement
in which all generations are better o. This establishes:
216

Theorem 39 In the overlapping-generations economy the competitive equilibrium is not necessarily Pareto optimal. More specifically, whenever r < n and the economy is dynamically
inecient, it is possible to reduce the capital stock starting from the competitive steady state
and increase the consumption level of all generations.
As the above derivation makes it clear, possible lack of Pareto eciency in the competitive equilibrium is intimately linked with dynamic ineciency. Dynamic ineciency, that is
the rate of interest less than the rate of population growth, is not a theoretical curiosum. In
Problem Set 5, you will be working through a numerical example that will illustrate under
what conditions dynamic ineciency can happen.
9.3
Role of Social Security in Capital Accumulation
We now briefly discuss how Social Security can be introduced as a way of dealing with
overaccumulation in the overlapping-generations model. Very briefly, we will consider a
fully-funded system, in which the young make contributions the Social Security and their
contributions are paid back to them in their old age. The alternative is an unfunded system
or a pay-as-you-go Social Security system, where transfers from the young directly go to the
current old.
9.3.1
Fully Funded Social Security
In a fully funded system, the government at date t raises some amount d (t) from the young
(by compulsory contributions to their Social Security accounts etc.), and this is invested in
the only productive asset of the economy, the capital stock, and pays the workers when they
217

are old an amount R (t + 1) d (t). More specifically, we now have the individual maximization
problem as
max
c1 (t),c2 (t+1),s(t)
u (c1 (t)) + u (c2 (t + 1))
subject to
c1 (t) + s (t) + d (t) w (t)
and
c2 (t + 1) R (t + 1) (s (t) + d (t)) ,
for a given choice of d (t) by the government. Notice that now the total amount invested in
capital accumulation is s (t) + d (t) = (1 + n) k (t + 1).
It is also no longer the case that individuals will always choose s (t) > 0, since they
have the income from Social Security. Therefore this economy can be analyzed under two
alternative assumptions, with the constraint that s (t) 0 and without.
It is clear that as long as s (t) is free, whatever the sequence of Social Security payments
{d (t)}
t=0 (as long as it is feasible), the competitive equilibrium applies. When s (t) 0
is imposed as a constraint, then the competitive equilibrium applies if given the sequence
{d (t)}
t=0 , the privately-optimal saving sequence {s (t)}t=0 is such that s (t) > 0 for all t.
This discussion immediately establishes:

Proposition 26 Consider a fully funded social security system in the above-described environment whereby the government collects d (t) from young individuals at date t.
1. Suppose that s (t) 0 for all t. If given the feasible sequence {d (t)}
t=0 of Social
Security payments, the utility-maximizing sequence of savings {s (t)}

t=0 is such that
s (t) > 0 for all t, then the set of competitive equilibria without Social Security are the
set of competitive equilibria with Social Security.
218

2. Without the constraint s (t) 0, given any feasible sequence {d (t)}
t=0 of Social Security payments, the set of competitive equilibria without Social Security are the set of
competitive equilibria with Social Security.
This is very intuitive: the d (t) taken out by the government is fully oset by a decrease
in s (t) as long as individuals were performing enough savings (or always when there are no
constraints to force positive savings privately).
9.3.2
Unfunded Social Security
The situation is very dierent with unfunded Social Security. Now we have that the government collects d (t) from the young at time t and distributes this to the current old with
per capita transfer b (t) = (1 + n) d (t) (which takes into account that there are more young
than old because of population growth). Therefore, the individual maximization problem
becomes
max
c1 (t),c2 (t+1),s(t)
u (c1 (t)) + u (c2 (t + 1))
subject to
c1 (t) + s (t) + d (t) w (t)
and
c2 (t + 1) R (t + 1) (s (t)) + (1 + n) d (t + 1) ,
for a given feasible sequence of Social Security payment levels {d (t)}
t=0 .
What this implies is that the rate of return on Social Security payments is 1 + n rather
than R (t + 1), because unfunded Social Security is a pure transfer system. Only s (t) goes
into capital accumulation. Therefore, intuitively we expect unfunded Social Security to
219

reduce capital accumulation, and in economies with dynamic ineciencies, this may be a
good thing. This leads to the following proposition (proof left for Problem Set 5).
Proposition 27 Consider the above-described overlapping generations economy and suppose
that there is dynamic ineciency into decentralized competitive equilibrium. Then there exists
a feasible sequence of unfunded Social Security payments {d (t)}
t=0 which will constitute a
competitive equilibrium starting from any date t.
Intuitively, unfunded Social Security reduces the overaccumulation and improves the
allocation of resources. In many ways, this is equivalent to commodities being transferred
from high indexed agents to low indexed agents in the Shell example above.
220
Chapter 10
Recitation Material: Stochastic
Growth
10.1
The Brock-Mirman Model
The classic analysis of economic growth with stochastic shocks was undertaken by Brock
and Mirman in their 1972 paper. This was done in the context of optimal growth. However,
if the economy admits a representative household, it turns out that despite the stochastic
shocks, the First and Second Welfare Theorems still hold, so equilibrium growth is the
same as optimal growth. In fact, the Brock-Mirman model is the starting point of the Real
Business Cycle models you will study later. For now, it suces to note that this model, for
all practical purposes, is identical to the non-stochastic model, except that we have to think
of expectations. In particular, it is a solution to the following program:
max
{c(t),k(t)}
t=0
E0
X
t=0
221
t u (c (t))
(10.1)

subject to
k (t + 1) = A (t) f (k (t)) + (1 ) k (t) c (t) and k (t) 0,
(10.2)
with given k (0). Here E0 is the expectations operator conditional on information available
at time t = 0. The budget constraint with the production function substituted in, equation
(10.2), requires some care in interpreting. A (t) is now introduced as a stochastic productivity term. The expectations are taken because the time path of the sequence {A (t)}
t=0 is
not known in advance. This implies that strategies have to have the proper measurability
conditions. In particular, in general we can do this by assuming that information at time t is
represented by a partition Ft , so that E0 [x] = E [x | Ft ], and variables chosen at time t have
to be measurable with respect to Ft . This simply means that they can not be conditioned
on realizations of future-dated stochastic variables.
The above model can be enriched by assuming that there are stochastic preference shocks,
for example by augmenting the utility function u (c (t)) by a shock b (t), so that u (c (t) | b (t))
is also a random function dependent on the realization of b (t).
In addition, analysis of growth under uncertainty makes the standard assumptions on
the production function as in Assumptions 1 and 2 above, and the standard assumption on
preferences as in Assumption 8.
Given this setup, the problem can again be written as a dynamic programming problem,
but now it is a stochastic dynamic programming problem, in particular, it takes the form
V (k) = max {u (c) + EV [Af (k) + (1 ) k c]}
c(k)
(10.3)
where the expectation is included because there is uncertainty about future values of the
stochastic variable A. The rest of the analysis is very similar to the non-stochastic case,
except that the Euler equations also include expectations. For example, assuming that A (t)
222

is known at time t, the key Euler equation becomes:
u0 (c (t)) = Et [(A (t + 1) f 0 (k (t + 1)) + (1 )) u0 (c (t + 1))] .
10.2
Application: Risk, Diversification and Growth
I now present the model from Acemoglu-Zilibotti (JPE 1997) aimed at capturing the interaction between diversification of risks and capital accumulation, and emphasizing the
endogenous generation of risks in the growth process. This model will give an example of
stochastic growth and also illustrate how the productivity of capital can change endogenously over the development process, and dier across countries. Finally, this model will
also introduce some tools that are useful for analyses of dynamic stochastic economies.
10.2.1
The Environment
Consider the following model. There is a continuum of equally likely states represented by
the unit interval. Agents have to invest their savings in intermediate sectors, which will than
payo in the form of capital in the next period. Intermediate sector j [0, 1] pays a positive
return only in state j and nothing in any other state.
This formulation implies that investing in a sector is equivalent to buying a Basic Arrow
Security that only pays in one state of nature. More formally, an investment of F j in sector j
generates capital of the amount RF j if state j occurs and F j Mj , and nothing otherwise.
There is also is a safe project, which transforms one unit of savings into r < R of capital.
The requirement F j Mj implies that all intermediate sectors have linear technologies
but some require a certain minimum size, Mj , before being productive. The distribution of
223

minimum size requirements is given by:
Mj = max 0,
D
(j ) .
(1 )
Sectors j have no minimum size requirement and for the rest of the sectors, the minimum
size requirement increases linearly). The next figure shows the minimum size requirements
diagrammatically, and will be used for determining the equilibrium as well once demand for
assets is introduced:
There are two important features here

1. risky investments have a higher expected return than the safe asset (i.e. R > r);
2. dierent projects are imperfectly correlated so that there is safety in variety.
convenient implication of this formulation is that if a portfolio consists of an equi_
proportional investment F in all projects j J [0, 1], and the measure of the set
_
J is p, then the portfolio pays the return RF with probability p, and nothing with
probability 1 p.
224

These features imply that if the aggregate production set were convex (i.e. D = 0), all
agents would invest an equal amount in all intermediate goods sectors and diversify all risks.
However, in the presence of nonconvexities, as captured by the minimum size requirements,
there is a trade-o between insurance and high productivity.
The preferences of consumers over final goods is defined as:
Et U (ct , ct+1 ) = log(ct ) +
log(cjt+1 )dj,
(10.4)
which again ensure a constant savings rate. Note that integration over [0,1] is over the states
of nature. The individual life cycle and decisions are summarized in the next figure:
Output of the final good sector is given by:

.
Yt = AKt L1
t
and let us normalize labor to 1.
225
(10.5)

The aggregate capital stock depends on the realization of the state of nature. If the
R
j
j
j
= t (rh,t + RFh,t
)dh where Fh,t
is the amount of savings
state of nature is j, then Kt+1
invested by agent h t in sector j, h,t is the amount invested in the safe asset, and t is
the set of young agents at time t. Since both labor and capital trade in competitive markets,
equilibrium factor prices in state j are given as:
j
Wt+1
j
= (1 )A Kt+1
(1 )A
jt+1
10.2.2
j 1
= A Kt+1
A
(rh,t +
j
RFh,t
)dh
(rh,t +
j
RFh,t
)dh
(10.6)
(10.7)
Equilibrium
Now consider the portfolio decisions of households. Each household takes the set of traded
securities as given, and maximizes its utility by allocating its savings across dierent assets.
Securities are labeled by the indices of the project to which they are attached. Therefore,
one unit of security j entitles its holder to R units of t + 1 capital in state of nature j.
Denote the unit price of security j (in terms of savings of time t) by Pj,t . Assume that the
intermediates are supplied by financial intermediaries. Since 1 unit of savings invested in
a project thats open yields one unit of capital, competition among financial intermediaries
ensures that in equilibrium Pj,t = 1that is, all projects will be oered to households at
marginal cost.
Therefore, denoting the set of open projects at time t by Jt , optimal consumption, savings
and portfolio decisions can be characterized by:
log(ct ) +
max
st ,t ,{Ftj }
0j1
226
log(cjt+1 )dj,
(10.8)

subject to:
t +
Ftj dj = st ,
(10.9)
cjt+1 = jt+1 rt + RFtj ,
(10.10)
/ Jt ,
Ftj = 0, j
(10.11)
ct + st wt ,
(10.12)
It is important that these agents not only take wt , jt+1 , but also the set of risky assets Jt as
given.
A static equilibrium given wage earnings of young agents, Wt , (or given Kt ) is a solution
to the maximization problem (10.8) subject to (10.9)-(10.12), such that Ftj Mj for all
open sectors. A dynamic equilibrium is a sequence of static equilibria linked to each other
through (10.6)
Because preferences are logarithmic, the following saving rule is obtained irrespective of
the risk-return trade-o:
st s (wt ) =
wt .
1+
(10.13)
Given this result, a households optimization problem can be broken into two parts: first,
the amount of savings is determined, and then an optimal portfolio is chosen.
Next observe that in equilibrium we will have
0
1. Ftj = Ftj j, j 0 that are open (i.e. j, j 0 Jt ). Since each individual is facing the
same price for all of the traded symmetric Arrow securities, he would want to purchase
an equal amount of eachi.e., a balanced portfolio.
2. The set of open projects will be Jt = [0, nt ] for some nt [0, 1]. This states that when
only a subset of projects can be opened in equilibrium, small projects are opened
227

before large projects. As a result, if a sector j is open, all sectors j j must also
be open.
Given this result, the maximization problem simplifies to:
h
i
h
i
(qG )
(qB )
max nt log t+1 (RFt + rt ) + (1 nt ) log t+1 (rt ) ,
(10.14)
t + nt Ft = st ,
(10.15)
t,Ft
subject to:
(q )
B
where nt and jt+1 s are taken as parametric by the agent, and st is given by (10.13). t+1
=
(rt )1 is the marginal product of capital in the bad state, when the realized state
(q )
G
= (RFt + rt )1 applies in the good
is j > nt and no risky investment pays o. t+1
state, i.e. when the realized state is j nt .

Maximization of (10.14) gives:
t =
Ftj,
Then let:
nt (Kt )
(1 nt )R
s,
R rnt t
(10.16)
F Rr s , j n
t
t
Rrnt t
=
.
0
j > nt
1/2
(R+r){(R+r)2 4r[(Rr)(1) D
Kt +R]}
2r
if Kt
if Kt >
(10.17)
D 1/
D 1/
. Then there exists a unique equilibrium such that st =

where A(1 ) 1+
)AKt , and t , Ftj are given by (10.16) and (10.17) with nt = nt (Kt ).
(10.18)
(1
1+
This equilibrium can be expressed as the intersection of the aggregate demand of each
risky asset, F (nt ), with the thick curve that traces minimum size requirements in the figure.
228
10.2.3
Dynamics
Next, it is straightforward to characterize the full stochastic equilibrium process, the equilibrium law of motion of Kt is:
Kt+1 =
r(1nt )
RKt
Rrnt
RKt
prob. 1 nt
prob. nt
(10.19)
where nt = n (Kt ) is given by equation (10.18).

The capital stock follows a Markov process in which the level of capital next period
depends on whether the economy is lucky in the current period (which happens when the
risky investments pay-o, probability nt ).
Moreover, the probability of this event changes over time. As the economy develops, it
can aord to open more sectors, and the probability of transferring a large capital stock to
the next period, nt , increases.
Also from (10.19), the expected productivity of an economy depends on its level of
development and diversification. To see this, define expected total factor productivity
(conditional on the proportion of sector open) by
e (n (Kt )) = (1 n )
r(1 n )
R + n R
R rn
(10.20)
Simple dierentiation establishes that as nt increases, this measure also increases.

To formalize the dynamics of development, define the following concepts;
(i) QSSB: quasi steady state of an economy which always has unlucky draws. An economy would converge to this quasi steady state if it follows the optimal investments
characterized above but the sectors invested never pay-o due to bad luck .
229

(ii) QSSG: quasi steady state of an economy which always receives good news.
The capital stocks of these two quasi steady states are:
K QSSB
1
"
# 1
r 1 n (K QSSB )
R
=
R rn (K QSSB )
and
K QSSG = (R) 1 .
(10.21)
If uncertainty could be completely removed, that is n(K QSSG ) = 1, then there would
never be bad news, and the good quasi steady state would be a real steady state; a point, if
reached, from which the economy would never depart.
From equations (10.18) and (10.21), the condition for this steady state to exist is that
the saving level corresponding to K QSSG be sucient to ensure a balanced portfolio of
investments, of at least D, in all the intermediate sectors. Thus, if:
D < 1 R 1 ,
a steady state will exist and we denote it by K SS .

The following figure is useful in understanding the dynamics.
230
(10.22)
At very low levels of capital, the Inada conditions of the production function guarantee
positive growth even conditional on bad news (both curves lie above the 45 line). Then,
there is a range (region II) in which growth only occurs conditional on good draws (the bad
draws curve is below the 45 line).
Regions I and II are separated by K QSSB . When they are below this level, all economies
will grow towards it. When they are above this level, their output will fall in case they receive
bad shocks, and the probability of bad news is very high when the economy has a level of
capital stock just above K QSSB .
As good news is received, the capital stock will grow and the probability of a further
lucky draw will increase. Note that even when it grows, the economy is still exposed to large
undiversified risks, and will typically experience some set-backs.
Finally, provided (10.22) is satisfied, the economy will eventually enter region III where
231

all diversifiable risks will be removed (since all sectors are open and an equal amount is
invested in all sectors), and there will be deterministic convergence to K SS .
10.2.4
Eciency
Since all agents are price takers, it may be conjectured that the decentralized equilibrium
here is ecient. This turns out not to be the case. To illustrate this, consider the portfolio
allocation that a social planner maximizing the welfare of the current generation of savers
would choose taking the amount of savings as given.
The dierence between the social planners allocation and a decentralized equilibrium is

that, the social planner explicitly chooses Ftj and the number of open sectors, Jt . It is
straightforward to see that the subset of projects in which the planner will invest will be of
the form J F B = [0, nF B ]similar to the decentralized equilibrium.
Therefore, subject to feasibility, the planner will solve
max
nt ,t ,{Ftj }
0jnt
nt
log(RFtj + rt )dj + (1 nt ) log(rt ).
(10.23)
This maximization problem leads to the following result:

Let n (Kt ) be given by (10.18) and St = st denote total savings. Then,St < D,
nF B (Kt ) > n (Kt ), F B (Kt ) < (Kt ) and each agent receives the following portfolio of
assets:
jt < nF B (St ) s.t.
F j,F B = Mj > Mj if
t
Ftj,F B = Mj
F j,F B = 0
t
jt < jt
if nF B (Kt ) jt jt .
if
And St D, nF B (Kt ) = n (Kt ) and Ftj,F B = St , j.

232
jt > nF B (Kt )
(10.24)

In other words, the social planner will always open more sectors/projects than the decentralized equilibrium, and will finance this by investing less in the sectors without the
minimum size requirement. This is shown in the next picture:
Why is the decentralized equilibrium inecient? The answer is a pecuniary externality

due to missing markets. As an additional sector opens, all existing projects become more
attractive relative to the safe asset because the amount of undiversified risks they carry are
reduced, and as a result, risk-averse agents are more willing to buy the existing securities.
Since each agent ignores his impact on others diversification opportunities, the externality
is not internalized.
It is important to also note that the decentralized equilibrium did not correspond to an
Arrow-Debreu equilibrium, and this gives the technical intuition for the ineciency. In an
Arrow-Debreu equilibrium, all commodities, even those that are not traded in equilibrium
are priced, whereas such a price schedule does not exist in this economy because of the non233

convexities of the production set. Instead, the analysis here uses a more natural competitive
equilibrium notion (common in general equilibrium analyses of monopolistic competition)
where only commodities traded in equilibrium are priced.
10.2.5
Implications
An important implication of this analysis is that there will be systematic dierences in

productivity across countries depending on the realization of past shocks: economies with
low capital, that is those that had received bad shocks in the past, will have fewer sectors
open, and therefore, they will correctly fear undiversified risks. As a result, these economies
will invest in the low productivity safe assets, and achieve low productivity.
The analysis also implies a systematic relationship between the variability of the current
performance and the level of development (the level of the capital stock). Richer countries
will have less variable growth rates. This is a pattern we see in the data.
10.2.6
Ineciency with Alternative Market Structures
Would the market failure in portfolio choices be overcome if some financial institution could
coordinate households investment decisions? Imagine that rather than all agents acting
in isolation and ignoring their impact on each others decisions, funds are intermediated
through a financial coalition-intermediary. This intermediary can collect all the savings and
oer to each saver a complex security (as dierent from a Basic Arrow Security) that pays
RFtj,F B +rFt B in each state j, where Ftj,F B and Ft B are as in the optimal portfolio. Holding
this security would make each consumer better o compared to the equilibrium.
Although from this discussion it may appear that the ineciency we identified may
234

not be robust to the formation of more complex financial institutions, we will show that
this is not the case. Unless some rather strong assumptions are made about the set of
contracts that a financial intermediary can oer, the unique equilibrium allocation with
unfettered competition among intermediaries will be identical to the one we characterized
as the equilibrium above.
In order to model the endogenous formation of coalitions, let us now assume that savings can be intermediated by some households who decide to act as middlemen and run an
investment fund. Put dierently, following Townsend (1983), some agents initiate the formation of a coalition of households which buys securities on behalf of its members. In return,
participants to the financial coalition can be charged an intermediation fee, . Projects are
still run by individual households.
Let me now introduce the following three assumptions for the coalition-formation game:
1. An agent cannot be part of two coalitions at the same time.
2. Coalitions at all points maximize a weighted utility of their members. In particular,
a coalition cannot commit to a path of action that will be against the interests of its
members in the continuation game.
3. Coalitions cannot exclude other agents (or coalitions) from investing in a particular
project.
The first assumption is introduced to simplify the objective function of coalitions. In fact,
this assumption makes it easier for an ecient allocation to be sustained as an equilibrium.
The second one is the most important assumption. We view this as a very natural assumption
along the lines of subgame perfection, and its importance will be discussed further below.
235

Assumption three is also mainly expositional. We will see below that as long as Assumption
number two holds, coalitions would never want to exclude others, and thus this assumption
is only imposed to simplify the exposition.
Formally, the game has now three stages. In the first stage, each household h can
announce that he is willing to act as an intermediary for a specified set of households h
(where h , the set of all subsets of , and we define (). as the Lebesque measure over ).
In general, only a subset of agents belonging to h will accept the oer of the intermediary.
Let ah h denote this subset of households. Note that because of assumption 1, in
equilibrium will be partitioned into disjoint coalitions. The intermediary h will invest the
savings he collects (net of his commission h ) in shares of both risky and safe projects so
as to maximize the total utility of the agents belonging to ah . A first-stage strategy for
(1)
h , h ) R+ . If agent h announces that he

household h is an announcement Zh = (
(1)
will not act as an intermediary, then Zh = . Among the possible non-null announcements,
(1)
there is autarky, i.e. Zh = (0, {h}), which means that h will only intermediate (at most)
his own savings. Finally, we denote the set of first-stage announcements of all agents by
Z (1) : R+ .
In the second stage, each agent h can announce his plan to run at most one project
and sell the corresponding Basic Arrow Security, i.e. h announces a pair (j, Pj,h ), as in
the game discussed in Section 3. But, now, securities are sold to financial intermediaries
rather than directly to households. Formally, the second-stage announcement for agent h
(2)
is Zh = (j, Pj,h ) [0, 1] R+ . and Z (2) : [0, 1] R+ is the set of all second-period
announcements. We will also denote the set of minimum security prices announced in the
second stage of the game by P = {P j }jJ .
In the third stage, each household takes the set of prior announcements, Z (1) and Z (2) ,
236

(3)
as given, and chooses which coalition to join. Or equivalently, Zh is hs choice of an

o
(1) n
(1)
intermediary from Mh Z
i , i ), h i , the set of coalitions which
i | Zi = (
announced his name. Note that although the set Mh Z (1) could be empty, this will never be
the case in equilibrium, since any agent can costlessly make the autarky announcement in the
first stage. Finally, after all agents announce which coalition-intermediary they will belong
to, each intermediary makes the optimal investment decision. We still use the notation h ,
Fhj to denote the investment of an agent (through a coalition) in the safe and risky assets.
More precisely, if a coalition invests Fj in project j, then Fhj will be the share of agent h
in this coalition times Fj .
Definition 9 A (perfect) equilibrium is a set of announcements Z = (Z (1) , Z (2) , Z (3) )
at each stage of the game, a price function P (Z ) for all Basic Arrow Securities, a saving
decision sh (Z ), and induced holdings of the safe asset h (Z ) and securities Fhj (Z ) for all
agents, and factor payments W and such that given the announcements of the previous
stage(s) and the announcements of all other agents in the current stage, every household
(i)
chooses Zh that maximizes its utility as given by (5) and factor returns are determined by
(10.6) and (10.7).
Note that the definition of equilibrium used so far was also subgame perfect. Here we
emphasize perfection in order to reiterate the importance of Assumption 3 in our analysis.
The first observation is that free entry will drive profits (commissions) to zero in both
the first and second stages. This is established by the following lemma (proof omitted).
Lemma 3 In equilibrium, (i) P j, (Z ) = 1, j; (ii) h = 0, h .
With this remark, it is now possible to establish the following proposition:
237

Proposition 28 The set of (perfect) equilibria is non-empty and all allocations in this set
have the following characteristics:
1. h , Mh 6= (all agents are included in some coalition).
(2)
2. Let nt be defined in (10.18). Then, h either Zh

(2)
j [0, nt ] . And, j [0, nt ] h such that Zh
(2)
= or Zh
= (j, 1) where
= (j, 1).
3. In the third stage ah 6= will choose a portfolio which induces t and Ftj = Ft as
given by equations (10.16) and (10.17).
This result implies that even with unrestricted coalitions, the ineciency cannot be
prevented. The key feature is that each agent would be creating a positive externality by
holding a non-balanced portfolio like the one necessary for eciency, and they will typically
find a way of moving towards a balanced portfolio, undermining eorts to sustain the ecient
allocation.
238
Part III
Endogenous Growth
239

Until now, we investigated economic growth models without growth in the sense that,
either the economy settled into a steady state without any economic growth, or growth
came exogenously from the unmodeled process of labor-augmenting technological progress.
The rest of the lectures will look into issues of endogenous growth and how the process of
development, whereby a society makes the transition from being a less-developed economy
to a more developed one, takes place endogenously.
241
242
Chapter 11
First-Generation Models of
Endogenous Growth
The first-generation models of endogenous growth made a big advance relative to the neoclassical growth model in generating sustained growth. Two approaches are noteworthy here.
The first one basically keeps the essence of the neoclassical approach, with competitive markets and no externalities. The second makes the first attempt at endogenizing technology
by introducing externalities and knowledge spillovers (flows) across firms.
11.1
AK Model Revisited
Let us start with the simplest neoclassical model of sustained growth, which we already
encountered in the context of the Solow growth model. This is the so-called AK model, where
the production technology is linear in capital. We will also see that in fact what matters is
that the accumulation technology is linear, not necessarily the production technology. But
243

for now it makes sense to start with the simpler case of the AK economy.
11.1.1
Since there will be growth and we are, at least at first, interested in balanced growth, we
are forced to use preferences that are asymptotically consistent with balanced growth. We
may as well assume these preferences from the beginning, thus choose the standard CRRA
preferences of the canonical model.
More specifically, let us assume that the economy admits an infinitely-lived representative
household with utility given by
Z
U=
c1 1
exp ( ( n) t)
dt,
1
(11.1)
a = (r n)a + w c,
(11.2)
subject to the constraint
where a is assets per person, r is the interest rate, w is the wage rate, and n is the growth
rate of population. I have now suppressed time dependence to simplify notation.
We again impose the no-Ponzi game constraint:
Z t
[r(s) n] ds
0
lim a(t) exp
t
(11.3)
The Euler equation for the representative household is the same as before and gives:
c
1
= (r )
c
and the transversality condition is:
Z t
[r(s) n] ds
= 0.
lim a(t) exp
t
244
(11.4)
(11.5)

The production sector is similar to before, except that Assumptions 1 and 2 are not
satisfied.
More specifically, we have that per capita output is given by
y = f (k) = Ak,
(11.6)
with A > 0 being a constant. Equation (11.6) has a number of notable dierences from
our standard production function satisfying Assumptions 1 and 2. First, output is only a
function of capital, and there are no diminishing returns (i.e., it is no longer the case that
f 00 () < 0). More important is the fact that the Inada conditions embedded in Assumption
2 are no longer satisfied. In particular,
lim f 0 (k) = A > 0.
This feature is essential for sustained growth.

The conditions for profit-maximization are very similar to before, and require that the
marginal product of capital be equal to the rental price of capital, R = r + . Since, as is
obvious from equation (11.6), the marginal product of capital is constant and equal to A,
we also have that the net rate of return on the savings is constant and equal to:
r = A .
(11.7)
Since the marginal product of labor is zero, the wage rate, w, is zero. This is a somewhat
extreme result, and again it can be relaxed as we will see below. Alternatively, in this model
we can think of k as a combination of physical and human capital, in which case there will
be labor income coming from human capital, which will be accumulating in the same way
as physical capital (in particular linearly).
245
11.1.2
Equilibrium
To characterize the equilibrium, which is defined in exactly the same way as in the basic
neoclassical model, we again use a = k, r = A , and w = 0, and substitute these into
equations (11.2), (11.4), and (11.5), to obtain:
k = (A n)k c
c
1
= (A ),
c
lim k(t)e(An)t = 0.
(11.8)
(11.9)
(11.10)
The important result immediately follows from equation (11.9). There is a constant
rate of consumption growth (as long as A > 0), and this is entirely independent of
the level of capital stock per person, k. This will also imply that there are no transitional
dynamics in this model. Starting from any k (0), the economy will immediately start growing
at a constant rate. To see this, integrate equation (11.9) starting from some initial level of
consumption c(0) [still to be determined]. This gives
1
(A )t .
c(t) = c(0) exp
(11.11)
Since there is growth in this economy, we have to ensure that the transversality condition
is satisfied (i.e., that lifetime utility is bounded away from infinity), and also we want to
ensure positive growth. Therefore we impose:
Assumption 12
A>+ >
1
(A ) + n + .
246

The first part of this condition ensures that there will be positive consumption growth,
while the second part is the analogous condition to + g > g + n in the neoclassical growth
model with technological progress, which was imposed to ensure bounded utility (and thus
was used in proving that the transversality condition was satisfied).
11.1.3
We now more explicitly show there are no transitional dynamics, that is, not only the growth
rate of consumption, but the growth rates of capital and output are also constant at all points
in time, and equal the growth rate of consumption given in equation (11.9).
To do this, let us substitute for c(t) from equation (11.11) into equation (11.8), which
yields
k = (A n)k c(0) exp 1 (A )t ,
(11.12)
which is a first-order, non-autonomous linear dierential equation in k. This type of equation

can be solved easily. In particular recall that if
z = az + g (t) ,
then, the solution is
z (t) = z0 exp (at) + exp (at)
exp (as) g(s)ds
for some constant z0 chosen to satisfy the boundary conditions. Therefore, equation (11.12)
solves for:
k(t) = exp((A n) t) + [(A )( 1)/ + / n]1 [c(0) exp ((1/) ((A )t))] ,
(11.13)
247

where is a constant to be determined. Notice also that Assumption 12 ensures that
[(A )( 1)/ + / n] > 0.
From (11.13), it may look like capital is not growing at a constant rate, since it is the
sum of two components growing at dierent rates. However, this is where the transversality
condition becomes useful. Let us substitute from (11.13) into the transversality condition,
(11.10), which yields
lim [ + [(A )( 1)/ + / n]1 c(0) exp ( [(A )( 1)/ + / n] t)] = 0.
Note that [(A )( 1)/ + / n] > 0, so the second term in this expression converges to zero as t . But the first term is a constant. Thus the transversality condition
can only be satisfied if = 0. Therefore we have from (11.13) that:
k(t) = [(A )( 1)/ + / n]1 [c(0) exp ((1/) ((A )t))]
(11.14)
= k (0) exp ((1/) ((A )t)) ,

where the second line immediately follows from the fact that the boundary condition has to
hold for capital at t = 0. This equation naturally implies that capital and output grow at
the same rate as consumption.
It also pins down the initial level of consumption exactly as
c (0) = [(A )( 1)/ + / n] k (0) .
(11.15)
It is also interesting to note that in this simple AK model, growth is not only endogenous in the sense of being sustained, but it is also endogenous in the sense of being
aected by underlying parameters. For example, consider an increase in the rate of discount,
248

. Recall that in the Ramsey model, this aected the level of income per capita, but had no
eect on the growth rate, which was determined by the exogenous labor-augmenting rate of
technological progress. Here, clearly it will reduce the growth rate. Similarly, changes in A
and aect the levels and growth rates of consumption, capital and output. In Problem Set
5, you will also see that policy can now aect the growth rate of output permanently.
Finally, we can calculate the saving rate in this economy. It is defined as total investment
(increase in capital plus replacement investment) divided by output:
s=
+n+
k/k
A + n + ( 1)
K + K
=
=
Y
A
A
(11.16)
where naturally k/k

= (A )/. Consequently, the savings rate which was taken as
exogenous in the basic Solow model is now a function of parameters, and more specifically
of exactly the same parameters that determine the per capita growth rate.
Summarizing, we have:
Proposition 29 Consider the above-described AK economy, with a representative household
with preferences given by (11.1), and the production technology given by (11.6). In this
economy there exists a unique equilibrium path in which consumption, capital and output all
grow at the same rate g (A )/ starting from any initial positive capital stock per
worker k (0), and the savings rate is endogenously determined by (11.16).
One important implication of the AK model is that since all markets are competitive,
there is a representative household, and there are no externalities, the competitive equilibrium will be Pareto optimal. This can be proved either using First Welfare Theorem type
reasoning, or by directly constructing the optimal allocation. The result is stated in the next
proposition and left for you to prove):
249

Proposition 30 Consider the above-described AK economy, with a representative household
with preferences given by (11.1), and the production technology given by (11.6). In this
economy the unique equilibrium path in which consumption, capital and output all grow at
the same rate g (A )/ is Pareto optimal.
11.1.4
The Role of Policy
It is straightforward to incorporate policy into this framework. The simplest and arguably
one of the most relevant classes of policies is, as also discussed above, that which aects the
rate of return to accumulation. In particular, suppose that there is an eective tax rate of
on the rate of return from capital income. Repeating the analysis above immediately implies
that this will adversely aect the growth rate of the economy, which will now become:
g=
(1 ) (A )
.
Moreover, it can be calculated that the savings rate will now be

s=
(1 ) A + n (1 )
,
A
which is a decreasing function of if A > 0. Therefore, in this model, the savings rate
is constant in equilibrium as in the basic Solow model, but in contrast to that model, it
responds endogenously to policy.
11.2
The Extended AK Model
The model studied in the previous section is attractive in many respects. It generates
sustained growth, which responds to policy, to underlying preferences and to technology.
250

Moreover, it is a very close cousin of the neoclassical model. In fact, as argued there, the
endogenous growth equilibrium is Pareto optimal.
One unattractive feature of this model, however, is that all of national income accrues
to capital. Essentially, it is a one sector model with only capital as the factor of production.
This makes it unattractive as an application to real world situations. It also blurs what is
the key underlying characteristic driving growth in this model. As I pointed out above, it
is not that the production technology is AK, but the related feature that the accumulation
technology is linear.
In this section, I will briefly illustrate this by developing a more workable version of the
AK model with two sectors, which in fact features a constant share of capital in national
income less than 1.
The preference and demographics are the same as in the model of the previous section, in
particular, equations (11.1)-(11.5) apply as before (but with a slightly dierent interpretation
for the interest rate in (11.4) as will be discussed below). Moreover, to simplify the analysis
I will shut down population growth, so n = 0, and the total amount of labor in the economy
is equal to L and is supplied inelastically.
The main dierence is in the production technology. Rather than a single good used
for consumption and investment, we now envisage an economy with two sectors. Sector 1
produces consumption goods with the following technology
C (t) = B (KC (t)) LC (t)1 ,
(11.17)
where the subscript C denotes that these are capital and labor used in the consumption
sector, which has a Cobb-Douglas technology. In fact, the Cobb-Douglas assumption here
is quite important in ensuring that the share of capital in national income is constant [can
251

you see why?]. The capital accumulation equation is given by:
K (t) = I (t) K (t) ,
where I (t) denotes investment. Investment goods are produced with a dierent technology
than (11.17), however. In particular, we have
I (t) = AKI (t) .
(11.18)
The distinctive feature of the technology for the investment goods sector, (11.18), is that
it is linear in the capital stock and does not feature labor. This is an extreme version of
an assumption often made in two-sector models that the investment-good sector is more
capital-intensive than the consumption-good sector. In the data, there seems to be some
support for this, though the capital intensities of many sectors have been changing over time
as the nature of consumption and investment goods has changed.
Market clearing implies:
KC (t) + KI (t) K(t),
for capital, and
LC (t) L,
for labor (since labor is only used in the consumption sector).
An equilibrium in this economy is defined similarly to that in the neoclassical economy,
but also features an allocation decision of capital between the two sectors. Moreover, since
the two sectors are producing two dierent goods, consumption and investment goods, there
will be a relative price between the two sectors which will adjust endogenously.
Since both market clearing conditions will hold as equalities (the marginal product of
both factors is always positive), we can simplify notation by letting (t) denotes the share
252

of capital used in the investment sector
KC (t) = (1 (t)) K (t) and KI (t) = (t) K(t).
From profit maximization, the rate of return to capital has to be the same when it is
employed in the two sectors. Let the price of the investment good be denoted by pI (t) and
that of the consumption good by pC (t), then we have
pI (t) A = pC (t) B
L
(1 (t)) K (t)
(11.19)
Define a steady-state (a balanced growth path) as an equilibrium path in which (t) is

constant and equal to say . Moreover, let us choose the consumption good as the numeraire,
so that pC (t) = 1 for all t. Then dierentiating (11.19) implies that at the steady state:
pI (t)
= (1 ) gK ,
pI (t)
(11.20)
where gK is the steady-state (BGP) growth rate of capital.

As noted above, the Euler equation for consumers, (11.4), still holds, but the relevant
interest rate has to be for consumption-denominated loans, denoted by rC (t). In other words,
it is the interest rate that measures how many units of consumption good an individual will
receive tomorrow by giving up one unit of consumption today. Since the relative price
of consumption goods and investment goods is changing over time, the proper calculation
goes as follows. By giving up one unit of consumption, the individual will buy 1/pI (t)
units of capital goods. This will have an instantaneous return of rI (t). In addition, the
individual will get back the one unit of capital, which has now experienced a change in
its price of pI (t) /pI (t), and finally, he will have to buy consumption goods, whose prices
changed by pC (t) /pC (t). Therefore, the general formula of the rate of return denominated
253

in consumption goods in terms of the rate of return denominated in investment goods is
rC (t) =
rI (t) pI (t) pC (t)

+
.
pI (t) pI (t) pC (t)
In our setting, given our choice of numeraire, we have pC (t) /pC (t) = 0. Moreover, pI (t) /pI (t)
is given by (11.20). Finally,
rI (t)
=A
pI (t)
given the linear technology in (11.18). Therefore, we have
rC (t) = A +
pI (t)
.
pI (t)
and in steady state, from (11.20), the steady-state consumption-denominated rate of return
is:
rC = A (1 ) gK .
From (11.4), this implies a consumption growth rate of
gC
1
C (t)
= (A (1 ) gK ) .
C (t)
(11.21)
Finally, dierentiate (11.17) and use the fact that labor is always constant to obtain
C (t)
K C (t)
=
,
C (t)
KC (t)
which, from the constancy of (t) in steady state, implies the following steady-state relationship:
gC = gK .
Substituting this into (11.21), we have
gK
=
A
1 (1 )
254
(11.22)

and
gC =
A
.
1 (1 )
(11.23)
What about wages? Now since labor is being used in the consumption good sector, there
will be positive wages. Since labor markets are competitive, the wage rate at time t is given
by
w (t) = (1 ) pC (t) B
(1 (t)) K (t)
L
Therefore, in the balanced growth path, we obtain

K (t)
w (t)
pC (t)
=
+
w (t)
pC (t)
K (t)
= gK ,
which implies that wages also grow at the same rate as consumption.
Moreover, with exactly the same arguments as in the previous section, it can be established that there are no transitional dynamics in this economy. This establishes the following
result:
Proposition 31 In the above-described extended AK economy, starting from any K (0) > 0,
consumption and labor income grow at the constant rate given by (11.23), while the capital
stock grows at the constant rate (11.22).
It is straightforward to conduct policy analysis in this model, and as in the basic AK
model, taxes on investment income will depress growth. Similarly, a lower discount rate will
increase the equilibrium growth rate of the economy
One important implication of this model, dierent from the neoclassical growth model, is
that there is continuous capital deepening. Capital grows at a faster rate than consumption
255

and output. Whether this is a realistic feature or not is debatable. The Kaldor facts,
discussed above, include constant capital-output ratio as one of the requirements of balanced
growth. Here we have steady state and balanced growth without this feature. For much of
the 20th century, capital-output ratio has been constant, but it has been increasing steadily
over the past 30 years. Part of the reason why it has been increasing recently but not before
is because of relative price adjustments. New capital goods are of higher quality, and this
needs to be incorporated in calculating the capital-output ratio. These calculations have
only been performed in the recent past, which may explain why capital-output ratio has
been constant in the earlier part of the century, but not recently.
11.3
Growth with Externalities
The model that started much all the interest in endogenous growth is Romer (1997). Romer
wanted to explicitly model the process of knowledge accumulation, but realized that this
would be dicult in the context of a competitive economy. His initial solution (later updated
and improved in his and others work during the 1990s) was to consider knowledge as a
byproduct of production that accumulates by itself. I now present this model.
11.3.1
Preferences and Technology
Consider an economy without any population growth (we will see why this is important)
and a production function with labor-augmenting knowledge (technology) that satisfies the
standard assumptions, Assumptions 1 and 2. For reasons that will become clear, instead
of working with the aggregate production function, let us look at the production function
256

facing each one of the many infinitesimal final good producers (each indexed by i):
Yi (t) = F (Ki (t) , A (t) Li (t)) ,
(11.24)
where Ki (t) and Li (t) are capital and labor rented by a firm i. Notice that A (t) is not
indexed by i, since it is technology common to all firms. Let us normalize the measure of
final good producers to 1, so that we have the following market clearing conditions:
Z 1
Ki (t) = K (t)
0
and
Li (t) = L,
where L is the constant level of labor (supplied inelastically) in this economy. Firms are
competitive in all markets, which implies that they will all hire the same capital to eective
labor ratio, and moreover, factor prices will be given by their marginal products, thus
F (K (t) , A (t) L)
L
F (K (t) , A (t) L)
.
R (t) =
K (t)
w (t) =
The key assumption of Romer (1997) is that although firms take A (t) as given, this stock
of technology (knowledge) advances endogenously for the economy as a whole. In particular, Romer assumes that this takes place because of spillovers across firms, and attributes
spillovers to physical capital. Lucas (1998) develops a similar model in which the structure is
identical, but spillovers work through human capital (i.e., while Romer has physical capital
externalities, Lucas has human capital externalities).
The idea of externalities is not uncommon to economists, but both Romer and Lucas
make an extreme assumption of suciently strong externalities such that A (t) can grow
257

continuously at the economy level. In particular, Romer assumes
A (t) = BK (t) ,
(11.25)
i.e., the knowledge stock of the economy is proportional to the capital stock of the economy.
This can be motivated by learning-by-doing whereby, greater investments in certain sectors
increases the experience (of firms, workers, managers) in the production process, making the
production process itself more productive. Alternatively, the knowledge stock of the economy
could be a function of the cumulative output that the economy has produced up to now,
thus giving it more of a flavor of learning-by-doing. The reason why the externalities work
through capital might be justified along the lines of the structural change model we will
discuss below, where it is assumed that the manufacturing sector, which is more capitalintensive, is more important for generating externalities (whether this is so or not is not very
clear, and in any case, there is no compelling evidence that such externalities are very large).
In any case, substituting for (11.25) into (11.24) and using the fact that all firms are
functioning at the same capital-eective labor ratio, we obtain the production function of
the representative firm as
Y (t) = F (K (t) , BK (t) L) .
Using the fact that F (, ) is homogeneous of degree one, we have

Y (t)
= F (1, BL)
K (t)
= f (L) .
258

Alternatively, output per capita can be written as:
Y (t)
L
Y (t) K (t)
=
K (t) L
= k (t) f (L) ,
y (t)
where again k (t) K (t) /L is the capital-labor ratio in the economy.

As in the standard growth model, marginal products and factor prices can be expressed
in terms of the normalized production function, now f (L). In particular, we have
w (t) = K (t) f0 (L)
(11.26)
R (t) = R = f (L) Lf0 (L) ,
(11.27)
and
which is constant.
11.3.2
Equilibrium
An equilibrium is defined similarly to the neoclassical growth model, as a path of consumption and capital stock for the economy, [C (t) , K (t)]
t=0 that maximize the utility of the
representative household and wage and rental rates [w (t) , R (t)]
t=0 that clear markets. The
important feature is that because the knowledge spillovers, as specified in (11.25), are external to the firm, factor prices are given by (11.26) and (11.27)that is, they do not price the
role of the capital stock in increasing future productivity.
Since the market rate of return is r (t) = R (t) , it is also constant. This immediately
implies that consumption in this economy, given by the usual Euler equation, grows at the
259

constant rate,
gC =
1
f (L) Lf0 (L) .
(11.28)
It is also clear that capital grows exactly at the same rate as consumption, so the rate of
capital, output and consumption growth are all given by gC as given by (11.28).
Let us assume that
f (L) Lf0 (L) > 0,
(11.29)
so that there is positive growth, but also that growth is not fast enough to violate the
transversality condition, in particular,
f (L) Lf0 (L) <
+ .
1
(11.30)
It is also straightforward to verify that as in the AK model above, there are no transitional
dynamics in this model. This establishes:
Proposition 32 In the above-described Romer model with physical capital externalities, as
long as conditions (11.29) and (11.30) are satisfied, there exists a unique equilibrium path
where starting with any level of capital stock K (0) > 0, capital, output and consumption
grow at the constant rate (11.28).
You can also see now why population was assumed constant in this model. To do this,
first, note that there is a scale eect here, in that when population (labor force) L is higher,
since f (L) Lf0 (L) is always increasing in L (by Assumption 1), the growth rate of the
economy will increase. Moreover, if population is growing constantly, the economy will not
admit a steady state and the growth rate of the economy will increase over time (output
reaching infinity in finite time and violating the transversality condition).
260
11.3.3
Pareto Optimal Allocations
Given the presence of externalities, it is not surprising that the decentralized equilibrium
characterized in Proposition 32 is not Pareto optimal. To characterize the allocation that
maximizes the utility of the representative household, let us again set up on the currentvalue Hamiltonian, noting that the per capita accumulation equation for this economy can
be written as
k = f (L) k c k.
The current-value Hamiltonian is
h
i
c1 1
+ f (L) k c k ,
H (k, c, ) =
1
and has the necessary conditions:
c (k, c, ) = 0 = c
H
h
i
Hk (k, c, ) = + = f (L) ,
lim [exp (t) (t) k (t)] = 0.
These equations imply that the social planners allocation will also have a constant growth
rate for consumption (and output) given by
gCS =
1
f (L) ,
which is always greater than gC as given by (11.28)since f (L) > f (L) Lf0 (L). Essentially, the social planner takes into account that by accumulating more capital, she is
improving the productivity in the future. Since this eect is external to the firms, the
decentralized economy fails to internalize this externality. Therefore we have:
261

Proposition 33 In the above-described Romer model with physical capital externalities, the
decentralized equilibrium is Pareto suboptimal and grows at a slower rate than the allocation
that would maximize the utility of the representative household.
262
Chapter 12
Multiple Equilibria and the Process of
Development
The models discussed so far generated sustained economic growth, which is important both
for understanding why some countries are much richer today than others, and the historical
process of economic growth leading to the modern world. However, the process of economic
development is not simply a linear sustained growth process. The process of development, as
emphasized by Simon Kuznets, is also one of the transformation of the economy. Agriculture
becomes less important, manufacturing becomes more important (and then later services
become more important). Urbanization increases. Simultaneously, there is a process of
coordination, or perhaps cumulative causation (where an economic process becomes selfsustaining once underway) going on, in which the increase in demand for certain goods and
services (especially coming from cities), fuels further growth. Many economic, social and
economic institutions also change in the process. To do justice to these topics, we need to
delve much deeper into issues of development economics and political economy, which are
263

beyond the scope of the current course. But, we can start getting a sense of these processes
by quickly looking at models of economic development emphasizing multiple equilibria, or
multiple steady states, and also looking at a very simple version of a model of structural
change incorporating the features emphasized by Simon Kuznets.
12.1
Multiple Equilibria From Aggregate Demand Externalities
Let us start with a very simple model of multiple equilibria arising from aggregate demand
externalities. Below in discussing models of endogenous technological change, monopolistic
competition will play a crucial role, since firms that discover new machines will become the
monopolistic suppliers of these machines or of goods produced with these machines. However,
the focus there will not be on multiple equilibria. Here we start with a simple two-period
model of an economy with monopolistic competition, which will lead to multiple equilibria.
The model is a version of Murphy, Shleifer and Vishnys (1989) Big Push paper. As the
name of the paper suggests, the idea is to think of the development process as a move from
one equilibrium to another, likely due to a coordinated move, a big push.
12.1.1
Preferences and Technology
Consider the following two-period economy. All agents have preferences given by
U=
C11 1
C 1 1
+ 2
1
1
where C1 and C2 denote consumption at the two dates. plays a similar to before, with
1/ being the intertemporal elasticity of substitution, regulating how willing individuals are
264

to substitute consumption between date 1 and date 2, and is the discount factor of the
households.
The resource constraint for the economy is
C1 + I1 Y1
C2 Y2 ,
where I1 denotes investment in the first date, Yt is total output at date t, and investment is
only possible in the first date.
Individuals can borrow and lend, so an individuals budget constraint is
c1 +
c2
w2 + 2
w1 + 1 +
,
R
R
where t denotes the profits accruing to the representative consumer, and wt is the wage
rate at time t. R is the gross interest rate. Although individuals can borrow and lend, in
the aggregate, the resource constraints have to hold, so R will be determined in equilibrium
to ensure this.
The new feature in this model is that output is an aggregate of intermediates. In particular, there is a continuum of dierentiated intermediate goods, with their total measure
normalized to 1, and the aggregate production at time t is given by:
Yt =
yt (i)
di
where yt (i) is the output level of intermediate i at date t. This production function has
the standard love-for-variety feature first introduced by Dixit and Stiglitz. This functional
form can be used either for aggregating intermediates or directly as a utility function. Its
advantage is that it provides an extremely tractable model of substitution between dierent
265

goods, both with competition and monopolistic competition, because the elasticity of demand
for each good is constant. We will make extensive use of these preferences in the rest of the
course. For now, note that is the elasticity of substitution between intermediate goods
within a given period, and is assumed to be strictly greater than one, i.e., > 1.
The economy has a total labor supply of L, supplied inelastically.
The production function of each good is as follows:
y1 (i) = l1 (i)
and
l (i) with old technology

2
y2 (i) =
l (i) with new technology
2
(12.1)
where > 1 and lt (i) denotes labor devoted to the production of intermediate good i at
time t. Labor market clearing, naturally, requires
Z
lt (i) di L
(12.2)
At date 1, there is a designated producer for each intermediate, but a competitive fringe
can also enter and produce each good as productively as the designated producer. At date
1, the designated producer can also invest in the new technology, which costs F per firm.
If this investment is undertaken, this producers productivity at date 2 will be higher by a
factor as indicated by equation (12.1). In contrast, the fringe will not benefit from this
technological improvement, thus the designated producer will have some degree of monopoly
power.
All firms are assumed to be owned equally by all the consumers. They will maximize
profits taking the market prices (especially the market interest rate) as given.
266
12.1.2
Equilibrium
Since this is a two-period economy, we will be looking for a subgame perfect equilibrium.
Moreover, to simplify the discussion, let us focus on symmetric subgame perfect equilibria,
SSPE. An SSPE consists of an allocation of labor across firms, investment decisions for firms,
wages for both periods and an interest rate linking consumption between the two periods.
First, since all goods are symmetric, the first period labor market clearing is straightforward and we will have
l1 (i) = L for all i [0, 1]
(recall that the measure of sectors and firms is normalized to 1). This implies that
Y1 = L.
At date 2, the equilibrium will depend on how many firms have adopted the new technology.
Since we are looking at the symmetric equilibrium (SSPE), we only consider the two extremes
where all firms adopt and no firm adopts. In either case, again the marginal productivity of
all sectors are the same, so labor will be allocated equally, i.e.,
l2 (i) = L for all i [0, 1] .
Consequently, when the technology is not adopted, we have
Y2 = L
and when the technology is adopted by all the firms, we have
Y2 = L.
We now turn to the pricing decisions. In the first date, the designated producers have no
monopoly power because of the competitive fringe, thus they charge price equal to marginal
267

cost, which is w1 , and make zero profits. Since total output is equal to Y1 = L, this also
implies that the equilibrium wage rate is equal to
w1 = 1.
In the second date, if the technology is not adopted, the same situation repeats, and we have
w2 = 1
and no profits. In this case there is also no investment, so consumption at both dates is
equal to L, thus the interest rate that makes individuals happy to consume this amount in
both periods is
= 1 .
R
(12.3)
To see this more formally, recall that the standard Euler equation in this case is
C1 = RC2 ,
(12.4)
as given in (12.3).
which can only be satisfied with C1 = C2 , if the gross interest rate is R
Next consider the situation in which the designated producers have invested in the advanced technology. Now they can produce units of output with one unit of labor, while
the fringe of competitive firms still produces one unit of output with one unit of labor.
This implies that the designated producers have some monopoly power. The extent of this
monopoly power depends on the comparison of and .
Let us first find the demand facing each producer, which is given as a solution to the
following program of profit maximization for the final goods sector:
max
[y2 (i)]i[0,1]
y2 (i)
1
Z 1
di
p2 (i) y2 (i) di,
268

where p2 (i) is the price of intermediate i at date 2. The first-order condition to this program
implies
1/
y2 (i)1/ Y2
= p2 (i) ,
or
y2 (i) = (p2 (i)) Y2 .
(12.5)
This expression is useful in laying the foundations for the aggregate demand externalities,
which we will discuss soon; the demand for good i depends on the total amount of production,
Y2 . [However, you should ask yourself why this actually causes an externality; even with
perfectly competitive markets, the demand for my goods may depend on the supply of other
goods in the economy. So why is there an externality here?]
A nice feature of the demand curve implied by equation (12.5) is that it is iso-elastic
(i.e., the demand elasticity is constant). This will be a very convenient feature in many of
the models using this class of utility or production functions below.
To make more progress, first imagine the situation in which there is no fringe of competitive producers. In that case, each designated producer will act as an unconstrained
monopolist and maximize its profits given by price minus marginal cost times quantity, i.e.,
w2
y2 (i) .
2 (i) = p2 (i)
substituting from (12.5), the firm maximization problem is
w2
(p2 (i)) Y2 ,
max 2 (i) = p2 (i)
p2 (i)
which has a first-order condition
(p2 (i))
w2
Y2 p2 (i)
(p2 (i))1 Y2 = 0,
269

which implies
p2 (i) = pM
2
w2
.
1
This is the standard monopoly price formula of a markup related to demand elasticity over
the marginal cost, w2 /. Here the markup is constant because the demand elasticity is
constant.
However, the monopolist can only charge this price if the competitive fringe could not
enter and make profits stealing the entire market at this price. Since the competitive fringe
can produce one unit using one unit of labor, the monopolist can only charge this price if
1
1.
1
Otherwise, the price would be too high and the competitive fringe would enter. Let us
assume that is not so high as to make the monopolist unconstrained. In other words, let
us impose
Assumption 13
1
> 1.
1
Under this assumption, the monopolist will be forced to charge a limit price. It is
straightforward to see that this equilibrium limit price would be
p2 = w2 .
If it were any higher, the competitive fringe would enter, steal the whole market and make
positive profits. If it were any lower, the monopolist could increase its price without losing
the market, and thus increase its profits. This implies that under Assumption 13, each
monopolist would make per unit profits equal to
w2
w2
1
=
w2 .
270

The profits of firms are then obtained from substituting from (12.5) as:
2 =
1 1
w2 Y2 .
(12.6)
The wage rate can be determined from income accounting. Total production will be
equal to Y2 = L, and this has to be distributed between profits and wages, thus
1 1
w2 L + w2 L = L,
which has an equilibrium at

w2 = 1,
as in the case without the technological investments. Therefore, in this economy the increased
marginal product does not translate into higher wages. Instead, it leads to profits for firms.
Nevertheless, all of these profits are redistributed to the agents, who are the owners of the
firms. Thus C2 = L. However, because there was investment in the new technology at
date 1, C1 = L F . Again the interest rate has to adjust so that individuals are happy to
consume these amounts, i.e., so that they have a steep consumption profile without wanting
to borrow. The Euler equation, (12.4), now implies
(L) ,
(L F ) = R
which solves for
= 1
R
L
LF
(12.7)
> R.
Consequently, the interest rate in this case is higher than the one in which there is no
investment. This is natural, since investment implies that individuals are being asked to
forgo date 1 consumption for date 2 consumption. Note also that the greater is , the higher
271

since with a greater , there is less intertemporal substitution. Also a higher F , meaning
is R,
a greater consumption sacrifice at date 1 implies a higher interest rate.
The question is whether firms will find it profitable to undertake the investment at date
1. The reason for the possibility of multiplicity is that the answer to this question will depend
on whether other firms are undertaking the investment or not. Let us first take a situation
in which no other firm is undertaking the investment, and consider the incentives of a single
designated firm to undertake such an investment.
In this case total output at date 2 is equal to L (since the firm considering investment
Moreover, from (12.6) and the
is infinitesimal), and the market interest rate is given by R.
fact that w2 = 1, profits at date 2 are
N
2 =
1
L.
where the superscript N denotes that no other firm is undertaking the investment. Therefore,
the net discounted profits at date 1 for the firm in question is
1 1
L

R
1
L.
= F +
N = F +
Next consider the case in which all other firms are undertaking the investment. In this
case, profits at date 2 are
I2 = ( 1) L,
where the superscript I designates that all other firms are undertaking the investment.
Consequently, the profit gain from investing at date 1 is
1
( 1) L
L
( 1) L.
= F +
LF
I = F +
272

As discussed above, the idea of the paper by Murphy, Shleifer and Vishny (1989), similar to the ideas of many economists writing on development before them, was to generate
multiple equilibria, where one of the equilibria corresponds to backwardness, while the other
one corresponds to industrialization. In this context, this means that for the same parameter values both no investment in the new technology and all firms investing in the new
technology should be equilibria. This is only possible if we have
N < 0 and I > 0,
(12.8)
that is, when nobody else invests, investment is not profitable, and when all other firms
invest, investment is profitable. This is clearly possible because of the aggregate demand
externality, the fact that I > N ; when other firms invest, they produce more, there is
more aggregate demand, and therefore profits from having invested in the new technology
are higher. Counteracting this eect is the fact that the interest rate is also higher when all
firms invest. Therefore, the existence of multiple equilibria requires the interest rate eect
not to be too strong. For example, in the extreme case where preferences are linear, i.e.,
= 0, we have that
I = F + ( 1) L > N = F +
1
L,
so (12.8) is certainly possible. More generally, the condition for the existence of multiple
equilibria is that:
L
LF
( 1) L > F >
1
L.
(12.9)
It is also straightforward to see that whenever both equilibria exist, the equilibrium with
investment Pareto dominates the one without investment, since condition (12.9) implies that
all households are better o with the upward sloping consumption profile giving them higher
consumption at date 2.
273

This leads to the following result:
Theorem 40 Consider the above-described environment and suppose that Assumption 13
holds and condition (12.9) is satisfied. Then there exist two pure strategy SSPE, one in
which all firms undertake the investment at date 1 and the other one in which no firm does.
The equilibrium with investment Pareto dominates the equilibrium without investment.
Intuitively, multiple equilibria arise because (when) there is substantial aggregate demand at date 2 so that investing in the new technology at date 1 is profitable. In turn, there
will be substantial aggregate demand at date 2 when all firms invest in the new technology,
so that they are more productive and produce more at date 2. This intuition highlights the
importance of aggregate demand linkages. In fact, as noted above, these linkages take the
form of aggregate demand externalities. The reason why they take the form of externalities is
that the firm does not realize the full increase in the social product created by its investment,
because the monopoly markup implies that at the margin, further increases in output create
a first-order gain for consumers. The presence of the markup means that the monopolist
does not internalize this first-order gain, thus turning the demand linkages into aggregate
demand externalities.
One interpretation of this result is that societies that can somehow coordinate on the
equilibrium with investment (either because private expectations are aligned or because of
some type of government action) will industrialize and realize both economic growth and
Pareto improvement, and this corresponds to the big push ideas suggested by qualitative
accounts of the early development process, for example, that provided by economists such as
Nurske or Rosenstein-Rodan. Naturally, the model here is essentially a static one, so it does
not allow a literal interpretation of a society being first in the no investment equilibrium
274

and then changing to the investment equilibrium and thus industrializing. Nevertheless,
it is suggestive of such a process. Also, although the model makes it sound as if simple
government action, for example, in the form of subsidies to firms, might realize such a
big push, in practice government intervention is not easy, partly because it is not clear
which sectors need to be subsidized, and perhaps more importantly because government
interventions are often captured by interest groups, a topic that brings us to the political
economy of development and economic growth.
12.2
Human Capital Accumulation with Imperfect Capital Markets
The previous section illustrated the potential of development traps because of aggregate
demand externalities. Investment by dierent firms may require coordination, leading to
multiple equilibria. Underdevelopment may be thought to correspond to a situation in
which the coordination is on the bad equilibrium, and the development process starts with
the big push, changing the coordination to the high-investment equilibrium.
Similar issues arise, in a more dynamic way, when the economy is subject to credit market
problems. Moreover, credit market problems will illustrate how the distribution of income
(and the incidence of poverty) in a society might aect economic growth and the process of
economic development. I will illustrate these issues in the simplest possible way looking at
the eect of credit market problems on human capital investments.
275
12.2.1
A Simple Case With No Borrowing
When credit markets are imperfect, a major determinant of human capital investments will
be the distribution of income (as well as the degree of imperfection in the credit markets).
I start with a discussion of the simplest case with no borrowing (extreme credit market problems) to illustrate how the distribution of income will matter, and may also selfperpetuate.
Consider an economy with a continuum 1 of dynasties. Each individual lives for two periods, childhood and adulthood, and gets an ospring in his adulthood. There is consumption
only at the end of adulthood.
Preferences are given by
(1 ) log cit + log eit+1
where c is consumption at the end of the individuals life, and e is the educational spending
on the ospring of this individual. The budget constraint is
cit + eit+1 wti ,
where w is the wage income of the individual.
There are a number of important features embedded in this utility function:
1. Even though it is a very similar utility function to that we worked with in the overlapping generations model, now the utility function refers to the utility that an individual
obtains from his consumption and the indirect utility he obtains from leaving something to his ospring. In other words, this utility function features impure altruism
(sometimes referred to warm glow preferences): parents do not care about the utility
of their ospring, but simply about what they bequeath to them, here education.
276

2. It is logarithmic, which, as with the two-period overlapping generations model, will
lead to constant savings rates.
The labor market is competitive, and wage income simply depends on human capital:
wti = Ahit
Human capital of the ospring of individual i of generation t in turn is given by
(ei ) if ei 1
t
t
i
,
ht+1 =
h
if eit < 1
(0, 1) is some minimum level of human capital that the individual

where (0, 1) and h
will attain even without any educational spending. Once spending exceeds a certain level
(here set equal to 1), the individual starts benefiting from the additional spending and
accumulates further human capital (though with diminishing returns since < 1).
This equation introduces a crucial feature necessary for models of credit market imperfections to generate multiple equilibria or multiple steady states; a nonconvexity in the
technology of human capital accumulation.
The budget constraint of individual i of generation t is:
cit + eit wti .
Given this description, the equilibrium is straightforward to characterize. Each individual
will choose the spending on education that maximizes its own utility. This immediately
implies the following savings rate:
eit = wti = Ahit .
277
(12.10)

This rule has one unappealing feature (not crucial for any of the results), which is that
because parents derive utility from educational spending on their children, they will spend
on education even when eit < 1, in which case educational spendings are in fact wasted (do
not translate into higher human capital of the ospring).
To obtain stark results let us also assume that
A > 1 > Ah.
(12.11)
Now, let us look at the dynamics of human capital for a particular dynasty i. If at time
0, we have hi0 < (A)1 , then (12.10) implies that eit < 1, so the ospring will have hi1 = h.
< (A)1 , and repeating this argument, we have hi < (A)1
Given (12.11), we have hi1 = h
t
for all t. Therefore, a dynasty that starts with hi0 < (A)1 will never reach a human capital
level greater than h.
Next consider a dynasty with hi0 > (A)1 . Then from (12.11), we have hi1 = (Ahi0 ) >
1, so this dynasty will accumulate human capital and reach the steady state given by
h = (Ah ) or
h = (A) 1 > 1.
(as long as hi0 < h ; otherwise, the dynasty would have started with too much human capital
and would decumulate human capital).
The most important result is that this simple model features poverty traps due to the
nonconvexities created by the credit market problems.
It is interesting to contrast two economies subject to the credit market problems, but
with dierent distributions of income. For example, imagine an economy with two groups
starting at income levels h1 and h2 > h1 such that (A)1 < h2 . Now if inequality (poverty)
is high so that h1 < (A)1 , a significant fraction of the population will never accumulate
278

much human capital. In contrast, if inequality is limited so that h1 > (A)1 , all agents will
accumulate human capital, eventually reaching h .
An important implication of this model is that the distribution of income and how credit
markets work are important for human capital accumulation and the process of economic
growth. This model and the next one (with imperfect capital markets) are sometimes interpreted as implying that an unequal distribution of income will lead to lower output (and
growth), and the above example with two classes seems to support this conclusion. However, this is not a general result. For example, take the same economy with two classes,
now starting with h1 < h2 < (A)1 . In this case, neither group will accumulate human
capital, and redistributing resources away from group 1 to group 2 (thus increasing inequality), so that we push group 2 to h2 > (A)1 would increase human capital accumulation.
This is a general feature: in models with nonconvexities, there are no general results about
whether greater inequality is good or bad for accumulation and economic growth; it depends
on whether greater inequality pushes more people to below or above the critical thresholds.
12.2.2
The Galor and Zeira Model
Now let us allow borrowing in the model above. Each individual still lives for two periods.
In his youth, he can either work or acquire education.
The utility function of each individual is
(1 ) log cit + log bit ,
where again c denotes consumption at the end of the life of the individual. The budget
constraint is
cit + bit m
279

where m is the individuals income. Note that utility of the parent now depends on monetary
bequest to the ospring rather than the level of education expenditure. It will now be the
individuals themselves who will use the monetary bequests to invest in education. Also, the
logarithmic formulation will once again ensure a constant savings rate equal to .
Education is a binary outcome, and educated (skilled) workers earn wage ws while uneducated workers earn wu . The required education expenditure to become skilled is h, and
workers acquiring education do not earn the unskilled wage, wu , during the first period of
their lives.
Imperfect capital markets are modeled by assuming that there is some amount of monitoring required for loans to be paid back. This creates a wedge between the borrowing and
the lending rates. In particular, assume that there is a linear savings technology open to
all agents, which fixes the lending rate at some constant r. However, the borrowing rate is
i > r, because of costs of monitoring necessary to induce agents to pay back the loans.
Also assume that
ws (1 + r) h > wu (2 + r)
(12.12)
which implies that investment in human capital is profitable when financed at the lending
rate r.
Let us now consider an individual with wealth x. If x h, assumption (12.12) implies
that individual will invest in education. If x < h, then whether it is profitable to invest in
education are not will depend on the wealth of individual and the borrowing interest rate, i.
Let us now write the utility of this agent (with x < h) in the two scenarios, and also the
280

bequest that he will leave to his ospring. These utility levels and bequests are given by
Us (x) = log (ws + (1 + i) (x h)) + D
bs (x) = (ws + (1 + i) (x h)) ,
when he invests in education. And
Uu (x) = log ((1 + r) (wu + x) + wu ) + D
bu (x) = ((1 + r) (wu + x) + wu ) ,
when he chooses not to invest. D is a constant term.
Comparing these expressions we obtain that an individual likes to invest in education if
and only if
xf
(2 + r) wu + (1 + i) h ws
ir
The dynamics of the system can then be obtained simply by using the bequests of unconstrained, constrained-investing and constrained-non-investing agents.
More specifically, the equilibrium correspondence describing equilibrium dynamics is
b (x ) = (ws + (1 + r) (xt h))

if xt h
n t
xt+1 =
(12.13)
bs (xt ) = (ws + (1 + i) (xt h)) if h > xt f
b (x ) = ((1 + r) (w + x ) + w )
if xt < f
u
t
u
t
u
Equilibrium dynamics can now be analyzed diagrammatically by looking at the graph of
(12.13).
Note an important feature here. The correspondence (12.13) describes the behavior of
the wealth of each individual. However, the whole wealth distribution can also be studied
from (12.13). This is because dynamics in this economy are Markoviandescribed simply
by the Markov process without any general equilibrium interactions.
281

Now define g as the intersection of the equilibrium correspondence (12.13) with the 45
degree line, when the equilibrium correspondence is steeper than the 45 degree line. Such
an intersection will exist when the borrowing interest rate, i, is large enough.
All individuals with xt < g converge to the wealth level xU , while all those with xt > g
converge to the greater wealth level xS . As in the example without credit markets, there is a
poverty trap which attracts agents with low initial wealth. The distribution of income again
has a potentially first-order eect on the income level of the economy. If the majority of the
individuals start with xt < g, the economy will have low productivity, low human capital
and low wealth.
It is also clear that financial development should matter for human capital investments.
In an economy with better financial institutions, we may expect the wedge between the
borrowing rate and the lending rate to be smaller, i.e., i to be smaller given r. With a
smaller i, more agents will escape the poverty trap, and in fact the poverty trap may not
exist (there may not be an intersection between (12.13) and the 45 degree line where (12.13)
is steeper).
282
12.3
Learning-by-Doing, Structural Change and NonBalanced Growth
As mentioned above, an important element of the process of economic development, especially starting from the early stages of development, is that of structural change. Pretty
much all societies have started as agricultural economies, and have grown together with a
transformation of the economy, with the share of output of manufacturing (and services) increasing. The most standard reason for this is thought to be Engels law, which is the name
given to the feature that the budget share of food declines as individuals become richer.
Here I will outline a model by Matsuyama (1992), which incorporates both this feature
and the possibility of learning-by-doing as an important factor in economic growth.
12.3.1
Consider the following continuous time economy, consisting of two sectors: manufacturing
and agriculture. Both sectors produce using only labor. Population is constant and equal
to L = 1, and labor is supplied inelastically.
Technologies in the two sectors are given by the following diminishing returns production
functions
X M (t) = M (t) F (n (t)) F (0) = 0, F 0 > 0, F 00 < 0,
(12.14)
X A (t) = AG(1 n (t)), G(0) = 0, G0 > 0, G00 < 0,
(12.15)
where n (t) is the fraction of labor employed in manufacturing as of time t. This way of
writing the two production functions already imposes market clearing in the labor market.
Notice that agricultural productivity, A, is not indexed by time, hence it is constant.
283

Manufacturing productivity, M (t), is time-varying. In particular, as in the Romer
(1997) model discussed above, M (t) reflects knowledge accumulation taking place as a noninternalized byproduct of production. Moreover, Matsuyama assumes that this knowledge
accumulation benefits only from production in the manufacturing sector, for example, because greater production in manufacturing allows learning-by-doing in this sector, increasing
future productivity. More specifically, we have:
M (t) = X M (t) ,
(12.16)
where > 0 measures the extent of these learning-by-doing eects.

As in the Romer model, learning-by-doing eects are external to individual firms. Consequently, each firm will choose its labor demand in order to equate the value of the marginal
product to the wage rate, w (t). Assuming an interior solution, this implies,
w (t) = AG0 (1 n (t)) and w (t) = p (t) M (t) F 0 (n (t))
where p (t) is the relative price of the manufactured good (with the price of the agricultural
goods normalized to 1 as the numeraire). Therefore, market clearing implies:
AG0 (1 n (t)) = p (t) M (t) F 0 (n (t)).

The economy admits a representative consumer with preferences given by
Z
W =
log(cA (t) ) + log(cM (t)) exp (t) dt,
(12.17)
(12.18)
with , and > 0, and cA (t) denoting the consumption of the agricultural good and cM (t)
denoting the consumption of the manufacturing good at time t. The parameter is the
discount factor, and designates the importance of agricultural goods versus manufacturing
284

goods in the utility function. The parameter is the new one relative to models we have
seen so far and represents the subsistence level of food consumption. In particular, imagine
that if cA (t) does not exceed , the individual will obtain negative infinite utility (recall
log (negative number) is undefined).
The presence of > 0 makes preferences non-homothetic and implies that the income
elasticity of demand for agricultural goods will be less than unity (while that for manufacturing goods will be greater than unity). This is the simplest way of introducing Engels
law.
Let us also assume that
AG(1) > L > 0.
(12.19)
The first inequality states that the economys agricultural sector is productive enough to
provide the subsistence level of food to all consumersotherwise individuals would receive
negative infinite utility.
The budget constraint of consumers in each period is
cA (t) + p (t) cM (t) w (t) + (t)
where (t) is the profits per representative household.
12.3.2
Equilibrium
An equilibrium is defined in the standard way as a sequence of consumption levels in the

two sectors and allocation of labor between the two sectors at all dates, such that consumers
maximize their utility and firms maximize profits given prices, and goods and factor prices
are such that all markets clear.
285

Maximization of (12.18) implies that for each household, and thus for the entire economy,
we have
cA (t) = + p (t) cM (t) .
(12.20)
Since production has to be equal to consumption, we further have:

cA (t) = X A (t) = AG(1 n (t))
and
cM (t) = X M (t) = M (t) F (n (t))
Now combining these equations with (12.17) and (12.20) yields
(n (t)) = /A,
(12.21)
where
(n) G(1 n) G0 (1 n)F (n)/F 0 (n).
Moreover, we have
(0) = G(1), (1) < 0 and 0 () < 0.
The function (n) can be interpreted as the excess demand for manufacturing over
agriculture. An equilibrium has to satisfy (12.21). From Assumption (12.19) it is clear that
the equilibrium condition (12.21) has a unique interior solution in which
n (t) (0, 1) .
Since the right-hand side of (12.21) is decreasing in A, this solution can be written as a
function of agricultural productivity, A:
n (t) = v(A),
with v 0 (A) > 0.

286
(12.22)

This implies that the employment share of manufacturing is constant over time and
positively related to A. This is not in line with the patterns we observe in the data, where
the manufacturing share of employment also increases early on (and then declines while
the share of services increases). However, given the learning-by-doing aspect, it generates
another feature which is consistent with the empirical patterns in the data; the share of
manufacturing output (and consumption of manufacturing goods) increases relative to those
of agriculture.
In particular, given the learning-by-doing in equation (12.16), output in manufacturing
grows at a constant rate, F (v(A)), also positively related to A. This is an interesting
observation, and shows that the growth rate of output in manufacturing is positively related
to productivity in agriculture.
This observation is consistent with some historical accounts of the development process,
which emphasize how economies with high agricultural productivity were those that were
able to make the transition to manufacturing. In those accounts and in this model, the
reasoning is simple: manufacturing requires a suciently large size of employment to grow
rapidly (either for creating aggregate demand externalities or for learning-by-doing), and this
can only be achieved if agriculture is productive enough that sucient food can be produced
by a relatively small fraction of the workforce.
Since productivity and employment in agriculture are constant, aggregate food consumption and production stay constant at
cA = X A = AG(1 v(A)) = + AG0 (1 v(A))
F (v(A))
.
F 0 (v(A))
which is also increasing in A; this implies that higher agricultural productivity also increases
agricultural consumption. Therefore, this discussion leads to the following simple result:
287

Proposition 34 In the above described model, the combination of learning-by-doing and
Engels law generate a unique equilibrium in which the share of employment of manufacturing
and agriculture are constant, and manufacturing output and consumption grow faster than
agricultural output and consumption. The growth rate of real consumption of agriculture is
zero, while the growth rate of manufacturing output is F (v(A)).
Although this proposition shows that the real consumption of agricultural goods is constant (and that of manufacturing goods is increasing), the expenditure on agricultural goods
will not remain constant, because relative prices will change in favor of agricultural goods.
This is a general phenomenon, in fact independent of Engels law; sectors that experience
slower growth will also experience increases in their relative prices, because their output is
becoming scarcer in the economy.
288
Chapter 13
Interdependence and Growth in the
Open Economy
The analysis so far treated each country as a closed island, not interacting with the rest
of the countries in the world. This is clearly not the correct way to view the world. In
this chapter, we have a first look at some models of interdependences. First, I begin with
a model of technology transfer from an exogenously advancing world technology frontier.
Then, I discuss a model of technology transfer and trade. Finally, I look at how international
trade influences the process of economic growth, creating interdependences across growing
countries.
13.1
Human Capital and Technology (Nelson-Phelps)
The Nelson-Phelps model is the simplest model of technology diusion across countries, and
has proved a useful reduced-form model for many applications. In addition to its growth
289

applications, the Nelson-Phelps model also suggests a new role of human capital, dierent
from those emphasized by the Mincer equations we used in order to understand the role of
human capital in contributing to cross-country income dierences.
Imagine that there is a world technology frontier, T (t), advancing at an exogenous rate
g, i.e.,
T (t) = T (0) exp (gt) .
Countries can benefit from this world technology by incorporating it into their production
processes. But this is a human capital-intensive task. For example, a country needs highly
skilled engineers to adapt world technologies to their conditions, to fill key positions in the
implementation of these technologies and to train workers in the use of these new techniques.
Nelson and Phelps postulate
(hj ) (T (t) Aj (t))
A j (t)
=
,
Aj (t)
Aj (t)
(13.1)
where hj is the human capital in country j, which is assumed to be time invariant.

This equation states that the farther a country is from the world technology frontier, the
faster is its rate of progress, since there is more technology out there to be absorbed.
But also 0 (hj ) > 0 so that, the greater the human capital of a country is, the faster will
this convergence be.
The first implication of (13.1) is that
2 A j (t) /Aj (t)
> 0,
T (t) (hj )
so that human capital becomes more valuable when frontier technology is more advanced.
Second, note that although equation (13.1) is in terms of technological progress, it does
have a unique stable stationary distribution as long as (hj ) > 0 for all countries. In the
290

stationary state, all Aj (t)s will grow at the same rate g, and this stationary cross-country
distribution is given by
Aj (t) =
(hj )
T (t) .
g + (hj )
(13.2)
Suppose now that output in each country is proportional to Aj (t). Equation (13.2) then
implies that countries with low human capital will be poor, because they will absorb less of
the frontier technology.
This eect is in addition to the direct productive contribution of human capital to output, and suggests that human capital dierences across countries can be more important
in causing income dierences than calculations based on private returns to schooling might
suggest.
13.2
Trade and Technology Diusion
A more subtle and in many ways more useful model of technology transfer is that of Krugman
(1979), which is also useful for our purposes because it combines interdependences due to
technology transfer with those arising from international trade.
13.2.1
The Basic Krugman Model
Consider two sets of economies, North and South. All individuals in all countries have the
same Dixit-Stiglitz preferences with love for variety given by
C=
c (i)
di
where c (i) is the consumption of the ith good, M is the total number of goods that will be
determined endogenously, and > 1 is the elasticity of substitution between these goods.
291

There is free international trade between countries.
Goods fall into two categories: new goods are just invented in the North and can only be
produced there; old goods have been invented in the past and their production technology
has been transferred to the South, so they can be produced both in the South and in the
North.
One worker produces one unit of any good to which the country in which he is located
has access to.
Workers in the North have access to all goods, but workers in the South only have access
to old goods. It is important to emphasize that when producing old goods, Northern
workers have no productive advantage. Their only advantage (and the only dierence in
technology) arises because they have access to a larger set of goods.
There can be two types of equilibria. In the first equilibrium, there are suciently few
new goods that both workers in the South and the North will produce some of the old goods,
and in this case, both new goods and old goods will command the same price, and incomes
in the North and South will be the same (why?).
Another possibility is that the South specializes in the production of old goods, while
the Northern producers specializes in the production of new goods. In this case prices and
wages will satisfy
pS = wS
pN = wN > wS
where pS is the price of the old goods produced in the South, and pN denotes the price of new
goods produced in the North. There will now be income dierences arising from technology
dierences across countries.
292

When will we be in this full specialization regime? To answer this question, note that
from the first-order condition of consumers the relative consumption of new and old goods
have to satisfy
cN
=
cS
pN
pS
wN
wS
(13.3)
Full specialization implies that
cN =
LN
LS
and cS =
MN
MS
where LN is total labor force in the North, and LS is the total labor force in the South. MN
is the total number of new goods (produced in the North) and MS is the total number of
old goods. Combining this with (13.3) we obtain
wN
=
wS
LN MS
LS MN
For this type of equilibrium to exist, we need wN /wS > 1, and this situation is also drawn
diagrammatically in the next figure as the intersection of the relative demand curve for
Northern labor with the relative supply curve at LN /LS . Note that wN /wS > 1 corresponds
to an intersection when the relative demand services downward sloping. Instead the flat
portion of the relative demand curve corresponds to the case where there is no full specialization, and hence some of the old goods are produced in the North (and wN /wS = 1).
293
So if there is a suciently large technology gap between the North and South, Northern
wages and incomes will be higher.
What determines the number of new and old goods? Krugman developed a model to
analyze this, formalizing an idea due to Vernon on the product cycle across countries.
In particular, suppose that new goods are created according to the following Poisson
process
M = iM
and these goods are imitated by the South slowly, according to the Poisson process
M S = tMN
and recall that
M = MS + MN
In steady state, we need the number of new and old goods to grow at the same rate, i.e.,
294
Then
M S /MS = M/M.
MS
t
= .
MN
i
Relative wages can be obtained as:
wN
=
wS
LN t
LS i
In this economy, relative utility and relative incomes per capita are simply proportional to
relative wages.
It is straightforward to check that as i, the rate of creation of new technologies, increases
wages (and incomes) in the North relative to the South will increase. As the rate of imitation,
t, increases the North becomes relatively poor.
13.2.2
Understanding the Eects of Trade
Next, consider a variation on this model without international trade. In this case, the
number of goods produced and consumed in each country will dier. Standard arguments
give incomes in the North and the South as
1
0
wN
= M 1 and wS0 = MS1
So relative wages and incomes in steady state will now be
1
0
wN
i 1
= 1+
wS0
t
The relative income dierences are typically larger now. For example, to illustrate this
point, consider the case in which
LN t
LS i
> 1. Then, there will be no specialization, and the
South and the North will have the same level of income when there is international trade
295

between the North and the South. In contrast, without international trade, the North will
be richer.
Intuitively, trade between the North and the South enables the Southern consumers to
consume goods that they did not have access to, eectively increasing their real incomes.
In the absence of trade, technology dierences will typically matter more! (But not always!
Why?)
13.3
Trade, Specialization and the World Income Distribution
Perhaps the most major source of interaction between countries is through international
trade. A number of papers investigate how international trade aects the process of economic
growth and creates interdependences across countries. One example is Acemoglu and Ventura
(2002), who develop a tractable framework for analyzing cross-country income dierences
that incorporates international trade. Here I outline a version of that model. An additional
lesson from this model is that the stability of the world income distribution and findings
of conditional convergence do not necessary rule out endogenous growth (recall that these
patterns were used as evidence against endogenous growth models).
13.3.1
The Model
Consider a world economy consisting of a continuum of small countries with mass 1. There
is a continuum of intermediate products indexed by z [0, M], and two final products that
are used for consumption and investment. There is free trade in intermediate goods and no
296

trade in final products or assets.
Countries dier in their technology, savings and economic policies. For example, country
j will be defined by its characteristics (j , j , j ), where is an indicator of how advanced
the technology of the country is, is its rate of time preference, and is a measure the
eect of policies and institutions on the incentives to invest. The joint distribution of these
characteristics is denoted by G(, , ) and is assumed to be time invariant.
All countries admit a representative consumer with utility function:
Z
ln c(t) exp (t) dt ,
(13.4)
where c(t) is consumption at date t in the (, , )-country (this is the same as the CRRA
preferences we have used so far with 1).
The budget constraint of the representative consumer is
pI k + pC c = y rk + w,
(13.5)
where pI and pC are the prices of the investment and consumption goods, k is capital stock,
r is the rental rate, and w is the wage rate, and also total wage income, since population
in each country is normalized to 1. There is no depreciation of capital. Since there is no
international trade in assets, income, y, must equal to consumption, pC c, plus investment,
pI k.
Specialization is introduced as follows: is assumed to be the number of intermediates
produced by the (, , )-country, with
Z
(j) dG (j) = M,
where I have explicitly introduced the j to emphasize that these refer to country j, but I
will drop this notation below and talk of a representative country.
297

A higher level of corresponds to the ability to produce a larger variety of intermediates,
so we interpret as an indicator of how advanced the technology of the country is. In all
countries, intermediates are produced by competitive firms using a technology that requires
one unit of capital to produce one unit of any intermediate that belongs to that country.
Each country also contains many competitive firms in the consumption and investment
goods sectors with unit cost functions:

1
ZM
BC (w, r, p (z)) = w(1)(1 ) r(1 ) p(z)1 dz ,
(13.6)
1
M
Z
BI (r, p (z)) = 1 r1 p(z)1 dz ,
(13.7)
where p(z) is the price of the intermediate with index z.
There are a number of noteworthy features introduced with these unit cost functions:
1. Labor is only used in the production of consumption goods. This is a convenient way of
introducing endogenous growth following Rebelo (1991)the accumulation equation
is linear.
2. I have written the unit cost functions for convenience. The underlying production
functions are quite similar. For example, the investment good would be produced as
follows

M
Z
1
I = BKI1 xI (z) dz
where B is a normalizing constant, KI is capital used in the production of the investment good, and xI (z) is the quantity of the zth intermediate good used in the
production of the investment good.
298

3. The parameter is the share of intermediates in production and it will also turn out
to be the ratio of exports to income (i.e., a measure of openness).
4. The parameter is the elasticity of substitution among the intermediates and also
the price-elasticity of foreign demand for the countrys products. The inverse of this
elasticity is often interpreted as a measure of the degree of specialization. Assume that
> 1, ruling out immiserizing growth, that is, the country becoming poorer despite
accumulating more.
5. The parameter corresponds to an inverse measure of the distortions aecting investment (this corresponds to the tax distortions modeled as s in Jones (1995) and Chari,
Kehoe and McGrattan (1997)).
13.3.2
Equilibrium
Consumer maximization of (13.4) subject to (13.5) yields the following first-order condition
r (t) + pI (t) pC (t)
c (t)
=+
,
pI (t)
pC (t)
c (t)
(13.8)
and the transversality condition:

pI (t) k (t)
exp (t) = 0.
t pC (t) c (t)
lim
(13.9)
Equation (13.8) is the standard Euler equation and requires the rate of return to capital,
r + pI
pC
, to equal the rate of time preference plus the slope of the consumption path.
pI
pC
The only dierence from the familiar version of the Euler equation is that, as in the twosector extended AK economy discussed above, now the rate of return to savings includes the
relative change in the price of investment goods compared to consumption goods, since by
299

investing in one unit of investment good today, an individual will receive income tomorrow
which will be spent on consumption goods, whose price may have changed. Thus the term
pI pC
is the adjustment for this change in relative prices.

pI pC
Equation (13.9) is the transversality condition. Integrating the budget constraint and
using the Euler and transversality conditions, the optimal rule is found to be to consume a
fixed fraction of wealth:
pC (t) c (t) = pI (t) k (t) +
Z
0
t

Z
r (s) + pI (s)
w (v) exp
ds dv .
pI (s)
(13.10)
Next consider firm maximization. The price of any variety of intermediate produced in
the (, , )-country is equal to:
p (t) = r (t) .
(13.11)
Choose the ideal price index for intermediates as the numeraire, i.e.,
ZM
0
p(z)
dz =
p1 dG = 1.
(13.12)
Since all countries export practically all of their production of intermediates and import the
ideal basket of intermediates, this choice of numeraire implies that p is also the terms of
trade of the country, i.e. the price of exports relative to imports.
The conditions for price to equal marginal cost for the consumption and investment
sectors imply:
pC = w(1)(1 ) r(1 ) ,
(13.13)
pI = 1 r1 .
(13.14)
300

Finally, we need to impose market clearing for capital and labor as well as trade balance.
By Walras law, one of these is redundant, and we drop market clearing for capital. Trade
balance requires
y = p1 Y.
where Y
(13.15)
ydG is world income. Intuitively, each country imports a fraction of its
output, y, and exports p1 Y. Equation (13.15) implies that when the number of varieties,
, is larger, a given level of income y is associated with better terms of trade, p, and higher
rental rate of capital, since
r = p.
Intuitively, a greater implies that for a given level of aggregate capital stock, there will be
less capital allocated to each variety of intermediate, so each will command a higher price in
the world market. Conversely, for a given , a greater relative income y/Y translates into
lower terms of trade, p, and a lower rental rate, r.
Market clearing for labor is also straightforward.
Labor demand comes only from
the consumption goods sector, and given the Cobb-Douglas assumption, this demand is
(1 ) (1 ) times consumption expenditure, pC c, divided by the wage rate, w. So the
market clearing condition for labor is:
1 = (1 ) (1 )
pC c
.
w
(13.16)
Finally, because (13.16) implies labor income, w, is always proportional to consumption

expenditure, the optimal consumption rule, (13.10), can be simplified to:
pC c =
pI k.
1 (1 ) (1 )
301
(13.17)

The state of the world economy is described by a distribution of capital stocks. This
distribution of capital stocks can be obtained from the law of motion of the capital stock of
each country:
k
= r ,
k
(13.18)
(this law of motion simply follows from the budget constraints of the representative consumer,
(13.5), combined with equilibrium conditions in (13.17)).
In addition, the market clearing conditions also imply that for each country:
rk + w = r
(rk + w)dG.
(1 ) (1 )
w
=
.
rk + w
[ + (1 ) ] r + (1 ) (1 )
(13.19)
(13.20)
For a given cross-section of rental rates, the set of equations in (13.18) determine the
evolution of the distribution of capital stocks. For a given distribution of capital stocks, the
set of equations in (13.19) and (13.20) determine the cross-section of rental rates.
It can now be shown that the world economy has a unique and stable steady state in
which all countries grow at the same rate.
Define the world growth rate as x Y /Y , and the relative income of a (, , )-country
= y/y
as yR y/Y . Then, setting the same growth rate for all countries, i.e., k/k
= x , the
steady-state cross-section of rental rates are:
1/
+ x
r =
Moreover:
yR
=
+ x
302
(13.21)
(13.22)

Z
+ x
dG = 1.
(13.23)
Equation (13.22) describes the steady-state world income distribution and states that
rich countries are those which are patient (low ), create incentives to invest (high ), and
have access to better technologies (high ). Equation (13.23) implicitly defines the steadystate world growth rate.
This discussion establishes:
Proposition 35 In the above-described world economy, there exists a unique steady state
equilibrium in which all countries grow at the same rate x defined by (13.23), but have
unequal levels of income, terms of trade and rates of return on capital. The terms of trade
and the rental return on capital for each economy is given by (13.21) and the relative position
of each country in the world income distribution is given by (13.22).
The implications of this model and this proposition are described next.
13.3.3
Implications
The important implications of this analysis are:

1. There is a stable world income distribution, despite the fact that in the absence of international trade, each country would grow at dierent rates (e.g., consider the limiting
case where = 0).
2. So why is there a stable world income distribution here? The reason is due to changes
in relative prices. In the open economy, when a country accumulates more capital, it
is supplying more of the goods that it produces to the world economy, experiencing a
303

decline in its terms of trade. This reduces the return to capital and discourages further
accumulation. When = 0, this relative price eect is absent, and each country grows
at a dierent rate determined by its technology, distortions and savings rate.
3. Dierences in saving rates or distortions can have much larger eects than those implied
by the standard neoclassical model. The strength of these eects depend on and ,
and they become arbitrarily large as or as = 0. The first of these is the
Heckscher-Ohlin limit, in which there are no decreasing returns coming from relative
price changes. The second is the closed economy case, with standard endogenous
growth, where small dierences will translate into infinitely large level dierences (since
they imply dierences in growth rates).
4. In the meantime, the share of capital in GDP is independent of this, determined largely
by the share of consumption investment goods in income.
13.4
Growth with Factor Price Equalization
The above model incorporated trade between countries together with terms of trade eects.
An alternative would be to incorporate trade assuming that each country is a small open
economy. This is done in Ventura (1997). If each country is within the cone of diversification,
this means there is factor price equalization, and thus each country takes factor prices as
given.
Imagine the world rate of return to capital is equal to r ; there is no trade in financial
assets (only in goods, which equalizes factor prices), and each country has identical preferences given by our standard CRRA formula. This implies that consumption growth in all
304

countries will be given by
1
cj
= (r ) .
cj
However, now imagine countries dier according to their patience, i.e., discount rate, j , as
we allowed in the previous model. Then the above equation becomes
cj
1
r j .
=
cj
In this case, more patient countries will have lower initial consumption but higher consumption growth, and therefore they will accumulate more capital and invest in their own country.
Ultimately, the more patient countries will become much richer. This process will end either
when the world moves out of the cone of diversification, or one country produces almost all
of the output of the world economy.
In fact, this feature that with given prices, the more patient country will ultimately
become much richer than the rest of the world is more general than the open economy model
outlined here. In a closed economy with individuals that have dierent discount rates, those
with smaller discount rates (greater patience) will ultimately become much richer than the
rest. In general, we tend to assume that all individuals have the same discount rates in order
to ensure a stable income distribution within a country.
305
306
Part IV
Endogenous Technological Change
307

Until now, we have investigated models of economic growth of exogenous or endogenous
variety, but growth was never a result of the actual process of technological change. Either
growth was exogenous, or it was sustained because of linear technology of accumulation,
or growth took place as a byproduct of knowledge spillovers. Much more attractive are
models in which growth is a consequence of technological change, and technological change
is a consequence of purposeful investments by individuals. These models not only allow us
to talk about the endogenous rates of technological progress, but they make contact with
industrial organization models of technology, innovation, anti-trust, R&D policy etc., and
also enable us to discuss issues of directed technical change. These models will be discussed
in the next few chapters.
Before going into details of the specific models, a general principle of this class of models
is useful to highlight. As originally noted by Arrow (1962), and as assumed so far in all of
the models we studied, knowledge is, in essence, a non-excludable and non-rival good. Once
an idea about how to produce a new good or how to improve the productivity of a certain
process is out there, many individuals and firms will have access to it, unless explicitly
prohibited. Moreover, the fact that I am making use of a particular idea does not preclude
other people from doing so, making knowledge not only non-excludable but also non-rival.
This observation creates a problem in constructing models of purposeful innovation. In fact,
as noted by Arrow, why would a competitive firm invest upfront resources to improve the
production technology if other firms will also benefit from this improvement (and it will still
end up making zero profits)? Romers (1997) model we studied above tried to avoid this
problem by making knowledge accumulation endogenous, but not a purposeful activity. It
was a byproduct, an externality, created by production.
Endogenous technological change models are explicitly about making knowledge accu309

mulation endogenous. They break the paradox pointed out by Arrow by introducing monopolistic competition and patent rights. In particular, we will now be looking at models
of monopolistic competition, where a firm that invents a new machine, a new product or
a new production process will be protected under either a patent law or because nobody
else will be able to replicate this invention without the specific know-how of the inventor.
Such protection will enable the inventor to become a monopolist. The monopoly profits
the inventor expects will, in turn, stimulate research and induce firms to make the upfront
investments to improve productivity and generate growth.
This insight that monopoly rights are important for innovation, which also goes back to
Schumpeter, will be central to the models that follow, but it will also imply that private and
social incentives for innovation will not be typically aligned.
310
Chapter 14
Expanding Variety Models
The simplest models of endogenous technological change are those in which the variety of
inputs used by firms increases (expands) over time as a result of R&D undertaken by research
firms. The key is that the R&D is purposeful, undertaken for profits, and it leads to an output
that increases the productivity of existing factors.
Two versions of essentially the same model could be used. In the first, research leads
to the invention of new goods, and individuals have love-for-variety, so they derive greater
utility when they have more goods available, so real income increases. In the second, which
is the one I will use here, it is the variety of machines that expand (because of invention
of new varieties), and a greater variety of machines leads to greater division of labor,
increasing the productivity of final good firms.
In all of these models, and also in the models of quality competition we will see below,
we will use the Dixit-Stiglitz constant elasticity structure.
311
14.1
The Lab-Equipment Model of Growth with Product Varieties
We start with a particular version of the growth model with expanding varieties of inputs
and an R&D technology such that only output is used in order to undertake research. This
is sometimes referred to as the lab equipment model, since all that is required for research
is additional investment in more equipment in labs etc.
14.1.1
Imagine an infinite-horizon economy in continuous time admitting a representative household

with preferences
Z
0
C (t)1 1
exp (t) dt.
1
(14.1)
Throughout I suppress time dependence when this causes no confusion. There is no population growth.
The unique consumption good of the economy is produced with the following aggregate
production function:
1
Y =
1
N
1
k(v)
dv L
(14.2)
where L is the aggregate labor input, N denotes the dierent number of varieties of capital
inputs, and k (v) is the total amount of capital (machine) of input type v. The term (1 )
in the denominator is included for notational simplicity. Notice that for given N, which final
good producers take as given, equation (14.2) exhibits constant returns to scale. Therefore,
final good producers are competitive and subject to constant returns to scale, justifying our
use of the aggregate production function to represent their production possibilities set.
312

We simplify the analysis by assuming that the capital inputs are just like intermediate
goods and they immediately depreciate after being used (thus it may be easier to think of
them as intermediate goods instead of capital, though the machine interpretation may be
nice for certain purposes).
The budget constraint of the economy is
C +I +X Y
(14.3)
where I is investment and X is expenditure on R&D, which is for now assumed to come out
of the total supply of the final good. (Other models of R&D will be discussed below).
Assume that the creation of new inputs takes place as follows:
N = X,
(14.4)
and the economy starts with some initial technology stock N (0) > 0.
This implies that greater spending on R&D leads to the invention of new inputs. There
is no uncertainty in this process, at least at the aggregate level. One may want to think that
there is uncertainty at the individual level, but with many dierent research labs undertaking
such expenditure, at the aggregate level, equation (14.4) holds deterministically.
The important point is that R&D expenditure expands the potential set of capital/machine
varieties.
A firm that invents a new capital variety is the sole supplier of that type of machine,
and sets its price (v) to maximize profits. The demand for capital of type v is obtained
by maximizing (14.2). Namely, simply considering the aggregate production function, the
maximization problem for inputs is:
Z N
Z N
1
1
k(v) dv L
(v) k(v)dv wL.
max
[k(v)]lv[0,N ] ,L 1
0
0
313
(14.5)

Recall that machines depreciate fully after use, so (v) is also the user cost of machines,
which is incorporated in the expression above.
The first-order condition with respect to k (v) for any v [0, N] yields the demand for
machines from the final good sector. These demands take the convenient isoelastic form:
L
k(v) =
(v)
1/
(14.6)
Assume also that, once the blueprint of a particular input is invented, the research firm
can create one unit of that machine at marginal cost equal to units of the final good.
Now consider the monopolist owning a machine of type invented at time t. This
monopolist chooses an investment plan and a sequence of capital stocks so as to maximize
the present discounted value of profits starting from time t, as given by
Z s
Z
exp
r () d [(, s)k(, s) k(, s)] ds
V (, t) =
t
(14.7)
where r (t) is the market interest rate at time t. Alternatively, assuming that the value
function is dierentiable in time, this could be written as a dynamic programming equation
of the form
r (t) V (, t) V (, t) = (, t)k(, t) k(, t).
14.1.2
(14.8)
Digression on Continuous Time Value Functions
To see why (14.8) follows from (14.7), you should think of the principle of optimality again
(now in continuous time rather than discrete time). In particular, rewrite (14.7) at time t
as:
V (, t) =
Z
exp
Z
r () d ((, s) ) k(, s)ds+
314
Z
exp
r () d [(, s)k(, s) k(

which is just an identity for any t. For suciently small t, this can be written as
V (, t) ' t ((, t) ) k(, t) + exp (r (t) t) V (, t + t)
0 ' t ((, t) ) k(, t) + exp (r (t) t) V (, t + t) exp (r (t) 0) V (, t),
whereexp (r (t) 0) = 1. Now divide both sides by t and take the limit t 0, which
makes the approximation exact, giving
exp (r (t) t) V (, t + t) exp (r (t) 0) V (, t)
= 0.
t0
t
((, t) ) k(, t) + lim
When the value function is dierentiable in time, this is equivalent to
(exp (r (t) t) V (, t + t))

((, t) ) k(, t) +
= 0.
t
t=0
thus, applying the chain rule,
((, t) ) k(, t) r (t) V (, t) + V (, t) = 0,

which is identical to (14.8).
14.1.3
Characterization of Equilibrium
Since (14.6) defines isoelastic demands, the solution to the maximization problem of the
monopolist involves setting the same price in every period,
(, t) =
,
1
that is, all monopolists charge a constant rental rate, equal to a mark-up over the marginal
cost. Without loss of generality, normalize the marginal cost of machine production to
(1 ), so that
(, t) = = 1
315

Profit-maximization also implies that each monopolist rents out the same quantity of machines in every period, equal to
k (v, t) = L,
(14.9)
(v, t) = ((, t) ) k (v, t) = L,
(14.10)
and makes profits
implying that all monopolists sell exactly the same amount, charge the same price and make
the same amount of profits.
Substituting (14.6) and the machine prices into (14.2), we obtain
Y (t) =
1
N (t) L.
1
(14.11)
This is the major equation of the expanding product or input variety models. It shows
that even though the aggregate production function is constant returns to scale from the
viewpoint of final good firms which take N as given, for the overall economy, there are
increasing returns to scale and increases in the variety of machines, N, increase the productivity of output. In particular, (14.11) makes it clear that if N increases at the constant
rate, so will output per capita.
Similarly, the labor decision of the final good sector, from the first-order condition of
maximizing (14.5) with respect to L, implies the following equilibrium condition
w (t) =
N (t) .
1
(14.12)
Finally, there is free entry into research. This implies that at all points in time we must
have
V (, t) = 1,
316
(14.13)

where V (, t) is given by (14.7). Recall that one unit of final good spend on R&D leads to
the invention of units of new inputs, each making profits given by (14.7).
Naturally, this free entry condition may be violated if research is so unprofitable that
nobody wants to enter, so it should really be written as a complementary slackness condition
with
V (, t) 1, X (v, t) 0 and (V (, t) 1) X (v, t) = 0,
but for the relevant parameter values there will be entry and economic growth (though just
technological change), so we simplify the exposition by writing it in the form of (14.13).
14.1.4
An equilibrium in this economy is described as a sequence of consumption and R&D decisions,
[C (t) , X (t)]
t=0 such that given the price path [r (t) , w (t)]t=0 , the representative household
is maximizing its utility given by (14.1), capital demands by the final goods sector satisfy
(14.9), the wage rate is given by (14.12), and the value of each monopolist, V (, t), satisfies
(14.7) and (14.13).
14.1.5
Steady State
Let us start with the steady state. In the steady state, the value of an invention will be
constant, thus V = 0, and also the interest rate will be constant, i.e., r (t) = r (where I
again use stars to denote BGP/steady-state values). Substituting this in either (14.7) or
(14.8), we obtain
V =
where is the (constant) flow of net profits per period, given by (14.10) above.
317
(14.14)

For there not to be further incentives to undertake R&D, we need one unit of final good
spent for R&D to generate exactly the same discounted value. Therefore, the no entry (free
entry) condition (14.13) can be expressed as:
L
=1
r
This equation pins down the steady-state interest rate, r , as:
r = L
From consumer maximization, in particular from the standard Euler equation, we also have
that the rate of growth of consumption, gc , is given by
gc =
1
C
= (r )
C
(14.15)
and in steady state, the rate of growth of the economy is the same as the rate of growth of
consumption, so we have that the whole economy grows at the rate g = gc .
Therefore, given the steady-state interest rate we can simply determine the long-run
growth rate of the economy as:
g =
1
(L )
(14.16)
Since this is a growing economy, we need to ensure that the transversality condition is
satisfied in equilibrium. As usual, this requires r > g (since there is no population growth),
i.e.,
(1 ) L < ,
(14.17)
which we assume holds.

Notice that there is a scale eect here: the larger is L, the greater is the growth rate. The
scale eect comes from the increasing returns to scale nature of the technology of the model
318

of endogenous technical change (this is a point related to the non-rival nature of knowledge,
emphasized in Romer, 1990). I will return to the issue of the scale eect further below.
This discussion establishes:
Proposition 36 In the above-described expanding input-variety model of endogenous technological change, there exists a unique steady state in which technology, output and consumption all grow at the same rate given by (14.16).
14.1.6
It is also straightforward to see that there are no transitional dynamics in this model. To
see this, let us go back to the value function for each monopolist. Substituting for profits,
this gives
r (t) V (, t) V (, t) = L.
Free entry gives
V (, t) = 1.
Dierentiating this with respect to time immediately implies V (, t) = 0, which is only
consistent with r (t) = r for all t, thus
r (t) = L for all t.
This establishes:
Proposition 37 In the above-described expanding input-variety model of endogenous technological change, with initial technology stock N (0) > 0, there is a unique equilibrium path
in which technology, output and consumption always grow at the rate g as in (14.16).
319

In other words, exactly as in the AK model, the economy always grows at a constant
rate. At some level this is not surprising, since the derived equation for output, (14.11), is
essentially a linear AK production function.
14.1.7
Pareto Optimal Allocations
The presence of monopolistic competition implies that the competitive equilibrium is no

longer Pareto optimal. There is a version of the aggregate demand externalities we saw in
the static context in previous lectures. It is straightforward to set up the problem of the
social planner and derive the optimal growth rate. To do this, notice that the social planner
will also use the same quantity of all types of machines in production, but because of the
absence of a markup, this quantity will be dierent. The social planner will also take into
account the eect of an increase in the variety of inputs on the overall productivity in the
economy, which monopolists could not because they do not capture the full surplus from
inventions.
More explicitly, given N, the social planner will choose
Z N
Z N
1
1
k(v) dv L
k(v)dv wL,
max
[k(v)]lv[0,N ] ,L 1
0
0
which only diers from the private maximization problem because the marginal cost of
machine creation, , is used. Recalling that 1 , this implies
ks (v) =
L
(1 )1/
thus
(1 )(1)/
Y (t) =
N (t) L
1
= (1 )1/ N (t) L.
320

Recall that the aggregate budget constraint is
C (t) + I (t) + X (t) Y (t) .
Let
Y n (t) Y (t) I (t)
be net output, after the costs of machines are subtracted (recall that it is net output that is
distributed between R&D expenditure and consumption). We have that
Z N(t)
1/
n
Y (t) = (1 )
N (t) L
ks (v, t) dv
0
1/
= (1 )
N (t) L (1 )(1)/ N (t) L
= (1 )1/ N (t) L.
Now, given this and (14.4), the maximization problem of the social planner can be written
as
max
Z
0
C (t)1 1
exp (t) dt
1
subject to
N (t) = (1 )1/ N (t) L C (t) .
In this problem, N (t) is the state variable, and C (t) is the control variable.
Let us set up the current-value Hamiltonian
1
h
i
1
(N, C, ) = C (t)
+ (t) (1 )1/ N (t) L C (t) .
H
1
The necessary conditions are
C (N, C, ) = 0 = C (t) = (t)

H
N (N, C, ) = (t) (t) = (t) (1 )1/ L
H
lim N (t) (t) et = 0.
321

Combining these conditions, we obtain the following growth rate for consumption in the
social planners allocation:
C
1
1/
=
(1 )
L ,
C
(14.18)
which can be directly compared to the growth rate in the decentralized equilibrium, (14.16).
The comparison boils down to that of
(1 )1/ to ,
and it is straightforward to see that the former is always greater since (1 )1/ > 1 by
virtue of the fact that (0, 1). This implies that the socially-planned economy will always
grow faster than the decentralized economy. Intuitively, the social planner values innovation
more, because it will be able to use the machines more intensively after innovation, since
the monopoly markup reducing the demand for machines is absent in the social planners
allocation.
This establishes:
Proposition 38 In the above-described expanding input-variety model, the decentralized
equilibrium is not Pareto optimal, and always grows less than the allocation that would maximize utility of the representative household.
14.1.8
Policy in the Endogenous Technology Model
The divergence between the decentralized equilibrium and the socially planned allocation
introduces the possibility that there might be Pareto-improving interventions. The most
natural alternatives to consider in this model are two:
322

1. Subsidies to Research: by subsidizing research, the government can increase the growth
rate of the economy, and this can be turned into a Pareto improvement if taxation is
not distortionary and there can be appropriate redistribution of resources so that all
parties benefit.
2. Subsidies to Capital Inputs: the problem also arises from the fact that the decentralized
economy is not using as many units of the machines/capital inputs (because of the
monopoly markup); so subsidies to capital inputs given to final good producers would
also be useful in increasing the growth rate.
Moreover, it is noteworthy that as in the first-generation endogenous growth models, a

variety of dierent policy interventions, including taxes on investment income and subsidies
of various forms will have growth eects not just level eects in this framework.
Naturally, once we start thinking of policy in order to close the gap between the decentralized equilibrium in the Pareto optimal allocation, we also have to think of the objectives
of policymakers and this brings us again to political economy issues. For that reason, rather
than go into a detailed discussion of optimal policy, I simply note the gap between the decentralized equilibrium and the Pareto optimal allocation, leaving you to draw your own
conclusions about what the implications of this gap will be.
I will discuss some of the implications of dierent types of competition policies and
intellectual property rights policies further below.
323
14.2
Growth with Knowledge Spillovers
In the model of the previous section, growth resulted from the use of final output for R&D.
This is similar, in some way, to the endogenous growth model of Rebelo (1991), since the
accumulation equation is linear in accumulable factors. As a result, we saw that, in equilibrium, output took a linear form in the stock of knowledge (new machines), thus a AN form
instead of the Rebelos AK form.
An alternative is to have scarce factors used in R&D. In other words, instead of the
lab-equipment, we now have scientists as the key creators of R&D. In this case, there will not
be endogenous growth, unless there are knowledge spillovers from past R&D. In other words,
now current researchers need to stand on the shoulder of past giants. In fact, the original
formulation by Romer (1990) was exactly of this knowledge-spillovers form, imposing the
standing on the shoulders of giants as part of the technological possibilities frontier of the
economy.
A typical formulation in this case is
N = NLR
(14.19)
where LR is labor allocated to R&D. The term N on the right-hand side captures spillovers
from the stock of existing ideas. The greater is N, the more productive is an R&D worker.
LR could be skilled workers as in Romer (1990), or scientists or regular workers. In the
latter case, there will be competition between the production sector and the R&D sector for
workers, and the marginal cost of workers and research would be given by the wage rate and
production sector. In particular, the free entry condition is now
N (t) V (v, t) = w (t)
324

where N is on the left-hand side because it parameterizes the productivity of an R&D worker
from (14.19), and V (v, t) is again given by (14.7) above, while the flow cost of undertaking
research is hiring workers for R&D, thus the wage rate w (t).
In the model I outlined in the previous section, the equilibrium wage rate was derived
as (recall equation (14.12)):
N (t)
1
w (t) =
So the steady-state free-entry condition, with a constant steady-state (balanced growth path)
interest rate, r , becomes
N (t)
L
N (t)
=
r
1
Hence the steady-state equilibrium interest rate is

r = (1 ) L.
Now using the Euler equation of the representative household, we have
gc
C
1
= ((1 ) L ) .
C
(14.20)
The rest of the analysis is unchanged. In particular, the growth rate of technology and
output are also given by (14.20). Also, there are again no transitional dynamics, and we can
also compare the decentralized equilibrium to the Pareto optimal allocation. It is also useful
to note that there is again a scale eect heregreater L increases the interest rate and the
growth rate in the economy.
This discussion immediately establishes:
Proposition 39 In the above-described expanding input-variety model with knowledge spillovers,
there exists a unique balanced growth path equilibrium in which, technology, output and con325

sumption grow at the same rate given by (14.20) starting from any initial level of technology
stock N (0) > 0.
14.2.1
The Role of Competition Policy
Since we now have a model with monopolistic competition, we can also relate the results
to standard issues in industrial organization, such as competition policy, anti-trust, patents
etc.. For example, in this model we can introduce a fringe of competitive firms which could
limit the markup that each monopolist can charge. For example, recall that the optimal
markup that the monopolist charges is
=
.
1
Imagine, instead, that a fringe of competitive firms can copy the innovation of any monopolist, but they will not be able to produce at the same level of costs (because the inventor
has more know-how). In particular, suppose that instead of a marginal cost , they will
have marginal cost of with > 1. If > 1/ (1 ), this fringe is not a threat to the
monopolist, since the monopolist could set its ideal, profit maximizing, markup and the
fringe would not be able to enter without making losses. However, if < 1/ (1 ), the
fringe would prevent the monopolist from setting its ideal monopoly price. In particular in
this case the monopolist would be forced to set a limit price, exactly equal to
= .
(14.21)
This price formula follows immediately by noting that, if the price of the monopolist were
higher than this, the fringe could undercut and make profits, since their marginal cost is
equal to . If it were above this, the monopolist could further increase its price without
326

losing any customers to the fringe and make more profits. Thus, there is a unique equilibrium
price given by (14.21).
When the monopolist charges this limit price, its profits per unit would be
profits per unit = ( 1) = ( 1) (1 ) ,
which is less than , the profits per unit that the monopolist made in the absence of the
competitive fringe.
What is the implication of this on the rate of economic growth? It is straightforward to
work out that in this case the economy would grow at a slower rate. For example, in the
baseline model with the lab-equipment technology, this growth rate would be
1 1/
(1)/
g =
( 1) (1 )
L ,
which is less than (14.16). Therefore, in this model, somewhat counter-intuitively, greater
competition, which reduces markups (and thus static distortions), also reduces long-run
growth. This is because profits are important in this model to encourage innovation by new
research firms. If these profits are cut, incentives for research are also reduced. Of course,
welfare is not the same as growth, and some degree of competition reducing prices below the
unconstrained monopolistic level might be useful for welfare depending on the discount rate
of the representative household. Essentially, with a lower markup, households are happier
in the present, but suer slower consumption growth. The exact tradeo between these two
opposing eects depends on the discount rate of the representative household.
Another similar application is to that of patent policy. In practice, patents are for
limited durations. In the baseline model, we assumed that patents are perpetual; once a
firm invents a new good, it has a patent forever and it becomes the monopolist for that good
327

forever. If patents are enforced strictly, then this might rule out the competitive fringe from
competing, restoring the growth rate of the economy to (14.16). Also, even in the absence of
the competitive fringe, we can imagine that once the patent runs out, the firm will cease to
make profits on its innovation. In this case, it can easily be shown that growth is maximized
by having as long patents as possible. Again there is a tradeo here between the equilibrium
growth rate of the economy and the static level of welfare.
But more important than these trade-os between growth and level is the fact that
these models are the most basic models, so do not feature some of the potential benefits
of competition. For example, competitive pressure from other firms might encourage faster
innovation. We will see this issue in Problem Set 6.
14.3
Growth without Scale Eects
As we have seen, the models used so far feature a scale eect in the sense that a larger
population, L, translates into a higher interest rate and a higher growth rate. This is
problematic for three reasons as argued in a series of papers by Chad Jones:
1. Larger countries do not necessarily grow faster (though the larger market of the United
States or European economies may have been an advantage during the early phases of
the industrialization process).
2. The population in general is not constant, but growing. If we have constant population
growth as in the standard neoclassical growth model, e.g., L (t) = exp (nt) L (0), these
models would not feature a balanced growth path. Instead, growth would become faster
and faster over time, eventually leading to an infinite output in finite time, violating
328

the transversality condition.
3. In the data, we see the total amount of resources devoted to R&D increases steadily,
but this has not been associated with an increase in the growth rate.
These observations have motivated Jones (1995) to suggest the following modification
of the baseline model. Population at time t is L (t) and grows at the constant rate n (i.e.,
L (t) = nL (t)). All agents have the standard CRRA preferences
Z
exp (t)
C 1 1
dt,
1
(14.22)
where C is consumption defined over the final good of the economy. This good is produced as
before, more specifically, with the production function, (14.2) and all the other assumptions
are the same as before.
New goods are produced by allocating workers to the R&D process as in the knowledgespillovers model studied in the previous section. However, now there are limited knowledge
spillovers, in particular,
N (t) = N (t) LR (t)
(14.23)
where < 1 and LR is labor allocated to R&D. So labor market clearing requires
LE (t) + LR (t) = L,
(14.24)
where LE (t) is the level of employment in the production sector. The fact that not all
workers are in the production sector implies that the aggregate output of the economy (by
an argument similar to before) is given by
Y (t) =
1
N (t) LE (t) ,
1
329

and profits of monopolists from selling their machines is
(t) = LE (t) .
The key assumption for the model is that < 1. The case where = 1 is the one
analyzed in the previous section, and as commented above, with population growth this
would lead to an exploding path, leading to infinite utility. However, the model is well
behaved when < 1.
In particular, let us focus on the BGP (steady state), where a constant fraction of
workers are allocated to R&D, the interest rate and the growth rate are constant. In this
BGP allocation, we have the following free-entry condition:
N (t)
LE (t)
N (t) ,
= w (t) =
r
1
where the wage is again substituted from (14.12). This implies

N (t)1
(1 ) LE (t)
= 1.
r
Now dierentiating this condition with respect to time, we obtain

N (t) L E (t)
+
= 0.
( 1)
N (t) LE (t)
Since in BGP, the fraction of workers allocated to research is constant, L E (t) /LE (t) = n.
This implies that the BGP growth rate of technology is given by
n
N (t)
=
.
gN
N (t)
1
(14.25)
From equation (14.11), this implies the total output grows at the rate gN + n. But now there
is population growth, so consumption per capita gross at the rate
gc = gN
n
.
gc =
1
330
(14.26)

Consequently, this model generates sustained growth in income per capita as well, and does
so in the presence of population growth. More interestingly, in order to achieve this growth
rate, it allocates more and more of the labor force to R&D. The reason for this is that the
technology for creating new ideas, (14.23), only features limited spillovers, thus to maintain
sustained growth more resources need to be allocated to R&D. The result is summarized in
the next proposition:
Proposition 40 In the above-described expanding input-variety model with limited knowledge spillovers as given by (14.23), starting from any initial level of technology stock N (0) >
0, there exists a unique balanced growth path equilibrium in which, technology and consumption per capita grow at the rate (14.25), and output grows at rate gN + n.
This type of model is sometimes referred to as semi-endogenous growth, because while
there is sustained growth, the per capita growth rate of the economy given in (14.26) is
determined only by population growth and technology and does not respond to taxes or
other policies. Some papers in the literature have attempted to develop models of endogenous
growth without scale eects, but where economic growth still responds to policies, though
this normally requires a combination of restrictive assumptions.
331
332
Chapter 15
Models of Quality Competition
15.1
Baseline Model
In the model of expanding machine variety, dierent machines were complements in production. However, in practice when a better computer comes to the market, it replaces previous
models. This is captured in the models of vertical quality competition or quality improvement, such as the models in Aghion and Howitt, or Grossman and Helpman. Population
and labor supply are again constant at L. The major dierence from the previous setup is
that the production function is now
1
Y (t) =
1
1
1
q(v, t)k(v, t)
dv L
(15.1)
where q(v, t) is the quality of machine v at time t and because now the number of varieties
is constant, I have normalized it to 1. Consequently, while in the previous section growth
took place because the variety of inputs expanded, here it takes place because existing inputs
become more productive. In many ways, this seems to describe the growth process better,
333

and it also has a nice Schumpeterian flavor of creative destruction. When a better vintage
of a particular machine is created, it replaces (destroys) the existing vintage.
The rest of the setup is the same as before. In particular, as in the baseline endogenous
technological change model, there is no population growth. Instead, the population and
labor supply is fixed at L. The economy admits a representative household with preferences
given by the standard CRRA form, (14.1).
To invent a new machine, firms undertake R&D on an existing machine (of type v). If a
firm spends qz units of the final good for R&D on a machine of quality q, then it has a flow
rate z of inventing a new machine, with quality q. Notice that the cost of undertaking
R&D is proportional to the quality of the machine on which the firm is working. This is
natural. Without this assumption, R&D would become more and more profitable over time,
leading to an explosive path.
The new machine will take over the market for this type of capital, but unless is very
large, it will have to charge a limit price in order to exclude the previous leader. This is
similar to the discussion of the limit price above (which led to equation (14.21) there). I
assume that is not too large, so we will observe limited prices in equilibrium. Also, assume
that the marginal cost of production is q for a machine of quality q.
One issue here, which was absent in the expanding input variety model, is whether
the existing leader will undertake R&D and innovation. In the expanding input variety
model, this was irrelevant, since machines could not be improved upon, so there was only
R&D for new machines, and who undertook them was not important. Here, in contrast,
existing machines can be (and are) improved, and this is the source of economic growth. But
also, incentives to undertake such innovations may dier between the incumbent monopolist
and entrants. A major insight here comes from Arrow (1962), who noted the presence
334

of the replacement eect; the incumbent would be replacing its own machine, and thus
destroying the profits that it is already making. In contrast, a new entrant does not have
this replacement calculation in mind. As a result, with the same technology of innovation, it
will always be the entrantsnew firms who do R&D in this model. This is an attractive
implication, since it creates a real sense of creative destruction or churning. Of course
in practice we see established big firms undertake innovation. This might be because the
technology of innovation diers between incumbents and new potential entrants, or there is
only a limited number of new entrants. One of the questions in Problem Set 6 will get you
to work through a model along these lines.
Following the same analysis as before, the demand for machines are now
k(v, t) = [q(v, t)/(v, t)]1/ L.
(15.2)
Let us normalize = 1 , so the monopolist sets the price (v, t) = q(v, t), and sells
k (v) = L. This generates profits
(v, t) =
1
Lq(v, t)
(15.3)
Substituting (15.2) into (15.1), we obtain total output as

Y (t) =
1
Q (t) L
1
Q (t) =
where
q(v, t)dv
is the average total quality of machines.
The value of being the inventor is dierent now, because this position will not last forever.
More formally, the standard dynamic programming equation now becomes:
r (t) V (v, t) V (v, t) = (v, t) x(v, t)V (v, t)
335

where x(v, t) is the rate at which new innovations occur in sector v at time t. When this
event occurs, the existing monopolist loses its monopoly position and is replaced by the
monopolist of the higher-quality machine. From then on, it receives zero profits, and thus
has zero value.
In the balanced growth path x(v, t) will be constant across dierent types of goods and
over time, and let us denote it by x .
Note that there is an immediate relationship between the innovation rate, x , and the
BGP growth rate, g , given by:
g = ( 1) x .
This simply follows from the fact that on average, growth occurs because there are new and
better machines, and those increase output by a factor 1.
Free entry into R&D implies that
V (v, t) = q(v, t).
(15.4)
Otherwise, there will be entry into or exit from research, since one more unit of the final
good provides a flow rate of obtaining V .
In steady state, V (v, t) = 0. So, dropping time and sector dependence and using stars
again to denote BGP values, we have
V
=
r + x
r + g /( 1)
q( 1)2 L
=
= 1 q.
[( 1)r + g ]
=
where the penultimate equality follows from substituting for profits from (15.3), and the last
equality follows from free entry condition (15.4).
336

Moreover, the Euler equation (14.15) still applies, so combining those, we have that in
steady state, r = g + , so
( 1)2 L
=1
[( 1) (g + ) + g ]
therefore
g =
or rearranging,
1
( 1)2 L ( 1) .
(( 1) + 1)
1
g =
( + 1/ ( 1))
( 1)
L .
(15.5)
This establishes:
Proposition 41 In the above-described quality-improvement model, there exists a unique
balanced growth path equilibrium in which output and consumption grow at the same rate
given by (15.5). The rate of innovation is g / ( 1).
15.2
Pareto Optimality
This equilibrium, like that of the endogenous technology model with expanding input varieties, is not, generally, Pareto optimal. But in fact, this can be because there is too little
or too much innovation. The reason why there is too little innovation is the same as the
model in the previous section: a monopolist does not sell as many units of the new machines
as the social planner would like, and does not fully internalize the benefits accruing to final
good producers (and the economy) from further innovation. However, counteracting this
there is the business stealing eect coming from the Schumpeterian nature of the model; a
new innovation steals the profits of the existing monopolist. This tends to induce entrants
337

to do too much R&D, even when R&D has small social returns, because it enables them
to become the monopoly producers of a new machine, thus becoming the claimant of the
natural monopoly power accruing to the leader in a particular line of machines.
The analysis of Pareto optimality is straightforward here because of the parallel between
the structure of this model to that with expanding input variety. In particular, it is immediate
to see that a social planner would choose demands for machines as
ks (v) =
L
= 1/ L,
1/
given the assumption that in this case = 1 . This implies that total output, under the
socially-planned economy, is equal to
Y (t) =
(1)/
Q (t) L.
(1 )
Recall again that the aggregate budget constraint is

C (t) + I (t) + X (t) Y (t) .
It is once more useful to work in terms of net output, which is defined as Y n (t) Y (t)I (t)
as in the expanding variety model, and we have
(1)/
Q (t) L
Y (t) =
(1 )
n
=
=
q (v, t) ks (v, t) dv
(1)/
Q (t) L (1)/ Q (t) L
(1 )
(1)/
Q (t) L.
1
(15.6)
Finally, note that given the assumptions above, the social planner faces an aggregate
technology frontier of the form
Q (t) = ( 1) X (t) ,
338

since an R&D spending of Q (t) X (t) will lead to discoveries of better vintages at the flow
rate of , each of these vintages increases average quality of machines by a proportional
amount 1.
Now, given this equation, the maximization problem of the social planner can be written
as
max
Z
0
C (t)1 1
exp (t) dt
1
subject to
(1)/
Q (t) = ( 1)
Q (t) L ( 1) C (t) ,
where the constraint uses net output, (15.6), and the budget constraint.
In this problem, Q (t) is the state variable, and C (t) is the control variable.
Let us again set up the current-value Hamiltonian
"
#
1
(1)/
C
(t)
(Q, C, ) =
H
+ (t) ( 1)
Q (t) L ( 1) C (t) .
1
1
The necessary conditions are
C (N, C, ) = 0 = C (t) = ( 1) (t)
H
(1)/
N (N, C, ) = (t) (t) = (t) ( 1)

H
lim N (t) (t) et = 0.
Combining these conditions, we obtain the following growth rate for consumption in the
social planners allocation:
1
C
=
C
!
(1)/
( 1)
L .
1
339
(15.7)

Comparing this to (15.5), we can see that either could be greater. For example, when is
small, we see that g > g S , so that there is too much innovation in the decentralized economy
relative to the social optimum.
This illustrates the contracting influences of the standard underinvestment and the business stealing eect discussed above. In particular:
Proposition 42 In the above-described qualityimprovement model, the decentralized equilibrium is not Pareto optimal, and may grow less or more rapidly than the allocation that
would maximize the utility of the representative household.
340
Chapter 16
Directed Technical Change
The framework analyzed so far assumed technical change to be neutral towards dierent factors, and in fact, in most applications, we limited ourselves to the Cobb-Douglas production
function.
Technical change is often not neutral towards dierent factors of production, and the
elasticity of substitution between dierent factors is often found not to be equal to 1.
So it is important to consider the implications of more general production functions,
and think of endogenizing technology and technological dierences within this more general
framework. There are, however, reasons for economists focus on Cobb-Douglas production
function. The most important one is that a general production function, associated with
arbitrary technological progress, does not generate balanced growth. Instead, with a nonCobb-Douglas production function, balanced growth requires all technical change to be laboraugmenting. Therefore, once we abandon the Cobb-Douglas production function, we need to
develop a theory of why technical change is purely labor-augmenting, and a more generally
think about various biases in the nature of technical change.
341

Do we have reason to think that biased technical change is important? The answer appears to be yesthere are many examples of systematic biases in technical change. For example, the consensus among labor and macroeconomists is that technical change throughout
the 20th century has been skill-biased. There is also a possible acceleration in skill-biased
technical change during the past 25 years. In contrast, evidence suggests that technical
change change during the 19th century may have been, at least in part, skill-replacing.
This reasoning leads to the following major question: What explains these various biases
and the direction of technical change?
Let us consider a model in which profit incentives determine what type of technologies
are developed. When developing technologies complementing a particular factor (say skilled
workers) is more profitable, more of these technologies will be developed. Whether the
development of these technologies makes aggregate technology more skill-biased or not will
depend on the elasticity of substitution between this factor and the rest.
What determines the relative profitability of developing dierent technologies?
1. The price eect: there will be stronger incentives to develop technologies when the
goods produced by these technologies command higher prices.
2. The market size eect: it is more profitable to develop technologies that have a larger
market. The importance of market size in innovation was much emphasized by the
famous scholar of innovation, Jacob Schmookler (1966), who, for example, argued:
invention is largely an economic activity which, like other economic activities, is
pursued for gain;... expected gain varies with expected sales of goods embodying the
invention.
342
16.1
Basics and Definitions
16.1.1
Definitions
First consider what factor-augmenting and factor-biased technical change correspond to.
For this purpose, take the standard the constant elasticity of substitution (CES) production
function
i
h
1
1 1
+ (1 ) (AZ Z)
,
y = (AL L)
where L is labor, and Z denotes another factor of production, which could be capital or
skilled labor.
Here (0, ) is the elasticity of substitution between the two factors.
AL is labor-augmenting (labor-complementary) and AZ is Z-complementary. The relative
marginal product of the two factors:
MPZ
1
=
MPL
AZ
AL
1
1
Z
.
L
(16.1)
This implies that when > 1, i.e., when the two factors are gross substitutes, AL is laborbiased and AZ is Z-biased. In contrast, when < 1, i.e., when the two factors are gross
complements, AZ is labor-biased and AL is Z-biased.
16.1.2
Basic Model
Now we are in a position to consider a simple model of directed technical change. Assume
that preferences are again given by the CRRA function
Z
0
C 1 1
exp (t) dt.
1
343
(16.2)

The budget constraint:
h 1
1 i 1
C + I + X Y YL + (1 )YZ
(16.3)
In words, the output aggregate is produced from two other (intermediate) goods, YL and
YZ , with elasticity of substitution . Here Y can either be interpreted as the final good
aggregated from the two intermediates, YL and YZ , or Y could be an index of utility defined
over the two final goods, YL and YZ . Total output is again distributed between consumption,
C, spending on machines, I, and spending on R&D, X.
The fact that there is R&D spending signifies that I will use the lab-equipment model
to expose the basic ideas, but exactly the same results apply with the knowledge-spillovers
model.
Intermediate good production functions are:
Z NL
1
1
xL (j, t)
dj L ,
YL (t) =
1
0
and
1
YZ (t) =
1
NZ
xZ (j, t)
dj Z .
(16.4)
(16.5)
Note here that the range of machines used with the two sectors are dierent (there are
two disjoint sets of machines, though we use the index j to denote either for notational
simplicity).
Assume that machines to both sectors are supplied by technology monopolists. This is
a straightforward generalization of the endogenous technical change model of product variety
discussed above.
Each monopolist sets a rental price L (j, t) or Z (j, t) for the machine it supplies to the
market. These prices are potentially time-varying, but we will see that they will be constant
in equilibrium.
344

The marginal cost of production is the same for all machines and normalized to 1
in terms of the final good.
Price taking implies the following maximization problem for sector L firms at time t:
Z NL
max pL (t) YL (t) wL (t) L
L (j, t) xL (j, t) dj,
(16.6)
L,{xL (j,t)}
This gives machine demands as
pL (t)
xL (j, t) =
L (j, (t))
Similarly
pZ (t)
xZ (j, t) =
Z (j, t)
1/
1/
L.
Z,
(16.7)
(16.8)
Since the demand curve for machines facing the monopolist, (16.7), is iso-elastic, the
profit-maximizing price will be a constant markup over marginal cost. In particular, all
machine prices will be given by
L (j, t) = Z (j, t) = 1 for all j and t.
These imply that
xL (j, t) = [pL (t)]1/ L for all j,
and
xZ (j, t) = [pZ (t)]1/ Z for all j.
Substituting these into (16.4) and (16.5), we obtain
YL (t) =
1
1
[pL (t)] NL (t) L
1
and
YZ (t) =
1
1
[pZ (t)] NZ (t) Z
1
345

Profits of technology monopolists at time t are then obtained as
L (t) = [pL (t)]1/ L and Z (t) = [pZ (t)]1/ Z.
(16.9)
Let VZ and VL be the net present discounted values of new innovations. Then in steady
state, we have that (dropping time dependence):
1/
1/
p L
p Z
VL = L
and VZ = Z
.
r
r
(16.10)
The comparison of these two values is of crucial importance. The greater is VZ relative
to VL , the greater are the incentives to develop Z-complementary machines, NZ , rather than
NL .
This highlights the two eects on the direction of technical change that I mentioned
above:
1. The price eect: a greater incentive to invent technologies producing more expensive
goods.
2. The market size eect: a larger market for the technology leads to more innovation.
The market size eect encourages innovation for the more abundant factor.
It is straightforward from the final good production function given in (16.3) that the
relative price of good Z to good L will be given by
1
pZ
1 YZ
p
=
pL
YL
1
1 NZ Z
1
p
=
NL L
346

Substituting for relative prices into the steady state (BGP) value functions, relative profitability is obtained as:
VZ
=
VL
NZ
NL
1 1
Z
.
L
(16.11)
where
( 1) (1 ) .
is the (derived) elasticity of substitution between the two factors. An increase in the relative
factor supply, Z/L, will increase VZ /VL as long as > 1 and it will reduce it if < 1.
Therefore, the elasticity of substitution regulates whether the price eect dominates the
market size eect.
Note also that we have
1 1
So the two factors will be gross substitutes when the two goods in utility function (or the
two intermediates in the production of the final good) are gross substitutes.
We have so far characterized the demand for new technologies. Next we have to determine
the supply of all new technologies, which will be, in part, regulated by the technological
possibilities for generating new machine varieties. Suppose as in the analysis above that new
machines in the two sectors are produced by investing in lab equipment:
N L = L XL and N Z = Z XZ ,
(16.12)
where X denotes R&D expenditure.

This gives the following steady-state technology market clearing condition:
L VL = Z VZ.
347
(16.13)

Then, the steady-state relative physical productivities can be solved for
1
NZ
1
Z
=
,
NL
(16.14)
where the *s denote that this expression refers to the steady-state value
Before going further, using the same type of analysis as before, we can characterize the
equilibrium in this economy. Because there are two state variables now, the economy features
transitional dynamics, but still has a unique balanced growth path. These are stated in the
next proposition (and left for you to prove):
Proposition 43 In the directed technical change model described here, there exists a unique
balanced growth path equilibrium in which the relative technologies are given by (16.14), and
consumption and output grow at the rate

1
1 1
1
g=
(1 ) ( Z Z)
+ ( L L)
.
Starting from any NL (0) > 0 and NZ (0) > 0, the economy converges to this balanced growth
path.
More interesting than the aggregate growth rate of the economy in this case is how the
direction of technical change aects relative factor prices and how it responds to changes in
relative supplies. The study this issue, recall that relative factor prices are given by
NZ
wZ
= p1/
=
wL
NL
NZ
NL
1
1
Z
.
L
(16.15)
First, the relative factor reward, wZ /wL , is decreasing in the relative factor supply, Z/L.
Second, the same combination of parameters,
1
,
which determines whether innovation
for more abundant factors is more profitable also determines whether a greater NZ /NL i.e.,
a greater relative physical productivity of factor Z increases wZ /wL .
348

When > 1, greater NZ /NL increases wZ /wL , but when < 1, it has the opposite eect.
This implies that irrespective of whether is greater than or less than one, an increase in Z/L
will change NZ /NL in a direction that increases the relative reward to factor Z, i.e., wZ /wL .
To capture this notion, let us define weak endogenous (relative) bias as the phenomenon that
an increase in the relative supply of a factor changes technology in a direction that benefits
the factor that is becoming more abundant.
This discussion, together with the definition of weak endogenous bias, establishes:
Proposition 44 In the above-described directed technical change model, there is always weak
endogenous (relative) bias, meaning that an increase in Z/L always causes relatively Z-biased
technical change.
Relative factor rewards are
wZ
= 1
wL
2
Z
.
L
(16.16)
Comparing this equation to the relative demand for a given technology, we see that the
response of relative factor rewards to changes in relative supply is always more elastic in
(16.16) than in (16.15) as implied by Proposition 44.
This is simply an application of the LeChatelier principle, which states that demand
curves become more elastic when other factors adjust, but with a new interpretationthat
is, the relative demand curves become flatter when technology adjusts.
The more important and surprising result here is that if is suciently large, in particular if > 2, the relationship between relative factor supplies and relative factor rewards
can be upward sloping. Let us refer to a situation in which an increase in the relative supply
of a factor changes technology so much that the relative price of the factor becoming more
349

abundant increases as strong endogenous (relative) bias. Therefore, the analysis so far has
established:
Proposition 45 In the above-described directed technical change model, if > 2, there is

strong endogenous (relative) bias in the sense that an increase in Z/L raises the relative
marginal product and the relative wage of the Z factor compared to the L factor.
16.1.3
Implications
Let us now consider the implications of this simple model of directed technical change, and in
particular of Propositions 44 and 45. One of the most interesting applications is to changes
in the skill premium. For this application, imagine that Z = H stands for skilled workers,
for example, college-educated workers. In the United States labor market, the skill premium
has shown no tendency to decline despite a very large increase in the supply of college educated workers. On the contrary, following a brief period of decline during the 1970s in the
face of the very large increase in the supply of college-educated workers, the skill (college)
premium has increased very sharply throughout the 1980s and 1990s, to reach a level not
experienced in the postwar era. The following figure shows the general patterns by plotting
the college premium and the relative supply of college graduate workers in the United States
since WWII.
350

College wage premium
Rel. supply of college skills

.8
.6
.5
.4
.4
.2
.3
Rel. supply of college skills
College wage premium
.6
0
39
49
59
69
year
79
89
96
Relative Supply of College Skills and College Premium
In the labor and macro literature, the most popular explanation for these patterns is skillbiased technological change. For example, the computers or the a new IT technologies are
argued to favor skilled workers relative to unskilled workers. But why should the economy
adopt and develop more skill-biased technologies throughout the past 20 years, or more
generally throughout the entire 20th century? This question becomes more relevant once we
remember that during the 19th century many of the technologies that were fueling economic
growth, such as the factory system and the major spinning and weaving innovations, were
skill-replacing rather than skill-complementary.
Thus, in summary, we have the following stylized facts:
1. Secular skill-biased technical change increasing the demand for skills throughout 20th
century.
351

2. Possible acceleration in skill-biased technical change over the past 25 years.
3. Many skill-replacing technologies during the 19th century.
The current model, in particular, Theorems 44 and 45, gives us a way to think about
these issues.
Recall that if > 2, then the long-run relationship between the relative supply of skills
and the skill premium is positive. With an upward sloping relative demand curve, or simply
with the degree of skilled bias endogenized, we have a natural explanation for all of the
patterns mentioned above.
1. The increase in the number of skilled workers that has taken place throughout 20th
century is predicted to cause steady skill-biased technical change.
2. Acceleration in the increase in the number of skilled workers over the past 25 years is
predicted to induce an acceleration in skill-biased technical change.
3. Large increase in the number of unskilled workers available to be employed in the
factories during the 19th century could be expected to induce skill-replacing/laborbiased technical change.
In addition, this framework with endogenous technology also gives a nice interpretation
for the dynamics of the college premium during the 1970s and 1980s. It is reasonable to
presume that the equilibrium skill bias of technologies, NH /NL , is a sluggish variable determined by the slow buildup and development of new technologies. In this case, a rapid
increase in the supply of skills would first reduce the skill premium as the economy would
be moving along a constant technology (constant NH /NL ) curve in the figure. After a while
352

the technology would start adjusting, and the economy would move back to the upward
sloping relative demand curve, with a very sharp increase in the college premium. This approach can therefore explain both the decline in the college premium during the 1970s and
the subsequent large surge, and relates both to the large increase in the supply of skilled
workers.
Relative Wage
Long-run Rel Wage

Initial Rel Wage
Long-run relative
demand for skills
Short-run
Response
Exogenous Shift in
Relative Supply
If on the other hand we have < 2, the long-run relative demand curve will be downward sloping, though again it will be shallower than the short-run relative demand curve.
Then following the increase in the relative supply of skills there will be an initial decline in
the skill premium (college premium), and as technology starts adjusting the skill premium
will increase. But it will end up below its initial level. To explain the larger increase in the
353

1980s, in this case we need some exogenous skill-biased technical change. The next figure
draws this case.
Relative Wage
Initial Rel Wage
Long-run relative
demand for skills
Long-run Rel Wage

Short-run
Response
Exogenous Shift in
Relative Supply
16.2
Equilibrium Technology Bias: Some More General Results
The above model derived the relative bias results by assuming a constant elasticity of substitution production function. In fact, the spirit of the results are much more general. The
following proposition generalizes these results:
354

Proposition 46 Consider the above economy with two-factors, (Z, L) R2+ , and two
factor-augmenting technologies, (AZ , AL ) R2+ , such that the production function is F (AZ Z, AL L).
Assume that F is twice continuously dierentiable, concave and homothetic in its two arguments, and that the costs of producing technologies AZ and AL , C (AZ , AL ), is also twice
continuously dierentiable, strictly convex and homothetic in AZ and AL . Denote the first
derivatives of C (AZ , AL ) by CZ and CL . Let be the (local) elasticity of substitution between
ln(CZ (AZ ,AL )/CL (AZ, AL ))

ln(Z/L)
. Finally, supZ and L defined by = ln(w
, and let =
ln(AZ /AL )
Z /wL ) AZ
AL
L
and denote equilibrium technologies by (A , A ),
pose that factor supplies are given by Z,
Z
L
L,
A , A and wL Z,
L,
A , A . Then we have that
and equilibrium factor prices by wZ Z,
Z
L
Z
L
L
:
for all Z,
1
ln (AZ /AL )
=
(16.17)
ln (Z/L)
1 +
and

L,
A , A /wL Z,
L,
A , A ln (A /A )
ln wZ Z,
Z
L
Z
L
Z
L
0
ln (AZ /AL )
ln (Z/L)
so that there is always weak relative equilibrium bias. Moreover,

L,
A , A /wL Z,
L,
A , A
d ln wZ Z,
2
Z
L
Z
L
=
,
d ln (Z/L)
1 +
(16.18)
(16.19)
so that there is strong relative equilibrium bias if 2 > 0.

Proof. The proof is provided in Acemoglu (2005).
In this environment, therefore, the condition for strong equilibrium bias is > 2+, thus
more restrictive than the previous model. This is because costs of creating new technologies
are convex. In models with knowledge spillovers, there are typically nonconvexities in the
creation of new technologies as well. For example, invention of skill-biased technologies today
may make further invention of skill-biased technologies easier, as in the standard building on
355

the shoulders of giants specification. In that case, Acemoglu (2002) shows that the condition
for an upward-sloping relative demand curve (i.e., strong relative equilibrium bias) is in fact
> 2 0 , for some other parameter 0 > 0 measuring the extent of this nonconvexity.
Thus in general, strong equilibrium bias requires sucient substitutability between factors,
with the exact threshold depending on the structure of costs (or the technology possibilities
frontier of the economy).
16.3
Endogenous Labor-Augmenting Technological Change
One of the advantages of the models of directed technical change is that they allow us to
investigate why technological change might be purely labor-augmenting as required for balanced growth. Here I outline a model which generates this results (though under somewhat
more restrictive assumptions than the directed technical change results we have seen so far).
16.3.1
Consider an economy consisting of L unskilled workers who work in the production sector,
and S scientists who perform R&D. The distinction between unskilled workers and scientists is adopted to ensure that the production and R&D sectors do not compete for workers.
The economy again admits a representative consumer with the usual constant relative risk
aversion (CRRA) preferences:
Z
0
C (t)1 1
exp (t) dt
1
(16.20)
where C (t) is consumption at the time t and 0 is the elasticity of marginal utility.
356

The budget constraint of the representative consumer requires that consumption and
investment expenditures are less than total income:
(16.21)
C + I wL + rK + S S + ,
where I denotes investment, w is the wage rate of labor, r is the interest rate, K denotes the
capital stock, S is the wage rate for scientists, and is total profit income. The resource
constraint of the economy implies that
h 1
1 i 1
,
wL + rK + S S + = Y = YL + (1 )YK
(16.22)
where Y is an output aggregate produced from a labor-intensive and a capital-intensive good,

respectively YL and YK , with elasticity of substitution , where 0 < . We will see
below that will also determine the short-run elasticity of substitution between capital and
labor. A host of evidence suggests that this short-run elasticity between capital and labor
is less than one, which in the context of this model implies that < 1.
For simplicity, let us assume that there is no depreciation of capital, so the change in
the capital stock (and in the representative consumers asset level) is given by
K = I.
(16.23)
Let us also use this opportunity to develop a variant of the models studied above. In
particular, let us assume that the labor-intensive and capital-intensive goods are produced
competitively from constant elasticity of substitution (CES) production functions of laborintensive and capital-intensive intermediates, with elasticity 1/(1 ):
YL =
1/
Z
yl (i) di
and YK =
357
yk (i) di
1/
(16.24)

where y(i)s denote the intermediate goods and (0, 1), so that > 1 and dierent
intermediate goods are gross substitutes. This formulation implies that there are two different sets of intermediate goods, n of those that are produced with labor, and m that are
produced using only capital. An increase in nan expansion in the set of labor-intensive
intermediatescorresponds to labor-augmenting technical change, while an increase in m
corresponds to capital-augmenting technical change.
Intermediate goods are supplied by monopolists who hold the relevant patent, and are
produced linearly from their respective factors:
yl (i) = l(i) and yk (i) = k(i),
(16.25)
where l(i) and k(i) are labor and capital used in the production of good i. Market clearing
for labor and capital then requires:
Z
l (i) di = L and
k (i) di = K.
(16.26)
To close the model, we need to specify the innovation possibilities frontierthat is,
the technological possibilities for transforming resources into blueprints for new varieties of
capital-intensive and labor-intensive intermediates.
Let us assume that these blueprints are created by the R&D eorts of scientists, who
are, in turn, employed by R&D firms. There is free-entry into the R&D sector. Once an
R&D firm invents a new intermediate, it receives a perfectly enforced patent and becomes
the perpetual monopolist of that intermediate. R&D firms have access to the following
technologies for invention:
n
m
= bl (Sl ) Sl and
= bk (Sk ) Sk ,
n
m
358
(16.27)

where bl , bk and are strictly positive constants and () is a continuously dierentiable
and decreasing function such that (s) s is always increasing, and (0) < . Sl and Sk
denote, respectively, the number of scientists working to discover new labor-intensive and
capital-intensive intermediates, with the market clearing condition
Sl + Sk = S.
(16.28)
I also assume that the economy starts at t = 0 with n (0) > 0 and m (0) > 0.
Equation (16.27) implies a number of important features:
1. Technical change is directed, in the sense that the society (researchers) can generate
faster improvements in one type of intermediates than the other. This feature will
enable the analysis of whether equilibrium technical change will be labor- or capitalaugmenting.
2. The fact that () is decreasing means that there are intra-temporal decreasing returns
to R&D eort; when more scientists are allocated to the invention of labor-intensive
intermediates, the productivity of each declines. This might be, for example, because
scientists crowd each other out in competing for the invention of similar intermediates.
This decreasing returns assumption is adopted to simplify the analysis of transitional
dynamicswhen () is constant, the behavior of Sl and Sk is discontinuous.
3. Research eort devoted to the invention of labor-intensive intermediates, (Sl ) Sl ,
leads to a proportional increase in the supply of these intermediates at the rate bl ,
while the same eort devoted to the discovery of capital-using intermediates leads to a
proportional increase at the rate bk . The parameters bl and bk potentially dier since
the discovery of one type of new intermediate may be technically more dicult than
359

discovering the other type (the standard model with only labor-augmenting technical
change can be thought as the special case with bk = 0). I also assume that the crowding
eect captured by the function () is not internalized by individual R&D firms, so
each R&D firm takes the productivity of allocating one more scientist to each of the
two sectors, bl (Sl ) or bk (Sk ), as given when deciding which sector to enter. The
results are identical when R&D firms act non-competitively and form global research
consortiums, internalizing these crowding-out eects.
4. Each intermediate disappears at the rate , so that when there is no research eort
devoted to a particular type of intermediates, its stock declines exponentially. With
= 0, the results are similar, but there will exist multiple balanced growth paths (see
below).
Notice that in (16.27) scientists are standing on the shoulders of giants as in the model
of knowledge spillovers analyzed above. In fact, equation (16.27) is a direct generalization
of the accumulation equation in the one-sector knowledge spillovers model analyzed above,
where we had a special form of n/n
= bl (S) S.
However, when we go to an economy with two sectors, there is the issue of how innovations in one sector aect the knowledge base of the other sector. An additional assumption
implicit in (16.27) is that a higher stock of knowledge accumulated in one sector benefits only
that sector (i.e., a higher n increases the productivity of scientists working in the n-sector).
This, as we will see, is the crucial assumption that enables the model to generate endogenous
technological change that is purely labor augmenting.
Finally, define Sl and Sk as the number of scientists required to keep the state of technology in each sector constant, i.e., bl (Sl ) Sl = and bk (Sk ) Sk = . Let us impose:
360

Assumption 14 Sl + Sk < S,
This assumption implies that there is enough scientists in the society to enable technological progress in both sectors.
16.3.2
Consumer and Firm Decisions
An equilibrium in this economy is given by time paths of factor, intermediate and good prices,
w, r, S , [pl (i)]ni=0 , [pk (i)]m
i=0 , pL and pK , employment, consumption and saving decisions,
n
m
[l(i)]ni=0 , [k(i)]m
i=0 , [yl (i)]i=0 , [yk (i)]i=0 , C and I, and the allocation of scientists between the
two sectors, Sl and Sk , such that [yl (i)]ni=0 , [yk (i)]m

i=0 , C and I maximize the utility of the
representative consumer given factor, intermediate and good prices; and [l(i)]ni=0 , [k(i)]m
i=0 ,
[pl (i)]ni=0 and [pk (i)]m
i=0 maximize profits of intermediate goods monopolists, Sl and Sk imply
zero-profits for all R&D firms, and all markets clear.
I start with the optimal consumption path of the representative consumer, which satisfies
the familiar Euler equation:
1
C
= (r ),
C
(16.29)
where recall that r is the rate of interest. The consumption sequence [C(t)]
0 also satisfies
the lifetime budget constraint of the representative agent (the no Ponzi game constraint):
Z t
r (v) dv = 0.
lim K (t) exp
(16.30)
Consumer maximization gives the relative price of the capital-intensive good as:
1
pK
=
p
pL
361
YK
YL
(16.31)

where pK is the price of YK and pL is the price of YL . To determine the level of prices,
I choose the price of the consumption aggregate, Y , in each period as numeraire, i.e.,
1
1
1
= 1, which implies that:
pL + (1 ) p1
K
1
1
pK = p1 + (1 ) 1 and pL = + (1 ) p1 1 .
(16.32)
Next, consumer maximization and the CES functions in (16.24) yield the following isoelastic demand curves for intermediates:
1
1
pl (i)
yl (i)
pk (i)
yk (i)
=
and
=
.
pL
YL
pK
YK
(16.33)
Given these isoelastic demands, profit maximization by the monopolists implies that prices
will be set as a constant markup over marginal cost (which is w for the labor-intensive
intermediates and r for the capital-intensive intermediates):
1
1
1
w
1
r
pl (i) = 1
w=
r= .
and pk (i) = 1
(16.34)
Since, from (16.34), all labor-intensive intermediates sell at the same price, equation (16.33)
implies that yl (i) = yl , for all i, and since all capital-intensive intermediates also sell at the
same price, yk (i) = k for all i as well. Then from the market clearing equation (16.26), we
obtain
yl (i) = l(i) =
L
K
and yk (i) = k(i) = .
n
m
(16.35)
Substituting (16.35) into (16.24) and integrating gives the total supply of labor- and
capital-intensive goods as:
YL = n
L and YK = m
K.
(16.36)
These equations reiterate that n and m correspond to labor- and capital-augmenting technologies. Greater n enables the production of a greater level of YL for a given quantity of
labor, and similarly an increase in m raises the productivity of capital.
362

Equations (16.33), (16.34), (16.35) and (16.36) give the wage rate and the rental rate of
capital as:
w = n
pL and r = m
pK .
(16.37)
Finally, using (16.31) and (16.36), the relative price of the capital intensive good is
1
1
1 m K
pK
=
.
p
pL
n
L
(16.38)
The value of a monopolist who invents a new f -intermediate, for f = l or k, is:

Vf (t) =
Z s
exp
(r() + ) d f (v)dv,
(16.39)
where r(t) is the interest rate at date t, is the depreciation (obsolescence) rate of existing
intermediates, and
l =
1 wL
1 rK
and k =
n
m
(16.40)
are the flow profits from the sale of labor- and capital-intensive intermediate goods.
Scientists are paid a wage S , and competition between the two sectors and free-entry
ensure that this wage is equal to the maximum of their contribution to the value of monopolists in the two sectors. Recall that R&D firms do not internalize the crowding eects, so
the marginal value of allocating one more scientist to the invention of labor-intensive intermediates is bl (Sl ) nVl , and for capital-intensive intermediates, it is bk (Sk ) mVk , where Vl
and Vk are given by (16.39). Therefore, free-entry requires:
S = max {bl (Sl ) nVl , bk (Sk ) mVk } .
(16.41)
Equation (16.41) implies zero expected profits for all firms at all point in time, so = 0 in
(16.21).
363

An equilibrium in this economy is therefore a set of factor prices, w, r and S that
satisfy (16.37) and (16.41), good prices, [pl (i)]ni=0 , [pk (i)]m
i=0 , that satisfy (16.34), intermediate
production levels given by (16.35), output levels given by (16.36), sequences of aggregate
consumption and investment levels that satisfy (16.29) and (16.30), and sequences of Sl and
Sk that satisfy (16.41).
16.3.3
Asymptotic and Balanced Growth Paths
Let us define an asymptotic path (AP) as an equilibrium path that the economy tends to as
t , and does not include limit cycles. In an AP, we can have either limt C (t) /C (t) =
, i.e., consumption grows more than exponentially (explodes), or limt C (t) /C (t) = gc ,
i.e., the rate of consumption growth tends to a constant, possibly 0 (including the case
where limt C (t) = 0 as a special case). A balanced growth path (BGP) is defined as an
AP where output, consumption and the capital stock grow at the same finite constant rate,
i.e., limt C (t) /C (t) = limt Y (t) /Y (t) = limt K (t) /K (t) = g.
This subsection will show that with < 1, only BGPs can be an AP, so if the economy
is going to tend to a non-cycling path, this has to be a BGP. In contrast, with 1, there
may exist asymptotic paths where consumption grows more than exponentially or grows at
a dierent rate than capital, but these artists interesting for us given our focus on < 1.
To facilitate the analysis, let us a dope the notation:
N n
and M m
and, together with (16.36), allows me to write output in a more compact way:
h
i
1
1 1
Y = (NL)
+ (1 ) (MK)
364
(16.42)

In addition, I define a normalized capital stock,
k
MK
,
NL
(16.43)
which is a direct generalization of the normalized capital stock defined in the neoclassical
growth model as capital stock divided by the eective units of labor. Here the numerator
contains the eective units of capital as well, since there can be capital-augmenting technical change. Then, using (16.32), (16.37), (16.38) and (16.43), we can write the interest
rate as:
1
i 1
h
1
+ (1 )
.
r = R(M, k) (1 ) M k
(16.44)
Also, define the relative share of capital, sK , as

sK =
rK
1 1
= pk =
k .
wL
(16.45)
The relationship between the relative share of capital and the normalized capital stock
depends on , which is the elasticity of substitution between capital-intensive and laborintensive goods. Equation (16.45) shows that is also the elasticity of substitution between
capital and labor in this economy. In response to an increase in k, sK will also increase if
> 1, and will decrease if < 1.
This analysis leads to the following crucial result:
Proposition 47 With < 1, all APs are BGPs and feature purely labor-augmenting technical change, i.e., they have limt M (t) /M (t) = 0.
This result demonstrates that with < 1, i.e., with labor and capital as gross complements, the only asymptotic (non-cycling) paths will feature purely labor-augmenting technical change. There will be research eort devoted to the invention of capital-intensive intermediates, but this is only to keep the state of technology in that sector at a constant level.
365

This is essentially a generalization of the steady-state growth theorem, Theorem 7, which
showed that balanced growth is only consistent with purely labor-augmenting technological
change. This proposition shows the same is the case in this more general model.
16.3.4
The Balanced Growth Path
We saw above that with < 1, only a BGP with purely labor-augmenting technical change
can be an AP. Now I show that there in fact exists a unique BGP as long as > 0, and
characterize the properties of this equilibrium path.
First note that from the Euler equation, (16.29), the BGP rate of interest has to be
constant. Moreover, since from Proposition 47, M/M

= 0, equation (16.44) immediately
implies that the price index for capital-intensive goods, pK , and therefore, the relative price
of capital-intensive goods, p, must remain constant.
In addition, in BGP, output, Y , the wage rate, w, and the capital stock, K, will all
grow at a common rate, g. Furthermore, for p to remain constant, (16.31) implies that YL
and YK should grow at the same rate. Therefore, with M constant, n has to grow at the
rate g/ (1 ) (or N has to grow at the rate g). We can then integrate equation (16.39),
allowing for the depreciation of technologies at the rate , and the growth of w, K and n, to
obtain the values of inventing labor- and capital-intensive goods as:
Vl =
wL/n
1
1 rK/m
and Vk =
.
r + (1 2) g/ (1 )
r+g
(16.46)
Notice that these values also grow at a constant rate along the BGP because w, K and n
are growing. The denominator for Vl is dierent from that of Vk because its BGP growth
rate is lower than that of Vk : n, which is in the denominator of l , grows along the balanced
growth path, while m remains constant.
366

Recall that in BGP, p and m are constant, so there is no net capital-augmenting technical
change. This implies (Sk ) Sk = /bk , i.e., Sk = Sk as defined above. The remaining
scientists will work on labor-augmenting technical change. The growth rate of the economy
is therefore
g =
1 n
1
=
[bl (S Sk ) (S Sk ) ] .
n
(16.47)
Assumption 14 ensures that g > 0.

The Euler equation (16.29) then gives the BGP interest rate as r = +g . The interest
rate has to be higher when the growth rate is higher in order to convince consumers to delay
consumption, and the elasticity of marginal utility, , determines how strong this eect needs
to be.
Let k = G(M) such that M and k are consistent with BGP (i.e., r = R(M, k)). It is
clear from (16.44) that G0 > 0that is, there is a strictly increasing relationship between M
and k. This is because a greater k implies a lower price of capital-intensive goods, so capital
has to become more productive, i.e., M has to increase in order to keep the interest rate at
r .
Next, let k be the level of normalized capital such that at this normalized capital stock
and at M/M
= 0, R&D firms are indierent between capital- and labor-augmenting technical
change, i.e., bl (S Sk ) nVl = bk (Sk ) mVk , or from equation (16.10),
bl (S Sk ) wL
bk (Sk ) r K
=
.
r + (1 2) g / (1 )
r + g
(16.48)
This implies that, at k = k , the relative share of capital, K , must satisfy:

K = b
bl (S Sk ) (1 ) ( + + ( 1) g )
,
bk (Sk ) ((1 ) ( + ) + ((1 ) ( 1) + ) g )
367
(16.49)

with g given by (16.47). In other words, using equation (16.45), we have:
k=k
b
1
K = b .
(16.50)
Finally, let M be such that k G(M ), i.e., M is the level of capital-augmenting technology that is consistent with the equilibrium interest rate taking its BGP value when k = k .
As a result, when k = k and M = M , the interest rate will be equal to r and the relative
share of capital will be b .
In BGP, M/M
= 0, while N/N
> 0. Because of the depreciation of technologies, there
must be both research to invent new labor-intensive and capital-intensive intermediatesif
there were no research directed at capital-intensive intermediates, we would have M/M

< 0.
This implies that firms working to invent both types of goods have to make equal profits,
so we need conditions (16.48) and (16.49) to hold, i.e., k = k , which in turn requires that
M = M so that r = r .
We can therefore state:
Proposition 48 Suppose that < 1 and > 0. Then there exists a unique BGP where
k = k as given by (16.50), M = M = G1 (k ), r = r = + g , and output, consumption
and wages grow at the rate g given by (16.47).
This proposition characterizes the unique BGP, which features purely labor-augmenting
technical change. In this BGP, most research is devoted to the invention of labor-intensive
intermediates. There is just enough capital-augmenting technical change to keep the productivity of capital constantthat is, there is no net capital-augmenting technical change.
As a result, despite growth and capital deepening, factor shares remain constant in the long
run. Intuitively, when the relative share of capital is equal to K = b , R&D firms are
368

just indierent between inventing capital-intensive and labor-intensive intermediates; so in
equilibrium they allocate their eort between the two sectors precisely to keep the relative
share of capital at b . We have already seen that when < 1, the BGP with purely laboraugmenting technical change is the only possible asymptotic equilibrium path. In addition,
we will see also below that, under certain conditions, this BGP is dynamically stable, so
starting from dierent initial conditions, the economy will tend towards this growth path.
Given the CRRA preferences, the conclusion that for a BGP with constant interest rate
and growth rate, we need M = M i.e., no net capital-augmenting technical changeis
not surprising. What is important (perhaps surprising), however, is that such a BGP exists
despite the possibility of capital-augmenting technical change.
The results are similar in spirit when there is no technological depreciation, i.e., = 0,
but there are now many balanced growth paths. These paths have the same growth rate,
g (given by (16.47) evaluated at = 0), but dierent factor distributions of income. This
reflects that the equilibrium correspondence is lower-hemi continuous, but not continuous,
in at = 0. Summarizing this result:
Proposition 49 Suppose that < 1 and = 0. Then, there exists a BGP for each M
M G1 (k ), where k is given by (16.49) and (16.50) with = 0. In all BGPs, output,
consumption, wages, and the capital stock grow at the same rate g given by (16.47) with
= 0, and the share of labor is constant. Each BGP has a dierent normalized capital stock,
k = G (M), and a dierent relative share of capital, K .
The intuition for the multiplicity of BGPs is simple: without depreciation, all that is
required for a BGP is that labor-augmenting improvements should be more profitable than
capital-augmenting improvements, i.e. Vk Vl , and this can happen for a range of capital
369

(labor) shares.
16.3.5
Finally, we would like to know whether the economy will tend to be balanced growth path
with three labor augmenting technical change. Here, the feature that < 1 ensures this. In
particular, we have the following result (which is proved in Acemoglu, 2003):
Proposition 50 Suppose > 0 and that < 1, then the BGP characterized above is locally
saddle-path stable.
Therefore, this model provides a framework in which technological change can be capitalaugmenting in the short run or in the median run, but in the long run it will be endogenously
labor-augmenting, ensuring a balanced growth path equilibrium as in the standard neoclassical growth model.
16.3.6
Policy Implications
Despite the similarity of this model to the neoclassical one, the implications are actually
quite dierent. Let us consider one example here. Suppose that there is taxation of capital
income, so that the budget constraint of the representative household becomes:
C + I wL + (1 ) rK + S S + + T.
It can be verified that in the standard neoclassical growth model with exogenously laboraugmenting technological change, an increase in will aect the capital to eective labor
ratio and the share of capital in national income. In contrast, here we have:
370

Proposition 51 Suppose > 0 and that < 1, then the BGP capital share in national
income is constant and independent of .
The reason for this result is interesting: when taxes reduce the rate of return to capital, the composition of technology between capital-augmenting and labor-augmenting types
adjusts endogenously in order to restore the rate of interest and the share of capital in national income back to its BGP level. Therefore, in this model, a variety of policies aect
the composition of technological change, but may have much less eect on long-run growth
properties.
371
372
Chapter 17
Recitation Material: Appropriate
Technology
Thinking of the composition of technology also opens the way for us to consider issues of
appropriate technologies. Recall that in previous models technological dierences were often
explained by assuming that technologies did not freely flow from advanced countries to less
advanced ones. Why should ideas not flow and machines not be exported to poor countries?
Perhaps distortions as in the previous model, but even when ideas could flow at no cost,
productivity dierences may stay.
Why? Many technologies used by LDCs are inappropriate because they are designed
to make optimal use of the prevailing factors and conditions in DCs, where most technologies
are developed.
There is a mismatch between technologies developed in the North and LDCs weather
conditions, labor force skills, etc.
Most technologies are developed in the North. For example, over 90% of the world R&D
373

expenditure takes place in OECD economies.
17.1
Dierences in Capital-Labor Ratios (Atkinson-Stiglitz)
Atkinson and Stiglitz suggested the following idea: new technologies are specific to a given
capital-labor ratio. When used with dierent capital labor ratios, they are less productive.
For example, suppose that the production technology is
Y = A (k | k0 ) K 1 L = A (k | k0 ) k1 L
where k = K/L is the capital-labor ratio, and A (k | k0 ) is the productivity of technology
designed to be used with capital-labor ratio k0 when used instead with capital-labor ratio k.
For example, suppose that

k
A (k | k ) = A min 1,
k0
0
for (0, 1). That is, when a technology designed for the capital labor ratio k 0 is used with
a lower capital-labor ratio, there is a loss in eciency.
Now suppose that new technologies are developed in richer economies, which have greater
capital-labor ratios. Then productivity in a less developed country with the capital-labor
ratio k < k 0 will be
Y = A (k | k0 ) k1 L = Ak 1+ (k 0 )
So less developed countries will produce with worse technologies. Moreover, this technological
disadvantage will be larger when the gap in the capital intensity of production between these
countries and in the technologically advanced economies is greater.
374

The problem with this formulation is that it is static. A recent paper by Basu and Weil
(1998) presents a dynamic formulation based on the idea that A is determined not only by
the current capital-labor ratio in the technologically advanced economies, but by the whole
history of these capital-labor ratios.
17.2
The Role of Human Capital (Acemoglu-Zilibotti)
The Atkinson-Stiglitz and Basu-Weil approach emphasizes dierences in capital intensity

between rich and poor economies.
Another possibility is mismatch between the skill requirements of the frontier technologies in the rich economies and the available skills in the LDCs.
Here I will outline a model where the skill requirements of new technologies are determined by directed technical change in the technologically advanced economies, and this
creates a mismatch between these technologies and the supply of human capital in the LDCs.
17.2.1
A Model
Consider two groups of countries: the North and the South.

H n /Ln > H s /Ls
All technological progress originates in the North. But the South can adopt technologies
without any impediments or costs.
Z
t
C( )1 1
exp(( t))d ,
1
Macroeconomic equilibrium:
375
C + I + X Y exp
(17.1)
ln y(i)di ,
Here i denotes either a task that needs to be performed for production, or an industry
that will contribute to final output.
Technology:
y(i) =
NL
kL (i, v)
dv [(1 i)l(i)] +
NH
kH (i, v)
dv [iZh(i)] ,
(17.2)
There are 3 important features embedded in this technology:

1. Each task/industry output can be produced using two alternative technologies, one
using skilled workers, the other one using unskilled labor.
2. The productivities of these two technologies are parameterized by NL and NH .
3. Skilled and unskilled labor have dierent comparative advantages across sectors. In
particular, skilled workers are more productive in tasks/industries with high indices.
Technological parameter, Z, determines how productive skilled workers are relative to
unskilled workers.
There is a continuum of machines j [0, NL ] (complementary to unskilled workers),
and a continuum j [0, NH ] (complementary to skilled workers), as in the basic directed
technical change model.
Final goods sector is competitive and rents capital and labor services.
A (technology) monopolist owns the patent for each type of machine, produces and rents
machines at the rental rate z (v).
376

Producers of the final good i [0, 1] are price takers. They maximize profits,
Z NL
Z NH
L (v) kL (i, v)dv
H (v) kH (i, v)dv,
p (i) y (i) wL l (i) wH h (i)
0
Solve for equilibrium kz (v), z (v) and replace kz (v) in production functions:
h
i1/
(1 )p(i) ((1 i)l(i)) /L (v)
,
h
i1/
.
kH (i, v) = (1 )p(i) (iZh(i)) /H (v)
kL (i, v) =
(17.3)
(17.4)
Given these isoelastic demand for machines, the optimal rental rates for the technology
monopolists will again be a constant markup over marginal cost.
Substituting machine prices into (17.3), and then using the resulting expressions with
(17.2), we obtain output in sector i as
y(i) = 1 p(i)(1)/ [NL (1 i)l(i) + NH iZ h(i)] .
Technical progress: increases in NL and NH (as in the baseline directed technical change
model discussed above).
NL and NH are the only state variables in the model.
Now taking NL and NH , the equilibrium is straightforward to characterize.
The equilibrium will take a similar form both in the North and in the South. a threshold
J [0, 1] such that skilled workers will be used only in sectors i > J. More explicitly,
i < J, h(i) = 0, and
i > J, l(i) = 0.
In equilibrium:
377

i < J, p(i) = PL (1 i)
i > J, p(i) = PH i
and l(i) = L/J,
and h(i) = H/(1 J),
where PL and PH are price indices for goods produced intensively using the skilled or
unskilled workers.
Relative price of skill-intensive goods:
/2
PH
NH ZH
=
,
PL
NL L
The equilibrium threshold will be given by
1/2
NH ZH
J
=
,
1J
NL L
Total output:
2
Y = exp() (NL L)1/2 + (NH ZH)1/2 ,
Wage premium:
wH
=Z
wL
NH
NL
1/2
ZH
L
1/2
(17.5)
Next, we have to determine NL and NH . This is similar to the analysis of directed

technical change we saw above. In particular, assuming no state dependence (i.e., in
terms of the model of directed technical change above, the innovation possibilities frontier
as given by (16.12), steady state/balanced growth requires H = L . This implies that for
technological equilibrium:
PHn
=
PLn
ZH n
Ln
378
Hence, the equilibrium relative technologies have to satisfy:

1 Jn
ZH n
NH
=
=
.
NL
Jn
Ln
(17.6)
It is interesting that in steady state, the skill premium in the North is
n
wH
/wLn = Z
independent of factor endowment in the North (this is the eect of directed technical change,
but also the special case corresponding to = 2 in terms of the directed technical change
model above).
Next, assume that Southern producers take NL and NH from the North, and maximize
profits. This captures the notion that the North is the technologically advanced economy,
and the South is the follower.
A monopolist in each Southern country copies each new machine and sells it to the
producers in its country.
In equilibrium:
Js > Jn
In other words, certain tasks that are performed by skilled workers in the North will be
performed by unskilled workers in the Souththis is simply an implication of the greater
skill abundance in the North.
Technology levels NH and NL are determined in the North, and Y s grows at the same
rate g as in the North.
379

Let us allow for the price of capital to be larger in the South than in the North (see
Jones, 1995). This will imply that capital-labor ratios may dier between countries.
17.2.2
Implications
What are the productivity implications of directed technical change in the North, and the
South importing technologies developed in the North?
Define
A=
y=
Y
L+ZH
Y
L+H
: output per eective unit of labor
: output per capita.
Both output per eective unit of labor and output per capita are greater in the North
than the South, even if both countries have the same cost of capital. This is true a fortiori,
if cost of capital is higher in the South.
Intuition: TFP is maximized in the North.
Why? The world technologies are designed to make best use of factor abundance/scarcity
in the North. For example, there are many more skilled workers in the North, so technologies
developed in the North are more skill-biased then what is required in the South. Since
J s > J n , these skill-biased technologies will be less useful in the South than in the North,
leading to endogenous productivity dierences between these countries.
17.2.3
Calibration
Can this mechanism lead to sizable eects? Can we generate output per worker dierences
which resemble those in the data? Can we improve on the neoclassical model?
380

Comparison between:
(benchmark neoclassical model):
c
= A (K c )1 (Lc + ZH c ) .
YNC
U SA
= 1,
A (identical for all countries) chosen so as to normalize yNC
AZ model, allowing for dierences in the price of capital):
c 1
c
= NL (K )
YAZ
c 1/2
(L )
NH /NL = ZH n /Ln
North = U S
U SA
= 1.
NL is chosen so as to normalize yAZ
381
NH
ZH c
NL
1/2 !2

Neoclassical model
H/L
LDC
yNC
5th
yNC
<2N C
Our model
LDC
yAZ
5th
yAZ
<2AZ
Primary
1.8 0.45
0.16
0.651 0.39
0.09
0.728
Sec. att.
1.8 0.39
0.15
0.816 0.26
0.05
0.937
Sec. compl. 1.8 0.39
0.15
0.808 0.28
0.07
0.934
Higher
1.8 0.43
0.18
0.718 0.37
0.13
0.843
Primary
1.5 0.46
0.17
0.625 0.40
0.09
0.723
Sec. att.
1.5 0.41
0.16
0.757 0.28
0.05
0.931
0.17
0.745 0.31
0.08
0.918
Higher
1.5 0.45
0.19
0.666 0.39
0.14
0.803
Primary
1.0 0.49
0.21
0.540 0.41
0.10
0.707
Sec. att.
1.0 0.49
0.21
0.540 0.31
0.07
0.901
0.21
0.540 0.36
0.11
0.840
Higher
0.21
0.540 0.44
0.17
0.689
1.0 0.49
y LDC = 0.21 (avg. GDP per worker in non-OECD).

y 5th = 0.03 (GDP 5th poorest country)
382
Chapter 18
Epilogue: Political Economy of
Growth
This course so far has been about understanding the mechanics of economic growth. The
models we have seen are very useful for understanding how individuals accumulate capital, how physical and human capital aect economic growth and income levels, and how
technology endogenously changes and is transferred from one country to another.
However, the major question motivating much of the analysis of economic growth is to
understand why some countries are rich while some others are poor, or why some countries
grow faster than while others stagnate.
At some level, what we have focused on are the proximate causes of this process. Exactly
as in the empirical analysis of decomposing cross-country income dierences into dierences
in physical capital, human capital and technology, we have learned how to construct microfounded models which help us in thinking about the process of economic growth in a careful
and rigorous way.
383

But after seeing all these models, can we answer the question of why is Nigeria poorer
than the United States? The answer is yes and no, probably with more emphasis on the no.
Inevitably, the answer comes to preferences, policies and institutions. The models we
have seen give us a way of translating dierences in preferences, policies, institutions (and
sometime technology) into dierences in growth rates and income levels.
Therefore, the next step in the study of economic growth is to understand why dierent
countries adopt dierent policies. This is the realm of the political economy of growth or
political economy of development.
These topics fall beyond the scope of this course. Nevertheless, as a pointer for those
who are interested in thinking about these topics, I include a brief discussion of some of
the issues here and also provide a very simple model showing how institutions and policies
can be incorporated into a simple growth-type model to analyze how distributional conflict
influences the growth prospects of a society.
18.1
Thinking of Institutions and Growth
As discussed above, institutions (and related policy dierences originating from institutional
dierences) have become popular recently in thinking of fundamental causes of dierences
in income per capita and growth performance of countries. In this context, institutions
contrast with other potential fundamental causes such as geographical dierences or cultural
factors. While geographic characteristics of countries and regions may lead to dierences
in the technology available to individuals or make their investments in physical and human
capital more dicult, institutional dierences, associated with dierences in the organization
of society, shape economic and political incentives and aect the nature of equilibria via these
384

channels.
18.1.1
The Impact of Institutions
Douglass North (1990, p. 3) oers the following definition: Institutions are the rules of the
game in a society or, more formally, are the humanly devised constraints that shape human
interaction. Three important features of institutions are apparent in this definition: (1)
that they are humanly devised, which contrasts with other potential fundamental causes,
like geographic factors, which are outside human control; (2) that they are the rules of the
game setting constraints on human behavior; (3) that their major eect will be through
incentives (see also North, 1981).
There are tremendous cross-country dierences in the way that economic and political life
is organized. A voluminous literature documents large cross-country dierences in economic
institutions, and a strong correlation between these institutions and economic performance,
and we have seen some of those in the early lectures of this course.
Knack and Keefer (1995), for instance, look at measures of property rights enforcement
compiled by international business organizations, Mauros (1995) study looks at measures
of corruption, and work by Djankov, La Porta, Lopez-De-Silanes and Shleifer compiles measures of entry barriers across countries, while many studies look at variation in educational
institutions and the corresponding dierences in human capital. All of these authors find
substantial dierences in these measures of economic institutions, and significant correlation between these measures and various indicators of economic performance. For example,
Djankov et al. find that, while the total cost of opening a medium-size business in the United
States is less than 0.02 percent of GDP per capita in 1999, the same cost is 2.7 percent of
GDP per capita in Nigeria, 1.16 percent in Kenya 0.91 percent in Ecuador and 4.95 percent
385

in the Dominican Republic. These entry barriers are highly correlated with various economic
outcomes, including the rate of economic growth and the level of development.
Nevertheless, as already discussed in the earlier lectures, this type of correlation does
not establish that the countries with worse institutions are poor because of their institutions.
After all, the United States diers from Nigeria, Kenya and the Dominican Republic in its
social, geographic, cultural and economic fundamentals, so these may be the source of their
poor economic performance. In fact, these dierences may be the source of institutional differences themselves. Consequently, evidence based on correlation does not establish whether
institutions are important determinants of economic outcomes.
To make further progress, one needs to isolate a source of exogenous dierences in institutions, so that we approximate a situation in which a number of otherwise-identical societies
end up with dierent sets of institutions. European colonization of the rest of the world
provides a potential laboratory to investigate these issues. From the late 15th century, Europeans dominated and colonized much of the rest of the Globe. Together with European
dominance came the imposition of very dierent institutions and social power structures in
dierent parts of the world.
Acemoglu, Johnson and Robinson, AJR, (2001) document that in a large number of
colonies, especially those in Africa, Central America, the Caribbean and South Asia, European powers set up extractive states. These institutions (again broadly construed) did not
introduce much protection for private property, nor did they provide checks and balances
against the government. The explicit aim of the European in these colonies was extraction
of resources, in one form or another. This colonization strategy and the associated institutions contrast with the institutions Europeans set up in other colonies, especially in colonies
where they settled in large numbers, for example, the United States, Canada, Australia and
386

New Zealand. In these colonies the emphasis was on the enforcement of property rights for
a broad cross section of the society, especially smallholders, merchants and entrepreneurs.
The term broad cross section is emphasized here, since even in the societies with the worst
institutions, the property rights of the elite are often secure, but the vast majority of the
population enjoys no such rights and faces significant barriers preventing their participation in many economic activities. Although investments by the elite can generate economic
growth for limited periods, for sustained growth property rights for a broad cross section
seem to be crucial (AJR, 2002a, Acemoglu, 2003).
A crucial determinant of whether Europeans chose the path of extractive institutions was
whether they settled in large numbers. In colonies where Europeans settled, the institutions
were being developed for their own future benefits. In colonies where Europeans did not
settle, their objective was to set up a highly centralized state apparatus, and other associated
institutions, to oppress the native population and facilitate the extraction of resources in
the short run. Based on this idea, AJR (2001) suggest that in places where the disease
environments made it easy for Europeans to settle, the path of institutional development
should have been dierent from areas where Europeans faced high mortality rates.
In practice, during the time of colonization, Europeans faced widely dierent mortality rates in colonies because of dierences in the prevalence of malaria and yellow fever.
They provide a possible candidate for a source of exogenous variation in institutions. These
mortality rates should not influence output today directly, but by aecting the settlement
patterns of Europeans, they may have had a first-order eect on institutional development.
Consequently, these potential settler mortality rates can be used as an instrument for broad
institutional dierences across countries in an instrumental-variables estimation strategy.
The key requirement for an instrument is that it should have no direct eect on the
387

outcome of interest (other than its eect via the endogenous regressor). There are a number of channels through which potential settler mortality could influence current economic
outcomes or may be correlated with other factors influencing these outcomes. Nevertheless,
there are also good reasons for why, as a first approximation, these mortality rates should not
have a direct eect. Malaria and yellow fever were fatal to Europeans who had no immunity,
thus having a major eect on settlement patterns, but they had much more limited eects
on natives who, over centuries, had developed various types of immunities. The exclusion
restriction is also supported by the death rates of native populations, which appear to be
similar between areas with very dierent mortality rates for Europeans.
The data also show that there were major dierences in the institutional development
of the high-mortality and low-mortality colonies. Moreover, consistent with the key idea
in AJR (2001), various measures of broad institutions, for example, measures of protection
against expropriation, are highly correlated with the death rates Europeans faced more than
100 years ago and with early European settlement patterns. They also show that these
institutional dierences induced by mortality rates and European settlement patterns have
a major (and robust) eect on income per capita. For example, the estimates imply that
improving Nigerias institutions to the level of those in Chile could, in the long run, lead
to as much as a 7-fold increase in Nigerias income. This evidence suggests that once we
focus on potentially-exogenous sources of variation, the data points to a large eect of broad
institutional dierences on economic development.
Naturally, mortality rates faced by Europeans were not the only determinant of Europeans colonization strategies. AJR (2002) focus on another important aspect, how densely
dierent regions were settled before colonization. They document that in more denselysettled areas, Europeans were more likely to introduce extractive institutions because it was
388

more profitable for them to exploit the indigenous population, either by having them work
in plantations and mines, or by maintaining the existing system and collecting taxes and
tributes. This suggests another source of variation in institutions that may have persisted
to the present, and AJR (2002) show similar large eects from this source of variation.
Another example that illustrates the consequences of dierence in institutions is the
contrast between North and South Korea. The geopolitical balance between the Soviet
Union and the United States following the WWII led to separation along the 38th parallel.
The North, under the dictatorship of Kim Il Sung, adopted a very centralized command
economy with little role for private property. In the meantime, South Korea, though far
from a free-market economy, relied on a capitalist organization of the economy, with private
ownership of the means of production, and legal protection for a range of producers, especially
those under the umbrella of the chaebols, the large family conglomerates that dominated the
South Korean economy. Although not democratic during its early phases, the South Korean
state was generally supportive of rapid development and is often credited with facilitating,
or even encouraging, investment and rapid growth in Korea.
Under these two highly contrasting regimes, the economies of North and South Korea
diverged. While South Korea grew rapidly under capitalist institutions and policies, North
Korea experienced minimal growth since 1950, under communist institutions and policies.
Overall, a variety of evidence paints a picture in which broad institutional dierences
across countries have had a major influence on their economic development. This evidence
suggests that to understand why some countries are poor we should understand why their
institutions are dysfunctional. But this is only part of a first step in the journey towards
an answer. The next question is even harder: if institutions have such a large eect on
economic riches, why do some societies choose, end up with and maintain these dysfunctional
389

institutions?
18.1.2
Modeling Institutional Dierences
As a first step in modeling institutions, let us consider the relationship between three institutional characteristics: (1) economic institutions; (2) political power; (3) political institutions.
As already mentioned above, economic institutions matter for economic growth because
they shape the incentives of key economic actors in society, in particular, they influence
investments in physical and human capital and technology, and the organization of production. Economic institutions not only determine the aggregate economic growth potential of
the economy, but also the distribution of resources in the society, and herein lies part of the
problem: dierent institutions will not only be associated with dierent degrees of eciency
and potential for economic growth, but also with dierent distribution of the gains across
dierent individuals and social groups.
How are economic institutions determined? Although various factors play a role here, including history and chance, at the end of the day, economic institutions are collective choices
of the society. And because of their influence on the distribution of economic gains, not all
individuals and groups typically prefer the same set of economic institutions. This leads
to a conflict of interest among various groups and individuals over the choice of economic
institutions, and the political power of the dierent groups will be the deciding factor.
The distribution of political power in society is also endogenous. To make more progress
here, let us distinguish between two components of political power; de jure (formal) and de
facto political power (see Acemoglu and Robinson, 2005). De jure political power refers to
power that originates from the political institutions in society. Political institutions, similar
to economic institutions, determine the constraints on and the incentives of the key actors,
390

but this time in the political sphere. Examples of political institutions include the form
of government, for example, democracy vs. dictatorship or autocracy, and the extent of
constraints on politicians and political elites.
A group of individuals, even if they are not allocated power by political institutions,
may possess political power; for example, they can revolt, use arms, hire mercenaries, co-opt
the military, or undertake protests in order to impose their wishes on society. This type of
de facto political power originates from both the ability of the group in question to solve
its collective action problem and from the economic resources available to the group (which
determines their capacity to use force against other groups).
This discussion highlights that we can think of political institutions and the distribution
of economic resources in society as two state variables, aecting how political power will
be distributed and how economic institutions will be chosen. An important notion is that
of persistence; the distribution of resources and political institutions are relatively slowchanging and persistent. Since, like economic institutions, political institutions are collective
choices, the distribution of political power in society is the key determinant of their evolution.
This creates a central mechanism of persistence: political institutions allocate de jure political
power, and those who hold political power influence the evolution of political institutions,
and they will generally opt to maintain the political institutions that give them political
power. A second mechanism of persistence comes from the distribution of resources: when a
particular group is rich relative to others, this will increase its de facto political power and
enable it to push for economic and political institutions favorable to its interests, reproducing
the initial disparity. Despite these tendencies for persistence, the framework also emphasizes
the potential for change. In particular, shocks to the balance of de facto political power,
including changes in technologies and the international environment, have the potential to
391

generate major changes in political institutions, and consequently in economic institutions
and economic growth.
Therefore, what we need is a framework for organizing our approach to the determination
of economic institutions and policies, taking into account that political institutions themselves are endogenous, and are chosen for their dynamic influences on economic allocations.
A simple way of summarizing some of these ideas in the form of a flow diagram is as follows:
economic
de jure
performancet
economic
political
political
= institutionst =
institutionst =
powert
&
distribution
&
of resources
de facto
distribution
t+1
=
of resourcest = political
political
power
institutions
t+1
This diagram illustrates both the eect of economic institutions on economic performance
and the distribution of resources in a society, and the role of the combination of de jure and
de facto political power in shaping both economic and political institutions.
18.1.3
Institutions in Action
As a brief example, consider the development of property rights in Europe during the Middle Ages. Lack of property rights for landowners, merchants and proto- industrialists was
detrimental to economic growth during this epoch. Since political institutions at the time
placed political power in the hands of kings and various types of hereditary monarchies, such
rights were largely decided by these monarchs. The monarchs often used their powers to
expropriate producers, impose arbitrary taxation, renege on their debts, and allocate the
392

productive resources of society to their allies in return for economic benefits or political support. Consequently, economic institutions during the Middle Ages provided little incentive
to invest in land, physical or human capital, or technology, and failed to foster economic
growth. These economic institutions also ensured that the monarchs controlled a large fraction of the economic resources in society, solidifying their political power and ensuring the
continuation of the political regime.
The seventeenth century, however, witnessed major changes in the economic and political institutions that paved the way for the development of property rights and limits on
monarchs power, especially in England after the Civil War of 1642 and the Glorious Revolution of 1688, and in the Netherlands after the Dutch Revolt against the Hapsburgs. How
did these major institutional changes take place? In England until the sixteenth century the
king also possessed a substantial amount of de facto political power, and leaving aside civil
wars related to royal succession, no other social group could amass sucient de facto political
power to challenge the king. But changes in the English land market and the expansion of
Atlantic trade in the sixteenth and seventeenth centuries gradually increased the economic
fortunes, and consequently the de facto power of landowners and merchants opposed to the
absolutist tendencies of the Kings.
By the seventeenth century, the growing prosperity of the merchants and the gentry,
based both on internal and overseas, especially Atlantic, trade, enabled them to field military
forces capable of defeating the king. This de facto power overcame the Stuart monarchs
in the Civil War and Glorious Revolution, and led to a change in political institutions
that stripped the king of much of his previous power over policy. These changes in the
distribution of political power led to major changes in economic institutions, strengthening
the property rights of both land and capital owners and spurring a process of financial and
393

commercial expansion. The consequence was rapid economic growth, culminating in the
Industrial Revolution, and a very dierent distribution of economic resources from that in
the Middle Ages.
This discussion poses, and also gives clues about the answers to, two crucial questions.
First, why do the groups with conflicting interests not agree on the set of economic institutions that maximize aggregate growth? Second, why do groups with political power want
to change political institutions in their favor? In the context of the example above, why did
the gentry and merchants use their de facto political power to change political institutions
rather than simply implement the policies they wanted? The issue of commitment is at the
root of the answers to both questions.
An agreement on the ecient set of institutions is often not forthcoming because of
the complementarity between economic and political institutions and because groups with
political power cannot commit to not using their power to change the distribution of resources
in their favor. For example, economic institutions that increased the security of property
rights for land and capital owners during the Middle Ages would not have been credible
as long as the monarch monopolized political power. He could promise to respect property
rights, but then at some point, renege on his promise, as exemplified by the numerous
financial defaults by medieval kings. Credible secure property rights necessitated a reduction
in the political power of the monarch. Although these more secure property rights would
foster economic growth, they were not appealing to the monarchs who would lose their rents
from predation and expropriation as well as various other privileges associated with their
monopoly of political power. This is why the institutional changes in England as a result of
the Glorious Revolution were not simply conceded by the Stuart kings. James II had to be
deposed for the changes to take place.
394

The reason why political power is often used to change political institutions is related.
In a dynamic world, individuals care not only about economic outcomes today but also in
the future. In the example above, the gentry and merchants were interested in their profits
and therefore in the security of their property rights, not only in the present but also in the
future. Therefore, they would have liked to use their (de facto) political power to secure
benefits in the future as well as the present. However, commitment to future allocations (or
economic institutions) is in general not possible because decisions in the future are made by
those who hold political power at the time. If the gentry and merchants would have been sure
to maintain their de facto political power, this would not have been a problem. However, de
facto political power is often transient, for example because the collective action problems
that are solved to amass this power are likely to resurface in the future, or other groups,
especially those controlling de jure power, can become stronger in the future. Therefore,
any change in policies and economic institutions that relies purely on de facto political
power is likely to be reversed in the future. In addition, many revolutions are followed
by conflict within the revolutionaries. Recognizing this, the English gentry and merchants
strove not just to change economic institutions in their favor following their victories against
the Stuart monarchy, but also to alter political institutions and the future allocation of de
jure power. Using political power to change political institutions then emerges as a useful
strategy to make gains more durable. Consequently, political institutions and changes in
political institutions are important as ways of manipulating future political power, and thus
indirectly shaping future, as well as present, economic institutions and outcomes.
395
18.2
A Simple Model of Non-Growth Enhancing Institutions
Now I present a simple model of the determination of institutions in the context of investigating their impact on economic growth. The basic setup is one in which an existing elite
is in control of political power, and uses their monopoly of political power for their own
interests even when this is costly for the society at large. I will present a simple model
of this which will highlight various sources of ineciencies in policies, which in turn will
translate into inecient (non-growth enhancing) institutions. It should be noted at this
point, however, that the concept of ineciency here is not that of Pareto ineciency, since
when distributional issues are important, Pareto eciency is not a strong enough concept.
An economy in which all of the resources are allocated to a single individual who has no
investment opportunities, thus growth is stifled, may nevertheless be Pareto ecient. Thus
the concept of ineciency here is being used in the sense of non-growth enhancing or
non-surplus maximizing.
The various sources of ineciencies in policies are
1. Revenue extraction: the group in powerthe elitewill set high taxes on middle
class producers in order to extract resources from them. These taxes are distortionary. This
source of ineciency results from the absence of non-distortionary taxes, which implies that
the distribution of resources cannot be decoupled from ecient production.
2. Factor price manipulation: the group in power may want to tax middle class producers in order to reduce the prices of the factors they use in production. This ineciency
arises because the elite and middle class producers compete for factors (here labor). By
taxing middle class producers, the elite ensure lower factor prices and thus higher profits for
396

themselves.
3. Political consolidation: to the extent that the political power of the middle class
depends on their economic resources, greater middle class profits reduce the elites political
power and endanger their future rents. The elite will then want to tax the middle class in
order to impoverish them and consolidate their political power.
Although all three ineciencies in policies arise because of the desire of the elite to
extract rents from the rest of the society, the analysis will reveal that of the three sources of
ineciency, the revenue extraction is typically the least harmful, since, in order to extract
revenues, the elite need to ensure that the middle class undertakes ecient investments. In
contrast, the factor price manipulation and political consolidation mechanisms encourage
the elite to directly impoverish the middle class. An interesting comparative static result
is that greater state capacity shifts the balance towards the revenue extraction mechanism,
and thus, by allowing the elite to extract resources more eciently from other groups, may
improve the allocation of resources.
Additional ineciencies arise when there are commitment problems on the part of the
elites, in the sense that they may renege on policy promises once key investments are made.
Following the literature on organizational economics, I refer to this as a holdup problem.
With holdup, taxes are typically higher and more distortionary. Holdup problems, in turn,
are likely to be important, for example, when the relevant investment decisions are long-term,
so that a range of policies will be decided after these investments are undertaken.
The ineciencies in policies translate into inecient institutions. Institutions determine
the framework for policy determination, and economic institutions determine both the limits
of various redistributive policies and other rules and regulations that aect the economic
transactions and productivity of producers. In the context of the simple model here, I
397

associate economic institutions with two features:
1. limits on taxation and redistribution, and
2. regulation on the technology used by middle class producers.
The same forces that lead to inecient policies imply that there will be reasons for
the elite to choose inecient economic institutions. In particular, they may not want to
guarantee enforcement of property rights for middle class producers or they may prefer
to block technology adoption by middle class producers. Holdup problems, which imply
equilibrium taxes even higher than those preferred by the elite, create a possible exception,
and may encourage the elite to use economic institutions to place credible limits on their own
future policies (taxes). This suggests that economic institutions that restrict future policies
may be more likely to arise in economies in which there are more longer-term investments
and thus more room for holdup.
The model also sheds light on the conditions under which economic institutions discourage or block technology adoption. If the source of ineciencies in policies is revenue
extraction, the elite always wish to encourage the adoption of the most productive technologies by the middle class. However, when the source of ineciencies in policies is factor
price manipulation or political consolidation, the elite may want to block the adoption of
more ecient technologies, or at the very least, they would choose not to invest in activities that would increase the productivity of middle class producers. This again reiterates
that when the factor price manipulation and political consolidation mechanisms are at work,
significantly more inecient outcomes can emerge.
While economic institutions regulate fiscal policies and technology choices, political institutions govern the process of collective decision-making in society. In the baseline model,
398

the elite have de jure political power, which means that they have the formal right to make
policy choices and influence economic decisions. To understand the ineciencies in the institutional framework, we need to investigate the induced preferences of dierent groups over
institutions. In the context of political institutions, this means asking whether the elite wish
to change the institutional structure towards a more equal distribution of political power.
The same forces that make the elite choose inecient policies also imply that the answer
to this question is no. Consequently, despite the ineciencies that follow, the institutional
structure with elite control tends to persist.
The framework also enables me to discuss issues of appropriate and inappropriate institutions. Concentrating political power in the hands of the elite may have limited costs
(may even be ecient), if the elite are suciently productive (more productive than the
middle class). However, a change in the productivity of the elite relative to the middle class
could make a dierent distribution of political power more beneficial. In this case, existing
institutions, which may have previously functioned relatively well, become inappropriate to
the new economic environment. Yet there is no guarantee that there will be a change in
institutions in response to the change in environment.
Finally, I extend the framework here for analyzing changes in political institutions. Political institutions regulate the allocation of de jure political power, as in the example of
constitutions or elections determining the party in government. There is more to political
power than this type of de jure power, however. Certain groups may be able to disrupt
the existing system, for example, by solving their collective action problem and undertaking
demonstrations, unrest, protests, revolutions or military action. Each group may therefore
possess de facto political power even when excluded from de jure political power. In this
context, middle class producers, even though they have no formal say in a dictatorship or
399

an oligarchic society, may sometimes have sucient de facto political power to change the
system or at least to demand some concessions from the elite. Under these circumstances,
changes in political institutions may emerge as an equilibrium outcome. They are useful as
a way of committing to future allocations, because, by aecting the distribution of de jure
political power in the future, they shape future policies and economic allocations. Such a
commitment may be necessary when the current elite need to make concessions in response
to a shift in the distribution of de facto political power and when their ability to make
concessions within a given political system is limited. Consequently, changes in political
institutions take place when the elite are forced to respond to temporary changes in de facto
political power by changing the political system (and thus the distribution of de jure political power in the future). The analysis also shows that changes in political institutions are
less likely when political stakes are higher, because, in this case, the elite will fight and use
repression to defend the existing regime. Rents from the natural resources or land tend to
increase political stakes and thus contribute to institutional persistence. Interestingly, state
capacity, which makes redistribution more ecient, also increases political stakes and may
create dynamic costs by increasing the longevity of the dictatorship of the elite.
18.2.1
Baseline Model
Consider an infinite horizon economy populated by a continuum 1 + e + m of risk neutral

agents, each with a discount factor equal to < 1. There is a unique non-storable final good
denoted by y. The expected utility of agent j at time 0 is given by:
U0j
= E0
X
t=0
400
t cjt ,
(18.1)

where cjt R denotes the consumption of agent j at time t and Et is the expectations
operator conditional on information available at time t.
Agents are in three groups. The first are workers, whose only action in the model is to
supply their labor inelastically. There is a total mass 1 of workers. The second is the elite,
denoted by e, who initially hold political power in this society. There is a total of e elites.
Finally, there are m middle class" agents, denoted by m. The sets of elite and middle class
producers are denoted by S e and S m respectively. With a slight abuse of notation, I will use
j to denote either individual or group.
Each member of the elite and middle class has access to production opportunities, represented by the production function
ytj =
1
(Aj ) (ktj )1 (ltj ) ,
1 t
(18.2)
where k denotes capital and l labor. Capital is assumed to depreciate fully after use. The
Cobb-Douglas form is adopted for simplicity.
The key dierence between the two groups is in their productivity. To start with, let
us assume that the productivity of each elite agent is Ae in each period, and that of each
middle class agent is Am . Productivity of the two groups diers, for example, because they
are engaged in dierent economic activities (e.g., agriculture versus manufacturing, old versus
new industries, etc.), or because they have dierent human capital or talent.
On the policy side, there are activity-specific tax rates on production, e and m , which
are constrained to be nonnegative, i.e., e 0 and m 0. There are no other fiscal
instruments (in particular, no lump-sum non-distortionary taxes). In addition there is a
total income (rent) of R from natural resources. The proceeds of taxes and revenues from
natural resources can be redistributed as nonnegative lump-sum transfers targeted towards
401

each group, T w 0, T m 0 and T e 0.
Let us also introduce a parameter [0, 1], which measures how much of the tax revenue
can be redistributed. This parameter, therefore, measures state capacity, i.e., the ability
of the states to penetrate and regulate the production relations in society (though it does
so in a highly reduced-form way). When = 0, state capacity is limited all tax revenue
gets lost, whereas when = 1 we can think of a society with substantial state capacity that
is able to raise taxes and redistribute the proceeds as transfers. The government budget
constraint is
Ttw
Ttm
Tte
jS e S m
jt ytj dj + R.
(18.3)
Let us also assume that there is a maximum scale for each firm, so that ltj for all j
and t. This prevents the most productive agents in the economy from employing the entire
labor force. Since only workers can be employed, the labor market clearing condition is
Z
jS e S m
ltj dj 1,
(18.4)
with equality corresponding to full employment. Since ltj , (18.4) implies that if
e + m
1
,
(ES)
there can never be full employment. Consequently, depending on whether Condition (ES)
holds, there will be excess demand or excess supply of labor in this economy. Throughout,
I assume that
Assumption 15
e
1
1
and m ,
402

This assumption ensures that neither of the two groups will create excess demand for
labor by itself. Assumption 15 is adopted only for convenience and simplifies the notation
(by reducing the number of cases that need to be studied).
18.2.2
Economic Equilibrium
I first characterize the economic equilibrium for a given sequence of taxes, { et , m

t }t=0,1,...,
(the transfers do not aect the economic equilibrium). An economic equilibrium is defined as
a sequence of wages {wt }t=0,1,..., , and investment and employment levels for all producers,
o
n
such that given { et , m

ktj , ltj jS e S m
t }t=0,1,..., and {wt }t=0,1,..., , all producers
t=0,1,...,
choose their investment and employment optimally and the labor market clears.
Each producer (firm) takes wages, denoted by wt , as given. Finally, given the absence of
adjustment costs and full depreciation of capital, firms simply maximize current net profits.
Consequently, the optimization problem of each firm can be written as
1 jt j j 1 j
max
(A ) (kt )
lt wt ltj ktj ,
j j 1
kt ,lt
where j S e S m . This maximization yields
ltj
ktj = (1 jt )1/ Aj ltj , and

if wt >
(1
1
[0, ] if wt =
(1
1
=0
if wt <
(1
1
(18.5)
jt )1/ Aj
jt )1/ Aj .
(18.6)
jt )1/ Aj
A number of points are worth noting. First, in equation (18.6), the expression (1
jt )1/ Aj / (1 ) is the net marginal product of a worker employed by a producer of group
j. If the wage is above this amount, this producer would not employ any workers, and if it is
403

below, he or she would prefer to hire as many workers as possible (i.e., up to the maximum,
). Second, equation (18.5) highlights the source of potential ineciency in this economy.
Producers invest in physical capital but only receive a fraction (1 jt ) of the revenues.
Therefore, taxes discourage investments, creating potential ineciencies.
Combining (18.6) with (18.4), equilibrium wages are obtained as follows:
(i) If Condition (ES) holds, there is excess supply of labor and wt = 0.
(ii) If Condition (ES) does not hold, then there is excess demand for labor and the
equilibrium wage is
wt = min
e 1/ e
m 1/ m
.
(1 t ) A ,
(1 t ) A
1
1
(18.7)
The form of the equilibrium wage is intuitive. Labor demand comes from two groups, the
elite and middle class producers, and when condition (ES) does not hold, their total labor
demand exceeds available labor supply, so the market clearing wage will be the minimum of
their net marginal product.
One interesting feature, which will be used below, is that when Condition (ES) does
not hold, the equilibrium wage is equal to the net productivity of one of the two groups of
producers, so either the elite or the middle class will make zero profits in equilibrium.
Finally, equilibrium level of aggregate output is
Z
Z
1
1
j
e (1)/ e
m (1)/ m
(1 t )
(1 t )
A
lt dj +
A
ltj dj + R.
Yt =
1
1
e
m
jS
jS
(18.8)
The equilibrium is summarized in the following proposition:

Proposition 52 Suppose Assumption 15 holds. Then for a given sequence of taxes
{ et , m
t }t=0,1,..., , the equilibrium takes the following form: if Condition (ES) holds, then
404

wt = 0, and if Condition (ES) does not hold, then wt is given by (18.7). Given the wage
sequence, factor demands are given by (18.5) and (18.6), and aggregate output is given by
(18.8).
18.2.3
Inecient Policies
Now I use the above economic environment to illustrate a number of distinct sources of
inecient policies. In this section, political institutions correspond to the dictatorship of
the elite in the sense that they allow the elite to decide the policies, so the focus will be on
the elites desired policies. The main (potentially inecient) policy will be a tax on middle
class producers, though more generally, this could correspond to expropriation, corruption
or entry barriers. As discussed in the introduction, there will be three mechanisms leading to
inecient policies; (1) Resource Extraction; (2) Factor Price Manipulation; and (3) Political
Consolidation.
To illustrate each mechanism in the simplest possible way, I will focus on a subset of the
parameter space and abstract from other interactions. Throughout, I assume that there is
and et , where 1. This limit can be
an upper bound on taxation, so that m
t
institutional, or may arise because of the ability of producers to hide their output or shift
into informal production.
The timing of events within each period is as follows: first, taxes are set; then, investments are made. This removes an additional source of ineciency related to the holdup
problem whereby groups in power may seize all of the output of other agents in the economy
once it has been produced. Holdup will be discussed below.
To start with, I focus on Markov Perfect Equilibria (MPE) of this economy, where
strategies are only dependent on payo-relevant variables. In this context, this means that
405

strategies are independent of past taxes and investments (since there is full depreciation). In
the dictatorship of the elite, policies will be chosen to maximize the elites utility. Hence, a
w
m
e
political equilibrium is given by a sequence of policies { et , m
t , Tt , Tt , Tt }t=0,1,..., (satisfying
(18.3)) which maximizes the elites utility, taking the economic equilibrium as a function of
the sequence of policies as given.
More specifically, substituting (18.5) into (18.2), we obtain elite consumption as
e
e 1/ e
(1 t ) A wt lte + Tte ,
(18.9)
ct =
1
with wt given by (18.7). This expression follows immediately by recalling that the first term
in square brackets is the after-tax profits per worker, while the second term is the equilibrium
wage. Total per elite consumption is given by their profits plus the lump sum transfer they
receive. Then the political equilibrium, starting at time t = 0, is simply given by a sequence
w
m
e
of { et , m
t , Tt , Tt , Tt }t=0,1,..., that satisfies (18.3) and maximizes the discounted utility of
P
t e
the elite,
t=0 ct .
The determination of the political equilibrium is simplified further by the fact that in the
MPE with full capital depreciation, this problem is simply equivalent to maximizing (18.9).
We now characterize this political equilibrium under a number of dierent scenarios.
18.2.4
Revenue Extraction
To highlight this mechanism, suppose that Condition (ES) holds, so wages are constant at
zero. This removes any eect of taxation on factor prices. In this case, from (18.6), we also
have ltj = for all producers. Also assume that > 0 (for example, = 1).
It is straightforward to see that the elite will never tax themselves, so et = 0, and will
redistribute all of the government revenues to themselves, so Ttw = Ttm = 0. Consequently
406

taxes will be set in order to maximize tax revenue, given by
Revenuet =
m
(1)/ m
A m + R
(1 m
t )
1 t
(18.10)
at time t, facedownwhere the first term is obtained by substituting for ltm = and for (18.5)
m
into (18.2) and multiplying it by m
t , and taking into account that there are middle class
producers and a fraction of tax revenues can be redistributed. The second term is simply
the revenues from natural resources. It is clear that tax revenues are maximized by m
t = .
In other words, this is the tax rate that puts the elite at the peak of their Laer curve. In
contrast, output maximization would require m
t = 0. However, the output-maximizing tax
rate is not an equilibrium because, despite the distortions, the elite would prefer a higher
tax rate to increase their own consumption.
At the root of this ineciency is a limit on the tax instruments available to the elite.
If they could impose lump-sum taxes that would not distort investment, these would be
preferable. Inecient policies here result from the redistributive desires of the elite coupled
with the absence of lump-sum taxes.
It is also interesting to note that as increases, the extent of distortions are reduced,
since there are greater diminishing returns to capital and investment will not decline much
in response to taxes.
Even though m
t = is the most preferred tax for the elite, the exogenous limit on
taxation may become binding, so the equilibrium tax is
RE
m
min {, }
t =
(18.11)
for all t. In this case, equilibrium taxes depend only on the production technology (in
particular, how distortionary taxes are) and on the exogenous limit on taxation. For example,
407

as decreases and the production function becomes more linear in capital, equilibrium taxes
decline.
This discussion is summarized in the following proposition (proof in the text):
Proposition 53 Suppose Assumption 15 and Condition (ES) hold and > 0, then the
RE
unique political equilibrium features m
min {, } for all t.
t =
18.2.5
Factor Price Manipulation
I now investigate how inecient policies can arise in order to manipulate factor prices. To
highlight this mechanism in the simplest possible way, let us first assume that = 0 so that
there are no direct benefits from taxation for the elite. There are indirect benefits, however,
because of the eect of taxes on factor prices, which will be present as long as the equilibrium
wage is positive. For this reason, I now suppose that Condition (ES) does not hold, so that
equilibrium wage is given by (18.7).
Inspection of (18.7) and (18.9) then immediately reveals that the elite prefer high taxes
in order to reduce the labor demand from the middle class, and thus wages, as much as
possible. The desired tax rate for the elite is thus m
t = 1. Given constraints on taxation,
FPM
for all t. We therefore have:
the equilibrium tax is m
t =
Proposition 54 Suppose Assumption 15 holds, Condition (ES) does not hold, and = 0,
FPM
then the unique political equilibrium features m
for all t.
t =
This result suggests that the factor price manipulation mechanism generally leads to
higher taxes than the pure revenue extraction mechanism. This is because, with the factor
price manipulation mechanism, the objective of the elite is to reduce the profitability of the
408

middle class as much as possible, whereas for revenue extraction, the elite would like the
middle class to invest and generate revenues. It is also worth noting that, dierently from
the pure revenue extraction case, the tax policy of the elite is not only extracting resources
from the middle class, but it is also doing so indirectly from the workers, whose wages are
being reduced because of the tax policy.
The role of = 0 also needs to be emphasized. Taxing the middle class at the highest
rate is clearly inecient. Why is there not a more ecient way of transferring resources
to the elite? The answer relates to the limited fiscal instruments available to the elite. In
particular, = 0 implies that they cannot use taxes at all to extract revenues from the
middle class, so they are forced to use inecient means of increasing their consumption, by
directly impoverishing the middle class. In the next subsection, I discuss how the factor
price manipulation mechanism works in the presence of an instrument that can directly
raise revenue from the middle class. This will illustrate that the absence of any means
of transferring resources from the middle class to the elite is not essential for the factor
price manipulation mechanism (though the absence of non-distortionary lump-sum taxes is
naturally important).
18.2.6
Revenue Extraction and Factor Price Manipulation Combined
I now combine the two eects isolated in the previous two subsections. By itself the factor
price manipulation eect led to the extreme result that the tax on the middle class should
be as high as possible. Revenue extraction, though typically another motive for imposing
taxes on the middle class, will serve to reduce the power of the factor price manipulation
409

eect. The reason is that high taxes also reduce the revenues extracted by the elite (moving
the economy beyond the peak of the Laer curve), and are costly to the elite.
To characterize the equilibrium in this case again necessitates the maximization of (18.9).
This is simply the same as maximizing transfers minus wage bill for each elite producer. As
before, transfers are obtained from (18.10), while wages are given by (18.7). When Condition
(ES) holds and there is excess supply of labor, wages are equal to zero, and we obtain the
same results as in the case of pure resource extraction.
The interesting case is the one where (ES) does not hold, so that wages are not equal to
zero, and are given by the minimum of the two expressions in (18.7). Incorporating the fact
that the elite will not tax themselves and will redistribute all the revenues to themselves,
the maximization problem can be written as
m
1
e
e
m (1)/ m m m
A lt + R ,
A wt lt + e
(1 t )
max
m
1
1 t
t
(18.12)
subject to (18.7) and

e lte + m ltm = 1, and
(18.13)
1/ m
A Ae .
ltm = if (1 m
t )
(18.14)
The first term in (18.12) is the elites net revenues and the second term is the transfer they
receive. Equation (18.13) is the market clearing constraint, while (18.14) ensures that middle
class producers employ as much labor as they wish provided that their net productivity is
greater than those of elite producers.
The solution to this problem can take two dierent forms depending on whether (18.14)
holds in the solution. If it does, then w = Ae / (1 ), and elite producers make zero
profits and their only income is derived from transfers. Intuitively, this corresponds to the
case where the elite prefer to let the middle class producers undertake all of the profitable
410

activities and maximize tax revenues. If, on the other hand, (18.14) does not hold, then the
elite generate revenues both from their own production and from taxing the middle class
producers. In this case w = (1 m )1/ Am / (1 ). Rather than provide a full taxonomy,
I impose the following additional assumption:
Assumption 16
e
(1)/
A (1 )
This assumption ensures that the solution will always take the latter form (i.e., (18.14)
does not hold). Intuitively, this condition makes sure that the productivity gap between the
middle class and elite producers is not so large as to make it attractive for the elite to make
zero profits themselves (recall that (1 )(1)/ < 1, so if e = m and Ae = Am , this
condition is always satisfied).
1/ m m
A t / (1 ),
Consequently, when Assumption 16 holds, we have wt = (1 m
t )
and the elites problem simply boils down to choosing m

t to maximize
1
m
m (1)/ m m m
1/ m
t (1 t )
(1 m
A l +R
A ,
t )
e
1
1
(18.15)
where I have used the fact that all elite producers will employ employees, and from (18.13),
lm = (1 e ) /m .
The maximization of (18.15) gives
m
t
e
1+
.
= (, , , )
1 m
1
(1 e )
t
The first interesting feature is that (, e , , ) is always less than . This implies that
m
t is always less than 1, which is the desired tax rate in the case of pure factor price
manipulation. Moreover, (, e , , ) is strictly greater than / (1 ), so that m
t is
411

always greater than , the desired tax rate with pure resource extraction. Therefore, the
factor price manipulation motive always increases taxes above the pure revenue maximizing
level (beyond the peak of the Laer curve), while the revenue maximization motive reduces
taxes relative to the pure factor price manipulation case. Naturally, if this level of tax is
greater than , the equilibrium tax will be , i.e.,
m
t
COM
min
(, e , , )
, .
1 + (, e , , )
(18.16)
It is also interesting to look at the comparative statics of this tax rate. First, as
increases, taxation becomes more beneficial (generates greater revenues), but COM declines.
This might at first appear paradoxical, since one may have expected that as taxation becomes
less costly, taxes should increase. Intuition for this result follows from the observation that an
increase in raises the importance of revenue extraction, and as commented above, in this
case, revenue extraction is a force towards lower taxes (it makes it more costly for the elite
to move beyond the peak of the Laer curve). Since the parameter is related, among other
things, to state capacity, this comparative static result suggests that higher state capacity
will translate into lower taxes, because greater state capacity enables the elite to extract
revenues from the middle class through taxation, without directly impoverishing them. In
other words, greater state capacity enables more ecient forms of resource extraction by the
groups holding political power.
Second, as e increases and the number of elite producers increases, taxes also increase.
The reason for this eect is again the interplay between the revenue extraction and factor
price manipulation mechanisms. When there are more elite producers, reducing factor prices
becomes more important relative to gathering tax revenue. One interesting implication of
this discussion is that when the factor price manipulation eect is more important, there will
412

typically be greater ineciencies. Finally, an increase in raises taxes for exactly the same
reason as above; taxes create fewer distortions and this increases the revenue-maximizing
tax rate.
Once again summarizing the analysis:
Proposition 55 Suppose Assumptions 15 and 16 hold, Condition (ES) does not hold, and
COM
> 0. Then the unique political equilibrium features m
as given by (18.16) for all
t =
t. Equilibrium taxes are increasing in e and and decreasing in .
18.2.7
Political Consolidation
I now discuss another reason for inecient taxation, the desire of the elite to preserve their
political power. This mechanism has been absent so far, since the elite were assumed to
always remain in power. To illustrate it, the model needs to be modified to allow for endogenous switches of power. Institutional change will be discussed in greater detail later. For
now, let us assume that there is a probability pt in period t that political power permanently
shifts from the elite to the middle class. Once they come to power, the middle class will
pursue a policy that maximizes their own utility. When this probability is exogenous, the
previous analysis still applies. Interesting economic interactions arise when this probability
is endogenous. Here I will use a simple (reduced-form) model to illustrate the trade-os and
assume that this probability is a function of the income level of the middle class agents, in
particular
pt = p (m cm
t ) [0, 1] ,
(18.17)
where I have used the fact that income is equal to consumption. Let us assume that p is
continuous and dierentiable with p0 > 0, which captures the fact that when the middle
413

class producers are richer, they have greater de facto political power. This reduced-form
formulation might capture a variety of mechanisms. For example, when the middle class are
richer, they may be more successful in solving their collective action problems or they may
increase their military power.
This modification implies that the fiscal policy that maximizes current consumption
may no longer be optimal. To investigate this issue we now write the utility of elite agents
recursively, and denote it by V e (E) when they are in power and by V e (M) when the middle
class is in power. Naturally, we have

Ae w le + 1e m (1 m )(1)/ Am lm m + R
t t
t
t
1
1 t
V e (E) = max
m
e
e
t
+ [(1 p ) V (E) + p V (M)]
t
subject to (18.7), (18.13), (18.14) and (18.17), with pt = p
(1
1
1/ m m m
m
A lt wt ltm m .
t )
I wrote V e (E) and V e (M) not as functions of time, since the structure of the problem makes
it clear that these values will be constant in equilibrium.
The first observation is that if the solution to the static problem involves cm
t = 0, then
the same fiscal policy is optimal despite the risk of losing power. This implies that, as
long as Condition (ES) does not hold and Assumption 16 holds, the political consolidation
mechanism does not add an additional motive for inecient taxation.
To see the role of the political consolidation mechanism, suppose instead that Condition
RE
(ES) holds. In this case, wt = 0 and the optimal static policy is m
min {, } as
t =
discussed above and implies positive profits and consumption for middle class agents. The
dynamic maximization problem then becomes
m
m
1
e
m (1)/ m
A + e 1 t (1 t )
A + R
1
e
. (18.18)
V (E) = max
m
+ V e (E) p (1 m )1/ Am m (V e (E) V e (M))
t
t
1
414

The first-order condition for an interior solution can be expressed as
1 m
t
+ e p0
1 m
t
m 1/ m m
(1 t ) A (V e (E) V e (M)) = 0.
1
RE
It is clear that when p0 () = 0, we obtain m
min {, } as above. However,
t =
PC
> RE min {, } as long as V e (E) V e (M) > 0. That
when p0 () > 0, m
t =
V e (E) V e (M) > 0 is the case is immediate since when the middle class are in power, they
get to tax the elite and receive all of the transfers.
Intuitively, as with the factor price manipulation mechanism, the elite tax beyond the
peak of the Laer curve, yet now not to increase their revenues, but to consolidate their
political power. These high taxes reduce the income of the middle class and their political
power. Consequently, there is a higher probability that the elite remain in power in the
future, enjoying the benefits of controlling the fiscal policy.
An interesting comparative static is that as R increases, the gap between V e (E) and
V e (M) increases, and the tax that the elite sets increases as well. Intuitively, the party
in power receives the revenues from natural resources, R. When R increases, the elite
become more willing to sacrifice tax revenue (by overtaxing the middle class) in order to
increase the probability of remaining in power, because remaining in power has now become
more valuable. This contrasts with the results so far where R had no eect on taxes. More
interestingly, a higher , i.e., greater state capacity, also increases the gap between V e (E) and
V e (M) (because this enables the group in power to raise more tax revenues) and thus implies
a higher tax rate on the middle class. Intuitively, when there is no political competition,
greater state capacity, by allowing more ecient forms of transfers, improves the allocation
of resources. But in the presence of political competition, by increasing the political stakes,
it leads to greater conflict and more distortionary policies.
415

Summarizing this discussion:
Proposition 56 Consider the economy with political replacement. Suppose also that Assumption 15 and Condition (ES) hold and > 0, then the political equilibrium features
PC
m
> RE for all t. This tax rate is increasing in R and .
t =
18.2.8
Subgame Perfect Versus Markov Perfect Equilibria
I have so far focused on Markov perfect equilibria (MPE). In general, such a focus can be
restrictive. In this case, however, it can be proved that subgame perfect equilibria (SPE)
coincide with the MPE. This will not be true in the next subsection, so it is useful to briefly
discuss why it is the case here.
MPE are a subset of the SPE. Loosely speaking, SPEs that are not Markovian will be
supported by some type of history-dependent punishment strategies. If there is no room
for such history dependence, SPEs will coincide with the MPEs.
In the models analyzed so far, such punishment strategies are not possible even in the
SPE. Intuitively, each individual is infinitesimal and makes its economic decisions to maximize profits. Therefore, (18.5) and (18.6) determine the factor demands uniquely in any
equilibrium. Given the factor demands, the payos from various policy sequences are also
uniquely pinned down. This means that the returns to various strategies for the elite are
independent of history. Consequently, there cannot be any SPEs other than the MPE characterized above. Therefore, we have:
Proposition 57 The MPEs characterized in Propositions 53-56 are the unique SPEs.
416
18.2.9
Lack of CommitmentHoldup
The models discussed so far featured full commitment to taxes by the elites. Using a term
from organizational economics, this corresponds to the situation without any holdup.
Holdup (lack of commitment to taxes or policies) changes the qualitative implications of
the model; if expropriation (or taxation) happens after investments, revenues generated by
investments can be ex post captured by others. These types of holdup problems are likely
to arise when the key investments are long-term, so that various policies will be determined
and implemented after these investments are made (and sunk).
The problem with holdup is that the elite will be unable to commit to a particular
tax rate before middle class producers undertake their investments (taxes will be set after
investments). This lack of commitment will generally increase the amount of taxation and
ineciency. To illustrate this possibility, I consider the same model as above, but change the
timing of events such that first individual producers undertake their investments and then
the elite set taxes. The economic equilibrium is unchanged, and in particular, (18.5) and
(18.6) still determine factor demands, with the only dierence that m and e now refer to
expected taxes. Naturally, in equilibrium expected and actual taxes coincide.
What is dierent is the calculus of the elite in setting taxes. Previously, they took
into account that higher taxes would discourage investment. Since, now, taxes are set after
investment decisions, this eect is absent. As a result, in the MPE, the elite will always want
HP
to tax at the maximum rate, so in all cases, there is a unique MPE where m
for
t =
all t. This establishes (proof in the text):

HP
for
Proposition 58 With holdup, there is a unique political equilibrium with m
t =
all t.
417

It is clear that this holdup equilibrium is more inecient than the equilibria characterized
above. For example, imagine a situation in which Condition (ES) holds so that with the
original timing of events (without holdup), the equilibrium tax rate is m
t = . Consider
the extreme case where = 1. Now without holdup, m
t = and there is positive economic
activity by the middle class producers. In contrast, with holdup, the equilibrium tax is
m
t = 1 and the middle class stop producing. This is naturally very costly for the elite as
well since they lose all their tax revenues.
In this model, it is no longer true that the MPE is the only SPE, since there is room
for an implicit agreement between dierent groups whereby the elite (credibly) promise a
dierent tax rate than . To illustrate this, consider the example where Condition (ES)
holds and = 1. Recall that the history of the game is the complete set of actions taken up
to that point. In the MPE, the elite raise no tax revenue from the middle class producers.
Instead, consider the following trigger-strategy combination: the elite always set m =
and the middle class producers invest according to (18.5) with m = as long as the history
consists of m = and investments have been consistent with (18.5). If there is any other
action in the history, the elite set m = 1 and the middle class producers invest zero. With
this strategy profile, the elite raise a tax revenue of (1 )(1)/ Am m / (1 ) in every
period, and receive transfers worth
(1 )(1)/ Am m .
(1 ) (1 )
(18.19)
If, in contrast, they deviate at any point, the most profitable deviation for them is to set
m = 1, and they will raise
(1 )(1)/ Am m .
1
(18.20)
The trigger-strategy profile will be an equilibrium as long as (18.19) is greater than or equal
418

to (18.20), which requires 1 . Therefore we have:
Proposition 59 Consider the holdup game, and suppose that Assumption 15 and Condition
(ES) hold and = 1. Then for 1 , there exists a subgame perfect equilibrium where
m
t = for all t.
An important implication of this result is that in societies where there are greater holdup
problems, for example, because typical investments involve longer horizons, there is room for
coordinating on a subgame perfect equilibrium supported by an implicit agreement (trigger
strategy profile) between the elite and the rest of the society.
18.2.10
Technology Adoption and Holdup
Suppose now that taxes are set before investments, so the source of holdup in the previous
subsection is absent. Instead, suppose that at time t = 0 before any economic decisions or
policy choices are made, middle class agents can invest to increase their productivity. In
particular, suppose that there is a cost (Am ) of investing in productivity Am . The function
is non-negative, continuously dierentiable and convex. This investment is made once and
the resulting productivity Am applies forever after.
Once investments in technology are made, the game proceeds as before. Since investments in technology are sunk after date t = 0, the equilibrium allocations are the same as in
the results presented above. Another interesting question is whether, if they could, the elite
would prefer to commit to a tax rate sequence at time t = 0.
The analysis of this case follows closely that of the baseline model, and I simply state
the results (without proofs to save space):
419

Proposition 60 Consider the game with technology adoption and suppose that Assumption
15 holds, Condition (ES) does not hold, and = 0, then the unique political equilibrium
FPM
for all t. Moreover, if the elite could commit to a tax sequence at
features m
t =
FPM
.
time t = 0, then they would still choose m
t =
That this is the unique MPE is quite straightforward. It is also intuitive that it is the
unique SPE. In fact, the elite would choose exactly this tax rate even if they could commit at
time t = 0. The reason is as follows: in the case of pure factor price manipulation, the only
objective of the elite is to reduce the middle class labor demand, so they have no interest
in increasing the productivity of middle class producers.
For contrast, let us next consider the pure revenue extraction case with Condition (ES)
satisfied. Once again, the MPE is identical to before. As a result, the first-order condition
for an interior solution to the middle class producers technology choice is:
0 (Am ) =
1
(1 m )1/
11
(18.21)
where m is the constant tax rate that they will face in all future periods. In the pure
revenue extraction case, recall that the equilibrium is m = RE min {, }. With the
same arguments as before, this is also the unique SPE. Once the middle class producers
have made their technology decisions, there is no history-dependent action left, and it is
impossible to create history-dependent punishment strategies to support a tax rate dierent
than the static optimum for the elite. Nevertheless, this is not necessarily the allocation that
the elite prefer. If the elite could commit to a tax rate sequence at time t = 0, they would
choose lower taxes. To illustrate this, suppose that they can commit to a constant tax rate (it
is straightforward to show that they will in fact choose a constant tax rate even without this
restriction, but this restriction saves on notation). Therefore, the optimization problem of
420

the elite is to maximize tax revenues taking the relationship between taxes and technology as
in (18.21) as given. In other words, they will solve: max m (1 m )(1)/ Am m / (1 )
subject to (18.21). The constraint (18.21) incorporates the fact that (expected) taxes aect
technology choice.
The first-order condition for an interior solution can be expressed as
Am
m
1 m
m
m dA
A
+
=0
1 m
d m
where dAm /d m takes into account the eect of future taxes on technology choice at time
t = 0. This expression can be obtained from (18.21) as:
1 (1 m )(1)/
1
dAm
< 0.
=
d m
11
00 (Am )
This implies that the solution to this maximization problem satisfies m = T A < RE
min {, }. If they could, the elite would like to commit to a lower tax rate in the future
in order to encourage the middle class producers to undertake technological improvements.
Their inability to commit to such a tax policy leads to greater ineciency than in the case
without technology adoption. Summarizing this discussion:
Proposition 61 Consider the game with technology adoption, and suppose that Assumption
15 and Condition (ES) hold and > 0, then the unique political equilibrium features m
t =
RE min {, } for all t. If the elite could commit to a tax policy at time t = 0, they would
prefer to commit to T A < RE .
An important feature is that in contrast to the pure holdup problem where SPE could
prevent the additional ineciency (when 1 , recall Proposition 59), with the technology adoption game, the ineciency survives the SPE. The reason is that, since middle
421

class producers invest only once at the beginning, there is no possibility of using historydependent punishment strategies. This illustrates the limits of implicit agreements to keep
tax rates low. Such agreements not only require a high discount factor ( 1 ), but also
frequent investments by the middle class, so that there is a credible threat against the elite
if they deviate from the promised policies. When such implicit agreements fail to prevent
the most inecient policies, there is greater need for economic institutions to play the role
of placing limits on future policies.
18.2.11
Inecient Economic Institutions
The previous analysis shows how inecient policies emerge out of the desire of the elite,
which possesses political power, to redistribute resources towards themselves. I now discuss
the implications of these mechanisms for inecient institutions. Since the elite prefer to
implement inecient policies to transfer resources from the rest of the society (the middle
class and the workers) to themselves, they will also prefer inecient economic institutions
that enable and support these inecient policies.
To illustrate the main economic interactions, I consider two prototypical economic institutions: (1) Security of property rights; there may be constitutional or other limits on the
extent of redistributive taxation and/or other policies that reduce profitability of producers
investments. In terms of the model above, we can think of this as determining the level
of . (2) Regulation of technology, which concerns direct or indirect factors aecting the
productivity of producers, in particular middle class producers.
As pointed out in the introduction, the main role of institutions is to provide the framework for the determination of policies, and consequently, preferences over institutions are
derived from preferences over policies and economic allocations. Bearing this in mind, let
422

us now discuss the determination of economic institutions in the model presented here. To
simplify the discussion, for the rest of the analysis, and in particular, throughout this section,
I focus on MPE, and start with security of property rights.
The environment is the same as in the previous section, with the only dierence that at
time t = 0, before any decisions are taken, the elite can reduce , say from H to some level
in the interval [0, H ], thus creating an upper bound on taxes and providing greater security
of property rights to the middle class. The key question is whether the elite would like to
do so, i.e., whether they prefer = H or < H
The next three propositions answer this question:
Proposition 62 Without holdup and technology adoption, the elite prefer = H .
The proof of this result is immediate, since without holdup or technology adoption,
putting further restrictions on the taxes can only reduce the elites utility. This proposition
implies that if economic institutions are decided by the elite (which is the natural benchmark
since they are the group with political power), they will in general choose not to provide
additional security of property rights to other producers. Therefore, the underlying economic
institutions will support the inecient policies discussed above.
The results are dierent when there are holdup concerns. To illustrate this, suppose
that the timing of taxation decision is after the investment decisions (so that there is the
holdup problem), and consider the case with revenue extraction and factor price manipulation
combined. In this case, the elite would like to commit to a lower tax rate than H in order to
encourage the middle class to undertake greater investments, and this creates a useful role
for economic institutions (to limit future taxes):
Proposition 63 Consider the game with holdup and suppose Assumptions 15 and 16 hold,
423

Condition (ES) does not hold, and > 0, then as long as COM given by (18.16) is less than
H , the elite prefer = COM .
The proof is again immediate. While COM maximizes the elites utility, in the presence
of holdup the MPE involves = H , and the elite can benefit by using economic institutions
to manipulate equilibrium taxes.
This result shows that the elite may provide additional property rights protection to
producers in the presence of holdup problems. The reason is that because of holdup, equilibrium taxes are too high even relative to those that the elite would prefer. By manipulating
economic institutions, the elite may approach their desired policy (in fact, it can exactly
commit to the tax rate that maximizes their utility).
Finally, for similar reasons, in the economy with technology adoption discussed above,
the elite will again prefer to change economic institutions to restrict future taxes:
Proposition 64 Consider the game with holdup and technology adoption, and suppose that
Assumption 15 and Condition (ES) hold and > 0, then as long as T A < H , the elite
prefer = T A .
As before, when we look at SPE, with pure holdup, there may not be a need for changing
economic institutions, since credible implicit promises might play the same role (as long as
1 as shown in Proposition 59). However, parallel to the results above, in the
technology adoption game, SPE and MPE coincide, so a change in economic institutions is
necessary for a credible commitment to a low tax rate (here T A ).
Turning to the regulation of technology now, we see that economic institutions also have
and major eect on the environment for technology adoption or more directly the technology
choices of producers. For example, by providing infrastructure or protection of intellectual
424

property rights, a society may improve the technology available to its producers. Conversely,
the elite may want to block, i.e., take active actions against, the technological improvements
of the middle class. Therefore the question is: do the elite have an interest in increasing the
productivity of the middle class as much as possible?
Consider the baseline model. Suppose that there exists a government policy g {0, 1},
which influences only the productivity of middle class producers, i.e., Am = Am (g), with
Am (1) > Am (0). Assume that the choice of g is made at t = 0 before any other decisions,
and has no other influence on payos (and in particular, it imposes no costs on the elite).
Will the elite always choose g = 1, increasing the middle class producers productivity, or
will they try to block technology adoption by the middle class?
When the only mechanism at work is revenue extraction, the answer is that the elite
would like the middle class to have the best technology:
Proposition 65 Suppose Assumption 15 and Condition (ES) hold and > 0, then w = 0
and the the elite always choose g = 1.
The proof follows immediately since g = 1 increases the tax revenues and has no other
eect on the elites consumption. Consequently, in this case, the elite would like the producers
to be as productive as possible, so that they generate greater tax revenues. Intuitively, there
is no competition between the elite and the middle class (either in factor markets or in the
political arena), and when the middle class is more productive, the elite generate greater tax
revenues.
The situation is dierent when the elite wish to manipulate factor prices:
Proposition 66 Suppose Assumption 15 holds, Condition (ES) does not hold, = 0, and
< 1, then the elite choose g = 0.
425

Once again the proof of this proposition is straightforward. With < 1, labor demand
from the middle class is high enough to generate positive equilibrium wages. Since = 0,
taxes raise no revenues for the elite, and their only objective is to reduce the labor demand
from the middle class and wages as much as possible. This makes g = 0 the preferred
policy for the elite. Consequently, the factor price manipulation mechanism suggests that,
when it is within their power, the elite will choose economic institutions so as to reduce the
productivity of competing (middle class) producers.
The next proposition shows that a similar eect is in operation when the political power
of the elite is in contention.
Proposition 67 Consider the economy with political replacement. Suppose also that Assumption 15 and Condition (ES) hold and = 0, then the elite prefer g = 0.
In this case, the elite cannot raise any taxes from the middle class since = 0. But
dierently from the previous proposition, there are no labor market interactions, since there
is excess labor supply and wages are equal to zero. Nevertheless, the elite would like the
profits from middle class producers to be as low as possible so as to consolidate their political
power. They achieve this by creating an environment that reduces the productivity of middle
class producers.
Overall, this section has demonstrated how the elites preferences over policies, and in
particular their desire to set inecient policies, translate into preferences over inecient
non-growth enhancingeconomic institutions. When there are no holdup problems, introducing economic institutions that limit taxation or put other constraints on policies provides
no benefits to the elite. However, when the elite are unable to commit to future taxes (because of holdup problems), equilibrium taxes may be too high even from the viewpoint of the
426

elite, and in this case, using economic institutions to manipulate future taxes may be beneficial. Similarly, the analysis reveals that the elite may want to use economic institutions to
discourage productivity improvements by the middle class. Interestingly, this never happens
when the main mechanism leading to inecient policies is revenue extraction. Instead, when
factor price manipulation and political consolidation eects are present, the elite may want
to discourage or block technological improvements by the middle class.
18.3
Modeling Political Institutions
The above analysis characterized the equilibrium under the dictatorship of the elite, a set
of political institutions that gave all political power to the elite producers. An alternative is
to have the dictatorship of the middle class, i.e., a system in which the middle class makes
the key policy decisions (this could also be a democratic regime with the middle class as
the decisive voters). Finally, another possibility is democracy in which there is voting over
dierent policy combinations. If e + m < 1, then the majority are the workers, and they
will pursue policies to maximize their own income.
I now briefly discuss the possibility of a switch from the dictatorship of the elite to one
of these two alternative regimes. It is clear that whether the dictatorship of the elite or that
of middle class is more ecient depends on the relative numbers and productivities of the
two groups, and whether elite control or democracy is more ecient depends on policies in
democracy. Hence, this section will first characterize the equilibrium under these alternative
political institutions. Moreover, for part of the analysis in this subsection, I simplify the
discussion by imposing the following assumption:
427

Assumption 17
1
m = e < ,
2
This assumption ensures that the number of middle class and elite producers is the same,
and they are in the minority relative to workers.
18.3.1
Dictatorship of the Middle Class
With the dictatorship of the middle class, the political equilibrium is identical to the dictatorship of the elite, with the roles reversed. To avoid repetition, I will not provide a full
analysis. Instead, let me focus on the case, combining revenue extraction and factor price
manipulation. The analog of Assumption 16 in this case is:
Assumption 18
m
(1)/
A (1 )
e
A m.
Given this assumption, a similar proposition to that above immediately follows; the
middle class will tax the elite and will redistribute the proceeds to themselves, i.e., Ttw =
Tte = 0, and moreover, the same analysis as above gives their most preferred tax rate as
(, m , , )
e
COM
t =
min
, .
(18.22)
1 + (, m , , )
Proposition 68 Suppose Assumptions 15 and 17 hold, Condition (ES) does not hold, and
> 0, then the unique political equilibrium with middle class control features et = COM as
given by (18.22) for all t.
Comparing this equilibrium to the equilibrium under the dictatorship of the elite, it is
apparent that the elite equilibrium will be more ecient when Ae and e are large relative
428

to Am and m , and the middle class equilibrium will be more ecient when the opposite is
the case.
Proposition 69 Suppose Assumptions 15-18 hold, then aggregate output is higher with the
dictatorship of the elite than the dictatorship of the middle class if Ae > Am and it is higher
under the dictatorship of the middle class if Am > Ae .
Intuitively, the group in power imposes taxes on the other group (and since m = e ,
these taxes are equal) and not on themselves, so aggregate output is higher when the group
with greater productivity is in power and is spared from distortionary taxation.
18.3.2
Democracy
Under Assumption (A4), workers are in the majority in democracy, and have the power
to tax the elite and the middle class to redistribute themselves. More specifically, each
w
workers consumption is cw
t = wt + Tt , with wt given by (18.7), so that workers care
about equilibrium wages and transfers. Workers will then choose the sequence of policies
P t w
w
m
e
{ et , m
t , Tt , Tt , Tt }t=0,1,..., that satisfy (18.3) to maximize
t=0 ct .
It is straightforward to see that the workers will always set Ttm = Tte = 0. Substituting
for the transfers from (18.3), we obtain that democracy will solve the following maximization
problem to determine policies:
max
wt +
e m
t , t
m
(1)/ m m m
e
e (1)/ e e e
t (1 m
)
A
l
(1
)
A
l
+R
t
t
t
1
with wt given by (18.7).
As before, when Condition (ES) holds, taxes have no eect on wages, so the workers will
tax at the revenue maximizing rate, similar to the case of revenue extraction for the elite
above. This result is stated in the next proposition (proof omitted):
429

Proposition 70 Suppose Assumption 15 and Condition (ES) hold and > 0, then the
e
RE
min {, }.
unique political equilibrium with democracy features m
t = t =
Therefore, in this case democracy is more inecient than both middle class and elite
control, since it imposes taxes on both groups. The same is not the case, however, when
Condition (ES) does not hold and wages are positive. In this case, workers realize that by
taxing the marginal group they are reducing their own wages. In fact, taxes always reduce
wages more than the revenue they generate because of their distortionary eects. As a result,
workers will only tax the group with the higher marginal productivity. More specifically, for
m 1/ m
example, if Am > Ae , we will have et = 0, and m
A = Ae or
t will be such that (1 t )
1/ m
A Ae . Therefore, we have:
m
t = and (1 )
Proposition 71 Suppose Assumptions 15 and 18 hold and Condition (ES) does not hold.
Then in the unique political equilibrium with democracy, if Am > Ae , we will have et = 0,
Dm
will be such that (1 Dm )1/ Am = Ae or Dm = and (1 )1/ Am Ae .
and m
t =
e
De
If Am < Ae , we will have m
will be such that (1 De )1/ Ae = Am or
t = 0, and t =
De = and (1 )1/ Ae Ae .
The most interesting implication of this proposition comes from the comparison of the
case with and without excess supply. While in the presence of excess labor supply, democracy taxes both groups of producers and consequently generates more ineciency than the
dictatorship of the elite or the middle class, when there is no excess supply, it is in general
less distortionary than the dictatorship of the middle class or the elite. The intuition is that
when Condition (ES) does not hold, workers understand that high taxes will depress wages
and are therefore less willing to use distortionary taxes.
430
18.3.3
Ineciency of Political Institutions and Inappropriate Institutions
Consider a society where Assumption 18 is satisfied and Ae < Am so that middle class
control is more productive (i.e., generates greater output). Despite this, the elite will have
no incentive, without some type of compensation, to relinquish their power to the middle
class. In this case, political institutions that lead to more inecient policies will persist even
though alternative political institutions leading to better outcomes exist.
One possibility is a Coasian deal between the elite and the middle class. For example,
perhaps the elite can relinquish political power and get compensated in return. However,
such deals are in general not possible. To discuss why (and why not), let us distinguish
between two alternative approaches.
First, the elite may relinquish power in return for a promise of future transfers. This
type of solution will run into two diculties. (i) such promises will not be credible, and once
they have political power, the middle class will have no incentive to keep on making such
transfers. (ii) since there are no other, less distortionary, fiscal instruments, to compensate
the elite, the middle class will have to impose similar taxes on itself, so that the alternative
political institutions will not be as ecient in the first place.
Second, the elite may relinquish power in return for a lump-sum transfer from the middle
class. Such a solution is also not possible in general, since the net present value of the benefit
of holding political power often exceeds any transfer that can be made. Consequently, the
desire of the elite to implement inecient policies also implies that they support political
institutions that enable them to pursue these policies. Thus, in the same way as preferences
over inecient policies translate into preferences over inecient economic institutions, they
431

also lead to preferences towards inecient political institutions. I will discuss how political
institutions can change from the ground-up in Section 18.3.4 below.
Another interesting question is whether a given set of economic institutions might be
appropriate for a while, but then become inappropriate and costly for economic activity
later. This question might be motivated, for example, by the contrast of the Northeastern United States and the Caribbean colonies between the 17th and 19th centuries. The
Caribbean colonies were clear examples of societies controlled by a narrow elite, with political power in the monopoly of plantation owners, and few rights for the slaves that made
up the majority of the population. In contrast, Northeastern United States developed as
a settler colony, approximating a democratic society with significant political power in the
hands of smallholders and a broader set of producers. While in both the 17th and 18th
centuries, the Caribbean societies were among the richest places in the world, and almost
certainly richer and more productive than the Northeastern United States, starting in the
late 18th century, they lagged behind the United States and many other more democratic
societies, which took advantage of new investment opportunities, particularly in industry
and commerce. This raises the question as to whether the same political and economic institutions that encouraged the planters to invest and generate high output in the 17th and
early 18th centuries then became a barrier to further growth.
The baseline model used above suggests a simple explanation along these lines. Imagine
an economy in which the elite are in power, Condition (ES) does not hold, is small, Ae is
relatively high and Am is relatively small to start with. The above analysis shows that the
elite will choose a high tax rate on the middle class. Nevertheless, output will be relatively
high, because the elite will undertake the right investments themselves, and the distortion
on the middle class will be relatively small since Am is small.
432

Consequently, the dictatorship of the elite may generate greater income per capita than
an alternative society under the dictatorship of the middle class. This is reminiscent of the
planter elite controlling the economy in the Caribbean.
However, if at some point the environment changes so that Am increases substantially
relative to Ae , the situation changes radically. The elite, still in power, will continue to
impose high taxes on the middle class, but now these policies have become very costly
because they distort the investments of the more productive group. Another society where
the middle class have political power will now generate significantly greater output.
This simple example illustrates how institutions that were initially appropriate (i.e.,
that did not generate much distortion or may have even encouraged growth) later caused
the society to fall substantially behind other economies.
18.3.4
Institutional Change and Persistence
To develop a better understanding for why inecient institutions emerge and persist, we
need an equilibrium model of institutional change. I now briefly discuss such a model.
It is first useful to draw a distinction between de jure and de facto political power. De
jure political power is determined by political institutions. In the baseline model, de jure
political power is in the hands of the elite, since the political institutions give them the right
to set taxes and determine the economic institutions. De facto political power, which comes
from other sources, did not feature so far in the model (except in the discussion of political
consolidation). The simplest example of de facto political power is when a group manages
to organize itself and poses a military challenge to an existing regime or threatens it with
a revolution. I will conceptualize institutional change as resulting from the interplay of de
jure and de facto political power.
433

Imagine a society described by the baseline model above where de jure political power is
initially in the hands of the elite. In each period, with probability q the middle class solve the
collective action problem among its members and gather sucient de facto political power
to overthrow the existing regime and to monopolize political power (establish a dictatorship
of the middle class). However, violently overthrowing the existing regime is still costly, and
in particular, each middle class agent incurs a cost of in the process. Moreover, in the
process, the elite are harmed substantially. In particular, I assume that following a violent
overthrow, the elite receive zero utility.
Let us assume that the dictatorship of the middle class, if established, is an absorbing
state and once the middle class comes to power, there will never be any further institutional
change. With probability 1 q, the middle class has no de facto political power. Also denote
the state at time t by the tuple (Pt , st ), where Pt {E, M} denotes whether the elite or the
middle class are in control, and st {H, L} denotes the level of threat (high or low) against
the regime controlled by the elite.
When the middle class amass de facto political power, the elite need to respond in some
way, since letting the middle class overthrow the existing regime is excessively costly for them.
The elite can respond in three dierent ways: (i) they can make temporary concessions, such
as reducing taxes on the middle class, etc.; (ii) they can give up power; (iii) they can use
repression, which is costly, but manages to prevent the regime from falling to the middle
class. I assume that repression costs for the elite as a whole.
Throughout this section, I focus on MPE. In a MPE, strategies are only a function of the
state st , so when st = L, the elite will set the policies that maximize their utility, which were
characterized above. So the interesting actions take place in the state st = H. Moreover,
to simplify the discussion, I assume throughout that Condition (ES) is satisfied, so that the
434

main motive for inecient policy is revenue extraction.
Let us first calculate the value of a middle class agent when the middle class is in power.
Since Condition (ES) is satisfied, the above analysis shows that they will not tax themselves,
set a tax of e = RE on the elite in every period, and redistribute all the revenue to
themselves. To write the resulting value function, let us introduce the following notation:
T j ( ) (1 )(1)/ Aj j / (1 ) as the tax revenue raised from group j at the tax
rate , and j ( ) (1 )1/ Aj / (1 ) as the profit of a producer in group j facing

the tax rate . Then, using M to indicate a value function under the dictatorship of the
middle class, we have
V
m (0) + T e RE + R /m
,
(M) =
1
(18.23)
where RE is given by (18.11). The first term in the numerator is their own revenues,
Am / (1 ), and the second is the distribution from the revenue obtained by taxing the
elite and from natural resources. The term 1 provides the net present discounted value
of this stream of revenues. Similarly, the value of an elite producer in this case is
e RE
e
.
V (M) =
1
(18.24)
What about the dictatorship of the elite? Let us write this value recursively starting in
the no threat state:
V m (E, L) = m RE + (1 q) V m (E, L) + qV m (E, H) .
(18.25)
This expression incorporates the fact that, in the MPE, during periods of low threat, the
elite will follow their most preferred policy, m = RE and T m = 0. The low threat state
recurs with probability 1 q. What happens when st = H? As noted above, there are
435

three possibilities. Let us first start by investigating whether the elite can prevent a switch
of political power by making concessions in the high threat state. For this purpose, let us
denote the highest possible value to the middle class under the dictatorship of the elite by
V m (E, H). Then, the condition for concessions within the given political regime to prevent
action by the middle class is simply
V m (E, H) V m (M) ,
(18.26)
where recall that is the cost of regime change for the middle class. When this constraint
holds, the elite could make sucient concessions to keep the middle class happy within the
existing regime.
Therefore, to determine whether concessions within the dictatorship of the elite will be
sucient to satisfy the middle class, we simply need to calculate V m (E, H). Note that the
best concession that the elite can do is to adopt a policy that is most favorable for the middle

class, i.e., m = 0, e = RE , and T m = T e RE + R /m . Therefore,
V m (E, H) = m (0) + T e RE + R /m + (1 q) V m (E, L) + qV m (E, H) (18.27)
where V m (E, L) is given by expression (18.25), with V m (E, H) replacing V m (E, H) on the
right hand side. Combining (18.27) and (18.25), we obtain:

e RE
RE
m
m
m
(0)
+
T
+
(1
(1
q))
+
R
/
(1
q)
m
(18.28)
V (E, H) =
(1 )
This is the maximum credible utility that the elite can promise the middle class within
the existing regime. The reason why they cannot give them greater utility is because of
commitment problems. As (18.28) makes it clear, the elite transfer resources to the middle
class only in the state st = H. Even if they promise to make further transfers or not tax
436

them in the state st = L, these promises will not be credible (they cannot commit to them),
and in the MPE, when the state st = L arrives, the elite will choose their most preferred
policy of taxing the middle class and transferring the resources to themselves.
If given this expression, (18.26) is satisfied, then the elite can prevent a violent overthrow
by making concessions within the existing regime. Nevertheless, the elite may not necessarily
prefer such concessions. To investigate this issue, we first need to determine the exact
concessions that the elite will make. They will clearly not follow the most preferable policy
for the middle class, since this will give more than sucient utility to prevent an overthrow.
Instead, the elite will choose a policy combination m , e , Tm , Te such that V m (E, H) =
V m (M) , i.e., they will make the middle class just indierent between overthrowing the
regime or accepting the concessions. The value of such concessions to the elite is, by similar
arguments, given by:
V e (E, H) =
i
h
m RE
e
e
e
e
e
)+T
+ R / + (1 (1 q)) (
(1 q) (0) + T
(1 )
(18.29)
Whether the elite will make these concessions or not then depends on the values of other
options available to them. Another alternative is the use of repression whenever there is a
threat from the middle class. Such repression is always eective, so the only cost of this
strategy for the elite is the cost they incur in the use of repression, . Denote V e (O, st ) the
value function to the elite it uses repression and the state is st . By standard arguments,
we can obtain this value by writing the following standard recursive formulae: V e (O, H) =
e (0)+ T m RE + R /m + (1 q) V e (O, L)+qV e (O, H) and V e (O, L) = e (0)+

m RE
+ R /m + (1 q) V e (O, L) +qV e (O, H). These two expressions incorporate
the fact that, when using the repression strategy, the elite will always choose their most for
437

preferred policy combination, and will use repression when st = H to defend their regime.
Combining these two equations, we obtain:
m RE
m
e
(0)
+
T
+
R
/ (1 (1 q))
V e (O, H) =
.
1
(18.30)
Consequently, for the elite to prefer concessions, it needs to be the case that V e (E, H)
V e (O, H).
Finally, the third alternative for the elite is to allow regime change, and obtain V e (M) as
given by (18.24). Evidently, V e (M) is less than V e (E, H), since in the latter case they only
make concessions (in fact limited concessions) with probability q. Therefore, regime change
will only happen when (18.26) does not hold. In addition, for similar reasons, for regime
change to take place, we need V e (M) V e (O, H). Note that all of the values here are
simple functions of parameters, so comparing these values essentially amounts to comparing
nonlinear functions of the underlying parameters.
Putting all these pieces together and assuming for convenience that when indierent the
elite opt against repression, we obtain the following proposition:
Proposition 72 Consider the above environment with potential regime change and suppose
that Condition (ES) holds. Then there are three dierent types of political equilibria:
1. If (18.26) holds and V e (E, H) V e (O, H), in the unique equilibrium the regime
always remains the dictatorship of the elite. When st = L, the elite set their most preferred
policy of m = RE , e = 0 and T m = 0, and when st = H, the elite make concessions
m
e m e
m
m
sucient to ensure V (E, H) = V (M) , i.e., they adopt the policy , , T , T .

2. If (18.26) holds but V e (E, H) < V e (O, H), or if (18.26) does not hold and V e (M) <
V e (O, H), then the regime always remains the dictatorship of the elite. The elite always set
438

their most preferred policy of m = RE , e = 0 and T m = 0, and when st = H, they use
repression against the middle class.
3. If (18.26) does not hold and V e (M) V e (O, H), then there is equilibrium institutional change. When st = L, the elite set their most preferred policy of m = RE , e = 0
and T m = 0, and when st = H, the elite voluntarily pass political control to the middle class.
This proposition illustrates how various dierent institutional equilibria can arise. The
most interesting case is 3, where there is equilibrium institutional change as a result of the
elite voluntarily relinquishing political control. Why would the elite give up their dictatorship? The reason is the de facto political power of the middle class, which threatens the
elite with a violent overthrowan outcome worse than the dictatorship of the middle class.
The elite then prevent such a violent overthrow by changing political institutions to transfer de jure political power to the middle class. This transfer exploits the role of political
institutions as a commitment device (a commitment to a dierent distribution of de jure
political power), and acts as a credible promise of future policies that favor the middle class
(the promise is credible, since institutional change gives de jure political power and thus the
right to set fiscal policy in the future to the middle class).
This discussion highlights that institutional change has two requirements: (i) that concessions within the existing regime are not sucient to appease the middle class; (ii) that
repression is suciently costly for the elites to accept regime change.
The comparative statics of regime change are also interesting. First, when repression is
more costly, i.e., is higher, institutional change is more likely. Moreover:
RE m RE
m
e
e
(0)
+
T
+
R
/ (1 (1 q))
V e (O, H) V e (M) =
1
is increasing in R and . This implies that when R is high, so that there are greater rents from
439

natural resources, V e (M) V e (O, H) becomes less likely, and the elite now prefer to use
repression rather than allowing institutional change. Similarly, greater , which corresponds
to greater state capacity, has the same impact on institutional equilibrium, since greater
state capacity enables greater tax revenues in the future. This implies that, as already
suggested by the results in subsection 18.2.7, greater state capacity, which typically leads to
less distortionary policies, also increases political stakes and makes the use of repression by
the elite the more likely. Nevertheless, increases in R or do not make institutional change
unambiguously less likely, since they also make (18.26) more likely to be violated, making it
more dicult for the elite to use concessions to appease the middle class. Therefore, when
the trade-o for the elite is between repression and institutional change, greater R and
make repression more likely, while when the trade-o is between concessions and institutional
change, they may encourage institutional change.
This analysis also illustrates the conditions for institutional persistence. Persistence
is the natural course of things and something unusual, the success of the middle class in
solving their collective action problem and amassing de facto political power, creates the
platform for institutional change. However, even the possibility of collective action by the
middle class is not sucient, since the elite can use costly methods to defend the existing
regime. Therefore, institutions will be more persistent when the elite are unwilling to give
up the right to determine policies in the future, which will in turn be the case when there is
significant distributional conflict between the elite and the middle class, for example because
tax revenues are important (i.e., high ) or because rents from natural resources, R, are
high. Therefore, a set of political institutions will persist when political stakes are high, i.e.,
when alternative institutional arrangements are costly for those who currently hold political
power and have the means to use force to maintain the existing political institutions.
440

The model also suggests the possibility of interesting interactions between economic
forces and institutional equilibria. The first is an interaction between economic and political
institutions. Suppose that economic institutions impose a low . This implies that control
of fiscal policy will generate only limited gains, reducing political stakes, and the elite will
have less reason to use repression in order to defend the existing regime. Consequently,
institutional persistence might be more of an issue in societies where economic institutions
enable those with political power to capture greater rents.
When the ability of the middle class to solve their collective action problem is endogenous
(as in the model used above to illustrate the political consolidation eect), there will be a
further interaction between economics and politics. In particular, suppose that the probability q that the middle class will be able to pose an eective threat to the regime is endogenous
and depends on the profits of the middle class. In this case, the elite realize that the richer
are the middle class, the greater the threat from them in the future. When political power is
very valuable, for example because tax revenues or rents from natural resources are high, the
elite will wish to overtax the middle class to impoverish them and to reduce their political
power. These higher taxes will, in turn, increase institutional persistence by making it more
dicult for the middle class to solve their collective action problem and mount challenges
against the dictatorship of the elite. This suggests another interesting interaction, this time
between inecient policies and institutional persistence.
441

Ace

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ace

Uploaded by

Copyright:

Available Formats

14.

14.451: Introduction to Economic Growth

1 Stylized Facts of Economic Growth and Development

1.1 A Quick Look at the Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3 The Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 The Solow Growth Model

2.1 The Basic Model in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . .

The Production Structure . . . . . . . . . . . . . . . . . . . . . . . .

Fundamental Law of Motion of the Solow Model . . . . . . . . . . . .

Equilibrium Without Population Growth and Technological Progress

Transitional Dynamics in the Solow Model . . . . . . . . . . . . . . .

2.2 The Solow Model in Continuous Time . . . . . . . . . . . . . . . . . . . . . .

From Dierence to Dierential Equations . . . . . . . . . . . . . . . .

The Fundamental Equation of the Solow Model in Continuous Time .

A First Look at Sustained Growth . . . . . . . . . . . . . . . . . . .

14.451: Introduction to Economic Growth

Neutral Technological Progress . . . . . . . . . . . . . . . . . . . . .

The Steady-State Technological Progress Theorem . . . . . . . . . . .

The Solow Growth Model with Technological Progress: Continuous

3 The Solow Model and the Data

3.1 Growth Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2 Solow Model and Cross-Country Income Dierences . . . . . . . . . . . . . .

Solow Model with Human Capital . . . . . . . . . . . . . . . . . . . .

Problems with the Mankiw, Romer and Weil Approach . . . . . . . .

The Macro Mincer Approach (Bils-Klenow-Rodriguez-Hall-Jones) . .

3.3 An Alternative Approach to Estimating Productivity Dierences (Trefler) . .

4 Fundamental Determinants of Dierences in Income

4.1 From Proximate to Fundamental Causes . . . . . . . . . . . . . . . . . . . .

4.3 Europes Expansion and Colonial Origins of Institutions . . . . . . . . . . .

5 Towards Neoclassical Growth

5.1 Representative Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

14.451: Introduction to Economic Growth

6.1 Brief Review of Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 114

Contraction Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Application of Contraction Mappings to Dynamic Programming . . . 123

6.3 Back to the Fundamentals of Dynamic Programming . . . . . . . . . . . . . 135

Basic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Dynamic Programming Versus the Sequence Problem . . . . . . . . . 138

6.4 Optimal Growth in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . 141

7.1 Finite-Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . 150

The Fundamental Problem . . . . . . . . . . . . . . . . . . . . . . . . 150

Variational Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Simplified Maximum Principle . . . . . . . . . . . . . . . . . . . . . . 154

7.2 Infinite-Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . 160

The Basic Problem: Necessary and Sucient Conditions . . . . . . . 160

Lack of Transversality Conditions . . . . . . . . . . . . . . . . . . . . 163

Discounted Infinite-Horizon Optimal Control . . . . . . . . . . . . . . 164

14.451: Introduction to Economic Growth

8.1 Preferences, Technology and Demographics . . . . . . . . . . . . . . . . . . . 167

Definition of Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 173

The Consumer Problem . . . . . . . . . . . . . . . . . . . . . . . . . 174

Equilibrium Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

8.3 Optimal Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Policy Dierences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

8.9 Variants of the Neoclassical Model

9 Growth with Overlapping Generations

9.1 Problems of Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

Demographics, Preferences and Technology . . . . . . . . . . . . . . . 206

Consumption Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . 208

More Specific Utility Functions . . . . . . . . . . . . . . . . . . . . . 210

Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

9.3 Role of Social Security in Capital Accumulation . . . . . . . . . . . . . . . . 217

14.451: Introduction to Economic Growth

Fully Funded Social Security . . . . . . . . . . . . . . . . . . . . . . . 217