Professional Documents
Culture Documents
Abstract 1. Introduction
One of the core problems in macroeconomics is to create
General equilibrium macroeconomic models are models that capture how the self-interested actions of indi-
a core tool used by policymakers to understand viduals and firms combine to drive the aggregate behaviour
a nation’s economy. They represent the econ- of the economy. These models can provide a guide for poli-
omy as a collection of forward-looking actors cymakers as to what actions they should take in any particu-
whose behaviours combine, possibly with stochas- lar circumstance. Historically, macroeconomic models have
tic effects, to determine global variables (such as tended to be simple because of the need for interpretability,
prices) in a dynamic equilibrium. However, stan- but also because of a heavy reliance on solution methods
dard semi-analytical techniques for solving these that are semi-analytical. Such methods allow for the solu-
models make it difficult to include the important tion of a wide range of important macroeconomic problems.
effects of heterogeneous economic actors. The However, events such as the Great Financial Crisis and the
COVID-19 pandemic has further highlighted the COVID-19 crisis have shown that the ability to solve more
importance of heterogeneity, for example in age general problems that include multiple, discrete agents and
and sector of employment, in macroeconomic out- complex state spaces is desirable. We propose a way to
comes and the need for models that can more use reinforcement learning to extend the frontier of what
easily incorporate it. We use techniques from is possible in macroeconomic modelling, both in terms of
reinforcement learning to solve such models in- the model assumptions that can be used and the ease with
corporating heterogeneous agents in a way that which models can be changed.
is simple, extensible, and computationally effi- Specifically, we show that reinforcement learning can solve
cient. We demonstrate the method’s accuracy and the ‘rational expectations equilibrium’ (REE) models that
stability on a toy problem for which there is a are ubiquitous in macroeconomics where choice variables
known analytical solution, its versatility by solv- are continuous and may have time-dependency, and where
ing a general equilibrium problem that includes there are global constraints that bind agents’ collective ac-
global stochasticity, and its flexibility by solving tions. Importantly, we show how to solve rational expecta-
a combined macroeconomic and epidemiological tions equilibrium models with discrete heterogeneous agents
model to explore the economic and health impli- (rather than a continuum of agents or a single representative
cations of a pandemic. The latter successfully agent).
captures plausible economic behaviours induced
by differential health risks by age. We apply reinforcement learning to solve three REE mod-
els: precautionary saving; the interaction between a pan-
demic and the macroeconomy (an ‘epi-macro’ model), with
stochasticity in health statuses; and a macroeconomic model
1
Bank of England 2 University College London, Depart- which has global stochasticity, i.e. where the background
ment of Computer Science 3 Data Science Campus, Office
is changing in a way that the agents are unable to predict.
for National Statistics. Correspondence to: Edward Hill
<ed.hill@bankofengland.co.uk>. With these three models, we show that we can capture a
macroeconomy that has rational, forward-looking agents,
The views expressed are solely those of the authors, and that is dynamic in time, that is stochastic, and that attains
do not represent the views of the Bank of England (BoE) or Office ‘general equilibrium’ between the supply and demand of
for National Statistics (ONS) or state the policies of the BoE or
goods or services in different markets.
ONS.
Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning
terms proportional to hours worked and amount consumed, allow θ (the utility of work) to vary, and introduce four
with avoidance of infection captured through a disutility of discrete states, indexed by d, that the household can be in
death. Market clearing is assumed. at any given time. While making no economic difference,
the expansion of the action and state spaces means that
Finally, recent work has seen reinforcement learning be
the hyperparameters we find are transferable to the more
applied to multi-agent systems of relevance to economics in
realistic problems we will come to. It also allows us to test
the case of bidding in auctions under constraints (Feng et al.,
that training the parameters of a single network to provide
2018; Dütting et al., 2019), and deciding on behaviours
the value function for any agent is a good approximation
for both agents and a social planner in a gather-and-build
to training a network for each agent individually, which is
economy (Zheng et al., 2020).
significantly more computationally expensive.
In the rest of this paper, we show how to use reinforcement
The analytical solution for the consumption and hours paths
learning to solve typical rational expectations macroeco-
for the household are given by ct = λ−1 β t /I and nt =
nomic models while also incorporating discrete agent het-
wt θ−1 λβ −t where λ is a Lagrange multiplier determined
erogeneity and, potentially, stochasticity; demonstrating that
from the no Ponzi condition. Computationally, we start
all three can be combined is by far our major contribution
from the Bellman equation:
and has applications for a wide class of economic problems.
U (t, s, S) =
3. Model and Experiments
max ut (s, S, a) + β E U (t + 1, st+1 , St+1 )
a st+1 ,St+1 |a,s,S
3.1. Precautionary Savings
(1)
A typical rational expectations equilibrium problem is that
where s is the agent’s state, a is the action vector. E is
of precautionary saving, in which agents anticipate a change
the expectation operator, and S is the global state (which
in circumstances that will adversely affect their utilities, in
includes t, but we make t explicit for clarity). For the cur-
this case a reduction in wages, and respond in advance
in order to smooth their consumption. Such behaviour is
rent problem,
U (t, s) =
we drop the global state to obtain
maxa ut (s, a) + β Est+1 |a,s U (t + 1, st+1 ) .
typical of the agents in a REE model. The simplest version
of this problem has a known analytical solution. We solve The optimal action vector under local and global constraints,
this model using reinforcement learning so that we may a∗t (s, S), is computed using the method of Lagrange mul-
compare it to the analytical solution, and we also use it as tipliers; this requires accurate values of the gradient of U
a way to demonstrate many of the challenges of using RL with respect to the state variables.
for this class of problems; notably the speed and accuracy
We use a deep neural network to approximate U (t, s) =
of convergence given the sensitivity to the estimate of the
U (t, s = (e, θ, d; bt )); however, direct approximation is
value function; the continuous action and state spaces; and
problematic because ∂s U (t, s) is slow to converge and is
the enforcement of the ‘no Ponzi’ condition.
highly sensitive to initialisation and hyperparameter choice.
To mitigate this, we find D(t, s) = ∂s U (t, s) explicitly by
3.1.1. M ODEL
solving
We assume that there is a single household agent with ratio-
nal expectations. There are I = 2 firms, with the firms and D(t, s) = ∂s ut (s, a∗ )
X
the good each firm produces indexed by i. The household +β ∂s P(a∗ (s) → st+1 |s, a∗ )U (t + 1, st+1 )
agent is employed by one of these firms, P which we will st+1
denote e, and has per period utility ut = i∈I ln cit − θ2 n2t
+β E ∂s (st+1 )D(t + 1, st+1 ) (2)
with action (choice variables) cit consumption and nt hours st+1 |s,a∗
worked. 0 ≤ t < T is the discrete timestep. The price
That a directly learnt estimate of the ∂s U (t, s) aids stability
vector is fixed to pit = 1, and the interest rate is fixed
and convergence has been noted in both deterministic (Bal-
to 0. The wage is imposed as w = 1 for t < T /2
duzzi & Ghifary, 2015) and stochastic (Heess et al., 2015)
and w = 0.5 afterwards, a fall that is anticipated. The
continuous control problems.
household agent is subject
P to a budget constraint such that
bt+1 = bt + wt nt − i∈I pit cit . The no Ponzi condition In the case of precautionary saving, ∂s P(s → st+1 |s, a∗ ) =
is imposed via bT = 0, which prevents unlimited borrow- 0 and P(s → st+1 |s, a∗ ) = δsst+1 , (with δ the Kro-
ing by P
the household. The agent maximises its discounted necker delta) so that D(t, s) = ∂s ut (s, a∗ ) + β ∂s∂s
t+1
D(t +
utility 0≤t<T β t ut with β = 0.97 and T = 20. ∗
1, a (s)). The values of ∂a U (t + 1, st+1 ) that are required
We use I = 2 firms rather than a single representative firm, for finding a∗ are then written as ∂s∂at+1
D(t + 1, st+1 ). U
and D need not be consistent.
Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning
demonstrate the approach; the solution method is appli- has significantly less noise than Ĥτ . We use the average
0
cable to a wide range of models. Also, note that there is of {Hτ 0 }τ 0 ≤τ weighted by γ τ −τ with γ ≤ 1. While in
no need for linearisation around a steady-state, which is a general Hτ is therefore produced solely as a function of
common solution technique for models of this type. Uτ −1 , we modify this general schemeP for this specific
P case
by providing the infection fraction Infected j cji / j cji in
Household agents have four possible health states: sus-
ceptible, infected, recovered, and deceased (SIRD), with multi-agent simulation τ from H̄τ −1 . This counteracts the
the probability of transition between states given by, e.g., propagation of the high levels of noise created by the initia-
P(S → I) for going from susceptible to infected. In what tion of the pandemic through to later times of the simulation,
follows, epidemiological parameters are noted with tildes. and could be avoided by using a larger number of agents.
At each time-step (a day), At each timestep, we iteratively solve to find the val-
X ues of the prices (pi , wi , r) for which the markets
cji
X ρ̃i cji Infected j for labour and capital clear by gradient descent using
Pj (S → I) = β̃ P scipy.optimize.least squares with the default
cji,t=0 j cji
i parameters, initialised from the previous timestep, for t > 0.
so that there is a ‘shopping risk’ of acquiring the infection if We then advance both the capital and epidemiological state
to proceed to the next timestep.
many infected are consuming the same goods.P ρ̃i is a vector
of relative consumption risks such that i ρ̃i = 1. P(I →
R) = γ̃, and Pj (I → D) = ξ(j) ˜ for each household j 3.2.3. R E - TRAINING AGENT BEHAVIOURS
˜
where ξ(j) is an exponentially increasing function of age Uτ +1 is obtained from Uτ by continuing training using H̄τ .
rising from 0.006 at age 40 through 0.024 at 55 to 0.165 for The network parameters are the same as in §3.1, however
a 70 year old, then flattening at age 80. β̃ = 0.56, and γ̃ = we use a gentler decay of the learning rate. There are no
0.2 from Lin et al. (2020). The relative risk of consuming problems observed with convergence, and the adherence to
each sector’s product is ρ̃i = {0.8, 1.2}/2, and we refer the no Ponzi condition is a good test of this, since achieving
to these as ‘remote’ and ‘social’ consumption respectively. it is sensitive to the entire time history of the simulation.
At time t = 0, kj = 15 = Ke /2 ∀j, and at time t = 2
we infect the youngest 10% of the population. On death, As in the multi-agent model, consumers take prices (pi ,wi
any investments are redistributed across living agents and and r) as given, find their optimal consumptions, hours
deceased agents are not replaced. We use J = 100 agents, worked and investments before advancing their state using
with a distribution of ages given by Age(j) ∼ U(20, 95). the budget constraint and the probabilities of their SIRD
Households are evenly distributed across employers, and state changing.
cannot change employer.
3.2.4. R ESULTS
Aside from using a reasonable distribution of the death rates,
this model is entirely uncalibrated. We examine two cases: a ‘heterogeneous’ case as described
above, and a ‘homogeneous’ case without age heterogeneity
3.2.2. S OLVING THE MODEL WITH MULTIPLE AGENTS but with the same mean death rate.
We run a number of simulations indexed by τ . Solving Figure 1 shows the percentages of susceptible, infected,
the model means finding a history Hτ =∞ = {St }t∈0,...,T recovered and deceased agents as the pandemic progresses.
and agent behaviours Uτ =∞ (t, kt , St ) that are consistent. Each line is an average over the results of 3 simulations,
We begin with Hτ =0 and Uτ =0 . We use H̄τ , an average and each simulation’s result is an average over 20 histories.
created from {Hτ 0 }τ 0 ≤τ to re-train Uτ , obtaining Uτ +1 . A As can be seen from the 95% confidence intervals in the
multi-agent simulation is run with agent behaviours gov- figure, the behaviour is similar across simulations. In the
erned by Uτ +1 to obtain Hτ +1 in general equilibrium. As is homogeneous-age case, more people are infected over the
standard in iterative methods, we terminate at a sufficiently course of the pandemic, but there are fewer deaths in total.
large τmax in order to provide a good approximation to the Figure 2 shows the agents’ consumptions. We bin the uni-
values at τ = ∞; the results we present use τmax = 50. In form age distribution into young (< 40), old (> 70) and
the limit of large J, each household’s behaviour makes no middle-age groups; we find considerable differences in be-
difference to the system, so despite the stochastic transitions haviour between them. After consuming the most before
of health statuses, we are able to obtain a unique history the pandemic since they anticipate the coming opportunity
of the variables characterising the global system (including to save, the old strongly reduce consumption in response to
the I = 2 sectors). The observed Ĥτ can be seen as noisy infection risk, and reduce consumption of the riskier ‘social’
¯ good more. The young, conversely, are unlikely to die and
observations of that Hτ , and Ĥτ as an estimate of H̄τ where
τ →∞ ¯ so their consumption is relatively unchanged, governed by
the averaging is chosen such that H̄τ −−−−→ Hτ and Ĥτ
Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning
Age Group
Susceptible 15 Under 40
100 Age distribution 40 to 70
10
Peak of infections
Percent of agents
60 Recovered
5
10
40 15
Deceased 20
20
0 5 10 15 20 25 30 35 40
Infected Simulation Time
0
0 10 20 30 40
Simulation Time Figure 3. Average investments of living agents binned into three
age groups. The vertical line shows the time of peak infections.
6
exercise has also shown the sensitivity of the model’s con-
clusions to those parameters, emphasising the importance
4
of calibration in all aspects of the model if it were used as
2 more than a test case of the methodology.
0 10 20 30 0 10 20 30 0 10 20 30
Simulation Time Simulation Time Simulation Time
3.2.5. S PECIFICATION
Figure 2. Average consumption of living agents from the ‘remote’ The hardware, software, and parameters are identical to
and ‘social’ sectors binned into three age groups. §3.1 with the exception of the learning rate decay which
is slower here to allow for the longer time history. Scaling
is linear with J, since the number of iterations of the least
the decrease in the size of the economy. squares optimisation seems to scale very weakly with J for
Figure 3 shows the mean investments per agent in each age this problem. Each history calculation followed by an RL
group, normalised to the salary (product of wage and hours update takes ∼ 30 minutes on the reference machine, so a
worked) of an agent in the same system with no pandemic single simulation takes ∼ 24 hours, ∼ 720 GFLOPs-hour.
and no saving. The young anticipate the pandemic and save
before it in order to spend when infection rates are higher, 3.3. Stochasticity of global variables: A toy wage-shock
with the converse behaviour for the old. It also shows the problem
no Ponzi condition holds with good accuracy for all age
In the previous section, only the agents’ state was stochastic
groups.
and, because of being in the limiting case of large numbers
Finally, Figure 4 shows the total consumption for both of agents, each individual agent’s state did not affect the
heterogeneous-age and homogeneous-age cases. The in- global state. We now return to the model in §3.1, but instead
clusion of distributional effects causes significant changes of having a deterministic drop in wages, wages now follow
to bulk macroeconomic quantities. a log-autoregressive stochastic process.
Together these results show that in this uncalibrated model
3.3.1. T HE MODEL
the inclusion of age heterogeneity makes a substantial differ-
ence to both the epidemiological and economic progress of a We return to the Bellman equation, (1). We assume there
pandemic. This model and solution method has been tested is no stochasticity in health states so the Es0 |s,a (·) are no
with a range of epidemiological and economic parameters longer present, however the ESt+1 |S (·) remain.
Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning
and so paths that have wp,t = wh,t using the second moment of
the log-normal distribution.
∂a L(t, S, s, a) = ∂a ut (S, s, a) We implement prioritised experience replay (Schaul et al.,
+ β(∂a st+1 )D̃(t + 1, S, st+1 ) (7) 2015) by retaining experiences with a larger error for a
larger number of training epochs. This increases the agents’
As is standard in reinforcement learning, the expectations performance relative to a base agent trained as in previous
are approximated by using a large number of global state sections, particularly on the dh > 0.2 histories that deviate
histories to update Ũ and D̃. significantly from the mean path. This is expected since
the training examples are sparser and more varied at higher
Excepting the wage history, the set-up is as in §3.1. The dh . The wage-observing agent outperforms the non-wage-
agents are statically heterogeneous in their propensity to
Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning
4. Discussion and Conclusion Feng, Z., Narasimhan, H., and Parkes, D. C. Deep learning
for revenue-optimal auctions with budgets. In Proceed-
This work shows that reinforcement learning can be used ings of the 17th International Conference on Autonomous
to solve a wide range of important macroeconomic ratio- Agents and Multiagent Systems, pp. 354–362, 2018.
nal expectations models in a way that is simple, flexible,
and computationally tractable. Furthermore, these meth- Ferguson, N., Laydon, D., Nedjati-Gilani, G., Imai, N.,
ods can be immediately applied to previously intractable Ainslie, K., Baguelin, M., Bhatia, S., Boonyasiri, A.,
problems with multiple degrees of discrete heterogeneity Cucunubá, Z., Cuomo-Dannenburg, G., et al. Report
and stochasticity. Being highly relevant to real world phe- 9: Impact of non-pharmaceutical interventions (NPIs)
nomena, such as climate change and disease transmission, to reduce COVID-19 mortality and healthcare demand.
these capabilities are of great value to policymakers and can Imperial College COVID-19 Response Team, 2020.
be developed into serious tools to aid decision making in
complex scenarios. Haldane, A. G. and Turrell, A. E. Drawing on different
disciplines: macroeconomic agent-based models. Journal
Finally, by linking to reinforcement learning, this work of Evolutionary Economics, 29(1):39–66, 2019.
provides the potential to apply its extensive toolkit of tech-
niques, many of which have direct relevance to economic Heess, N., Wayne, G., Silver, D., Lillicrap, T., Tassa, Y., and
questions: examples include accessing larger state and ac- Erez, T. Learning continuous control policies by stochas-
tion spaces (e.g. Lillicrap et al., 2015), including bounded tic value gradients. arXiv preprint arXiv:1510.09142,
rationality, or applying inverse reinforcement learning to 2015.
deduce agents’ objectives and rewards from observed micro-
Hoertel, N., Blachier, M., Blanco, C., Olfson, M., Massetti,
and macro-economic behaviours. Additionally, we can har-
M., Rico, M. S., Limosin, F., and Leleu, H. A stochas-
ness improvements in implementation such as GPU/TPU
tic agent-based model of the SARS-CoV-2 epidemic in
acceleration (Paszke et al., 2019) and distributed computing
france. Nature Medicine, 26(9):1417–1421, 2020.
(Mnih et al., 2016).
Kaplan, G., Moll, B., and Violante, G. L. Monetary policy
Acknowledgements according to hank. American Economic Review, 108(3):
697–743, 2018.
We would like to thank Federico Di Pace for useful discus-
sions. Kaplan, G., Moll, B., and Violante, G. L. The great lock-
down and the big stimulus: Tracing the pandemic possi-
bility frontier for the u.s. Working Paper 27794, National
References Bureau of Economic Research, September 2020.
Balduzzi, D. and Ghifary, M. Compatible value gradients
for reinforcement learning of continuous deep policies. Kermack, W. O. and McKendrick, A. G. A contribution to
arXiv preprint arXiv:1509.03005, 2015. the mathematical theory of epidemics. Proceedings of
the Royal Society A: Mathematical, Physical and Engi-
Dütting, P., Feng, Z., Narasimhan, H., Parkes, D., and Ravin- neering Sciences, 115(772):700–721, 1927.
dranath, S. S. Optimal auctions through deep learning.
Kerr, C. C., Stuart, R. M., Mistry, D., Abeysuriya, R. G.,
In International Conference on Machine Learning, pp.
Hart, G., Rosenfeld, K., Selvaraj, P., Nunez, R. C.,
1706–1715. PMLR, 2019.
Hagedorn, B., George, L., et al. Covasim: an agent-
Eichenbaum, M. S., Rebelo, S., and Trabandt, M. The based model of COVID-19 dynamics and interven-
macroeconomics of epidemics. Working Paper 26882, tions. medRxiv, 2020. URL https://doi.org/10.
National Bureau of Economic Research, March 2020a. 1101/2020.05.10.20097469.
Lin, Q., Zhao, S., Gao, D., Lou, Y., Yang, S., Musa, S. S., Zheng, S., Trott, A., Srinivasa, S., Naik, N., Gruesbeck,
Wang, M. H., Cai, Y., Wang, W., Yang, L., et al. A con- M., Parkes, D. C., and Socher, R. The AI economist:
ceptual model for the outbreak of coronavirus disease Improving equality and productivity with AI-driven tax
2019 (COVID-19) in Wuhan, China with individual re- policies. arXiv preprint arXiv:2004.13332, 2020.
action and governmental action. International journal of
infectious diseases, 2020.
Appendix I : The Real Business Cycle model
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidje- We use a standard real business cycle model, however we
land, A. K., Ostrovski, G., et al. Human-level control adopt notation and variables common in reinforcement learn-
through deep reinforcement learning. Nature, 518(7540): ing, in particular, an emphasis on state variables (capital)
529–533, 2015. rather than action variables (consumption, hours worked),
and the inclusion of an action-dependent expectation. A
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, baseline RBC model would use Equations 9, 12, 13, 14, 15,
T., Harley, T., Silver, D., and Kavukcuoglu, K. Asyn- 16, and 19. We use Equations 9, 10, 11, 14, 15, 16, and 19,
chronous methods for deep reinforcement learning. In but note that 10 and 12, and 11 and 13, are the same up to
International Conference on Machine Learning, pp. 1928– algebraic manipulation, expressing the future behaviour in
1937. PMLR, 2016. terms of U (k) and U 0 (k), functions of the state, rather than
in terms of consumption or other actions as is usually seen.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
We work in real quantities, using pi=0 = 1 as the numéraire
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga,
and other quantities, including pi6=0 6= 1 defined relative to
L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Rai-
this.
son, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang,
L., Bai, J., and Chintala, S. Pytorch: An imperative
style, high-performance deep learning library. In Wal- Notation
lach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, j is an index that runs over the J consumer-workers. i runs
F., Fox, E., and Garnett, R. (eds.), Advances in Neural In- over the I consumption goods, each of which is produced
formation Processing Systems 32, pp. 8024–8035. Curran by a different sector/firm.
Associates, Inc., 2019.
For consumers, nj is hours worked, cji is consumption, kj
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. Priori- is capital held by the consumer, vj is their investment in
tized experience replay. arXiv preprint arXiv:1511.05952, capital in that timestep, θj > 0 is the weight given to hours
2015. in the utility function. Ei denotes the set of agents employed
at firm i, e(j) is the index, i, of the employer of j.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and For firms, Ni is the number of hours worked at the firm, Ki
Klimov, O. Proximal policy optimization algorithms. is its capital, Ai is the firm’s technology, Yi is the production
arXiv preprint arXiv:1707.06347, 2017. function, Ci is the consumption of the firm’s goods.
Smets, F. and Wouters, R. Shocks and frictions in us busi- r is the real interest rate, wi are wages, and pi are the prices
ness cycles: A bayesian dsge approach. American Eco- of goods.
nomic Review, 97(3):586–606, 2007.
Consumers
Sutton, R. S. and Barto, A. G. Reinforcement learning:
An introduction. MIT Press, Cambridge, Massachusetts, For consumers, the time-t utility is
2018. X 1
uj ({cji }, nj ; θj ) = ln cji − θj n2j
2
Tracy, M., Cerdá, M., and Keyes, K. M. Agent-based mod- i
eling in public health: Current applications and future
and their total utility from time t onward is
directions. Annual Review of Public Health, 39(1):77–94,
2018.
Uj,t,S (kt,j ) =
( )
Van Hasselt, H., Guez, A., and Silver, D. Deep reinforce- X
ment learning with double q-learning. In Proceedings of max uj (cji , nj ) + β PS→S 0 (cji , nj )Uj,t+1,S 0 (kt+1,j )
cji ,nj
the AAAI Conference on Artificial Intelligence, pp. 2094— S0
Firms
Firms are profit maximising with production function