You are on page 1of 76

How Does Government Expenditure Impact Sustainable

Development?

Studying the Multidimensional Link between Budgets and

Development Gaps

Omar A. Guerrero1,2 and Gonzalo Castañeda3

1
Department of Economics, UCL, United Kingdom
2
The Alan Turing Institute, United Kingdom
3
Centro de Investigación y Docencia Económica (CIDE), Mexico

Abstract

We develop a bottom-up causal framework to study the impact of public spending on high-
dimensional and inter-dependent policy spaces in the context of socioeconomic and environmen-
tal development. Using data across 140 countries, we estimate the indicator-country-specific
development gaps that will remain open in 2030. We find large heterogeneity in development
gaps, and non-linear responses to changes in the total amount of government expenditure. Im-
portantly, our method identifies bounds to how much a gap can be reduced by 2030 through sheer
increments in public spending. We show that these structural bottlenecks cannot be addressed
through expenditure on the existing government programs, but require novel micro-policies in-
tended to affect behaviors, technologies, and organizational practices. One particular set of
bottlenecks that stands out relates to the environmental issues contained in the Sustainable
Development Goals 14 and 15.

1
1 Introduction

In recent years, a vast literature on the Sustainable Development Goals (SDGs) and the possibility

of reaching them by 2030 has emerged. Some of these studies analyze specific SDGs and explore

projections of indicators for different micro-policy interventions (e.g., González-Pier et al. (2016);

Porciello et al. (2020); Boeren (2019); Sobczak et al. (2021); Mensi and Udenigwe (2021)), while

other focus on identifying synergies and trade-offs between different SDGs (indicators or targets)

(e.g., Fuso Nerini et al. (2019); Lusseau and Mancini (2019); McGowan et al. (2019); Pedercini

et al. (2019); Asadikia et al. (2021). This latter approach provides a more holistic evaluation of

policy measures attempting to improve the performance of specific SDGs. A third variant of studies

explores how the nature of the relationships between SDGs has changed over time and how likely

it is that trade-offs can successfully transform into synergies in the coming years (e.g., Machingura

and Lally (2017); Fader et al. (2018); Kroll et al. (2019); Amos and Lydgate (2020); Philippidis

et al. (2020)). Finally, a fourth set of studies makes use of expert advice or indicator trends to

decipher the extent to which the SDGs might achieve the goals set for 2030 (e.g. Luken et al.

(2020); Moyer and Hedden (2020); Pradhan et al. (2021); Benedek et al. (2021); Ionescu et al.

(2020)).

Two major points stand out from this succinct overview: (1) that a systemic perspective–

emphasizing interactions among SDGs–is critical for policy evaluation; and (2) that a comprehen-

sive understanding (quantitative) of how budgetary allocations impact SDG performance is almost

entirely absent. This paper focuses on the latter point and tried to fill this knowledge gap by de-

veloping a modeling framework to study policy prioritization in the context of the SDGs. Akenroye

et al. (2018) mention the importance of addressing the problem of policy prioritization and of lever-

aging existing budget resources for meeting these goals. Such funding frameworks are necessary

to analyze pressing questions related to the effectiveness of public funding on existing government

programs, for example: Do changes in the size and distribution of the budget (on existing pro-

grams) help, effectively, to close development gaps? What are the most and least sensitive SDGs

to such budgetary rearrangements? Can the commitments to the 2030 Agenda be met when there

is enough government spending? To what extent do structural factors hinder the effectiveness of

existing programs? From the perspective of governments, understanding how their expenditure

2
actions translate, at a systemic level, into effective policies is critical to guarantee the success of

any international development agenda.

In this paper, we develop a bottom-up computational model in which public expenditure gen-

erates development advancement (with various degrees of effectiveness). A bottom-up approach

to budgetary prioritization is necessary to properly account for political-economy factors that are

present in a multidimensional and interdependent policy space (Guerrero and Castañeda, 2020a;

Castañeda et al., 2018). One of the analytic benefits of this agent-based model (ABM) is the

ability of calibration using coarse-grained data1 of individual countries without needing to pool

cross-national data.2 We exploit this feature to study the sensitivity of country-specific indicators

to changes in public expenditure.

We study the feasibility of the SDGs across 140 countries using data from the 2020 edition of

the Sustainable Development Report (SDR) (Sachs et al., 2020).3 Our three main results are the

following. First, we provide estimates of the SDG gaps that might remain by 2030 if government

programs were to be kept unaltered.4 Second, we demonstrate that the sensitivity of these gaps

vary–in diverse and non-linear ways across countries and indicators–according to the amount of per

capita government expenditure. Third, we identify the maximum reduction that can be achieved

for the SDG gaps by 2030 through sheer expenditure increments. That is to say, there are strin-

gent ‘budgetary frontiers’ that cannot be overcome without addressing long-term structural factors

(redesigning the government programs). Altogether, our results provide quantitative and theoret-

ically sound insights into what makes the SDGs unfeasible from the perspective of government

expenditure and existing development strategies.5


1
In the ABM literature, this type of indicator data are considered coarse-grained since they do not provide
disaggregated information about the individual behaviors of the agents; something typically needed to calibrate
ABMs (e.g., microdata or administrative records).
2
While some studies use non-pooled country-level data to describe the structure of trade-offs and synergies (see
Pradhan et al. (2017) and references), the capacity to produce quantitative prospective analysis may be limited
because proper statistical power can only be achieved–under traditional statistical tools–with a large number of
observations (which can only be obtained by pooling cross-country data).
3
The SDR is produced by the Sustainable Development Solutions Network and the Bertelsmann Stiftung.
4
A government program is the set of policies that a government has in place to affect a specific development issue.
Funding or defunding these programs is a short/medium-term decision, while redesigning them is a long-term one
(which needs to address structural factors). Our model focuses on the former–short/medium-term decisions–so it is
assumed that the specific policies in place remain unchanged.
5
Although the third type of literature mentioned above argues for the need for structural changes to achieve SDGs
and and to break away from trade-offs, the meaning of structural bottleneck remains broad and often ambiguous in
terms of policy instruments. Our paper sheds new light by introducing a more nuanced concept of bottlenecks, one
with a direct link to government programs that can be directly affected through budgetary readjustments.

3
We structure the remainder of the paper in five more sections and a an appendix. In Section 2,

we present the methods employed: model description, network estimation, calibration procedure,

goodness of fit, and definition of SDG gaps. In section 3, we describe the sources of our database,

which includes time series for development indicators and government expenditure, and explain

how we geographically cluster the information for producing visualizations. Then, in section 4,

we show different figures describing the main results from our simulations. Section 5 compares

alternative methodologies (data-fitting and aggregated models) with our bottom-up computational

approach in the context of systemic policy analysis and budgetary allocations. Finally, in section

6, we finalize the paper with a brief summary of the model’s purpose and assumptions, and with a

suggestion on how to use the simulation results for country-specific policy guidelines.

2 Methods

Essentially, the proposed model is designed to study how different budgetary allocations affect the–

simultaneous and interdependent–evolution of a large set of development indicators. The model

takes as inputs a vector with initial conditions for the indicators, a network with their interdepen-

dencies, a budget size, the fraction of positive changes in the indicators (as a measure of variation),

and the final values they achieved in the last period of the sample. With this information, the

parameters are calibrated to (1) match the simulated and empirical indicators in their final ob-

servations, and (2) match the fraction of positive changes. Due to the interdependent dynamics

produced by the model, calibrating its parameters is not trivial. Nonetheless, we devise an efficient

method that yields a goodness of fit above 90% for most countries. Our model is a variant of

Castañeda et al. (2018); Guerrero and Castañeda (2020a), with the improvement of accounting

for the size of a government’s budget. Similar models have been successfully applied to study

ex-ante policy evaluation (Castañeda and Guerrero, 2019a), policy resilience (Castañeda and Guer-

rero, 2018), policy coherence (Guerrero and Castañeda, 2020b), public governance (Guerrero and

Castañeda, 2021), and sub-national development (Guerrero et al., 2021). Some of them have also

been used to in the provision of policy advice (Castañeda and Guerrero, 2019b,d,c; Sulmont et al.,

2021; Gobierno del Estado de México, 2020). While the full details of the model are provided in

Appendix A, here we provide an explain the mechanisms that are most salient for this study and

4
elaborate on the new calibration procedure.

2.1 Model description

The model consists of an agent representing the government or central authority in charge of

deciding how to spend a budget of size B. There are N policy issues, each one with a level of

development measured by an indicator. From these policy issues, n ≤ N can be directly impacted

through existing government programs, and we assume that there is one program for each one of

them. We call these types of policy issues instrumental, while the remaining N − n are considered

collateral. An issue may be collateral because there does not exist a policy instrument to directly

intervene it, and this may occur because the issue is too aggregate (e.g., GDP growth).6

In addition to the government agent, there are n policymaking agents (functionaries), one in

charge of each instrumental indicator7 . Thus, the problem of the central authority is to allocate

B resources across n policymakers in order to improve the N indicators. Policymakers, however,

may have different goals from those of the central authority or may just be inefficient. Therefore,

some of the allocated resources might end up diverted or wasted. Let us denote the allocation to

instrumental policy issue i as Pi , and the amount of resources that the policymaker uses effectively

as Ci ; we say that the latter is the contribution of the policymaker.

In an iterative process, the government agent reallocates it resources, prioritizing the most

laggard8 and the most efficient policy issues. In parallel, the policymakers try to maximize their

benefit by determining a level of Ci that shows proficiency to the central authority (for political

reputation) but that also benefits them through the wasted resources Pi − Ci . The determination

of Ci happens through a behavioral model of reinforcement learning (which has extensive empirical

validation), subjected to the monitoring of the government and to the corresponding penalties in

case it spots inefficiencies.

The quality of the procurement mechanisms aimed at minimizing inefficiencies vary across

countries according to empirical data on public governance, which we use as an input. With each
6
Other reasons include lack of capacity (e.g., cybersecurity), advanced level of development (e.g., extreme poverty
in some advanced economies), or lack of awareness (e.g., pollution and over-exploitation of natural resources in several
poor countries).
7
The model is flexible to accommodate multiple agents per indicator or indicators per agent. This, however,
requires detailed contextual information that we leave for country-specific studies.
8
Prioritizing laggard issues has been a promoted practice since the Millennium Development Project under the
assumption that laggard indicators reveal potential bottlenecks.

5
step, the contributions and the total incoming spillovers Si (which could be positive or negative)

determine the success probability γi of the policy aimed at issue i. If the policy succeeds, the amount

of improvement reflected in the indicator is proportional to the existing long-term structural factors,

which we capture explicitly in a parameter αi . Altogether, the model runs for T periods that can

be mapped into calendar time. Parameter B corresponds to the empirical budget that a given

country spent during the sampling period. Thus, in the calibration, the budget runs out after T

simulation periods, reflecting different spending capabilities across countries, and enabling the test

of potential effects from budgetary increments and reductions. We perform Monte Carlo simulations

to generate stable measures of the indicators and other variables of interest. The reader should

be aware that the model is calibrated and implemented for each country independently, so this

approach overcomes concerns about biases from grouping countries or indicators.

In the interest of clarity and space, we summarize the model in algorithm 1 and Figure 1. In

this section, we focus on the two equations that drive the dynamics of the indicators, and provide

the details of the remaining equations in Appendix A. These equations connect the outcomes of

the behavioral components with the spillover effects shaped by the network of interdependencies,

and establish a clear differentiation between short/mid-term and long-term dynamics.

Algorithm 1: Model pseudocode


1 foreach period t do
2 foreach public servant i do
3 receive public funds Pi,t ;
4 evaluate the benefits from the previous contribution Ci,t−1 ;
5 establish new contribution level Ci,t ;
6 foreach indicator i do
7 if the indicator is instrumental then
8 implement public policy using the resources Ci,t ;
9 receive the incoming spillovers Si,t ;
10 determine the probability of success γi,t according to Ci,t and Si,t ;
11 if the public policy is successful (with probability γi,t ) then
12 improve the indicator according to the long-term structural factors αi ;

13 the government monitors the policymakers through imperfect mechanisms;


14 the government penalizes those who are found being inefficient;
15 the policymakers receive the benefit from their chosen contributions;
16 the government updates the allocation profile P1,t , . . . , Pn,t ;

6
Figure 1: Structure of the model

INTERVENTIONS POLITICAL ECONOMY OUTCOMES

MACRO LEVEL

Institutional reforms: Development-indicator


- Monitoring of corruption spillovers dynamics
- Strength of the rule of law conditional dependencies network (empirically observable)

collateral instrumental
nodes nodes

gaps &
Government actions: signals
Policy priorities
- Development goals
(empirically unobservable)
- Fiscal rigidity

MICRO LEVEL functionaries’


contributions

Structural reforms:
Sector-level inefficiencies
- Network interventions
central inefficiency public (empirically unobservable)
- Growth factor change
authority social norm servants

adaptation learning
(resource allocation) (increasing benefits)

Notes: The left panel shows examples of policy interventions that could be implemented by manipulating some
of the model exogenous variables. All the interventions take place at the micro-level and exert a direct impact on
budgetary decisions. The panel at the center shows that the model establishes linkages between the micro and the
macro. At the micro-level, the central authority allocates budgetary resources, while policymakers implement the
government programs. At the macro level, the network of interdependencies produces spillover effects that condition
the evolution of the development indicators. In the upward causation component (right-vertical arrow), functionaries
make an effective use of some of the resources that they receive from the central government. In the downward
causation (left-vertical arrow), the overall dynamic produces reductions in the development gaps of the 2030 Agenda.
This channel also transmits signals reflecting certain misuse of resources, which causes the government to penalize
inefficient functionaries and reallocate resources. Moreover, the three circling arrows in the middle of the bottom
panel describe a horizontal causation mechanism responsible for the social norms of inefficiency guiding functionaries’
behavior. Finally, the left panel presents some of the outcomes that can be obtained from the model: the evolution
of the indicators, policy priorities, (allocation profiles), and sectoral inefficiencies.
Sources: Subsection 3.1 of Guerrero and Castañeda (2020a).

Now, let us define the evolution of indicator i as

Ii,t+1 = Ii,t + αi ξ(γi,t ), (1)

where parameter αi > 0 captures long-term structural factors.9 Parameter αi imposes a limit to

the growth that could be achieved in the short-term through sheer spending. For instance, let
9
Note that, if the indicator exceeds its theoretical maximum (if provided by the user), the model will assign zero
growth.

7
ξ(γi,t ) in equation 1 denote the outcome of a Bernoulli trial that can take values 1 (successful) or

0 (unsuccessful). This means that, if a positive event materializes, the indicator grows according

to αi . As previously mentioned, the probability of a successful trial is γi,t . Note that γi,t is an

endogenous variable of the model, so we proceed to explain how it is formed.

Recall that the total budget size across periods is B. This stock can be turned into flows
PT
by defining a disbursement schedule B1 , . . . , BT , such that t Bt = B. For simplicity, let us

assume that the disbursement schedule is homogeneous, so Bt = B ∀ t. Next, consider the

allocation profile P1,t , . . . , Pn,t that the central authority defines in period t. Under the homogeneous

disbursement schedule assumption, ni Pi,t = B holds, so the contributions of the policymakers are
P

in the same units as the budget. In order to map Ci,t into the success probability γi,t , we define

1 P
Ci,t + n j Cj,t
γi,t = β , (2)
1 + e−Si,t

where β is a normalizing constant10 and Si,t is the total amount of spillovers received by indicator

i in period t (this could be positive or negative).11 The spillovers are computed every simulation
P
period according to Si,t = j 1j,t Aj,i , where 1 is an indicator function that returns 1 if indicator j

grew in the previous period and 0 otherwise. The adjacency matrix A corresponds to the empirical

network of interlinkages, with each entry representing a conditional dependence from indicator j

to i. Importantly, these conditional dependencies do not represent causal links, but rather an em-

pirical regularity that the model takes into account (see Ospina-Forero et al. (2020) for a detailed

discussion on estimating SDG networks and the impossibility of interpreting them as causal net-

works). While the structure of the network represented by A is considered a long-term feature, the

actual realization of the spillovers is a short/mid-term phenomenon because it is the result of the

dynamics of the other indicators in the previous period.

Equation 2 represents the short/mid-term component of the model, while parameter αi from

equation 1 captures the long-term factors limiting the impact of public expenditure on the indi-
10
Importantly, if expenditure data at the level of each indicator were available, it could be used as an input for Pi ,
in which case βi could be indicator-specific and more intuitive in terms of returns to expenditure in specific policy
issues. Hence, while we use aggregate expenditure data in this paper, the model is flexible to allow various types of
disaggregated data.
11
The term Ci,t accounts for the expenditure contribution to an instrumental policy issue. For P a collateral issue,
Ci,t equals zero, so its success depends on the overall ‘financial health’ of the government n1 j Cj,t , and on the
spillovers Si,t . Therefore, we assume that public funding is a necessary but not sufficient condition for development.

8
cators. For example, a government may increase the funds allocated to train quantum-computing

engineers with the aim of strengthening this strategic area. While the number of engineers in this

field may indeed increase due to the availability of scholarships, they may leave for another country

or end up in unrelated jobs due to a lack of employment opportunities in the domestic labor market.

A labor-market-related structural factor, the demand for quantum-computing engineers, limits the

speed with which this sector can develop; such speed will be reflected in modest improvements of

the relevant indicators. Naturally, a structural reform could be seen as a change in αi , but its

interpretation proves difficult due to the multiple variables that are absorbed in this parameter;

this is a challenge that we leave for future work. Nevertheless, αi is informative about the limits

of sheer spending at the level of each indicator, something lacking in all other approaches. For

this reason, the model is consistent with the idea of analyzing budgetary changes over existing

government programs.

2.2 Networks

As we have previously explained, the structure of the interdependencies between indicators is

assumed to be a long-term feature, so the networks are exogenous inputs. As such, adjacency

matrices can be built for each country by following any preferred criteria. A popular approach

among development scholars is the qualitative approach of eliciting expert opinions. Unfortunately,

this strategy is not scalable for a large set of countries and indicators (and is difficult to use

in the case governments have severe time constraints). Ospina-Forero et al. (2020) provide a

comprehensive review of quantitative methods that may be suitable for estimating SDG networks.12

With this information in hand, our method of choice is the Bayesian approach of Sparse Gaussian

Bayesian Networks developed by Aragam et al. (2019) (and known as sparsebn). This procedure

has the distinctive advantages of working well with high-dimensional datasets, even if they have

short series, and producing adjacency matrices that try to minimize the number of links that may

be false positives (hence the “sparse” term in the name).13

Recall that the resulting networks should not be interpreted as causal relations, but as con-

ditional probabilities, which means that a link A → B does not imply that ∆A guarantees ∆B.
12
For instance, correlation thresholding is one of the methods commonly used in the literature (e.g., Warchold et al.
(2021); Putra et al. (2020)).
13
For more details on the network and its estimation procedure see the Appendix D.

9
This is the reason why spillovers affect the probability of success γi , and not the magnitude of the

outcome. Of course, like with any statistical method, sparsebn makes certain assumptions such

as a linear Gaussian structural equation model and no temporal dependence between observations.

The former is a standard assumption in causal Bayesian models. Temporal dependencies can be

partially removed by computing first differences of the series. Overall, we consider that these as-

sumptions are more reasonable than those made by alternative methods, and further arguments

are provided by (Ospina-Forero et al., 2020). Finally, the networks are estimated for each country

individually, an important improvement over the existing literature on SDG synergies and trade-offs

which tends to use pooled data.14

2.3 Calibration

The aim of the calibration method is two-fold, to assure (1) that the simulated dynamics of the

indicators start and end at the empirical levels, and (2) that the model’s average success probability

corresponds to the empirical fraction of positive first differences of the indicators.15 To achieve this,

we need to find the parameter vector α1 , . . . , αN , β that minimizes an error measure.

There are two features that characterize this calibration problem. First, the dynamics of the

indicators are interdependent. This means that if αi changes, the ‘speed’ of another indicator j may

be altered as well. Furthermore, these interdependencies are not obvious enough so that the model

can be written as a system of equations to be simultaneously solved (as one may think by looking

at equation 1). For instance, the fact that γi is endogenous renders homogeneous Markov chains

ineffective. The second feature is the computational cost of each evaluation. Since each simulation

may yield a different trajectory for the same indicator, stable metrics have to be obtained from

Monte Carlo simulations. This means that evaluating a given set of parameters involves several

independent runs.16
14
Naturally, the network plays a role in the model, so different topologies may influence some of the model’s
variables. In fact, Castañeda et al. (2018); Guerrero and Castañeda (2020a) show that removing the spillovers
alters the incentive structure of the policymaking agents, resulting lower variation of inefficiency across policy issues.
Nevertheless, for the variables of interest of this study (the SDG gaps) we find that our results are robust to different
networks. Appendix I provides detailed evidence.
15
Appendix E discusses how to deal with indicators that show final values that are lower than their initial conditions.
16
Heuristic optimization algorithms that can handle dynamic landscapes, such as simulated annealing and particle
swarm fail, arguably due to the sensitivity of the landscape and to the cost of each evaluation. Evolutionary approaches
such as differential evolution have also been ineffective due to similar reasons. Finally, Bayesian methods, such as the
Tree-structured Parzen Estimator, which perform well with expensive-evaluation models, do not work in this context
due to the high dimensionality of the solution space (and the sensitivity of the fitness landscape).

10
We develop a multi-objective gradient descent method that exploits the fact that each parameter

can be associated to a specific error. Let us define an indicator-specific error as eαi = Ii,−1 − I¯i,T ,

where Ii,−1 is the final empirical value of indicator i, and I¯i,T is the average final simulated value

of the same indicator across M independent Monte Carlo simulations. The corresponding error for
P
i,t,m γi,t,m
β is eβ = Γ − M ×T ×N ,
where Γ is the fraction of positive first differences across all indicators.
P 
N
The calibration algorithm tries to minimize the average absolute error e = N 1+1 i |eαi | + |eβ | .

To minimize the error, first, we start with a proposed vector α1 , . . . , αN , β. Next, we perform a

set of M Monte Carlo simulations and compute the error vector eα1 , . . . , eαN , eβ . For each indicator

i, if eαi < 0 (meaning that the indicator grew too fast), then we multiply αi by a factor 1 − δαi .

If eαi > 0 (the indicator was too slow), then we multiply αi by 1 + δαi . The same logic applies to

β, which has a corresponding factor δβ . Ideally, we want that the mean error converges to zero as

we search the parameter space. We can generate such behavior by setting factors δα1 , . . . , δαN , δβ

that change in proportion to the errors. As it turns out, a factor that achieves this for indicator

i is δαi = |eαi |/(Ii,−1 − Ii,0 ), where Ii,0 and Ii,−1 are the empirical initial and final values of the

indicator, while δβ = |eβ | for β. Our simulations suggest zero-error convergence for a large enough

M .17 Thus, it can be run for several iterations until a certain threshold for the average error is

achieved. The calibration procedure for the model parameters is described in algorithm 2. As

the reader will notice, we bound the step factor (1 ± δ) by 1/2 or 3/2 as we have found that this

accelerates the convergence rate significantly.

2.3.1 Goodness of fit

For a single indicator i, the goodness of fit of its corresponding parameter αi is

eαi
GoFαi = 1 − , (3)
Ii,−1 − Ii,0

which takes values in the interval (−η, 1], where η is the lower bound induced by the theoretical

maximum of the indicator. If no theoretical maximum exists, then the lower bound is − inf.

The basic idea behind GoFαi is that, in a good fit, the error eαi should represent a small fraction

of the historical gap that needs to be closed in a simulation (Ii,−1 −Ii,0 ). Errors where the simulated
17
The resulting parameter vector is robust across different calibrations using random initial parameters.

11
Algorithm 2: Calibration pseudocode
1 initialize vector α1 , . . . , αN , β with random values;
2 for an error tolerance threshold do
3 run M Monte Carlo simulations;
4 compute the error vector eα1 , . . . , eαN , eβ ;
5 foreach indicator i do
6 if eαi < 0 then
7 update αi to αi × max(1 − |δαi |, 1/2);
8 else
9 update αi to αi × min(1 + |δαi |, 3/2);

10 if eβ < 0 then
11 update β to β × max(1 − |δβ |, 1/2);
12 else
13 update β to β × min(1 + |δβ |, 3/2);

average indicator ends below the empirical value are bound by Ii,0 because the model only allows

non-negative growth. However, an error where the simulated average indicator ends above the

empirical value may represent multiple times the size of the historical gap. Therefore, this metric

not only takes into account accuracy with respect to the final value, but it also penalizes extreme

errors with negative contributions when computing the mean goodness-of-fit across all indicators.

Importantly, when testing alternative calibration methods, several indicators display a negative

GoFαi . This is not the case for our algorithm.

The metric for the goodness of fit of parameter β follows the same logic, but the target feature

is the rate of positive first-differences. Formally, the goodness of fit of β is


GoFβ = 1 − , (4)
Γ

where Γ is the number of positive first differences in the empirical data as a fraction of all first

differences.

The overall goodness of fit for a country is the average

N
!
1 X
GoF = GoFαi + GoFβ . (5)
N +1
i

12
Figure 2 shows the distribution of the GoF after calibrating the model.18 More detailed results

on the goodness of fit are provided in Appendix F. Notice that, when performing this calibration

procedure, we obtain a remarkable goodness of fit at the country level. Furthermore, the large

majority of the parameters αi exhibit a fitting above 0.9, while this is always the case for β.

Figure 2: Distribution of goodness of fit metrics

(a) Country-level GoF (b) Indicator-level GoF (c) GoFβ

14
40 103
12

30 10
frequency

frequency

frequency
102
8
20 6
101
4
10
2
100
0 0
0.94 0.95 0.96 0.97 0.98 0.99 0.5 0.6 0.7 0.8 0.9 1.0 0.994 0.995 0.996 0.997 0.998 0.999 1.000
goodness of fit goodness of fit goodness of fit
Sources: Authors’ own calculations.

2.4 Definition of SDG gaps

The main estimates of the paper are the gaps or the distances between development goals and the

levels predicted for the indicators in 2030. If a prediction surpasses its goal, then we say that the

gap has been closed. Formally, an SDG gap is


Gi −I¯i,T
Gi ≥ I¯i,T

100 ×

Gi
gapi = , (6)
Gi < I¯i,T

0

where G1 , . . . , GN are the development goals obtained from the SDR, and I¯i,T is the expected value

of indicator i–across M independent Monte Carlo simulations–after T simulation periods that are

equivalent to 10 years. The underlying yearly budget for the 2021-30 period is assumed identical, in

per capita terms, to the (annual) average expenditure observed in the 21 years of data. We express

the gaps as a proportion of their goals and in percentage terms. Thus, we can read an SDG gap
18
The choice of the number of simulation periods T does not alter the results significantly because the calibration
of β compensates for a higher or lower frequency of the disbursement schedule. Appendix H provides evidence of
robustness under different disbursement schedules.

13
as: “by 2030, indicator i will still need to close x% of its goal ”.19

3 Data

There exist different databases from which one could obtain indicators classified into the SDGs, for

example, the SDG Indicators from United Nations Statistics Division (United Nations, 2020), the

World Bank Atlas of Sustainable Development Goals (World Bank, 2020), the OECD SDG distance

indicators (OECD, 2020), and the indicators compiled by the Bertelsmann Stiftung and Sustainable

Development Solutions Network to produce the Sustainable Development Report (SDR) (Sachs

et al., 2020). In this study, we use the SDR database for three main reasons.20 First, the SDR is

the only dataset that provides quantitative values for the goals to be achieved by each indicator.

Furthermore these goals are consistent across all the countries in the sample because the chosen

indicators are applicable to each nation. Since the aim of this paper is to assess the feasibility of

reaching the SDGs, having quantitative goals is necessary. Second, the SDR data have consistently

longer time series than alternative databases. This is helpful for the calibration of the model because

the estimation of the structural factors α1 , . . . , αN assumes that they capture long-term features of

the data. For a sub-sample of 140 countries, the SDR provides time series with a length of almost

21 years (from 2000 to 2020) in numerous indicators. Alternative datasets, while they contain more

indicators, fail to provide consistently long time series. Third, the majority of the data sources for

the SDR indicators are recognized international (and intergovernmental) organizations, while the

rest are scientifically sound products such as surveys from statistics bureaus, NGOs, and academic

institutions.

While the SDR team makes a substantial effort in gathering as much data as possible for each

country, there are countries that lack some of the indicators, or that have too few observations.

For this reason, different countries in our sample may have more or fewer indicators than others.

This is problematic for all studies that pool cross-national data, since decisions have to be made

regarding the imputation of missing observation, or the complete elimination of certain indicators.
19
Appendix C reports confidence intervals and provides a method to incorporate uncertainty about the quality of
the data into the intervals when information on the indicators’ errors is available.
20
A caveat of the chosen data is that they do not contain time series for SDG 12. The relevant indicators in SDG
12 relate to issues such as waste management, which have just recently been quantified in a handful of countries.
However, this is also an issue in most datasets.

14
Our approach overcomes this problem because we do not need to produce estimates on pooled

data. Thus, we allow each country to have its potentially unique set of indicators and perform

the estimations independently of other nations.21 This allows capturing as many policy dimensions

as possible for each country, which is consistent with the philosophy behind multidimensional

development. While having unbalanced panels is still not the ideal setup for ex-post cross-country

comparisons, we believe that this framework is still able to overcome some of the main hurdles

of data-fitting approaches. Appendix B provides detailed information on the 77 indicators of our

sample, and their distribution across countries.

For the purpose of visualizing some of our results, we may color or aggregate them into country

clusters. We should emphasize, this is only for visualization purposes. For these country clusters,

we use the following grouping scheme: Sub-Saharan Africa (Africa), Eastern Europe and Central

Asia (E. Europe & C. Asia), East and South Asia (East & South Asia), Latin America and the

Caribbean (LAC ), Middle East and North Africa (MENA), and Western Countries (West). Figure

3 provides a map of the countries covered in our sample.

For the national budgets, we use data on total government expenditure in current USD (which

can be accessed through data.worldbank.org/indicator/NE.CON.GOVT.KD). This information

is obtained from the dataset on General Government Final Consumption Expenditure which, in

turn, sources the data from the World Bank National Accounts Data and the OECD National

Accounts data files. We compute the total expenditure exercised by each country in the 2000-2020

period. Missing values are imputed with the average yearly expenditure, and the final amount

is transformed into per capita expenditure in order to remove population-size effects (we use the

population size reported by the SDR).

21
In Appendix D.3, we present a methodology for the imputation of missing observations that works very well when
indicators exhibit non-linear dynamics and the network is estimated with pooled country data. This is one of the
several methods available for the imputation of missing information in the SDGs. For instance, Gaussian processes
are reliable for non-linear dynamics when a database only includes time-series for one country (or region), while in
cross-section analyses heuristic approaches are more common (e.g., (Warchold et al., 2021)).

15
Figure 3: Countries and their clusters

Blue: Africa. Orange: E. Europe & C. Asia. Green: East & South Asia. Red: LAC. Purple: MENA. Brown: West.
Countries in gray were excluded from the sample due to lack of data.

4 Results

Because we only have aggregate yearly government expenditure for making worldwide comparisons,

we limit our analysis to three types of simulation exercises. Firstly, we study whether SDG gaps of

the 2030 United Nations Agenda can be closed assuming a benchmark scenario in which we project

the historical yearly average of public expenditure for the following ten years. Secondly, we analyze

the sensitivity of these gaps to different increments/decrements of the budget size. We also visualize

the response function of budgetary changes in terms of delays (or savings) in the number of years

to reach the 2030 levels obtained in the benchmark scenario. Thirdly, we study the magnitude of

structural bottlenecks that hamper the possibility of improving the indicators’ performances by just

increasing the allocated funds. These bottlenecks are made evident when, by construction, limited

and inefficient funding are ruled out in a counter-factual simulation. Although these exercises are

produced at the country level using all available indicators separately, for exposition reasons we

present several visualizations at the SDG or geographical cluster level in the main body of the

paper.

The reader should be aware that our methodology can deal with country-specific features such

16
as the following: the network of interdependencies between indicators; the historical context re-

flected in the database and considered for calibration purposes; the indicators’ initial conditions

for prospective analyses; the setting of the 2030 goals attending to the countries’ idiosyncrasies

and political systems. Country-specific estimations are key when using the model for providing

policy guidelines, however, technically this is not always possible with other methodologies. For

example, in regression analyses using aggregate data, information from different countries has to

be pooled to obtain enough degrees of freedom. The latter approach precludes the possibility of

making inferences for particular countries and, thus, the estimates have limitations in terms of

policy advice.

4.1 SDG gaps

We present our estimates of SDG gaps for 2030 at the level of each country in Figure 4. The bars

indicate average levels across indicators, and the colored dots correspond to the 10 indicators with

the largest estimated gaps. The latter exemplifies the gap disparities that exist within each country.

As expected, the advanced market economies of the West exhibit gaps that are substantially lower

that those estimated for the least developed countries (like those in Africa). However, there are

also relatively successful countries in other regions of the world, such as Cyprus (CYP) and Croatia

(HRV) in E. Europe & C. Asia; Japan (JPN), South Korea (KOR), and Singapore (SGP) in East

and South Asia; and the United Arab Emirates (ARE) in MENA. In contrast, the least successful

countries are Haiti (HTI) in LAC ; and the Central African Republic (CAF), Eritrea (ERI), and

Chad (TCD) in Africa.

At a more aggregate level, we can observe gap disparities across clusters and across SDGs.

For example, while no country in Africa has an average gap below 18%, all the countries in the

West have an average gap below 12%. The systematic persistence of certain dot colors (such as

orange, corresponding to SDG 9) suggest that, in some SDGs, it is more difficult to close the gaps.

Some of the most persistent SDGs across the dot markers are SDG 9–‘Industry, Innovation and

Infrastructure’–and SDG 7–‘Affordable and Clean Energy’. Such pattern is especially visible in

Africa.

Figure 5 provides a complementary visualization of the SDG gaps. Here, the gaps of the

indicators have been averaged across countries in the same cluster. These plots reveal that only

17
one indicator of SDG 7 is persistently close to the 100% gap in Africa, and that several indicators

of SDG 9 exhibit high gaps. Another feature revealed by this visualization is that most of the

environmental indicators in SDGs 14 and 15 present gaps above the cluster average (identified with

the solid black ring). The reader should be aware of the risks of aggregation, which are evident

when comparing the gaps estimated for the indicators in SDG 2 in Africa and West. Here, Africa

is expected to perform better than West in obesity, nitrogen emission, and human trophic levels

(which relates to dietary diversity). These problems are endemic to advanced market economies,

so our results are intuitive. However, if we were to aggregate these gaps for the whole SDG 2,

the result would suggest a similar performance between both clusters since the indicators related

to hunger and malnutrition show the opposite performance (so their gaps would cancel out each

other, at least approximately). Clearly, even with a multidimensional view of development, there

exist specific policy issues that perform in substantially different ways across countries, even if they

belong to the same dimension. This is one of the reasons why it is so important to move beyond

the common practice of pooling cross-national and SDG-level data, and to produce more granular

estimates that reflect the context and the spending capabilities of each country in each indicator.

18
Figure 4: Average SDG gaps for 2030 by country
Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
development gap estimated for 2030 as a percentage of its goal

Notes: The bars denote the average SDG gap for 2030 (across indicators) for each individual country. Bars are
colored according to the country clusters described in Figure 3. The dots correspond to the 10 indicators with the
largest estimated gaps. Each dot is colored according to the corresponding SDG of its indicator. We use the model
to estimate the indicators’ projections for 2030. For precise estimates and confidence intervals of each individual
indicator gap, see Appendix C.
Sources: Authors’ own calculations.

19
Figure 5: SDG gaps for 2030 aggregated by cluster and indicator

(a) Africa (b) E. Europe & C. Asia (c) East & South Asia
sdg17_govrev

sdg1_oecdpov

sdg1_oecdpov
sdg17_govrev

sdg17_govrev
sdg1_320pov
sdg17_govefe

sdg17_govefe

sdg17_govefe
sdg2 besity

sdg2_s besity

sdg2_s besity
sdg dg16_ _redli pta

sdg2_o d

sdg dg16_ _redli pta

sdg dg16_ _redli pta


sdg2_o d

sdg2_o d
sdg16_ _rsf

sdg16_ _rsf

sdg16_ _rsf
sdg2_crlyl

sdg2_crlyl

sdg2_crlyl
sdg g3_b wastin sh

sdg1 omicideain

sdg1 icide
sd g2_un hic

sdg g2_w dernsh

sdg g3_b wastin sh


sd g2_un hic

sd g2_un hic
sdg2 _snmi

sdg nmi

sdg nmi
16_

16_

16_
sdg16

sd g2_ dern

sdg16rison

sdg16rison

sd g2_ dern
sd _trop

sd 2_trop

sd 2_trop
3_f irth g
s 15 _c

3_f irth g

3_f irth g
s 15 _c

s 15 _c
hom detain st

6_p

6_p
h

hom detain st
sdg 3_b astin
sa
sdg dg15

sa

sa
sdg dg15

sdg dg15
ert s

ert s

ert s
x
icid

x
y

y
ilit
sd

d
s

ilit

ilit
sd sd

s
et st
es

sd g15 iv sd g15 iv eg sd g15 iv eg

s
sdg sdg sdg
14_ g14_ _cpf 3_h 14_ g14_ _cpf 3_h com 14_ g14_ _cpf 3_h com
f t a
sdg ishsto rawl sdg fee rt f t a
sdg ishsto rawl sdg g3_in lifee ort f t a
sdg ishsto rawl sdg g3_in lifee ort
sdg1 14_c cks 3_li tmo sdg1 14_c cks sd g3_ atm sdg1 14_c cks sd g3_ atm
4_cle pma sdg g3_ma s 4_cle pm d
s g3_m s 4_cle pm d
s g3_m s
sdg1 anwat sd 3_ncd sdg1 anwat a sd 3_ncd sdg1 anwat a sd 3_ncd
3
sdg13_ _co2pc sdg sdg13_ 3_co2pc sdg 3_smoke sdg13_ 3_co2pc sdg 3_smoke
co2i wb co2i sdg _swb co2i sdg _swb
sdg11_tra mport sdg3_s sdg11_tra mport sdg3 sdg11_tra mport sdg3
nsport sdg3_tb nsport sdg3_tb nsport sdg3_tb
sdg3_traffic sdg3_traffic sdg11_rentover sdg3_traffic
sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort
sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc
sdg3_vac sdg3_vac sdg3_vac
sdg10_palmar sdg4_ear sdg10_palmar sdg4_ear
4 de de
sdg10_eldres 4 sdg4 lyedu sdg10_eldres 4 sdg4 lyedu
x 6 sdg sdg9_r _rdex 6 sdg _pisa sdg9_r _rdex 6 sdg _pisa
_rde sdg4 4_prima sd 4_prim sd 4_prim
sdg9 12 _sec ry sdg9 tents 12 sdg g4_seco ary sdg9 tents 12 sdg g4_seco ary
ond _pa _pa acc sd 4_s
se sdg9 se sd 4_s
sdg g4_te ocioe
nd sdg9 9_net use sdg g4_te ocioe
nd
obu lpi 30 sdg obu lpi 30 5_e rtia c sdg _mob 9_lpi 30 5_e rtia c
9_m dg9_ 5_e 9_m g9_ da ry 9 g da ry
sdg da sdg sd sdg sd
rtic se

sdg dg5_lf parl

sdg g8_yn icles e

sdg dg5_lf parl p


sdg g8_yn icles e

sdg 5_lf arl


s

sd _art tus
sd _art tus

t t t
9_a intu
les

s g5_

s 5_ yga
sdg g5_p aygap
5_f pr

5_f pr
5_f pr

9 in
9 in

100
sd

sdg8 _unemeet

sdg g5_pa san


sdg8 _unemeet

sd g5_p san
sdg dg9_

sdg dg9_
sdg dg9_

100 100
am

am
am
_imp p

_imp p

sd 6_safe at
_imp p

sd 6_safe
sdg8 _unem
acc

acc

sdg 6_safew
acc

sdg
ilyp

ilyp
ilyp
s

s
s

sdg anita
sdg6_s carcew

sdg6_s carcew
s

sdg7_rens
sdg6_s carcew
sdg7_rens
sdg6_s

sdg6_s
sdg6_s

l
l
ccount

ccount
ccount
sdg7_elecac

sdg6_water

sdg7_elecac

sdg6_water
sdg7_elecac

sdg6_water

sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel

sdg7_co2twh
sdg7_cleanfuel
8

8
8
sdg

anita

anita
sdg8_a

sdg8_a
sdg8_a

(d) LAC (e) MENA (f) West


sdg1_oecdpov

sdg17_govrev

sdg1_oecdpov
sdg17_govrev

sdg1_320pov
sdg17_gove
sdg17_govefe

sdg17_govefe
sdg2 besity
sdg2_s besity

sdg2_s besity
sdg dg16_ _redli pta

sdg2_o d
sdg dg16_ _redli pta

sdg dg16 _redli pta


sdg2_o d

sdg2_o d
sdg16_ _rsf
sdg16_ _rsf

sdg2_crlyl

sdg16_ _rsf
sdg2_crlyl

sdg2_crlyl
sdg g3_b wastin sh
sdg1 icide

sdg g2_w dernsh

sd g2_un hic

sdg1 icide

sdg g3_b wastin sh


sd g2_un hic

sd g2_u hic
sdg2 _snmi
sdg nmi

sdg nmi
16_
16_

16_ _de st
sdg16

sd g2_ dern
sdg16rison

sdg16rison

sd g2_ ndern
sd _trop
sd 2_trop

sd 2_trop
3_f irth g
s 15 _c
3_f irth g

3_f irth g
s 15 _c

s 15 _c
hom etain st
6_p

6_p
hom detain st

hom tain
sdg 3_b astin

safe
sdg dg15
sa

sa
sdg dg15

sdg dg15
ert s
ert s

ert s
x
icid
x

x
y
y

y
d

ilit

sd
s
ilit

sd

ilit
sd
s

s
es

sdg sdg g15_ iv eg sdg sdg g15_ v sdg sdg g15_ iv eg


s

s
i
14_ 14_ cpf 3_h com 14_ 14_ cpf 3_h 14_ 14_ cpf 3_h com
f
sd ishs raw
t a sdg g3_in lifee ort f
sd ishs raw
t a sdg fee rt f
sd ishs raw
t a sdg g3_in lifee ort
sdg1 g14_c tocks l sd g3_ atm sdg1 g14_c tocks l 3_li tmo sdg1 g14_c tocks l sd g3_ atm
4_cle pm sd g3_m s 4_cle pma sdg g3_ma s 4_c pm sd g3_m s
sdg1 anwat a sd 3_ncd sdg1 anwat sd 3_ncd sdg1 leanwat a sd 3_ncd
sdg 3_smoke sdg sdg 3_smoke
sdg13_ 3_co2pc sdg13_ 3_co2pc sdg13_ 3_co2pc
co2i sdg _swb co2i wb co2i sdg _swb
sdg11_tra mport sdg3 sdg11_tra mport sdg3_s sdg11_tra mport sdg3
nsport sdg3_tb nsport sdg3_tb nsport sdg3_tb
sdg11_rentover sdg3_traffic sdg3_traffic sdg11_rentover sdg3_traffic
sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort sdg11_pm25 sdg3_u5mort
sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc sdg11_pipedwat sdg3_uhc
sdg3_vac sdg3_vac sdg3_vac
sdg10_palmar sdg4_ear sdg10_palmar sdg4_ear
de 4 de
sdg10_eldres 4 sdg4 lyedu sdg10_eldres 4 sdg4 lyedu
sdg9_r _rdex 6 sdg _pisa x 6 sdg sdg9_r _rdex 6 sdg _pisa
sd 4_prim _rde sdg4 4_prima sd 4_prim
sdg9 tents 12 sdg g4_seco ary sdg9 12 _sec ry sdg9 tents sdg g4_seco ary
_pa acc ond _pa acc 12 4
sdg9 9_net use sd _s 4
sdg g4_te ocioe
nd se sdg9 9_net use i sd _s
sdg g4_te ocioe
nd
sdg _mob 9_lpi 30 5_e rtia c obu lpi 30 sdg
sdg _mob 9_lp 30 5_e rtia c
9 g da ry 9_m dg9_ 5_e 9 g da ry
sdg sd sdg d sdg sd
rtic se

sdg dg5_lf parl

sdg _artic tuse

sdg dg5_lf parl p


sdg _artic tuse

sdg dg5_lf parl p

t s at t
9_a intu
les

s g5_

8_y les

s g5_ ayga
sdg _yn les

s g5_ ayga

5_f pr

5_f pr
5_f pr

9 in
9 in

100
sd

sd 5_p san
sdg8 _unemeet

sd 5_p san

sdg dg9_

sdg dg9_

nee
sdg dg9_

100
am

am
am

sdg 6_safe at
_imp p
_imp p

sdg 6_safe at

100
sdg8 _unem

sdg8 impacc

sdg 6_safew
acc
acc

sdg 6_safew

ilyp

ilyp
ilyp

sdg8_a _empop

sdg
s

sdg _sanita

s
sdg6_s carcew
s
sdg6
sdg7_rens
8

sdg7_ren

sdg6_sca
sdg6_s

l
sdg6_sca

ccount
ccount
ccount

sdg7_elecac
sdg7_elecac

sdg6_water
sdg7_elecac

sdg6_water

sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel
sdg7_co2twh
sdg7_cleanfuel

8
8

_
sdg

sdg8
anita
sdg8_a
sdg8_a

rcew
rcew

Notes: We use the model to estimate the indicators’ projections for 2030. The height of each bar represents the
average gap between the SDG and the indicator level predicted by 2030 computed across countries in the cluster.
Empty spaces between bars indicate that no data was available for the corresponding indicator in any country from
the cluster. The solid black ring corresponds to the average gap across across countries (in the cluster) and indicators.
The dashed red ring indicates the largest average gap (between indicators in the cluster). The black lines at the top
of each bar denote the ± one standard error of the mean gaps across the countries of a cluster. For precise estimates
and confidence intervals of each individual indicator gap, see Appendix C.
Sources: Authors’ own calculations.

20
4.2 Sensitivity to changes in the budget size

A key issue to be addressed when studying the feasibility of SDGs is the impact that budgetary

changes have on the evolution of social, economic, and environmental indicators. In the context of

this paper, we are interested in understanding how sensitive are the different SDG gaps to changes in

public expenditure. Thus, we estimate the country-specific sensitivity of each indicator to changes in

the overall size of the budget during the 2020-30 period. Our dataset suggests substantial variation

in the growth of public spending between the 2000-10 and the 2010-20 decades (an average of 47%).

Thus, our estimates consider prospective simulations with positive and negative changes of up to

50% with respect to the historical expenditure levels reflected in the data.22 We measure sensitivity

by calculating the difference between gaps from a benchmark scenario that maintains the historical

expenditure levels (the average yearly expenditure from the data, projected over 10 years) and a

scenario that considers changes in the size of the budget.

Figure 6 presents a highly disaggregated picture of the different sensitivities when the budget is

increased by 50%. Larger markers denote more sensitivity, while the gray lines indicate the absence

of an indicator in a particular country. As a reference point, the largest marker corresponds to

a reduction of 13% in an SDG gap. From this visualization we can highlight several important

results. First, there is substantial heterogeneity across countries–positioned in the vertical axis–

and indicators–positioned in the horizontal axis–with respect to the magnitude of gap reductions.

Second, the most notorious impacts are not randomly scattered but rather concentrated in specific

SDGs (compare columns of different colors) and indicators. For instance, two gaps in SDG 9

(‘Logistics performance index’ and ‘Mobile broadband subscriptions’) have notable gap reductions

in most of the countries where data is available. Third, the gaps of economic indicators in SDG

8 are not particularly sensitive to a 50% increase in the budget size, especially when compared

with those of SDG 9 (see the size of brown markers versus that of orange markers). Fourth, with

the exception of some African cases, the gaps in SDGs 13, 14, and 15 (the environmental ones)

rarely exhibit substantial improvements. Fifth, excluding a few country-indicator cases, the SDG

16 gaps do not seem responsive to a 50% increase in the budget. In section 4.3, we show that these

diverse sensitivities are the result of long-term structural factors that impose a constraint to the
22
An expenditure growth scenario for the next 10 years may be hindered thanks to the COVID-19 global pandemic.

21
effectiveness of public expenditure in government programs. Before elaborating on these structural

factors, we provide further sensitivity results related to reductions to the budget size, and to an

alternative sensitivity metric.

Figure 6: SDG gap shrinkage due to a 50% increment in per capita expenditure
AGO MYS
BDI NPL
BEN PAK
BFA PHL
BWA SGP
CAF THA
CIV VNM
CMR ARG
COG BOL
ERI BRA
GAB CHL
GHA COL
GIN CRI
GMB DOM
GNB ECU
KEN GTM
LBR HND
LSO HTI
MDG JAM
MLI MEX
MOZ NIC
MRT PAN
MUS PER
MWI PRY
NAM SLV
NER URY
NGA VEN
RWA ARE
SDN BHR
SEN DZA
SLE EGY
SWZ IRN
TCD IRQ
TGO JOR
TZA KWT
UGA LBN
ZAF MAR
ZMB OMN
ZWE QAT
AFG SAU
ALB TUN
ARM AUS
AZE AUT
BGR BEL
BIH CAN
BLR CHE
CYP CZE
GEO DEU
HRV DNK
KAZ ESP
KGZ EST
MDA FIN
MKD FRA
RUS GBR
TJK GRC
TKM HUN
TUR IRL
UKR ISR
UZB ITA
BGD LTU
CHN LVA
IDN NLD
IND NOR
JPN NZL
KHM POL
KOR PRT
LAO SVK
LKA SVN
MMR SWE
MNG USA
sdg3_smoke

sdg9_rdex
sdg9_rdres

sdg3_smoke

sdg9_rdex
sdg9_rdres
sdg2_trophic
sdg2_undernsh

sdg6_scarcew

sdg7_ren

sdg11_rentover

sdg15_redlist

sdg17_govex
sdg17_govrev

sdg2_trophic
sdg2_undernsh

sdg6_scarcew

sdg7_ren

sdg11_rentover

sdg15_redlist

sdg17_govex
sdg17_govrev
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc

sdg4_second
sdg4_socioec

sdg8_accounts
sdg8_empop
sdg8_impacc
sdg8_unemp

sdg14_cpma

sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc

sdg4_second
sdg4_socioec

sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc

sdg14_cpma

sdg16_safe
sdg3_fertility
sdg3_lifee

sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg7_co2twh
sdg7_cleanfuel
sdg7_elecac

sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe

sdg3_fertility
sdg3_lifee

sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water

sdg8_unemp
sdg7_cleanfuel
sdg7_elecac

sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
Notes: The size of the markers is proportional to the reduction of the SDG gap caused by an increase in government
spending. The biggest marker corresponds to the largest reduction in the sample. The gray lines indicate the absence
of an indicator in a particular country.
Sources: Authors’ own calculations.

Figure 7 presents sensitivity results for a 50% reduction in budget sizes. In this case, the sensi-

tivity outcomes mean that the SDG gaps widen. As a reference point, a larger marker corresponds

to a gap augmentation of nearly 20% with respect to the benchmark case. This implies that, in gen-

eral, SDG gaps are more sensitive to a 50% reduction than to an increment of the same proportion

in the budget size. This sensitivity asymmetry becomes evident when contrasting the outcomes of

SDG 8–in ‘Adults with an account at a bank or other financial institution’–presented in Figures 7

and 6. A similar asymmetric pattern can be found in environmental indicators from SDGs 14 and

15.

To have a better understanding of the asymmetric sensitivity between a 50% increment and

reduction in the budget, we would like to revisit three assumptions about our modeling approach.

First, we aim to model short-term dynamics and, hence, long-term structural factors are given

22
Figure 7: SDG gap growth due to a 50% reduction in per capita expenditure
AGO MYS
BDI NPL
BEN PAK
BFA PHL
BWA SGP
CAF THA
CIV VNM
CMR ARG
COG BOL
ERI BRA
GAB CHL
GHA COL
GIN CRI
GMB DOM
GNB ECU
KEN GTM
LBR HND
LSO HTI
MDG JAM
MLI MEX
MOZ NIC
MRT PAN
MUS PER
MWI PRY
NAM SLV
NER URY
NGA VEN
RWA ARE
SDN BHR
SEN DZA
SLE EGY
SWZ IRN
TCD IRQ
TGO JOR
TZA KWT
UGA LBN
ZAF MAR
ZMB OMN
ZWE QAT
AFG SAU
ALB TUN
ARM AUS
AZE AUT
BGR BEL
BIH CAN
BLR CHE
CYP CZE
GEO DEU
HRV DNK
KAZ ESP
KGZ EST
MDA FIN
MKD FRA
RUS GBR
TJK GRC
TKM HUN
TUR IRL
UKR ISR
UZB ITA
BGD LTU
CHN LVA
IDN NLD
IND NOR
JPN NZL
KHM POL
KOR PRT
LAO SVK
LKA SVN
MMR SWE
MNG USA
sdg3_smoke

sdg9_rdex
sdg9_rdres

sdg3_smoke

sdg9_rdex
sdg9_rdres
sdg2_trophic
sdg2_undernsh

sdg6_scarcew

sdg7_ren

sdg11_rentover

sdg15_redlist

sdg17_govex
sdg17_govrev

sdg2_trophic
sdg2_undernsh

sdg6_scarcew

sdg7_ren

sdg11_rentover

sdg15_redlist

sdg17_govex
sdg17_govrev
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc

sdg9_mobuse
sdg5_edat

sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc

sdg9_intuse
sdg9_lpi
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma

sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc

sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg8_unemp

sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma
sdg3_fertility
sdg3_lifee

sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water

sdg8_unemp
sdg7_cleanfuel
sdg7_elecac

sdg8_yneet
sdg9_articles

sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe

sdg3_fertility
sdg3_lifee

sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg7_cleanfuel
sdg7_elecac

sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma

sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
Notes: The size of the markers is proportional to the increase of the SDG gap caused by a reduction in government
spending. The biggest marker corresponds to the largest reduction in the sample. The gray lines indicate the absence
of an indicator in a particular country.
Sources: Authors’ own calculations.

through the exogenous parameters αi that are specific to each indicator and country. Second, the

impact of the public funds devoted to the different government programs is viewed in the context

of short/mid-term effects. This is so because we model a probability γi,t representing the chance

of indicator i to improve in the subsequent period t + 1. These two aspects are combined into the

evolution equation 1. While more public spending increases γi,t , the long-term structural factors αi

limit the speed of such growth. Therefore, government expenditure only affects γi,t , not αi . Third,

public spending is a necessary condition for development. Thus, from looking at the evolution

equation we can tell that, if less expenditure brings γt,i close to zero, then the growth trials are

almost always unsuccessful, so the indicator dynamics stagnate.

From the side of budgetary increments, there seems to be a limit to how much some gaps can be

reduced in a given period while, on the side of reductions, no improvements can be expected in the

absence of public funds. Furthermore, given the law of motion of the indicators, and other micro-

foundations of the model, the response to expenditure changes may vary in non-linear ways. Thus,

to provide a full picture of these non-linear response functions, we measure sensitivity in terms

23
of the number of years that it would take to achieve the SDGs (for each indicator and country),

and compute them for 1% variations (positive and negative) in the budget size. We present the

results aggregated into SDGs and clusters, but provide country-level plots in http://github.com/

oguerrer/sdg_feasibility. In Figure 8, we present the aggregate response functions in the range

of budgetary changes between -50% to +50% (with marginal changes of 1%).23 We calculate the

response functions using the difference in the number of years it takes for an adjusted budget

to reach the levels of the indicators obtained in 2030 with the benchmark scenario. In the latter

calculation, the historical annual expenditure average is projected forward throughout the following

decades. If the budgetary changes produce additional years, there is a delay, while if there are some

saved years, there delay displays in a negative scale.

Due to the aforementioned problems of aggregating indicators, the results presented in 8 should

be considered as qualitative evidence of the non-linear responses to changes in public expenditure.24

First, note that every SDG shows certain level of sensitivity to both positive and negative bud-

getary changes. Second, confirming our previous findings, the sensitivity to positive and negative

budgetary changes are systematically asymmetrical in terms of the responses’ magnitudes. Third,

the sensitivity rankings across indicators vary between clusters and depending on the magnitude

and direction of the budgetary change. For example, for countries in West, SDG 13 is the most

sensitive to budgetary reductions, but the same ranking position is not observed in other clus-

ters. Nevertheless, it is important to emphasize that the SDG 13 systematically exhibits important

delays in all clusters.

23
The curves in Figure 8 are composed of indicators that were able to converge during a set of Monte Carlo
simulations. Thus, because there are selection biases due to the exclusion of non-converging indicators, these curves
should be considered a qualitative result about the non-linear nature of development outcomes to budgetary changes.
24
We provide the data of the country-indicator specific responses in http://github.com/oguerrer/sdg_
feasibility.

24
Figure 8: Changes in convergence time as a function of the budget size

(a) Africa (b) E. Europe & C. Asia (c) East & South Asia
60
1 5 9 15 1 5 9 15 2 6 11 15
2 6 11 16 60 2 6 11 16 60 3 7 13 16
average years of delay

average years of delay

average years of delay


40 3 7 13 17 3 7 13 17 4 8 14 17
4 8 14 40 4 8 14 40 5 9
20
20 20
0
0 0
20 20 20

40 40 40
40 20 0 20 40 40 20 0 20 40 40 20 0 20 40
percentage change in per capita expenditure percentage change in per capita expenditure percentage change in per capita expenditure

(d) LAC (e) MENA (f) West

1 5 9 14 50 1 5 9 15 1 5 9 14
60 2 6 10 15 2 6 11 16 60 2 6 10 15
average years of delay

average years of delay

average years of delay


3 7 11 16 40 3 7 13 17 3 7 11 16
4 8 13 17 30 4 8 14 4 8 13 17
40 40
20
20 20
10
0 0 0
10
20 20
20

40 20 0 20 40 40 20 0 20 40 40 20 0 20 40
percentage change in per capita expenditure percentage change in per capita expenditure percentage change in per capita expenditure

Notes: These response functions are calculated for each cluster (panel) and SDG (colored lines) averaging across
indicators. The horizontal axis denotes the increment or reduction of the annual budget during the decades following
2020. A positive value in the vertical axis indicates the number of additional years that it would take to reach the
levels originally projected for 2030 (hence, when the budget change is zero, all the lines collapse at zero in the y-axis).
A negative value in the y-axis translates into years saved to reach the 2030 levels. The reference 2030 levels are
determined in the baseline scenario used for Section 4.1.
Sources: Authors’ own calculations.

4.3 Budgetary frontiers and structural bottlenecks

Now that we have established non-linear responses of the SDG gaps to public spending, we elaborate

on their structural origins. Let us open our argument by stating the obvious: that every government

is constrained by time and resources. Thus, in a short/mid-term scenario, time is critical in order to

achieve a set of goals. While a particular policy may succeed in improving an issue, the amount of

improvement is constrained by factors such as infrastructure, organizational practices, individuals’

incentives, and technology that can only be modified through changes to the existing government

programs; changes that take place in a longer time span. Thus, in the scope of existing government

programs, these factors are considered exogenous, and we capture them through parameter αi . It

follows that the success in reaching development goals is partly determined by how much αi allows

25
an indicator to improve during a set amount of time. Not knowing these limits to success could lead

to ineffective policy priorities and bad planning in terms of long versus short/mid-term policies.

To unearth the limits imposed by structural factors, it is useful to think about the following

hypothetical question: How much, in the years left to reach the SDGs, can the SDG gaps be closed

if public funding was unlimited and fully efficient?. This theoretical scenario removes the resource

constraints from the equation, and leaves us with the interaction between structural factors and

time. Therefore, by estimating the SDG gaps under this hypothetical setting, it is possible to

establish bounds to how small an SDG gap can become in by 2030. To achieve this, we only

need to assume ξ(γi ) = 1 in equation 1. When ξ(γi,t ) = 1 for every indicator, we say that the

country operates at the ‘budgetary frontier’. Thus, the SDG gaps obtained at this frontier describe

the limitations of increasing expenditure in the current government programs. In other words, if

an SDG gap remains open at the budgetary frontier, it means that–regardless of how much public

expenditure increases–the strategy will be unsuccessful if the long-term structural factors (i.e. their

bottlenecks) in that policy issue are not addressed.

Figure 9 presents the average SDG gaps at the budgetary frontiers of the different countries in

the sample. Panel (a) aggregates the gaps across indicators with each country. Note that none of

the average gaps closes entirely, even in the most advanced nations. As expected, these gaps are

wider in Africa, reaching 39% in the Central African Republic (CAF). This diagram illustrates the

relevance that local features have in the wide disparities observed across countries’ performances.

The estimated SDG gaps at the budgetary frontiers show countries exhibiting structural long-terms

hindrances of different magnitudes, even if they belong to the same cluster. Although the model

cannot distinguish the specific reasons behind these discrepancies, it makes sense to argue that

their causes lie in bottlenecks of a local nature.

The right panel in Figure 9 shows the average gaps at the budgetary frontier, aggregated into

SDGs within each cluster. The fact that SDG 13 presents a near-null gap in countries from Africa

and LAC indicates that environmental issues related to climate action could be improved, on

average, by properly funding existing government programs in those regions. However, this is not

the case for other environmental SDGs. For instance, in SDGs 14 and 15, the frontier gaps vary

between 27 and 42%25 .


25
SDG 15 for West is an outlier with a gap of 15%.

26
Figure 9: Budgetary frontiers

(a) Country level (b) SDG-cluster level

TUN Africa 13 Africa


DZA E. Europe & C. Asia E. Europe & C. Asia
East & South Asia 10 East & South Asia
BHR
BOL
LAC 8 LAC
MENA 6 MENA
PRY
West West
PAN 4
PER 3
MYS 3 16
ITA THA SAU
1 11
LVA AUS VNM OMN
FIN CAN NPL JOR
8 5
DEU GRC MAR UZB EGY KWT LBN 3 1
HUN ISR QAT UKR SLV JAM PHL 1 13 15
NLD LTU ESP IRN GEO NIC VEN BGD IND 13 6 9
CZE GBR POL COL AZE HND MEX TKM MWI 6 10 8 2
SVN NZL USA URY ARM MNG LKA SWZ CMR PAK 4 7 4 16
BEL PRT ECU CRI MDA IDN KHM KEN MRT AFG
10 13 6 5 17 16 5 2 15
FRA SVK SGP DOM KAZ RUS LAO CIV IRQ BEN TGO GNB
EST ARE KOR BRA ALB TJK TUR ZWE GTM NGA SLE SDN
7 7 8 4 7 10 5 16 9 15 17
DNK SWE CYP JPN CHL KGZ MUS GHA GAB MMR MOZ UGA AGO HTI 13 6 3 1 11 7 2 2 11 15 17 15 15 9
CHE IRL HRV BLR ARG BIH BWA SEN LSO ZMB GMB GIN NER MDG TCD 13 3 8 10 16 2 16 11 5 11 14 9 14 14 14 17
AUT NOR BGR MKD CHN ZAF NAM RWA COG BFA TZA MLI BDI LBR ERI CAF 10 1 4 8 3 11 4 5 6 2 17 14 14 9 7 1 17 9
5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
SDG gap at the budgetary frontier (%) SDG gap at the budgetary frontier (%)

Notes: A country operating at the budgetary frontier has a ξ(γi,t ) = 1 for every indicator i and every period t (see
equation 2) in the methods section). At the budgetary frontier, the only frictions slowing down the indicators’ growth
are in the structural parameter αi . Panel (a): budgetary frontiers calculated by averaging gaps across indicators for
each individual country. Panel (b): budgetary frontiers calculated by averaging gaps across countries and indicators
at the level of SDG for each cluster. The average gaps have been discretized in order to produce the visualizations.
Sources: Authors’ own calculations.

Finally, a cautious reader may consider that public spending should have structural conse-

quences, so the exogenous factors αi could also be affected in the short term. While this reasoning

is, in principle, correct, the empirical evidence suggests that this process is rather weak. For in-

stance, if the structural factors contained in αi were to change substantially in the short term,

then the SDG gaps estimated from simulations using more recent data samples should significantly

differ. To demonstrate that this is not the case, we calibrate and perform the same analysis as in

section 4.1 but, instead of using the full 21-year dataset (with 2000-2020 coverage) to calibrate the

model, we employ a 10-year (2011-2020) and a 5-year (2016-2020) sample.26

Figure 10 shows that our original estimates are robust to these alternative samples, as the six

clusters show relatively small differences in their average gaps (see Appendix I for more disaggre-
26
This involves re-estimating the network, the structural parameters, and the gaps.

27
Figure 10: Robustness to different sampling lengths
Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO 15 17 AFG 23 13 ARG 11 35 AUS 7 45
BDI 9 18 ALB 13 28 BOL 19 18
BEN 41 25 ARM 7 24 AUT 10 14
BFA 11 22 BRA 6 9
BEL 10 10
AZE 18 19
CHL 9 22
BWA 23 21 BGR 8 19 CAN 5 16
CAF 9 20 BIH 18 17 COL 8 18
CIV 17 21 BLR 19 20 CRI 11 15 CHE 11 11
CMR 12 21 CYP 8 20 DOM 21 21 CZE 11 20
COG 22 20 GEO 28 28 ECU 12 10 DEU 10 8
ERI 8 18 HRV 9 24 GTM 17 18
GAB 37 34 DNK 8 11
KAZ 13 16 HND 16 36
GHA 30 23 KGZ 12 30 HTI 16 30 ESP 10 19
GIN 21 44 MDA 8 16 EST 12 20
GMB 8 26 JAM 8 27
MKD 11 13
MEX 34 15 FIN 12 7
GNB 11 28
RUS 16 16
KEN 12 23
TJK 8 30 NIC 11 31 FRA 17 18
LBR 20 20
TKM 22 9 PAN 15 18 GBR 12 25
LSO 9 9
TUR 11 15 PER 21 20
MDG 7 30 GRC 21 13
MLI 26 26 UKR 40 30 PRY 15 14
HUN 17 25
MOZ 18 14 UZB 9 10 SLV 21 21
BGD 10 17 URY 20 20 IRL 9 19
MRT 16 48
CHN 24 36
MUS 16 53 VEN 19 53 ISR 13 14
MWI 5 26 IDN 8 16
ARE 23 38 ITA 8 12
NAM 37 13 IND 14 28
BHR 13 18
JPN 5 5 LTU 26 19
NER 18 22 DZA 17 55
NGA 24 24 KHM 15 18 LVA 14 18
RWA 15 12 KOR 7 10 EGY 17 22
NLD 16 12
SDN 17 22 LAO 22 18 IRN 13 9
LKA 11 9 IRQ 13 31 NOR 14 18
SEN 13 10
SLE 9 16 MMR 31 9 JOR 9 11 NZL 15 5
SWZ 14 51 MNG 10 23 KWT 10 12 POL 35 16
TCD 9 24 MYS 14 20 LBN 9 10 PRT 6 10
TGO 16 28 NPL 12 19
MAR 12 21
TZA 17 15 PAK 13 42 SVK 12 14
UGA 10 14 PHL 8 9 OMN 24 9
SVN 11 15
ZAF 11 34 SGP 6 10 QAT 17 22
SAU 10 16 SWE 6 8
ZMB 11 14 THA 4 19
ZWE 11 32 VNM 9 24 TUN 19 12 USA 10 17

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
average absolute difference in terms of SDG gap (%)

Notes: The bars indicate the average absolute difference in estimated gaps (in percentage) between the benchmark
case–using 21 years of data–and one where the model was calibrated with shorter time series. The dark bars are
calculated using the model calibrated with 10-year time series. The light bars are computed using the model calibrated
with with 5-year time series. The solid squares on the right of each panel denote the color of the SDG to which the
most sensitive indicator belongs in the case of differences using 10-year time series. The hollow ones correspond to
5-year time series. For a more disaggregated presentation of these results see Appendix I.
Sources: Authors’ own calculations.

gated yet robust results).27 For calculating these differences, we compare the average SDG gap for

each country produced in the benchmark simulations–using 21 years of data–and the gap estimated

with smaller time series of historical data (either 10 or 5 years). Notice also that the closer the

size of these time series is to the whole historical sample, the smaller the difference in the average

gaps is. That is to say, the dark bars are smaller than the light ones. From this, we conclude that

the SDG network and the structural factors exhibit slow dynamics, validating our conceptualiza-
27
These are only differences in the average gaps. The numbers to the right of these bars show the most sensitive
indicators to the length of the time series considered. The first column of numbers corresponds to comparisons with
10-year time series, while the second to comparisons with 5-year time series.

28
tion of long- versus short/mid-term effects. Accordingly, the budgetary frontiers involve long-term

considerations and demand the implementation of innovative micro-policies.

5 A discussion on the model’s strengths and limitations

Models of multidimensional development typically use composite indices such as the Human De-

velopment Index and the SDG Index. However, if analysts wish to provide more nuanced advice

with respect to specific SDGs in terms of policy prioritization and budgetary allocations, it is

necessary to model the evolution of each separate dimension without aggregating them and losing

valuable information. This task is problematic for statistical/econometric and machine learning

approaches since they cannot deal easily with a high-dimensional policy space characterized by few

observations (short time series). For example, multi-output models (such as regressions of equation

systems or neural networks) demand unrealistically large amounts of observations for each dimen-

sion/indicator. To overcome this limitation, analysts pool cross-national data to produce their

estimates. This, however, has the costly implication of removing country-specificity because any

interpretation from the estimated parameters is limited to a hypothetical country with the average

characteristics of the sample. In addition, data-pooling strategies only work with a limited number

of indicators, since there exist only so many countries.

In data-fitting approaches, the problem of few observations aggravates when considering inter-

dependencies between indicators because the number of potential interactions (parameters to be

estimated) grows exponentially with the number of dimensions (e.g., Asadikia et al. (2021); Osuji

and Nwani (2020); Dhaoui (2018)). On the other hand, aggregate models like systems dynamics and

integrated assessment frameworks try to overcome this limitation by, ex-ante, imposing the struc-

ture of interactions (e.g., Zelinka and Amadei (2019); Pedercini et al. (2020); Collste et al. (2017)).

This approach introduces strong assumptions and still demands large amounts of data since the

analysts tend to estimate the model’s parameters through regressions. Often, if data are not avail-

able, such parameters are directly imposed from existing estimations from other countries/regions

or, again, from pooled regressions, which brings us back to the context-specificity problem. Such

limitations to the quantitative study of the SDGs become more evident in the context of the causal

relationship between government expenditure and development indicators. In terms of this nexus,

29
we provide a list of more specific drawbacks.

1. Much of the empirical quantitative literature–which policymakers often use to guide their

decisions–focus on the impact of one (or multiple) indicator(s) on another (or others). How-

ever, indicators are not instruments that governments can directly manipulate but rather

endogenous variables resulting from spending decisions. Hence, governments often motivate

their expenditure choices using studies that do not offer evidence on how effective or viable

it would be to fund a particular SDG given the existing government programs. There are

two alternatives to remedy this analytical hurdle: the use of granular expenditure data or the

implementation of a generative model of public spending.

2. Highly granular data of public expenditure–properly linked to specific development indicators–

are practically non-existent. Under these circumstances, analysts have to rely on data ag-

gregated into a few broad sectors (e.g., education, health, poverty alleviation) and select a

‘representative’ indicator for each one–or an average index.

3. When constructing a dataset to use these methods, one must assemble a large cross-country

panel to obtain the necessary degrees of freedom for making estimations possible.

4. Most data-fitting approaches can only consider one dependent variable, which is inconsistent

with the systemic view of the SDGs.

5. Establishing a causal link between an expenditure variable and an aggregate indicator is

problematic because of confounding factors and reverse causation (because a government can

adjust its budget according to the observed performance of the indicators). This problem

also applies to studies using more sophisticated machine learning methods.

6. Due to inefficiencies during the policymaking process and spillover effects, the level of expen-

diture in a policy issue does not reflect the actual amount of resources effectively used.

In general, data-fitting and aggregate models are ill-suited due to their lack of explicit causal

mechanisms. To overcome this problem, computational approaches such as agent-based models

(ABMs) can be useful. Nonetheless, these models may also demand large amounts of data, so they

are typically employed in micro-level studies relevant to a specific SDG. However, this does not

30
mean that ABMs cannot be used for comprehensive analyses of SDGs but rather that substantial

efforts need to be made in this direction. For instance, (Allen et al., 2016) provide an extensive

review of model types used for the assessment of SDGs. They find that ABMs account for only 1%

of the studies in their literature survey.

In this paper, we contribute to this effort by developing an ABM that is explicit about a

critical causal channel: public expenditure. In contrast with recent studies on SDGs, like those

mentioned in the paper’s introduction, our model can be calibrated for individual countries on a

large policy space, helping researchers and practitioners to get the most out of the available data.

Furthermore, it does not impose aggregate relationships between the different indicators. Rather,

it is very flexible since it allows the user to introduce any network of interdependencies that are

relevant to the context under study (Ospina-Forero et al., 2020). While our model is not explicit

about the full complexity of the system (and no other model is), it provides a rich enough yet

parsimonious specification, which facilitates counterfactual experiments (‘what if’ scenarios) and

allows estimating the impact of budgetary changes.

The proposed computational method also has some limitations that the reader should be aware

of. Our model cannot produce ex-ante evaluations of new government programs. Nor it can yield

policy prescriptions if, in the out-of-the-sample analysis, there is a drastic transformation in the

technological, political, and organizational underpinnings of a country. In this sense, the model

assumes a ‘business as usual’ setting in which the system keeps working with similar government

programs and structural features as those observed during the sampling period. This assumption

is realistic in short-term analyses (less than 6 years) and admissible for evaluating policy design in

a medium-term setting (5-15 years). Thus, our approach focuses on short/mid-term effects, and

it explicitly separates structural factors that shape long-term dynamics. Ironically, while much of

the existing methods suffer from the same limitations, their applications often tend to emphasize

long-term scenarios.

The robustness of the model can be enhanced when disaggregated expenditure data is available

at the SDG or government program levels. However, these databases only exist for a very few

countries. Therefore, in this paper, we offer a worldwide application of the model and show that it

can provide insightful policy guidelines even if a country only has aggregate expenditure data. As

in any quantitative approach, when more detailed empirical information is available, our ABM can

31
generate more specific policy prescriptions. For instance, with expenditure data disaggregated at

the SDG level, it is possible to establish whether different budgetary allocations, to those observed

historically, could exert an impact on the closing of development gaps.

6 Conclusion

We propose a bottom-up computational framework to analyze the short- and mid-term impact of

budgetary allocations in a large set of SDG indicators. Our simulations use data from individual

countries. Hence, it allows specifying context-dependent settings: initial conditions, calibrated pa-

rameters, and spillover effects among indicators. The underlying theory assumes fixed structural

factors in the indicators’ evolution equations and exogenous network topologies (which could be

constructed in tandem with other qualitative and quantitative approaches). Our approach is useful

to understand how to allocate resources across the existing government programs, so it facilitates

identifying key priority areas. Moreover, through counter-factual simulations, the model can dis-

cover bottlenecks associated with the inefficacy of the public expenditure, which is key to achieve

any development agenda. However, the model is not designed to identify the causes behind the

structural constraints that prevent a country from closing its SDG gaps. Hence, the outputs are

not informative about how to reformulate the existing micro-policies or how to generate new ones.

Our main results provide novel and nuanced estimates of the development gaps that will remain

open in 2030, at the level of each country and indicator. We also find that more government spend-

ing is not enough to close the SDGs gaps, even if countries were operating at a budgetary frontier

that entails enough resources for the existing government programs. Hence, complementary micro-

policies are ultimately needed to overcome structural–long-term–bottlenecks and to improve the

relevant indicators. When looking at the model’s estimates, we can offer detailed interpretations

of the simulation results. For instance, some environmental concerns such as clean air can be sub-

stantially ameliorated with a larger budget, while others (e.g., SDG 14 and 15) require undertaking

well-designed government programs in order to shift the historical course of ineffective policies.

Despite the simplicity of the model, it is possible to use it to infer two crucial country-specific

features: (i) the possibility of closing the SDG gaps, and (ii) the existence of long-term bottlenecks.

Therefore, when analysts can identify one or several government programs with a particular indi-

32
cator, it is possible to establish some policy guidelines with the model’s estimates. Depending on

the values of these features and the level of the indicator’s historical performance, it is possible to

define different routes of policy action. That is to say, whether the program should be reviewed,

in terms of incentives and organizational practices before spending more public funds, or whether

the SDG gaps can be closed by just channeling more funds into the existing programs.

References

Akenroye, T., Nygård, H., and Eyo, A. (2018). Towards Implementation of Sustainable Develop-

ment Goals (SDG) in Developing Nations: A Useful Funding Framework. International Area

Studies Review, 21(1):3–8.

Allen, C., Metternicht, G., and Wiedmann, T. (2016). National pathways to the Sustainable

Development Goals (SDGs): A Comparative Review of Scenario Modelling Tools. Environmental

Science & Policy, 66:199–207.

Amos, R. and Lydgate, E. (2020). Trade, Transboundary Impacts and the Implementation of SDG

12. Sustainability Science, 15(6):1699–1710.

Aragam, B., Gu, J., and Zhou, Q. (2019). Learning Large-Scale Bayesian Networks with the

Sparsebn Package. Journal of Statistical Software, 91(11).

Asadikia, A., Rajabifard, A., and Kalantari, M. (2021). Systematic Prioritisation of SDGs: Machine

Learning Approach. World Development, 140:105269.

Benedek, D., Gemayel, E., Senhadji, A., and Tieman, A. (2021). A Post-Pandemic Assessment of

the Sustainable Development Goals. Staff Discussion Notes, 2021(003).

Boeren, E. (2019). Understanding Sustainable Development Goal (SDG) 4 on “Quality Education”

from Micro, Meso and Macro Perspectives. International Review of Education, 65(2):277–294.

Castañeda, G., Chávez-Juárez, F., and Guerrero, O. (2018). How Do Governments Determine

Policy Priorities? Studying Development Strategies through Networked Spillovers. Journal of

Economic Behavior & Organization, 154:335–361.

33
Castañeda, G. and Guerrero, O. (2018). The Resilience of Public Policies in Economic Development.

Complexity, 2018.

Castañeda, G. and Guerrero, O. (2019a). The Importance of Social and Government Learning in

Ex Ante Policy Evaluation. Journal of Policy Modeling.

Castañeda, G. and Guerrero, O. (2019b). Inferencia de Prioridades de Polı́tica para el Desarrollo

Sostenible. Reporte Metodológico, Programa de las Naciones Unidas para el Desarrollo.

Castañeda, G. and Guerrero, O. (2019c). Inferencia de Prioridades de Polı́tica para el Desarrollo

Sostenible: El Caso Sub-Nacional de México. Reporte Técnico, Programa de las Naciones Unidas

para el Desarrollo.

Castañeda, G. and Guerrero, O. (2019d). Inferencia de Prioridades de Polı́tica para el Desarrollo

Sostenible: Una Aplicación para el Caso de México. Reporte Técnico, Programa de las Naciones

Unidas para el Desarrollo.

Collste, D., Pedercini, M., and Cornell, S. E. (2017). Policy Coherence to Achieve the SDGs: Using

Integrated Simulation Models to Assess Effective Policies. Sustainability Science, 12(6):921–931.

Dhaoui, I. (2018). Achieving Sustainable Development Goals in MENA countries: An Analytical

and Econometric Approach.

Fader, M., Cranmer, C., Lawford, R., and Engel-Cox, J. (2018). Toward an Understanding of Syner-

gies and Trade-Offs Between Water, Energy, and Food SDG Targets. Frontiers in Environmental

Science, 0.

Fuso Nerini, F., Sovacool, B., Hughes, N., Cozzi, L., Cosgrave, E., Howells, M., Tavoni, M., Tomei,

J., Zerriffi, H., and Milligan, B. (2019). Connecting Climate Action with Other Sustainable

Development Goals. Nature Sustainability, 2(8):674–680.

Gobierno del Estado de México (2020). Informe de Ejecución del Plan de Desarrollo del Estado de

México 2017-2023; a 3 Años de la Administración.

González-Pier, E., Barraza-Lloréns, M., Beyeler, N., Jamison, D., Knaul, F., Lozano, R., Yamey, G.,

and Sepúlveda, J. (2016). Mexico’s path towards the Sustainable Development Goal for health:

34
An assessment of the feasibility of reducing premature mortality by 40% by 2030. lancet.Global

health, 4(10):e714–e725.

Guerrero, O. and Castañeda, G. (2020a). Policy Priority Inference: A Computational Framework

to Analyze the Allocation of Resources for the Sustainable Development Goals. Data & Policy,

2.

Guerrero, O. and Castañeda, G. (2020b). Quantifying the Coherence of Development Policy Pri-

orities. Development Policy Review, 00:1–26.

Guerrero, O. and Castañeda, G. (2021). Does expenditure in public governance guarantee less

corruption? Large non-linearities and complementarities of the rule of law. Economics of Gov-

ernance, forthcoming.

Guerrero, O., Castañeda, G., Trujillo, G., Hackett, L., and Chávez-Juárez, F. (2021). Subna-

tional Sustainable Development: The Role of Vertical Intergovernmental Transfers in Reaching

Multidimensional Goals. SSRN Working Paper.

Ionescu, G., Firoiu, D., Tănasie, A., Sorin, T., Pı̂rvu, R., and Manta, A. (2020). Assessing the

Achievement of the SDG Targets for Health and Well-Being at EU Level by 2030. Sustainability,

12(14):5829.

Jones, B., Baumgartner, F., Breunig, C., Wlezien, C., Soroka, S., Foucault, M., François, A.,

Green-Pedersen, C., Koski, C., John, P., Mortensen, P., Varone, F., and Walgrave, S. (2009).

A General Empirical Law of Public Budgets: A Comparative Analysis. American Journal of

Political Science, 53(4):855–873.

Kroll, C., Warchold, A., and Pradhan, P. (2019). Sustainable Development Goals (SDGs): Are We

Successful in Turning Trade-Offs into Synergies? Palgrave Communications, 5(1):1–11.

Luken, R., Mörec, U., and Meinert, T. (2020). Data Quality and Feasibility Issues with Industry-

Related Sustainable Development Goal Targets for Sub-Saharan African Countries. Sustainable

development, 28(1):91–100.

Lusseau, D. and Mancini, F. (2019). Income-Based Variation in Sustainable Development Goal

Interaction Networks. Nature Sustainability, 2(3):242–247.

35
Machingura, F. and Lally, S. (2017). The Sustainable Development Goals and Their Trade-Offs.

Technical report, Overseas Development Institute, London, United Kingdom.

McGowan, P., Stewart, G., Long, G., and Grainger, M. (2019). An Imperfect Vision of Indivisibility

in the Sustainable Development Goals. Nature Sustainability, 2(1):43–45.

Mensi, A. and Udenigwe, C. (2021). Emerging and Practical Food Innovations for Achieving

the Sustainable Development Goals (SDG) Target 2.2. Trends in Food Science & Technology,

111:783–789.

Moyer, J. and Hedden, S. (2020). Are We on the Right Path to Achieve the Sustainable Development

Goals? World Development, 127:104749.

OECD (2020). Measuring the Distance to the SDGs in Regions and Cities.

Ospina-Forero, L., Castañeda Ramos, G., and Guerrero, O. (2020). Estimating Networks of Sus-

tainable Development Goals. Information and Management.

Osuji, E. and Nwani, S. (2020). Achieving Sustainable Development Goals: Does Government

Expenditure Framework Matter? International Journal of Management, Economics and Social

Sciences (IJMESS), 9(3):131–160.

Pedercini, M., Arquitt, S., and Chan, D. (2020). Integrated Simulation for the 2030 Agenda. System

Dynamics Review, 36(3):333–357.

Pedercini, M., Arquitt, S., Collste, D., and Herren, H. (2019). Harvesting synergy from sustainable

development goal interactions. Proceedings of the National Academy of Sciences, 116(46):23021–

23028.

Philippidis, G., Shutes, L., M’Barek, R., Ronzon, T., Tabeau, A., and van Meijl, H. (2020). Snakes

and Ladders: World Development Pathways’ Synergies and Trade-Offs through the Lens of the

Sustainable Development Goals. Journal of Cleaner Production, 267:122147.

Porciello, J., Ivanina, M., Islam, M., Einarson, S., and Hirsh, H. (2020). Accelerating Evidence-

Informed Decision-Making for the Sustainable Development Goals Using Machine Learning. Na-

ture Machine Intelligence, 2(10):559–565.

36
Pradhan, P., Costa, L., Rybski, D., Lucht, W., and Kropp, J. (2017). A Systematic Study of

Sustainable Development Goal (SDG) Interactions. Earth’s Future, 5(11):1169–1179.

Pradhan, P., Subedi, D., Khatiwada, D., Joshi, K., Kafle, S., Chhetri, R., Dhakal, S., Gautam, A.,

Khatiwada, P., Mainaly, J., Onta, S., Pandey, V., Parajuly, K., Pokharel, S., Satyal, P., Singh,

D., Talchabhadel, R., Tha, R., Thapa, B., Adhikari, K., Adhikari, S., Bastakoti, R., Bhandari,

P., Bharati, S., Bhusal, Y., Bk, B., Bogati, R., Kafle, S., Khadka, M., Khatiwada, N., Lal,

A., Neupane, D., Neupane, K., Ojha, R., Regmi, N., Rupakheti, M., Sapkota, A., Sapkota, R.,

Sharma, M., Shrestha, G., Shrestha, I., Shrestha, K., Tandukar, S., Upadhyaya, S., Kropp, J.,

and Bhuju, D. (2021). The COVID-19 Pandemic Not Only Poses Challenges, but Also Opens

Opportunities for Sustainable Transformation. Earth’s Future, 9(7):e2021EF001996.

Putra, M., Pradhan, P., and Kropp, J. (2020). A Systematic Analysis of Water-Energy-Food

Security Nexus: A South Asian Case Study. Science of The Total Environment, 728:138451.

Sachs, J., Schmidt-Traub, G., Kroll, C., Lafortune, G., Fuller, G., and Woelm, F. (2020). Sustain-

able Development Report 2020. Bertelsmann Stiftung and Sustainable Development Solutions

Network (SDSN), New York.

Sobczak, E., Bartniczak, B., and Raszkowski, A. (2021). Implementation of the No Poverty Sus-

tainable Development Goal (SDG) in Visegrad Group (V4). Sustainability, 13(3):1030.

Sulmont, A., Garcı́a de Alba Rivas, M., and Visser, S. (2021). Policy Priority Inference for Sustain-

able Development: A Tool for Identifying Global Interlinkages and Supporting Evidence-Based

Decision Making. In Understanding the Spillovers and Transboundary Impacts of Public Policies.

OECD Publishing, Paris.

United Nations (2020). SDG Indicators, United Nations Global SDG Database.

Warchold, A., Pradhan, P., and Kropp, J. (2021). Variations in Sustainable Development Goal

Interactions: Population, Regional, and Income Disaggregation. Sustainable Development,

29(2):285–299.

World Bank (2020). SDG Atlas 2020.

37
Zelinka, D. and Amadei, B. (2019). A Systems Approach for Modeling Interactions Among the

Sustainable Development Goals Part 2: System Dynamics. International Journal of System

Dynamics Applications (IJSDA), 8(1):41–59.

38
How Does Government Expenditure Impact Sustainable

Development?

Studying the Multidimensional Link between Budgets and

Development Gaps

Online Appendix

Omar A. Guerrero1,2 and Gonzalo Castañeda3

1
Department of Economics, UCL, United Kingdom
2
The Alan Turing Institute, United Kingdom
3
Centro de Investigación y Docencia Económica (CIDE), Mexico

A Full model details

This appendix provides all the equations of the agent-based model and their motivations. Guerrero

and Castañeda (2020a) provide further discussions on the theoretical foundations of the model as

well as internal and external validation tests.

A.1 Policy-making agents

There are n agents (or public officials), each in charge of a public policy that is specific to a single

policy issue. To implement the mandated policy in a given period t, agent i receives Pi,t resources

from the central authority (or government). With these resources, the public official tries to leverage

two potential benefits: (1) the reputation from being a proficient public servant and (2) the utility

derived from being inefficient. This trade-off is modeled through the benefit function

1
∗ Ci,t (Pi,t − Ci,t )
Fi,t+1 = ∆Ii,t + (1 − θi,t τ ) , (7)
Pi,t Pi,t

where Fi,t+1 represents the benefit or utility obtained in the next period. The first summand in
∗ is the change in indicator i with respect
equation 7 captures the benefit of being proficient. ∆Ii,t

to the previous period (its performance), relative to the changes of all other indicators. More

specifically, the relative change in indicator i is computed as

∗ Ii,t − Ii,t−1
∆Ii,t =P , (8)
j Ij,t − Ij,t−1

and it captures the idea that the central authority compares and evaluates the relative performance

of each public official, and their implemented policies, through the corresponding development

indicators.

Going back to the first summand in equation 7, we find that the relative change in the indicator
Ci,t
is pondered by Pi,t . Here, Ci,t is the fraction of the allocated resources Pi,t that are effectively used

towards the policy. We call it the contribution of agent i.

Next, let us focus on the second addend of equation 7, which corresponds to the utility derived

from being inefficient. Here, Pi,t − Ci,t is the benefit extracted from not devoting resources to the

policy. Thus, when dividing by Pi,t , it represents the level of inefficiency. Public procurement

mechanisms such as monitoring and penalties may hinder inefficiencies. This is captured by factor

(1 − θi,t τ ). Variable θi,t is the binary outcome of monitoring inefficiencies. If θi,t = 1, it means that

the government has spotted agent i in inefficient behavior. In that case, i is penalized by a factor

τ , such that the benefit from these private gains are reduced.

In order to model the binary outcomes of monitoring efforts, we assume that, every period, an

independent realization of θi,t takes place for each indicator. This is nothing else than a Bernoulli

process with a probability of success λi,t determined by

Pi,t − Ci,t
λi,t = ϕ , (9)
Pt∗

where Pt∗ is the largest allocation in period t. Parameter ϕ in equation 9 corresponds to the quality

of the monitoring efforts. By normalizing the inefficiencies by Pt∗ , we are considering the emergence

2
Pi,t −Ci,t
of social norms of corruption. That is, factor Pt∗ captures how much a deviation of resources

stands out from the largest allocation. Thus, diversions that deviate from this norm are more likely

to be under the spotlight and to become media scandals.

If an agent becomes more inefficient and their benefits increase, then reinforcement learning

takes place, becoming more inefficient the next period. If, in contrast, the government is able to

penalize, according to the learning process, they become more proficient the next period. There

are several ways in which an agent may become more or less inefficient. Therefore, we represent

any action through an abstract variable Xi,t , which may take any real value. If Xi,t > 0 is positive,

it means that the agent has a proclivity to be more efficient than inefficient. Otherwise, the agent

is more propense to be inefficient. We model the reinforcement of action Xi,t as

Xi,t+1 = Xi,t + sgn((Xi,t − Xi,t−1 )(Fi,t − Fi,t−1 ))|Fi,t − Fi,t−1 |, (10)

where sgn(·) is the sign function. Equation 10 corresponds to the directed learning model (Dhami,

2016), which is a type of reinforcement learning.

In order to translate action Xi,t into a contribution of resources that is bounded by [0, Pi,t ], we

define

Pi,t
Ci,t = . (11)
1 + e−Xi,t

A.2 The government agent

Policy priorities are represented by the allocation profile P = P1 , . . . Pn . It is important to introduce

a distinction between those indicators that can be intervened via public policies: instrumental ; and

those that cannot: collateral. An instrumental indicator exists if the government has a program

to directly impact it (i.e., it receives public funds). In contrast, a collateral indicator cannot be

directly impacted, for example, because it is a composite aggregation of various topics, e.g. GDP

per capita or financial development. Policy priorities can only be defined on the n instrumental

indicators, and we assume that there are n public officials (one in charge of each instrumental

indicator). When talking about all the indicators together, we say that there are N ≥ n policy

issues in total, and a government has goals for all of them (even for the collateral ones).

3
The objective of the government is to close the gap between the goals and the indicators by

solving the problem

N
" #
X
min (Gi − Ii,t )2 , (12)
i

where Gi is the goal established for indicator i. The central authority achieves this by adapting its

allocation profile.

In the real world, identifying the precise mechanisms through which governments establish their

budgets is extremely challenging. A starting point is the principle of ‘gaping’, which suggests that

governments prioritize the most laggard topics as these may be development bottlenecks. Neverthe-

less, this political process also introduces adaptations motivated from signals such as the people’s

demands, and the performance of the different expenditure programs. In the political science lit-

erature, these budgetary changes exhibit punctuated dynamics and are modeled through simple

stochastic processes (Jones et al., 2009). Thus, we combine all these insights into a government

heuristic where the policy priorities are established according to

qi,t
Pi,t = B P , (13)
j qj,t

where qi,t is the propensity to spend in policy issue i in time t, and B is the budget available in

time t.

The evolution of the policy priorities takes place through the propensities. In the first period,

these are determined by the normalized gaps

Gi − Ii,0
qi,0 = . (14)
max(G· − I·,0 )

Then, as time progresses, the propensities are updated according to

t−1
!−1 t−1
X X Pi,k − Ci,k
qi,t = qi,t−1 + U (0, 1) θi,k . (15)
Pi,k
k k|θi,k =1

The previous equation is rather intuitive. The term U (0, 1) is a random draw from a uniform

distribution in the (0,1) interval. This captures the randomness of societal signals received by the

4
government (it is consistent with the stochastic processes used to model budgetary changes in the

literature). The remaining terms to the right correspond to the inter-temporal average inefficiency,

which lies in the interval [0,1]. Therefore, the government encourages increments among the most

efficient policymaking agents. Note that, in general, the contribution Ci,t is not observable by the

government, unless there is a successful audit by the monitoring authority. This is why equation 15

conditions the efficiency bias in the allocation of the budget to successful outcomes of the monitoring

random variable θi,t . Thus, the government tends to be more inquisitive with policymakers whose

inefficiencies have been spotted in the past.

A.3 Indicator dynamics

As discussed in the main text, we model indicator dynamics through a random growth process. Let

γi denote a probability associated with an improvement in indicator i. This probability depends on

a combination of network effects (i.e., incoming spillovers) and budgetary allocations. Therefore,

the growth process is modeled as independent Bernoulli trials with a probability of success

1 P
Ci,t + n j Cj,t
γi,t = β , (16)
1 + e−Si,t

where β is a normalizing parameter and Si,t are the net amount of spillovers received by indicator

i in time t (this could be positive or negative). The spillovers are computed every period according
P
to Si,t = j 1j,t Aj,i , where 1 is the indicator function: 1 if indicator j grew in the previous period

and 0 otherwise.

Next, we define the difference equation of indicator i as

Ii,t+1 = Ii,t + αi ξ(γi,t ) (17)

where ξ(·) is the binary outcome (0 or 1) of a growth trial. Note that, if the indicator exceeds its

theoretical maximum (when provided by the user), the model assigns zero growth.

5
B Data

B.1 Development indicators

The original dataset of the indicators used for the 2020 Sustainable Development Report (SDR)

can be downloaded here: github.com/sdsna/SDR2020. In total, we use 77 indicators and 140

countries. Table B.1 provides the complete list of indicators, their codes, and the SDG to which

they belong. Table B.1, on the other hand, presents the complete list of countries that we extracted

for our sample, with counts of indicators and SDG coverage.

Table B.2: Indicators and SDGs per country

Country Indicators SDGs Country Indicators SDGs Country Indicators SDGs

AFG 49 13 AGO 56 15 ALB 54 14


ARE 51 13 ARG 56 14 ARM 52 13
AUS 71 16 AUT 66 15 AZE 51 13
BDI 50 14 BEL 69 16 BEN 53 15
BFA 52 14 BGD 55 14 BGR 55 14
BHR 46 14 BIH 50 13 BLR 52 13
BOL 51 13 BRA 56 14 BWA 53 14
CAF 49 14 CAN 69 16 CHE 66 15
CHL 71 16 CHN 52 14 CIV 55 15
CMR 54 15 COG 52 15 COL 56 14
CRI 56 14 CYP 51 14 CZE 68 15
DEU 72 16 DNK 72 16 DOM 54 14
DZA 55 15 ECU 55 14 EGY 57 15
ERI 46 14 ESP 71 16 EST 72 16
FIN 72 16 FRA 72 16 GAB 54 14
GBR 69 16 GEO 55 14 GHA 56 15
GIN 53 15 GMB 53 15 GNB 44 15
GRC 70 16 GTM 56 14 HND 55 14

Continued . . .

6
Table B.2: Indicators and SDGs counts per country (continued . . .)

Country Indicators SDGs Country Indicators SDGs Country Indicators SDGs

HRV 55 14 HTI 50 13 HUN 68 15


IDN 56 14 IND 55 14 IRL 71 16
IRN 56 15 IRQ 54 15 ISR 67 16
ITA 72 16 JAM 53 14 JOR 53 15
JPN 68 16 KAZ 52 13 KEN 57 15
KGZ 52 13 KHM 56 14 KOR 67 16
KWT 51 14 LAO 49 13 LBN 52 15
LBR 54 15 LKA 56 14 LSO 51 14
LTU 70 16 LVA 72 16 MAR 56 15
MDA 51 13 MDG 56 15 MEX 71 16
MKD 51 13 MLI 52 14 MMR 54 14
MNG 52 13 MOZ 56 15 MRT 53 15
MUS 55 15 MWI 51 14 MYS 56 14
NAM 57 15 NER 51 14 NGA 54 15
NIC 55 14 NLD 71 16 NOR 72 16
NPL 51 13 NZL 69 16 OMN 49 14
PAK 56 14 PAN 54 14 PER 56 14
PHL 56 14 POL 70 16 PRT 72 16
PRY 52 13 QAT 47 14 RUS 54 14
RWA 51 14 SAU 52 14 SDN 55 15
SEN 55 15 SGP 49 14 SLE 54 15
SLV 56 14 SVK 68 15 SVN 70 16
SWE 70 16 SWZ 48 14 TCD 50 14
TGO 54 15 THA 56 14 TJK 51 13
TKM 45 11 TUN 57 15 TUR 67 16
TZA 57 15 UGA 52 14 UKR 56 14
URY 55 14 USA 71 16 UZB 51 13
VEN 53 14 VNM 56 14 ZAF 57 15

Continued . . .

7
Table B.2: Indicators and SDGs counts per country (continued . . .)

Country Indicators SDGs Country Indicators SDGs Country Indicators SDGs

ZMB 52 14 ZWE 51 13

Sources: Sample from the 2020 Sustainable Development Report.

8
Table B.1: List of policy issues by SDG
SDG Code Description
1 320pov Poverty headcount ratio at $3.20/day (%)
1 oecdpov Poverty rate after taxes and transfers (%)
2 crlyld Cereal yield (tonnes per hectare of harvested land)
2 obesity Prevalence of obesity, BMI ≥ 30 (% of adult population)
2 snmi Sustainable Nitrogen Management Index (worst 0-1.41 best)
2 trophic Human Trophic Level (best 2-3 worst)
2 undernsh Prevalence of undernourishment (%)
2 wasting Prevalence of wasting in children under 5 years of age (%)
3 births Births attended by skilled health personnel (%)
3 fertility Adolescent fertility rate (births per 1,000 adolescent females aged 15 to 19)
3 hiv New HIV infections (per 1,000 uninfected population)
3 incomeg Gap in self-reported health status by income (percentage points)
3 lifee Life expectancy at birth (years)
3 matmort Maternal mortality rate (per 100,000 live births)
3 ncds Age-standardized death rate due to cardiovascular disease, cancer, diabetes, or chronic respiratory disease in adults
aged 30–70 years (%)
3 smoke Daily smokers (% of population aged 15 and over)
3 swb Subjective well-being (average ladder score, worst 0-10 best)
3 tb Incidence of tuberculosis (per 100,000 population)
3 traffic Traffic deaths (per 100,000 population)
3 u5mort Mortality rate, under-5 (per 1,000 live births)
3 uhc Universal health coverage (UHC) index of service coverage (worst 0-100 best)
3 vac Percentage of surviving infants who received 2 WHO-recommended vaccines (%)
4 earlyedu Participation rate in pre-primary organized learning (% of children aged 4 to 6)
4 pisa PISA score (worst 0-600 best)
4 primary Net primary enrollment rate (%)
4 second Lower secondary completion rate (%)
4 socioec Variation in science performance explained by socio-economic status (%)
4 tertiary Tertiary educational attainment (% of population aged 25 to 34)
5 edat Ratio of female-to-male mean years of education received (%)
5 familypl Demand for family planning satisfied by modern methods (% of females aged 15 to 49 who are married or in unions)
5 lfpr Ratio of female-to-male labor force participation rate (%)
5 parl Seats held by women in national parliament (%)
5 paygap Gender wage gap (% of male median wage)
6 safesan Population using safely managed sanitation services (%)
6 safewat Population using safely managed water services (%)
6 sanita Population using at least basic sanitation services (%)
6 scarcew Scarce water consumption embodied in imports (m3 /capita)
6 water Population using at least basic drinking water services (%)
7 cleanfuel Population with access to clean fuels and technology for cooking (%)
7 co2twh CO2 emissions from fuel combustion for electricity and heating per total electricity output (MtCO2 /TWh)
7 elecac Population with access to electricity (%)
7 ren Share of renewable energy in total primary energy supply (%)
8 accounts Adults with an account at a bank or other financial institution or with a mobile-money-service provider (% of
population aged 15 or over)
8 empop Employment-to-population ratio (%)
8 impacc Fatal work-related accidents embodied in imports (per 100,000 population)
8 unemp Unemployment rate (% of total labor force)
8 yneet Youth not in employment, education or training (NEET) (% of population aged 15 to 29)
9 articles Scientific and technical journal articles (per 1,000 population)
9 intuse Population using the internet (%)
9 lpi Logistics Performance Index: Quality of trade and transport-related infrastructure (worst 1-5 best)
9 mobuse Mobile broadband subscriptions (per 100 population)
9 netacc Gap in internet access by income (percentage points)
9 patents Triadic patent families filed (per million population)
9 rdex Expenditure on research and development (% of GDP)
9 rdres Researchers (per 1,000 employed population)
10 elder Elderly poverty rate (% of population aged 66 or over)
10 palma Palma ratio
11 pipedwat Access to improved water source, piped (% of urban population)
11 pm25 Annual mean concentration of particulate matter of less than 2.5 microns in diameter (PM2.5) (µg/m3 )
11 rentover Population with rent overburden (%)
11 transport Satisfaction with public transport (%)
13 co2import CO2 emissions embodied in imports (tCO2 /capita)
13 co2pc Energy-related CO2 emissions (tCO2 /capita)
14 cleanwat Ocean Health Index: Clean Waters score (worst 0-100 best)
14 cpma Mean area that is protected in marine sites important to biodiversity (%)
14 fishstocks Fish caught from overexploited or collapsed stocks (% of total catch)
14 trawl Fish caught by trawling (%)
15 cpfa Mean area that is protected in freshwater sites important to biodiversity (%)
15 cpta Mean area that is protected in terrestrial sites important to biodiversity (%)
15 redlist Red List Index of species survival (worst 0-1 best)
16 detain Unsentenced detainees (% of prison population)
16 homicides Homicides (per 100,000 population)
16 prison Persons held in prison (per 100,000 population)
16 rsf Press Freedom Index (best 0-100 worst)
16 safe Percentage of population who feel safe walking alone at night in the city or area where they live (%)
17 govex Government spending on health and education (% of GDP)
17 govrev Other countries: Government revenue excluding grants (% of GDP)

B.2 Public governance indicators

As part of the behavioral component of the policymaking agents described in Appendix A.1, we

consider two mechanisms that affect the incentives of the agents in determining their contributions.

9
These are the quality of monitoring (parameter ϕ from equation 9) and the quality of the rule

of law (parameter τ from equation 7). Since both are institutional variables, we consider them

exogenous and directly impute their values through empirical data.1 The intuition behind these

parameters is to provide a comparative metric of the different qualities of the public procurement

mechanisms across countries. Therefore, rather than being actual estimates, they are indicators

reflecting relative qualities. We use the Worldwide Governance Indicators database, which can

be obtained here: info.worldbank.org/governance/wgi. In particular, we obtain the indicators

of control of corruption, reflecting the quality of the monitoring efforts by the central authority,

and the one of rule of law, capturing the quality of institutions designed to reassure a law-abiding

society. These data are normalized using their theoretical minimum and maximums (provided in

the source dataset), so all their values lie in the interval (0, 1). Then, for the countries in the

SDR sample, we compute the inter-temporal values of these two indicators for the period 2000-

2020. As we later show in Appendix G, this institutional information, together with the behavioral

component, facilitate the external validation of the model.

B.3 Instrumental indicators

In the model, policy priorities (budgetary allocations) are defined over those indicators that are

considered to be directly impacted through specific government programs; that is why we call them

instrumental indicators. In this study, we identify a subset of indicators that, from our experience,

are likely to be instrumental. The rest of the indicators are defined as collateral because we

find them too aggregate for any government to claim any capability of direct manipulation. Of

course, some indicators could be instrumental in some countries but not in others. This, however,

requires extensive contextual knowledge, something difficult to obtain when studying 140 nations.

Therefore, the 55 indicators identified in Table B.3 are assumed to be instrumental in all countries

from our sample.

1
Castañeda et al. (2018); Guerrero and Castañeda (2021) show that these governance variables can also be con-
sidered endogenous if these are relevant indicators.

10
Table B.3: Instrumental indicators

SDG Indicator code SDG Indicator code SDG Indicator code SDG Indicator code

1 320pov 1 oecdpov 2 snmi 2 undernsh


2 wasting 3 births 3 incomeg 3 matmort
3 tb 3 u5mort 3 uhc 3 vac
4 earlyedu 4 pisa 4 primary 4 second
4 socioec 4 tertiary 5 edat 5 familypl
6 safesan 6 safewat 6 sanita 6 water
7 cleanfuel 7 co2twh 7 elecac 7 ren
8 accounts 8 yneet 9 intuse 9 lpi
9 mobuse 9 netacc 9 rdex 10 elder
11 pipedwat 11 pm25 11 rentover 11 transport
13 co2import 13 co2pc 14 cleanwat 14 cpma
14 fishstocks 14 trawl 15 cpfa 15 cpta
15 redlist 16 detain 16 homicides 16 prison
16 safe 17 govex 17 govrev

Sources: Authors’ manual classification.

C Confidence

C.1 Confidence intervals

We are interested in building confidence intervals for the SDG gaps. Recall these are the average

gaps computed over multiple Monte Carlo simulations. In order to construct their confidence

intervals the brute force approach is to perform X sets of M Monte Carlo simulations. Taking as

reference that we use M = 10000 for our estimates, confidence intervals based on X = 10000, for

example, would imply a hundred million simulations for each country. A more efficient strategy

is to construct bootstrap confidence intervals from the M Monte Carlo simulations of the original

estimate. Figure C.1 shows an example of the distributions of SDG gaps for illustrative indicators

11
from Mexico obtained through both methods. As predicted by the bootstrap theory (Efron, 1981),

the bootstrap intervals closely resemble the original intervals for a large enough M and sufficient

resampling.

Figure C.1: Brute-force and bootstrap confidence intervals for SDG gaps

(a) Nitrogen management (b) Researchers (c) Fish stocks


140
bootstrap bootstrap 50 bootstrap
30 brute force 120 brute force brute force
25 100 40
20
density

density
density 80 30
15 60
20
10 40
5 10
20
0 0 0
53.34 53.36 53.38 53.40 53.42 53.44 94.125 94.130 94.135 94.140 94.145 28.55 28.56 28.57 28.58 28.59 28.60
SDG gap for sdg2_snmi (%) SDG gap for sdg9_rdres (%) SDG gap for sdg14_fishstocks (%)
Notes: Each gap was estimated using 1000 Monte Carlo simulations. The brute-force distributions use 10000 gap
estimates. The bootstrap distributions use 10000 re-samples.
Sources: Authors’ own calculations.

From Figure C.1, we can see that the confidence intervals of the SDG gaps are narrow. This is

a systematic feature across all the estimated SDG gaps. The widest intervals have an amplitude

of nearly 0.5% between the percentiles 2.5 and 97.5. While the data files with exact estimates

and percentiles can be found in the repository http://github.com.oguerrer/sdg_feasibility,

here we provide a qualitative view of the amplitude of the intervals for the SDG gaps of each

country and indicator. In Figure C.2, we depict marker sizes proportionally to the amplitude of the

corresponding 95% confidence interval. This visualization provides information about the relative

uncertainties across countries and indicators with respect to their SDG gaps.

12
Figure C.2: Amplitude of 95% bootstrap confidence intervals of SDG gaps
AGO MYS
BDI NPL
BEN PAK
BFA PHL
BWA SGP
CAF THA
CIV VNM
CMR ARG
COG BOL
ERI BRA
GAB CHL
GHA COL
GIN CRI
GMB DOM
GNB ECU
KEN GTM
LBR HND
LSO HTI
MDG JAM
MLI MEX
MOZ NIC
MRT PAN
MUS PER
MWI PRY
NAM SLV
NER URY
NGA VEN
RWA ARE
SDN BHR
SEN DZA
SLE EGY
SWZ IRN
TCD IRQ
TGO JOR
TZA KWT
UGA LBN
ZAF MAR
ZMB OMN
ZWE QAT
AFG SAU
ALB TUN
ARM AUS
AZE AUT
BGR BEL
BIH CAN
BLR CHE
CYP CZE
GEO DEU
HRV DNK
KAZ ESP
KGZ EST
MDA FIN
MKD FRA
RUS GBR
TJK GRC
TKM HUN
TUR IRL
UKR ISR
UZB ITA
BGD LTU
CHN LVA
IDN NLD
IND NOR
JPN NZL
KHM POL
KOR PRT
LAO SVK
LKA SVN
MMR SWE
MNG USA
sdg3_smoke

sdg9_rdex
sdg9_rdres

sdg3_smoke

sdg9_rdex
sdg9_rdres
sdg2_trophic
sdg2_undernsh

sdg6_scarcew

sdg7_ren

sdg11_rentover

sdg15_redlist

sdg17_govex
sdg17_govrev

sdg2_trophic
sdg2_undernsh

sdg6_scarcew

sdg7_ren

sdg11_rentover

sdg15_redlist

sdg17_govex
sdg17_govrev
sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc

sdg9_mobuse
sdg5_edat

sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc

sdg9_intuse
sdg9_lpi
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma
sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma

sdg1_320pov
sdg1_oecdpov
sdg2_crlyld
sdg2_obesity
sdg2_snmi
sdg2_wasting
sdg3_births
sdg3_hiv
sdg3_incomeg
sdg3_matmort
sdg3_ncds
sdg3_swb
sdg3_tb
sdg3_uhc

sdg7_co2twh
sdg8_accounts
sdg8_empop
sdg8_impacc
sdg8_unemp

sdg11_pipedwat
sdg11_pm25
sdg11_transport
sdg13_co2import
sdg13_co2pc
sdg14_cpma
sdg3_fertility
sdg3_lifee

sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water

sdg8_unemp
sdg7_cleanfuel
sdg7_elecac

sdg8_yneet
sdg9_articles

sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe

sdg3_fertility
sdg3_lifee

sdg3_traffic
sdg3_u5mort
sdg3_vac
sdg4_earlyedu
sdg4_pisa
sdg4_primary
sdg4_second
sdg4_socioec
sdg4_tertiary
sdg5_edat
sdg5_familypl
sdg5_lfpr
sdg5_parl
sdg5_paygap
sdg6_safesan
sdg6_safewat
sdg6_sanita
sdg6_water
sdg7_cleanfuel
sdg7_elecac

sdg8_yneet
sdg9_articles
sdg9_intuse
sdg9_lpi
sdg9_mobuse
sdg9_netacc
sdg9_patents
sdg10_elder
sdg10_palma

sdg14_cleanwat
sdg14_fishstocks
sdg14_trawl
sdg15_cpfa
sdg15_cpta
sdg16_homicides
sdg16_prison
sdg16_detain
sdg16_rsf
sdg16_safe
Notes: The size of the markers are proportional to the amplitude of the 95% bootstrap confidence interval. The
largest marker correspond to an amplitude of approximately 0.5%. The gray lines indicate the absence of an indicator
in a particular country.
Sources: Authors’ own calculations.

C.2 Uncertainty from data quality

The distributions of SDG gaps reported above are the result of the stochastic elements of the model

(such as the growth process of the indicators), of path dependency in the learning component, and

of the random initial conditions of the endogenous variables. A natural question for any empirical

work that relies on indicators is how can we asses our confidence on the inferences, if the data are

subject to errors. Here, we show how to incorporate this additional source of uncertainty when

estimating the SDG gaps.

In the agent-based modeling literature, one can find different strategies to tackle this problem

because each model may have a very particular way of using the data. Furthermore, models that

can be approximated through stationary stochastic processes may enjoy the benefit of existing

asymptotic results from the statistical literature. In the case of the model presented in this paper,

propagating the uncertainty of the data into the parameters involves heavy computational work.

In this appendix, we present a viable strategy, and provide some results for an illustrative country.

First, let us assume that we know the standard error ei,t of each empirical indicator at each point

13
in time. With this information, we can generate an alternative dataset in which each data point

has been perturbed according to its standard error, so its value is a randomly chosen point in the

interval [Ii,t − ei,t , Ii,t + ei,t ]. Once this alternative dataset has been built, we compute the fraction

of positive first differences Γ across all indicators, which will be used to calibrate parameter β.

This alternative dataset also provides updated values for the indicators’ initial and final conditions,

necessary to calibrate α1 , . . . , αN . Thus, with this information, we calibrate the model following

the procedure described in section 2.3 of the main text. Finally, we store the resulting parameters

and repeat the entire process in order to obtain a sample of parameter configurations. In order

to compute the confidence intervals that account for the indicators’ errors, we need to perform

independent estimations of the SDG gaps for each parameter configuration.2

The procedure described above assumes knowledge about the errors of the indicators. Unfortu-

nately, the SDR does not provide this information; and most of the original sources do not report

them either. For this reason, we do not report these intervals in the main text. However, here

we present an example of the procedure for the case of Mexico, using the inter-temporal standard

deviation of each indicator in order to obtain a proxy error ρi /(number of years). First, in Figure

C.3, we present histograms approximating the distributions of the model’s parameters. Then, in

Table C.1, we show the confidence intervals obtained for the different SDG gaps, and compare them

with the ones estimated when no measurement error is assumed.

In Figure C.4, we compare the distributions from Figure C.1 with the ones obtained when

accounting for the indicators’ errors. As expected, the errors from the indicators introduce more

variability in the distributions of the SDG gaps. However, the range of the new distributions is still

modest, usually not exceeding 1%.

2
The estimations have to be done strictly with the stored parameters and not by randomizing them using their
distributions. The reason for this is that the calibration procedure does not treat each parameter independent of
each other; each configuration is the result of a joint estimation, so any ex post randomization should consider their
interdependencies.

14
Figure C.3: Parameter distributions obtained from randomized indicators

140 70
120 60
100 50

frequency
frequency

80 40
60 30
40 20
20 10
0 0
0 1 2 3 4 5 0.12425 0.12450 0.12475 0.12500 0.12525 0.12550 0.12575 0.12600
structural factor normalizing constant

Sources: Authors’ own calculations.

Figure C.4: Confidence intervals of SDG gaps with and without data errors

(a) Nitrogen management (b) Researchers (c) Fish stocks


140
without data errors without data errors 50 without data errors
30 with data errors 120 with data errors with data errors
25 100 40
20
density

density
80
density

30
15 60
20
10 40
5 10
20
0 0 0
53.20 53.25 53.30 53.35 53.40 53.45 53.50 53.55 94.08 94.10 94.12 94.14 94.16 94.18 28.40 28.45 28.50 28.55 28.60 28.65 28.70 28.75
SDG gap for sdg2_snmi (%) SDG gap for sdg9_rdres (%) SDG gap for sdg14_fishstocks (%)
Sources: Authors’ own calculations.

Table C.1: 95% confidence intervals for Mexico

Indicator Gap CI CI+se Indicator Gap CI CI+se Indicator Gap CI CI+se

oecdpov 5.32 ±0.14 ±0.29 crlyld 35.86 ±0.32 ±0.50 obesity 28.81 ±0.00 ±0.00
snmi 53.38 ±0.05 ±0.20 trophic 15.02 ±0.00 ±0.01 undernsh 3.75 ±0.01 ±0.02
wasting 1.02 ±0.02 ±0.03 births 0.00 ±0.00 ±0.00 fertility 4.44 ±0.04 ±0.07
hiv 0.06 ±0.00 ±0.00 lifee 6.71 ±0.04 ±0.07 matmort 0.22 ±0.00 ±0.00
ncds 6.29 ±0.03 ±0.05 smoke 0.00 ±0.00 ±0.00 swb 8.85 ±0.05 ±0.21
tb 0.22 ±0.00 ±0.00 traffic 11.02 ±0.00 ±0.00 u5mort 0.45 ±0.01 ±0.02
uhc 13.48 ±0.16 ±0.49 vac 13.98 ±0.03 ±0.24 earlyedu 0.00 ±0.00 ±0.00
pisa 21.01 ±0.00 ±0.00 primary 0.00 ±0.00 ±0.00 second 0.00 ±0.00 ±0.00
socioec 4.63 ±0.00 ±0.00 tertiary 49.76 ±0.11 ±0.43 edat 2.00 ±0.06 ±0.12
familypl 18.02 ±0.04 ±0.09 lfpr 39.63 ±0.12 ±0.25 parl 0.00 ±0.00 ±0.00
paygap 12.12 ±0.08 ±0.24 safesan 13.78 ±0.63 ±1.89 safewat 52.73 ±0.06 ±0.22
scarcew 0.71 ±0.01 ±0.01 cleanfuel 11.17 ±0.05 ±0.13 co2twh 1.45 ±0.01 ±0.01
elecac 0.00 ±0.00 ±0.00 ren 84.23 ±0.00 ±0.00 accounts 43.09 ±0.09 ±0.66

Continued . . .

15
Table C.1: 95% confidence intervals for Mexico (continued . . .)

Indicator Gap CI CI+se Indicator Gap CI CI+se Indicator Gap CI CI+se

impacc 0.14 ±0.00 ±0.00 yneet 12.40 ±0.06 ±0.12 articles 85.33 ±0.13 ±0.20
intuse 0.37 ±0.24 ±0.20 lpi 31.03 ±0.00 ±0.00 mobuse 1.53 ±0.33 ±0.44
netacc 1.37 ±0.32 ±0.75 patents 99.74 ±0.00 ±0.00 rdex 87.04 ±0.04 ±0.08
rdres 94.13 ±0.01 ±0.06 elder 15.86 ±0.13 ±0.26 palma 1.56 ±0.00 ±0.00
pipedwat 0.00 ±0.01 ±0.01 pm25 16.08 ±0.04 ±0.13 rentover 2.24 ±0.00 ±0.01
transport 32.30 ±0.12 ±0.41 co2import 0.48 ±0.00 ±0.00 co2pc 4.14 ±0.00 ±0.01
cleanwat 35.37 ±0.00 ±1.30 cpma 1.93 ±0.40 ±0.44 fishstocks 28.57 ±0.03 ±0.29
trawl 12.68 ±0.02 ±0.13 cpfa 75.90 ±0.05 ±0.32 cpta 55.16 ±0.11 ±0.54
redlist 32.77 ±0.00 ±0.00 detain 13.89 ±0.02 ±0.09 homicides 2.50 ±0.00 ±0.02
prison 10.71 ±0.10 ±0.23 rsf 45.57 ±0.02 ±0.11 safe 56.55 ±0.00 ±0.00
govex 37.48 ±0.18 ±0.34 govrev 35.51 ±0.25 ±0.63

Notes: All quantities are expressed in percentages.


Column ‘Gap’ denotes the estimated SDG gap to be expected in 2030.
‘CI’ corresponds to the bootstrap confidence internals.
‘CI+se’ indicates confidence intervals that account data errors.
Recall that the SDR does not report data errors, so ‘CI+se’ is illustrative.
Sources: Authors’ own calculations.

D Network

D.1 Estimation

The network of interlinkages consists of a directed acyclic graph estimated through Bayesian meth-

ods from the package sparsebn (Aragam et al., 2019), which can be accessed here: https:

//github.com/itsrainingdata/sparsebn. The links do not represent causal relationship, but

conditional dependencies. This is discussed in detail by Guerrero and Castañeda (2020a); Ospina-

Forero et al. (2020). In order to remove the influence of temporal trends, we transform the series

into their first differences.

A virtue of Bayesian methods over alternative network estimation approaches is their ability

to specify a ‘white list’ of edges that can be considered true positives. In other words, with prior

knowledge, one can determine a set of links that would be expected from the estimation. We

identify 109 synergies (links with positive weights) that should be expected in any network of any

country. Of course, this could be refined for each individual country, should more specific contextual

information becomes available. These synergies are reported in Table D.1.

16
Table D.1: White list of synergies

Origin Destination Origin Destination Origin Destination

wpc (SDG 1) undernsh (SDG 2) wpc (SDG 1) u5mort (SDG 3) wpc (SDG 1) fertility (SDG 3)
wpc (SDG 1) vac (SDG 3) wpc (SDG 1) primary (SDG 4) wpc (SDG 1) earlyedu (SDG 4)
wpc (SDG 1) accounts (SDG 8) wpc (SDG 1) netacc (SDG 9) wpc (SDG 1) elder (SDG 10)
320pov (SDG 1) undernsh (SDG 2) 320pov (SDG 1) u5mort (SDG 3) 320pov (SDG 1) fertility (SDG 3)
320pov (SDG 1) vac (SDG 3) 320pov (SDG 1) primary (SDG 4) 320pov (SDG 1) earlyedu (SDG 4)
320pov (SDG 1) accounts (SDG 8) 320pov (SDG 1) netacc (SDG 9) 320pov (SDG 1) elder (SDG 10)
oecdpov (SDG 1) undernsh (SDG 2) oecdpov (SDG 1) u5mort (SDG 3) oecdpov (SDG 1) fertility (SDG 3)
oecdpov (SDG 1) vac (SDG 3) oecdpov (SDG 1) primary (SDG 4) oecdpov (SDG 1) earlyedu (SDG 4)
oecdpov (SDG 1) accounts (SDG 8) oecdpov (SDG 1) netacc (SDG 9) oecdpov (SDG 1) elder (SDG 10)
undernsh (SDG 2) u5mort (SDG 3) undernsh (SDG 2) lifee (SDG 3) undernsh (SDG 2) swb (SDG 3)
undernsh (SDG 2) pisa (SDG 4) wasting (SDG 2) ncds (SDG 3) wasting (SDG 2) lifee (SDG 3)
obesity (SDG 2) ncds (SDG 3) obesity (SDG 2) lifee (SDG 3) trophic (SDG 2) obesity (SDG 2)
crlyld (SDG 2) undernsh (SDG 2) snmi (SDG 2) crlyld (SDG 2) matmort (SDG 3) oecdpov (SDG 1)
matmort (SDG 3) lifee (SDG 3) matmort (SDG 3) swb (SDG 3) neonat (SDG 3) lifee (SDG 3)
u5mort (SDG 3) lifee (SDG 3) tb (SDG 3) u5mort (SDG 3) ncds (SDG 3) swb (SDG 3)
fertility (SDG 3) second (SDG 4) births (SDG 3) matmort (SDG 3) births (SDG 3) u5mort (SDG 3)
vac (SDG 3) u5mort (SDG 3) uhc (SDG 3) oecdpov (SDG 1) uhc (SDG 3) u5mort (SDG 3)
uhc (SDG 3) tb (SDG 3) uhc (SDG 3) ncds (SDG 3) uhc (SDG 3) vac (SDG 3)
uhc (SDG 3) swb (SDG 3) incomeg (SDG 3) oecdpov (SDG 1) smoke (SDG 3) ncds (SDG 3)
smoke (SDG 3) lifee (SDG 3) primary (SDG 4) swb (SDG 3) second (SDG 4) edat (SDG 5)
second (SDG 4) yneet (SDG 8) pisa (SDG 4) empop (SDG 8) socioec (SDG 4) pisa (SDG 4)
science (SDG 4) pisa (SDG 4) resil (SDG 4) pisa (SDG 4) familypl (SDG 5) fertility (SDG 3)
edat (SDG 5) fertility (SDG 3) edat (SDG 5) lfpr (SDG 5) edat (SDG 5) paygap (SDG 5)
lfpr (SDG 5) parl (SDG 5) lfpr (SDG 5) paygap (SDG 5) water (SDG 6) u5mort (SDG 3)
water (SDG 6) swb (SDG 3) sanita (SDG 6) u5mort (SDG 3) sanita (SDG 6) swb (SDG 3)
elecac (SDG 7) empop (SDG 8) cleanfuel (SDG 7) co2pc (SDG 13) co2twh (SDG 7) co2pc (SDG 13)
ren (SDG 7) cleanfuel (SDG 7) unemp (SDG 8) intuse (SDG 9) empop (SDG 8) intuse (SDG 9)
empop (SDG 8) mobuse (SDG 9) empop (SDG 8) govrev (SDG 17) yneet (SDG 8) empop (SDG 8)
intuse (SDG 9) accounts (SDG 8) lpi (SDG 9) empop (SDG 8) rdex (SDG 9) rdres (SDG 9)
rdres (SDG 9) articles (SDG 9) rdres (SDG 9) patents (SDG 9) netacc (SDG 9) empop (SDG 8)
adjgini (SDG 10) rentover (SDG 11) palma (SDG 10) rentover (SDG 11) pipedwat (SDG 11) water (SDG 6)
transport (SDG 11) swb (SDG 3) rentover (SDG 11) swb (SDG 3) co2pc (SDG 13) ncds (SDG 3)
co2import (SDG 13) ncds (SDG 3) cpma (SDG 14) fishstocks (SDG 14) cpma (SDG 14) trawl (SDG 14)
cpta (SDG 15) redlist (SDG 15) cpfa (SDG 15) fishstocks (SDG 14) safe (SDG 16) swb (SDG 3)
cpi (SDG 16) homicides (SDG 16) cpi (SDG 16) safe (SDG 16) prison (SDG 16) detain (SDG 16)
govex (SDG 17) uhc (SDG 3) govex (SDG 17) tertiary (SDG 4) govex (SDG 17) rdex (SDG 9)
govrev (SDG 17) govex (SDG 17)

Sources: Authors’ manual identification.

We also identify negative links or trade-offs that should be expected. Our trade-offs white list

is substantially smaller because establishing negative structural relations require highly contextual

information, unless is it self-evident, like in the case of industrial growth versus the environment.

We report them in table D.1.

17
Table D.2: White list of trade-offs

Origin Destination

elecac (SDG 7) co2twh (SDG 7)


elecac (SDG 7) co2pc (SDG 13)
empop (SDG 8) pm25 (SDG 11)
empop (SDG 8) co2pc (SDG 13)

Sources: Authors’ manual identification.

The specification of the white list does not force the estimation to yield a specific sign. Instead,

sparsebn takes the white lists and forces the algorithm to maintain those links in the estimated

network. It may be the case that some of these links come out with the opposite sign from the

expected one. We consider these to be false positives so we remove these links from the network in

an edge-correction procedure.

D.2 Edge correction

Besides eliminating links with an incorrect sign, we also remove negative edges between indicators

that belong to the same SDG. The intuition here is that trade-offs are likely to occur across topics

in different SDGs, not in the same one. While there is still the possibility of intra-SDG trade-offs,

we rather sacrifice them and allow this type of error (losing some true positives) than permitting a

large amount of false positives.

Finally, it is still possible that certain links have excessively large magnitudes in their weights,

i.e. outliers. We consider these to be false positives as such magnitudes are likely to be an artifact

from exogenously-produced high variance in the data. To eliminate these links, we establish weight

thresholds in the 5 and 95 percentiles of the weights of all the networks pooled together. If the weight

of a particular link lies below or above these thresholds, it is eliminated from its corresponding

network.

D.3 Imputation of missing observations

Like most statistical methods, sparsebn requires a balanced panel since it cannot produce estimates

with missing observations. Thus, we resort to a novel data imputation method created by de Wolff

18
et al. (2021). Traditional imputation methods consider linear inter and extrapolations, or some type

of clustering criterion across data from other indicators. The issue with these approaches is that

indicators often display non-linear dynamics, so traditional approaches fail to accurately account

for non-linear shifts and empirical variance (and parametric approaches like splines may be too

rigid). Today, Gaussian processes are considered the most reliable approach for data imputation

because the imputation does not try to fit a particular function to the data, but to find a function of

moments for point-specific distributions. This non-parametric approach can accommodate a wide

variety of non-linear empirical behaviors, while providing uncertainty estimates for each imputation.

The method developed by de Wolff et al. (2021) goes one step further and embeds Gaussian

processes with in a multi-input-multi-output framework that uses neural networks. This means

that, the imputation of the missing observations of indicator i in country k can be improved by

providing additional data on the same indicator i but from similar countries to k. We exploit this

virtue and construct groups with 3 reference countries whose data can be used to impute the missing

observations of a given country k. While this strategy may seem similar to the econometric practice

of pooling cross-national data, the size of our groups is considerable smaller (only 4 countries: the

country of interest k + the 3 reference ones). In addition, this imputation procedure is mainly used

for the estimation of the network, and to assign final values (from 2020) to those countries that

lack them.

The reference groups are unique to each country because we define them through a multidimen-

sional criterion. Under this criterion, we construct an index to rank the most similar countries to

k, and pick the top 3. For a given country i, the similarity index to another country j takes into

account:

• If both countries share a common border (borderk,h );

• if they belong to the same country group (groupk,h );

• their distance, weighted by population centers (distancek,h );

• the total imports of k from h (importsk,h ) and;

• the total exports from k to h (exportsk,h ).

19
To compute the similarity index, we employ trade and geographical data from the Centre

d’Etudes Prospectives et d’Informations Internationales (CEPII). Trade data on imports and ex-

ports between every country is provided by the CEPII BACI Database covering 2002 to 2018

(Gaulier and Zignago, 2010). The information on geographical proximity weighted by urban pop-

ulation centers is obtained from the CEPII GeoDist Database (de Sousa et al., 2012).

The variable borderk,h is binary and takes the value 1 if there is a shared border, and 0 otherwise.

Component groupk,h is also binary and becomes 1 if both countries belong to the same group (i.e.,

geographical cluster) and 0 otherwise. The term distancek,h is the geographical distance between

k and h, divided by the largest distance between k and any other country, and subtracted from

1. The value of importsk,h consists of the total number of imports received by country k from h,

divided by the maximum number of imports received by k from any country. Similarly, exportsk,h

consists of the total number of exports sent by country k to h, divided by the maximum number

of exports sent by k to any country. Finally, the similarity index is expressed as

similarityk,h = borderk,h + groupk,h + distancek,h + importsk,h + exportsk,h . (18)

We compute the similarity index for every pair of countries. Then, for a given country k, we

rank all other countries according to the index. We select the top 3 most similar countries to k,

and create a mini-pooled dataset that includes these nations and k (i.e., a group composed by 4

countries in total). Once the the pooled dataset of a single indicator has been built, we perform

the imputation procedure for country k only. Finally, before proceeding to estimate the networks,

we make sure that the imputed extrapolations are bound to the indicators’ theoretical limits. For

this, we develop a variance correction procedure that we explain next.

D.4 Variance correction

To correct imputed extrapolations that lie beyond the theoretical limits of an indicator, we perform

a variance compression procedure that preserves the periodicity of the extrapolations, but re-

normalizes the imputed data points in order to bound them to the limits established in the SDR

dataset. We apply this procedure also to those extrapolations that, even if they remain within the

theoretical bounds, have a variance that exceeds the empirical one. By correcting the variance of

20
the extrapolations, we produce data imputations with a volatility that is closer to the empirical

one.

To explain the variance compression procedure, let us consider forward extrapolations. Given

a time series I2000,t covering the years {2000, . . . , t} and an extrapolation Et+1,2020 covering {t +

1, . . . , 2020}, we want to compress the extrapolation such that var(Et+1,2020 ) ≤ var(I2000,t ). We

perform this compression in a procedural fashion by iteratively re-normalizing Et+1,2020 by a factor

z / 1. The compression procedure for forward extrapolations is described in algorithm 3.

Algorithm 3: Variance compression pseudocode


Input: I2000,t and Et+1,2020
1 while var(Et+1,2020 ) > var(I2000,t ) or any value in Et+1,2020 lies beyond a theoretical limit
do
2 Et+1,2020 = I2000,t (t) + z[Et+1,2020 − I2000,t (t)];

Figure D.1 shows an example of the outcome of this procedure. In this case, the extrapolation

remains within the theoretical boundaries, but the variance was substantially larger with respect to

the one from the empirical time series. This variance may have been the result of large changes in

the time series of the reference group. Thus, with the compression algorithm we are able to preserve

the information on relative fluctuations and trend direction provided by the reference group, while

normalizing the imputed data to be consistent with the empirical one in terms of its volatility. The

same logic and algorithm applies to backward extrapolations.

Figure D.1: Example of variance compression

21
Now that we have corrected the imputations we proceed to estimate the networks. We take

advantage of the reference groups previously constructed in order to pool their data and create

longer first-difference series (with 80 observations in total). This helps sparsebn in producing

sparser graphs, which reduces the rate of potential false positive links. Again, even if a small

amount of data are pooled, the estimated networks are unique because each country has a unique

reference group and indicators.

Finally, let us remind the reader that none of these imputation, pooling, and correction proce-

dures are necessary for our method to work. We decide to do them as part of our empirical strategy,

which consists of trying to minimize the number of false-positives in the spillover network. Thus,we

would also apply them if we were using alternative methods to study SDG gaps. Thus, our method

does not depend on this data pre-processing, so it can be adapted to the particular needs of each

empiricist.

E Some calibration nuances

E.1 Development goals

The model assumes a given set of development goals for the government agent. This assumption

works well for prospective estimations such as the ones done for the SDG gaps, where public

documents provide precise values for each goal. However, revealed goals may not exist in the

case of historical data. We have two alternative ways to deal with this data limitation. The first

one is to assume that the final values of the data were the goals that the government wanted to

achieve, which is the strategy followed by Guerrero and Castañeda (2020b), and which is justified

in the context of Mexico’s government public discourse about emulating the OECD countries. The

second one (the one adopted here) is to produce a random allocation profile as initial condition,

disregarding any specific goal vector. In a Monte Carlo setting, this captures the uncertainty

(without particular priors) of the goals that the government could possibly have had. Formally, it

means that, in equation 14, the initial propensities are randomly determined. Overall, the model

is flexible enough to accommodate any type of goals or, more generally, any prioritization heuristic

that the user may want to employ for the government agent. Importantly, our particular modeling

choice has been guided by the principles of parsimony and evidence from prior work by political

22
scientists.

E.2 Negative trends

By design, the model simulates indicator dynamics with non-negative growth. There are two reasons

for this. First, from the point of view of government expenditure, generating improvements through

public spending is an intuitive way to think about the expenditure-indicator linkage. Second, it

makes the model more parsimonious and easier to calibrate on a multidimensional space (since

fitting dynamics with both positive and negative changes is extremely challenging from a bottom-

up modeling perspective). In our application, we assume that, if an empirical indicator has a

negative trend, it reflects a poor performance of the existing government program. This means

that the contribution of public spending to improving the indicator is poor or near to null.

In order to capture poor performance, we apply a simple data transformation procedure to

those indicators showing negative trends. If, in the empirical data, indicator i presents a final value

lower than its initial value, then we replace the final value by largest one in the time series. This

transformation implies that a fall in the final observation may not reflect an ineffective government

program, but an exogenous event that moved the indicator away from its usual performance. If,

on the other hand, all the values in the time series are not higher than the initial observation, then

we establish a final value of I0,i + 10−3 .

E.3 Calibration algorithm

The algorithm presented in section 2.3 produces excellent goodness of fit results. Here, we would like

to discuss some details regarding its precision and scalability, especially in relation to an algorithm

previously developed by Guerrero and Castañeda (2020a). The proposed calibration procedure

outperforms the one of Guerrero and Castañeda (2020a) not only in precision, but also in speed.

Furthermore, the calibration is simultaneous for all parameters, as opposed to the ceteris paribus

approach from Guerrero and Castañeda (2020a).

Like in Guerrero and Castañeda (2020a), the precision with which the algorithm can reduce the

error depends on the number M of Monte Carlo simulations run for each evaluation. This is due to

the stability of the average final values of the simulated indicators. Nevertheless, remarkably low

error levels can be achieved through, for example M = 1000, without the need to resort to parallel

23
computing. Panel (a) in Figure E.1 shows how the error decreases exponentially during the first

iterations of the algorithm, to then decrease further, but at a slower rate. Clearly, running more

Monte Carlo Simulations allow achieving a lower error, but at a computational cost. Nevertheless,

we implement an adaptive algorithm that increases M as the average error falls, so it performs

a larger amount of simulations only when needed. Panel (b) shows the same decaying dynamics,

but at the level of each indicator, for M = 1000. Finally, panel (c) depicts the distribution of

indicator-level errors (not in absolute values) across the entire dataset.

Figure E.1: Calibration algorithm performance

(a) Precision & sample size (b) Indicators’ errors (c) Distribution of errors
0.5 4000
10 samples
10 1 100 samples 3500
1000 samples 0.4
3000
indicator error
average error

0.3 2500

frequency
10 2 2000
0.2
1500
0.1 1000
500
10 3
0.0
0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 0.015 0.010 0.005 0.000 0.005 0.010 0.015
iteration iteration error

Notes: Panel (a) shows the evolution of the average error for different numbers of Monte Carlo simulations using data
from Mexico. Panel (b) shows the evolution of the indicator-specific errors for M = 1000 using data from Mexico.
Panel (c) presents the distribution of all the errors (not in absolute values) calculated across the entire sample after
calibration.
Sources: Authors’ own calculations.

F Goodness of fit

Table F.1 presents the calculations of goodness of fit for each country. In addition we provide the

standard deviation, minimum, and maximum values for the indicator-specific GoFα metrics. The

reader may consult the complete results in the data files provided in github.com/oguerrer/SDG_

feasibility.

24
Table F.1: Model goodness of fit

Country GoFβ GoFαi stdαi minαi maxαi Country GoFβ GoFαi stdαi minαi maxαi

AFG 0.9989 0.9960 0.003 0.986 0.9999 AGO 0.9991 0.9943 0.004 0.984 1.0000
ALB 0.9998 0.9937 0.006 0.970 0.9998 ARE 0.9985 0.9957 0.003 0.986 0.9997
ARG 0.9990 0.9933 0.005 0.981 0.9999 ARM 0.9995 0.9943 0.004 0.982 0.9997
AUS 0.9997 0.9926 0.005 0.975 1.0000 AUT 0.9992 0.9940 0.005 0.977 0.9999
AZE 0.9997 0.9957 0.003 0.982 0.9998 BDI 0.9992 0.9944 0.004 0.983 0.9999
BEL 0.9993 0.9933 0.005 0.982 0.9998 BEN 0.9997 0.9951 0.004 0.984 0.9999
BFA 1.0000 0.9965 0.003 0.984 1.0000 BGD 0.9992 0.9940 0.006 0.976 1.0000
BGR 0.9992 0.9953 0.003 0.981 0.9993 BHR 0.9995 0.9956 0.004 0.985 0.9997
BIH 0.9998 0.9933 0.005 0.981 0.9995 BLR 0.9992 0.9949 0.005 0.980 1.0000
BOL 1.0000 0.9949 0.004 0.985 0.9999 BRA 0.9993 0.9914 0.005 0.978 0.9996
BWA 0.9976 0.9933 0.005 0.976 0.9995 CAF 0.9974 0.9920 0.007 0.973 0.9999
CAN 0.9997 0.9904 0.009 0.951 0.9999 CHE 0.9991 0.9918 0.006 0.972 0.9999
CHL 0.9998 0.9951 0.004 0.979 0.9999 CHN 1.0000 0.9942 0.004 0.983 0.9998
CIV 0.9988 0.9948 0.004 0.983 0.9998 CMR 0.9976 0.9945 0.004 0.984 1.0000
COG 0.9993 0.9957 0.003 0.985 0.9999 COL 0.9986 0.9949 0.004 0.984 0.9999
CRI 0.9997 0.9937 0.005 0.979 1.0000 CYP 0.9988 0.9544 0.091 0.636 0.9989
CZE 0.9969 0.9933 0.005 0.981 1.0000 DEU 0.9981 0.9896 0.007 0.973 0.9999
DNK 0.9995 0.9929 0.005 0.973 0.9999 DOM 0.9996 0.9952 0.004 0.979 1.0000
DZA 0.9978 0.9939 0.005 0.980 1.0000 ECU 0.9998 0.9946 0.005 0.978 1.0000
EGY 0.9987 0.9939 0.004 0.984 1.0000 ERI 0.9986 0.9934 0.005 0.982 0.9997
ESP 0.9996 0.9942 0.005 0.982 0.9998 EST 0.9996 0.9964 0.003 0.987 1.0000
FIN 0.9990 0.9858 0.026 0.798 0.9999 FRA 0.9989 0.9927 0.006 0.971 1.0000
GAB 0.9993 0.9950 0.004 0.983 0.9999 GBR 0.9988 0.9905 0.009 0.946 0.9998
GEO 0.9990 0.9951 0.004 0.980 1.0000 GHA 0.9985 0.9958 0.004 0.982 0.9999
GIN 0.9987 0.9941 0.004 0.986 0.9996 GMB 0.9987 0.9953 0.004 0.984 1.0000
GNB 0.9989 0.9954 0.003 0.984 0.9998 GRC 0.9996 0.9942 0.004 0.981 0.9998
GTM 0.9978 0.9933 0.005 0.978 0.9998 HND 0.9989 0.9951 0.005 0.979 0.9999
HRV 0.9996 0.9954 0.003 0.986 1.0000 HTI 0.9938 0.9785 0.037 0.807 0.9989
HUN 0.9988 0.9941 0.004 0.980 0.9999 IDN 1.0000 0.9948 0.004 0.982 0.9999
IND 0.9995 0.9960 0.003 0.988 0.9999 IRL 0.9999 0.9948 0.004 0.981 1.0000
IRN 0.9997 0.9943 0.004 0.981 0.9999 IRQ 0.9969 0.9945 0.005 0.978 1.0000
ISR 0.9993 0.9520 0.095 0.524 0.9999 ITA 0.9997 0.9941 0.005 0.975 0.9996
JAM 0.9973 0.9786 0.036 0.869 0.9996 JOR 0.9994 0.9936 0.005 0.980 1.0000
JPN 0.9964 0.9454 0.099 0.499 0.9997 KAZ 0.9997 0.9946 0.004 0.982 0.9999
KEN 0.9975 0.9940 0.004 0.981 1.0000 KGZ 0.9988 0.9939 0.005 0.978 0.9995
KHM 0.9982 0.9958 0.003 0.987 1.0000 KOR 0.9985 0.9330 0.134 0.441 0.9998
KWT 0.9978 0.9920 0.007 0.971 1.0000 LAO 0.9993 0.9956 0.003 0.987 0.9996
LBN 0.9997 0.9935 0.005 0.979 0.9997 LBR 0.9992 0.9931 0.005 0.978 0.9998
LKA 0.9986 0.9935 0.005 0.978 0.9999 LSO 0.9981 0.9935 0.005 0.979 1.0000
LTU 0.9999 0.9955 0.003 0.989 0.9998 LVA 0.9998 0.9955 0.004 0.987 1.0000
MAR 0.9996 0.9963 0.003 0.989 0.9999 MDA 0.9990 0.9941 0.005 0.979 1.0000
MDG 0.9990 0.9946 0.005 0.976 0.9997 MEX 0.9983 0.9932 0.006 0.976 1.0000
MKD 0.9995 0.9931 0.005 0.977 0.9997 MLI 0.9999 0.9964 0.003 0.984 1.0000
MMR 0.9964 0.9930 0.005 0.984 0.9999 MNG 0.9994 0.9954 0.004 0.986 0.9999
MOZ 0.9988 0.9942 0.005 0.982 0.9999 MRT 0.9993 0.9951 0.004 0.977 0.9995
MUS 0.9995 0.9926 0.006 0.970 1.0000 MWI 0.9996 0.9933 0.006 0.972 0.9996
MYS 0.9998 0.9817 0.036 0.785 0.9996 NAM 0.9986 0.9964 0.003 0.987 0.9999
NER 0.9994 0.9957 0.003 0.983 1.0000 NGA 0.9987 0.9946 0.004 0.987 0.9998
NIC 0.9958 0.9912 0.006 0.976 0.9995 NLD 0.9999 0.9797 0.030 0.854 0.9998
NOR 0.9980 0.9916 0.007 0.966 0.9999 NPL 0.9982 0.9951 0.004 0.984 0.9999
NZL 0.9985 0.9901 0.008 0.952 0.9999 OMN 0.9990 0.9948 0.005 0.974 0.9996

Continued . . .

25
Table F.1: Model goodness of fit (continued . . .)

Country GoFβ GoFαi stdαi minαi maxαi Country GoFβ GoFαi stdαi minαi maxαi

PAK 0.9996 0.9930 0.005 0.984 0.9999 PAN 0.9998 0.9946 0.005 0.975 0.9998
PER 0.9987 0.9944 0.005 0.974 0.9997 PHL 0.9997 0.9945 0.004 0.984 0.9996
POL 0.9985 0.9947 0.004 0.978 0.9998 PRT 0.9998 0.9940 0.004 0.982 1.0000
PRY 0.9998 0.9953 0.004 0.986 1.0000 QAT 0.9990 0.9937 0.004 0.983 0.9990
RUS 0.9997 0.9935 0.005 0.981 1.0000 RWA 0.9984 0.9960 0.004 0.983 1.0000
SAU 0.9998 0.9957 0.003 0.986 0.9998 SDN 0.9975 0.9938 0.005 0.978 0.9996
SEN 0.9994 0.9961 0.003 0.985 1.0000 SGP 0.9959 0.9665 0.073 0.624 0.9993
SLE 0.9974 0.9930 0.005 0.979 1.0000 SLV 0.9990 0.9952 0.004 0.985 0.9998
SVK 0.9998 0.9940 0.004 0.981 0.9999 SVN 0.9996 0.9945 0.004 0.984 0.9995
SWE 0.9986 0.9786 0.036 0.794 0.9998 SWZ 0.9996 0.9953 0.003 0.986 0.9998
TCD 0.9988 0.9938 0.004 0.981 0.9995 TGO 0.9998 0.9956 0.003 0.988 0.9999
THA 0.9988 0.9944 0.004 0.984 0.9999 TJK 0.9969 0.9937 0.006 0.975 0.9999
TKM 0.9961 0.9803 0.031 0.801 0.9997 TUN 0.9999 0.9944 0.004 0.980 0.9999
TUR 0.9985 0.9927 0.006 0.971 0.9999 TZA 0.9992 0.9951 0.004 0.980 0.9990
UGA 0.9999 0.9950 0.005 0.980 1.0000 UKR 0.9990 0.9935 0.006 0.975 0.9999
URY 0.9972 0.9933 0.005 0.981 0.9991 USA 0.9975 0.9823 0.026 0.888 1.0000
UZB 0.9990 0.9931 0.005 0.980 0.9999 VEN 0.9975 0.9926 0.006 0.975 0.9999
VNM 0.9995 0.9955 0.004 0.981 0.9999 ZAF 0.9988 0.9943 0.005 0.980 0.9998
ZMB 0.9983 0.9953 0.004 0.981 0.9999 ZWE 0.9985 0.9928 0.006 0.976 0.9997

Sources: Authors’ own calculations.

G Validation

In computational simulation models, validation can be tackled at multiple levels. The work of

Carley (1996) is a classic reference on this topic. In the context of the model developed in this

paper, Castañeda et al. (2018); Guerrero and Castañeda (2020a) present several levels of validation

that are consistent with Carley’s view. In this appendix, we present a new validation exercise

that highlights the importance of the behavioral component of the model in order to produce

governance-related outcomes that are consistent with an independent data source.

As explained in the main text, policymaking agents determine a contribution level Ci ≤ Pi

every period. This means that the resources Di = Pi − Ci are diverted for a private gain. Our

main interpretation for such diversions is corruption, a central challenge of the international public

governance agenda (World Bank, 2017; Izquierdo et al., 2018; OECD, 2019). Our validation pro-

cedure consists of demonstrating that the model is capable of accurately reproducing international

empirical patterns of corruption. Importantly, the calibration of the model does not intend to

optimize the parameters in order to generate such patterns. In fact, it cannot do it because the

26
model is calibrated for each country individually. Hence, by showing that the model’s endogenous

variable Di reproduces the empirical distribution of corruption across countries (from an indepen-

dent dataset) provides a strong case for external validity. Furthermore, by showing the sensitivity

of these results to modifications in the learning model, we also provide evidence of internal validity.

First, let us define the endogenous variable of corruption for a single simulation as

T n
1 XX
D= Pi,t − Ci,t . (19)
B t
i

For a set of M independent Monte Carlo simulations, the expected level of corruption is

M
1 X
D̄ = (1 − Dm ), (20)
M m

where we have applied the complement operator 1−Dm so that D̄ denotes better outcomes through

higher values.

We are interested in testing if D̄ correlates with an empirical indicator of corruption across

countries. Notice that the SDR dataset contains Transparency International’s Perception of Cor-

ruption Index for the 140 countries analyzed in the paper. We intentionally left this indicator out of

our study, since it is redundant with our endogenous variable of corruption. Thus, since it contains

data that was not used to calibrate the model, we can exploit it to test model’s validity.

Panel (a) in Figure G.1 shows a high Pearson correlation between the empirical indicator of

corruption and the one generated by the model (D̄) from 10000 Monte Carlo simulations. Remember

that the model has been calibrated for each country individually, so this cross-sectional match is not

the result of any fitting procedure. Next, recall that the agent’s learning process may be influenced

by two parameters related to public governance: the quality of monitoring and the quality of the

rule of law. The former affects the probability of being caught diverting funds. The latter sets

the size of the penalty incurred when caught. Both parameters are taken from the Worldwide

Governance Indicators, and are known to be strongly correlated to corruption indices. Therefore,

in the remaining panels of the Figure G.1, we show that the strong correlation between the empirical

and the simulated corruption variables is not trivially driven by the data on public governance.

First, we remove the learning model from the policymaking agents, and replace it with random

27
Figure G.1: Validation via corruption/inefficiencies

(a) Full model (b) No learning (c) Random governance

0.625 0.50015 = -0.06 0.5450


pval=0.45
model's corruption output

model's corruption output

model's corruption output


0.600 0.50010 0.5425
= 0.97 0.50005 0.5400
0.575 pval=0.00
0.550 0.50000 0.5375

0.525 0.49995 0.5350


0.49990 0.5325
0.500 = 0.65
0.475
0.49985 0.5300 pval=0.00
0.49980 0.5275
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
perception of corruption index perception of corruption index perception of corruption index

Notes: Panel (a) is obtained from the model presented in the main text. Panel (b) is the result of removing the
learning component of the model and replacing it with random choices of Ci . Panel (c) results from having the
learning component, but replacing the public governance parameters (obtained from empirical data) by random
values.
Sources: Perception of Corruption Index of Transparency International and authors’ own calculations.

choices of Ci from a uniform distribution in [0, Pi ]. Panel (b) shows that the correlation is entirely

lost. Next, let us put back the learning model, but not the empirical parameters of monitoring and

rule of law. Instead, the probability of being caught and the corresponding penalty are determined

every period through a uniform random draw in [0,1]. Panel (c) shows that a substantial portion of

the correlation is recovered through this procedure. This is an intriguing result because it suggests

that, even without empirical data on public governance, the model is able to produce a cross-

sectional distribution of corruption that resembles the empirical one. We believe that the reason

why this happens is that the SDG indicators contain implicit information about the efficiency with

which the resources are being used. This information is distilled into the model when calibrating

its parameters. Once calibrated, the learning model is sensitive to this information through the

proficiency component of the policymakers’ benefit function, something that we find remarkable.

H Robustness to the disbursement schedule

In the calibration procedure we assume that all the indicators reach their final values in T = 50

simulation periods (the disbursement schedule). This implies that the disbursement schedule is

being mapped to the number of years covered in the dataset. However, there exists the possibility

that the simulation results could be biased by this assumption. Hence, as a robustness test, we

modify the number of disbursement periods to 25 and to 100. In Figure H.1, we analyze if there

28
are significant changes in the average gaps estimated for 2030 when using these two new schedules,

in relation to the outcomes derived from the benchmark simulation from the main text. The

differences in estimated SDG gaps are minimal, hence our results are not sensitive to the chosen

disbursement schedule. An identical appraisal is obtained when looking at average SDG gaps in

specific countries. Here, we compare the simulation results presented in Figure 5 for T = 50, in

the main text, with those in Figures H.2 and H.3 for T = 25 and T = 100, respectively. These

differences, in absolute terms, are extremely narrow, as indicated by the calculation presented in

in Figure H.4. Even for the most sensitive indicators, the discrepancies are rather small, as shown

by the numbers in the colored squares.

Figure H.1: Distribution of differences in the 2030 SDG gaps under different disbursement schedules

(a) T = 25 (b) T = 100


3000 4000

2500 3500
3000
2000
frequency

2500
frequency

1500 2000

1000 1500
1000
500
500
0 0
3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0
difference in SDG gap (%) difference in SDG gap (%)
Notes: The difference is with respect to the benchmark estimations of T = 50.
Sources: Authors’ own calculations.

29
Figure H.2: Average SDG gaps by country under 25 disbursements

Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
development gap estimated for 2030 as a percentage of its goal

Notes: The bars denote the average SDG gap (across indicators) for each individual country. The dots correspond
to the 10 indicators with the largest estimated gaps. Each dot is colored according to the corresponding SDG of its
indicator. For precise estimates and confidence intervals of each individual indicator gap, see the data provided in
http://github.com/oguerrer/SDG_feasibility.
Sources: Authors’ own calculations.

30
Figure H.3: Average SDG gaps by country under 100 disbursements

Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
development gap estimated for 2030 as a percentage of its goal

Notes: The bars denote the average SDG gap (across indicators) for each individual country. The dots correspond
to the 10 indicators with the largest estimated gaps. Each dot is colored according to the corresponding SDG of its
indicator. For precise estimates and confidence intervals of each individual indicator gap, see the data provided in
http://github.com/oguerrer/SDG_feasibility.
Sources: Authors’ own calculations.

31
Figure H.4: Robustness to different disbursement schedules

Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO 1.07 1.10 AFG 0.61 0.79 ARG 0.52 0.53 AUS 0.55 0.34
BDI 0.33 0.69 ALB 0.59 1.20 BOL 0.75 0.75
BEN 0.39 0.34 ARM 1.45 1.26 AUT 0.38 0.53
BFA 0.40 0.81 BRA 0.70 0.60
BEL 0.47 0.55
AZE 0.54 1.19
CHL 0.65 0.64
BWA 0.81 0.45 BGR 0.68 1.06 CAN 0.94 0.81
CAF 0.44 0.49 BIH 0.67 0.39 COL 0.38 1.15
CIV 0.53 0.71 BLR 0.54 0.52 CRI 0.56 0.46 CHE 0.27 0.58
CMR 0.81 1.03 CYP 0.78 0.61 DOM 0.56 1.24 CZE 0.78 0.58
COG 0.94 2.28 GEO 0.53 1.21 ECU 0.83 1.12 DEU 0.51 0.82
ERI 0.18 0.50 HRV 0.36 0.56 GTM 0.57 0.79
GAB 0.57 1.87 DNK 0.26 0.50
KAZ 0.25 0.61 HND 0.69 1.10
GHA 0.62 0.82 KGZ 1.26 1.13 HTI 0.33 0.65 ESP 0.45 0.41
GIN 0.68 0.92 MDA 0.62 1.47 EST 0.43 0.53
GMB 0.29 0.84 JAM 0.57 0.70
MKD 0.34 0.59
MEX 0.83 1.36 FIN 0.41 0.53
GNB 0.34 0.62
RUS 0.25 0.48
KEN 0.60 0.77
TJK 0.77 0.69 NIC 0.74 0.87 FRA 0.49 0.66
LBR 0.32 0.67
TKM 1.11 1.90 PAN 0.62 1.14 GBR 0.37 0.53
LSO 1.02 1.49
TUR 0.50 1.03 PER 0.56 1.02
MDG 0.23 0.59 GRC 0.75 0.63
MLI 0.26 0.61 UKR 0.92 1.28 PRY 0.84 1.30
HUN 1.20 1.03
MOZ 0.31 0.67 UZB 1.23 1.80 SLV 0.55 1.17
BGD 0.62 1.18 URY 0.51 0.38 IRL 0.57 0.58
MRT 0.39 0.48
CHN 0.38 0.67
MUS 0.47 0.96 VEN 0.33 1.03 ISR 0.54 0.39
MWI 0.25 0.62 IDN 0.85 0.78
ARE 0.42 0.90 ITA 0.73 0.49
NAM 0.51 1.01 IND 0.52 0.62
BHR 0.39 0.55
JPN 0.43 0.33 LTU 0.40 1.03
NER 0.34 0.47 DZA 0.62 1.07
NGA 0.65 1.10 KHM 0.64 0.77 LVA 0.70 0.91
RWA 0.74 0.86 KOR 0.36 0.28 EGY 1.07 1.08
NLD 0.30 0.82
SDN 0.50 0.81 LAO 0.37 0.59 IRN 0.57 0.56
LKA 0.76 1.41 IRQ 0.48 1.10 NOR 0.22 0.63
SEN 0.23 0.62
SLE 0.28 0.50 MMR 0.36 0.79 JOR 0.71 1.02 NZL 0.31 0.39
SWZ 1.98 2.92 MNG 0.24 0.83 KWT 0.28 0.30 POL 0.67 0.87
TCD 0.73 0.73 MYS 0.90 2.15 LBN 0.31 0.64 PRT 0.55 0.51
TGO 0.33 0.64 NPL 0.57 1.00
MAR 0.58 1.01
TZA 0.34 0.94 PAK 0.41 0.67 SVK 0.56 0.99
UGA 0.59 0.82 PHL 0.49 0.73 OMN 0.26 0.29
SVN 0.73 0.62
ZAF 0.50 0.58 SGP 0.50 0.20 QAT 0.63 0.38
SAU 0.46 0.85 SWE 0.34 0.34
ZMB 0.86 0.64 THA 0.15 0.42
ZWE 0.63 0.79 VNM 0.56 0.72 TUN 0.70 1.11 USA 0.27 0.46

0 0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3
average absolute difference in terms of SDG gap (%)

Notes: The bars indicate the average absolute difference in estimated gaps (in percentage) between the benchmark
case and one where the model was calibrated with a different disbursement schedule (different number of simulation
periods T ). The dark bars are calculated using T = 25. The light bars are computed with T = 100 time series.
The comparison benchmark corresponds to T = 50. The solid squares on the right of each panel denote the color
of the SDG to which the most sensitive indicator belongs in the case of differences using T = 25. The hollow ones
correspond to T = 100. Their magnitudes are indicated inside each square.
Sources: Authors’ own calculations.

32
I Robustness to shorter time series

We perform a second test to show that our results are robust when re-calibrating the model with sub-

samples of the data. This is important because one may argue, for example, that investments done

during the 2000-10 decade must have produced structural changes, and this should be reflected in

better performance of the indicators during the 2010-20 decade. Thus, if the model is re-calibrated

using 2010-20 data, its structural parameters αi should induce faster indicator dynamics. These

new parameters, in turn, should produce SDG gap predictions that are substantially different from

the ones reported in the main text.

We explore this line of reasoning by using the most recent 5 or 10 years, instead of the 21 included

in the database. With these shorter time series, we re-estimate the network of interlinkages and

re-calibrate the model. Then, for each SDG gap (of each indicator and country), we compute the

difference between the original estimation (the one in the main text) and the one obtained from a

more recent sub-sample (recall that the units of the SDG gaps are in percentage with respect to

the goal).

Figure I.1: Distribution of differences in the 2030 SDG gaps under shorter time series

(a) Ten years (b) Five years


4000
3500 3500

3000 3000

2500 2500
frequency
frequency

2000 2000
1500 1500
1000 1000
500 500
0 0
40 30 20 10 0 10 20 30 40 40 20 0 20 40
difference in SDG gap (%) difference in SDG gap (%)
Notes: The difference is with respect to the benchmark estimations of 21 years of data.
Sources: Authors’ own calculations.

Panels (a) and (b) in Figure I.1 show the differences in the SDG gaps between the benchmark

from the main text and the one obtained from shorter time series. These differences are computed

at the level of each indicator of each country. As suggested by the highly zero-mean concentrated

33
histograms, these gaps show no significant differences, suggesting modest structural changes in the

long-term structural components of the model during the last two decades. Then, when comparing

the average gaps at the country level using the full sample–see Figure 5 in the main text–with

those obtained with reduced samples (Figures I.2 and I.3), no notorious difference emerge. Thus, if

there were substantial structural improvements in last decades, these should be discernible through

significantly smaller SDG gaps for the reduced datasets. Since this is not the case, our assumption of

capturing long-term structural factors in α1 , . . . , αN and our choice of 21 years of data are justified.

34
Figure I.2: Average SDG gaps by country inferred from 10 years of data

Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
development gap estimated for 2030 as a percentage of its goal

Notes: The bars denote the average SDG gap (across indicators) for each individual country. The dots correspond
to the 10 indicators with the largest estimated gaps. Each dot is colored according to the corresponding SDG of its
indicator. For precise estimates and confidence intervals of each individual indicator gap, see the data provided in
http://github.com/oguerrer/SDG_feasibility.
Sources: Authors’ own calculations.

35
Figure I.3: Average SDG gaps by country inferred from 5 years of data

Africa E. Europe & C. Asia East & South Asia LAC MENA West
AGO AFG ARG AUS
BDI ALB BOL
BEN ARM AUT
BFA BRA BEL
AZE CHL
BWA BGR CAN
CAF BIH COL
CIV BLR CRI CHE
CMR CYP DOM CZE
COG GEO ECU DEU
ERI HRV GTM
GAB DNK
KAZ HND
GHA KGZ HTI ESP
GIN MDA EST
GMB JAM
MKD MEX FIN
GNB RUS
KEN TJK NIC FRA
LBR TKM PAN GBR
LSO TUR PER
MDG GRC
MLI UKR PRY HUN
MOZ UZB SLV
BGD URY IRL
MRT CHN
MUS VEN ISR
MWI IDN ARE ITA
NAM IND BHR
JPN LTU
NER DZA
NGA KHM LVA
RWA KOR EGY NLD
SDN LAO IRN
LKA IRQ NOR
SEN
SLE MMR JOR NZL
SWZ MNG KWT POL
TCD MYS LBN PRT
TGO NPL MAR
TZA PAK SVK
UGA PHL OMN SVN
ZAF SGP QAT
SAU SWE
ZMB THA
ZWE VNM TUN USA
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
development gap estimated for 2030 as a percentage of its goal

Notes: The bars denote the average SDG gap (across indicators) for each individual country. The dots correspond
to the 10 indicators with the largest estimated gaps. Each dot is colored according to the corresponding SDG of its
indicator. For precise estimates and confidence intervals of each individual indicator gap, see the data provided in
http://github.com/oguerrer/SDG_feasibility.
Sources: Authors’ own calculations.

36
References

Aragam, B., Gu, J., and Zhou, Q. (2019). Learning Large-Scale Bayesian Networks with the

Sparsebn Package. Journal of Statistical Software, 91(11).

Carley, K. (1996). Validating Computational Models. Working Paper. CASOS Program, Pittsburgh,

PA.

Castañeda, G., Chávez-Juárez, F., and Guerrero, O. (2018). How Do Governments Determine

Policy Priorities? Studying Development Strategies through Networked Spillovers. Journal of

Economic Behavior & Organization, 154:335–361.

de Sousa, J., Mayer, T., and Zignago, S. (2012). Market Access in Global and Regional Trade.

Regional Science and Urban Economics, 42(6):1037–1052.

de Wolff, T., Cuevas, A., and Tobar, F. (2021). MOGPTK: The Multi-Output Gaussian Process

Toolkit. Neurocomputing, 424:49–53.

Dhami, S. (2016). The Foundations of Behavioral Economic Analysis. Oxford Univeristy Press,

Oxford.

Efron, B. (1981). Censored Data and the Bootstrap. Journal of the American Statistical Associa-

tion, 76(374):312–319.

Gaulier, G. and Zignago, S. (2010). BACI: International Trade Database at the Product-Level.

Technical Report 2010-23, CEPII.

Guerrero, O. and Castañeda, G. (2020a). Policy Priority Inference: A Computational Framework

to Analyze the Allocation of Resources for the Sustainable Development Goals. Data & Policy,

2.

Guerrero, O. and Castañeda, G. (2020b). Quantifying the Coherence of Development Policy Pri-

orities. Development Policy Review, 00:1–26.

Guerrero, O. and Castañeda, G. (2021). Does expenditure in public governance guarantee less

corruption? Large non-linearities and complementarities of the rule of law. Economics of Gov-

ernance, forthcoming.

37
Izquierdo, A., Pessino, C., and Vuletin, G., editors (2018). Better Spending for Better Lives: How

Latin America and the Caribbean Can Do More with Less. Inter-American Development Bank.

OECD (2019). Governance as an SDG Accelerator: Country Experiences and Tools. OECD Pub-

lishing.

Ospina-Forero, L., Castañeda Ramos, G., and Guerrero, O. (2020). Estimating Networks of Sus-

tainable Development Goals. Information and Management.

World Bank (2017). World Development Report 2017: Governance and the Law. International

Bank for Reconstruction and Development / The World Bank, Washington, D.C.

38

You might also like