You are on page 1of 45

Journal Pre-proof

Embracing multiple perspectives of sustainable development in a composite measure:


The Multilevel Sustainable Development Index

Claudia Lemke, Karola Bastini

PII: S0959-6526(19)33754-0
DOI: https://doi.org/10.1016/j.jclepro.2019.118884
Reference: JCLP 118884

To appear in: Journal of Cleaner Production

Received Date: 22 May 2019


Revised Date: 13 September 2019
Accepted Date: 13 October 2019

Please cite this article as: Lemke C, Bastini K, Embracing multiple perspectives of sustainable
development in a composite measure: The Multilevel Sustainable Development Index, Journal of
Cleaner Production (2019), doi: https://doi.org/10.1016/j.jclepro.2019.118884.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Ltd.


Embracing multiple perspectives of sustainable
development in a composite measure:
The Multilevel Sustainable Development Index

Claudia Lemke
Affiliation:
Technical University of Berlin
Strasse des 17. Juni 136
10623 Berlin
Germany
c.lemke@tu-berlin.de
Phone: +49 (0)30 314 21660
Fax: +49 (0)30 314 25040

Acknowledgements:
None

JProf. Dr. Karola Bastini


(Corresponding author)
Affiliation:
Technical University of Berlin
Strasse des 17. Juni 136
10623 Berlin
Germany
karola.bastini@tu-berlin.de
Phone: +49 (0)30 314 25615
Fax: +49 (0)30 314 25040

Acknowledgements:
None

1
Embracing multiple perspectives of sustainable
development in a composite measure:
The Multilevel Sustainable Development Index

Abstract

To manage progress toward the Sustainable Development Goals (SDGs),


sustainability indicators and indices are useful analytical tools. Indices proposed in
the literature rarely comprise all three contentual domains of sustainable
development (environment, society, and economy), fail to enable comparisons of
agents at multiple levels, and exhibit methodological shortcomings. Comparability of
micro, meso, and macro agents is crucial because the macro SDGs can only be
achieved if micro and meso agents contribute. Contributions must be measured in a
methodologically sound manner to avoid misled decision making. To overcome these
deficiencies, we develop the comprehensive and comparable Multilevel Sustainable
Development Index (MLSDI). A novel information-theoretic algorithm is shown to
outperform established multivariate statistical weighting methods such as the
principal component analysis (PCA). The application of the MLSDI to 62 industries in
the German economy reveals that the chemical industry requires urgent action with
regard to environmental performances, and the agricultural industry demands
assistance and action in all three domains.

Keywords: Sustainability; Sustainable Development Goals; composite indicators;


multilevel perspective; principal component analysis; information theory

1
1 INTRODUCTION
Sustainable development with its three interconnected domains – environmental
protection, social development, and economic prosperity – has become highly
integrated into corporate and political leadership and agenda setting. The
Sustainable Development Goals (SDGs) frame the objectives of the three domains to
be achieved by 2030.1 To assess and monitor progress toward the SDGs, measuring
their 169 targets and 234 indicators (UN, 2018a, 2018b) is essential for governments
around the world.
Indicator sets are central for sustainability assessment because they cover a wide
range of environmental, social, and economic aspects (Almássy and Pintér, 2018).
To enhance their impact on policy and outcomes, composite measures derived from
indicator sets yield several advantages. An index is a compressed description of a
multidimensional state (Ebert and Welsch, 2004), which reduces the broad range of
topics and complexity (Bell and Morse, 2018). The important focus in measurement
is recaptured (Griggs et al., 2014), combating the disadvantage of a rich indicator set
to potentially cause more confusion than understanding (Wu and Wu, 2012).
Furthermore, with aggregation of single indicators into an index, interactions of
individual sustainable development elements can be explored (Costanza et al., 2016;
Hahn and Figge, 2011). These interactions are prerequisites for the effectiveness of
coordinated actions and for maximizing progress on the SDGs. However, they have
not yet been fully understood and are thus the subject of current research (Weitz et
al., 2018).
Against this background, sustainable development indicators and indices derived
from them should fulfil two conceptual requirements. First, all three domains must be
included (e.g., Hacking and Guthrie, 2008) because the domains are interdependent
and mutually reinforcing, requiring a simultaneous and integrated consideration (e.g.,
Costanza et al., 2016). Second, comparable and reliable assessment of agents of
any aggregational size must be performed, because sustainable development is a
society-level concept (Hahn et al., 2015). The SDGs are macro-level goals, which
can only be achieved if micro and meso agents contribute to them (Griggs et al.,
2014; Sachs, 2012). Research on sustainability transitions has emphasized the roles
of particular agents involved in dynamic processes within and between different

1
A fourth component – institutions – is disregarded because the major macro and meso reporting
frameworks applied only distinguish between three and not four domains (see Section 3.4).

2
levels (Köhler et al., 2019). The multilevel perspective according to Rotmans et al.
(2001) organizes these perspectives into three levels,2 delimiting individuals (micro),
organizations such as corporations (meso), and conglomerates of organizations such
as industries or overall economies (macro). This multilevel perspective is crucial for a
holistic sustainable development assessment, because it characterizes agents that
are responsible for making progress toward the SDGs. To the best of our knowledge,
sustainable development indices that integrate the multilevel perspective are not
represented in the academic literature.
To address this gap, we develop the Multilevel Sustainable Development Index
(MLSDI), which is comprehensive and can be meaningfully applied to different
aggregational levels. It includes environmental, social, and economic indicators,
tackling an enhanced understanding of the interactions of individual sustainable
development elements. As a result, contributions of micro and meso agents to the
macro SDGs can be measured, and coordinated actions for sustainable development
can be managed. Furthermore, the MLSDI addresses and overcomes several
methodological shortcomings identified in the literature (Böhringer and Jochem,
2007; Mayer, 2008). These include inadequate indicator weighting techniques and
composite measure aggregation as well as the lack of sensitivity analysis. These
methodological shortcomings may result in misdirected decision making.
We explore three statistical weighting methods: A novel information-theoretic
approach – the maximum relevance minimum redundancy backward algorithm
(MRMRB) – is tested against established weighting methods such as principal
component analysis (PCA) and partial triadic analysis (PTA). Mathematical
aggregation rules are obeyed and further sensitivities are tested.
The remainder of this paper is structured as follows. Section 2 reviews the state-of-
the-art sustainable development indices and discusses their scopes and
methodological approaches against assessment principles. Section 3 derives the
methodology of the MLSDI, and Section 4 illustrates its application to a sample of 62
industries in the German economy, several aggregated branches, and the overall
economy. Section 5 summarizes and reflects on the implications and limitations of
the current study.

2
In a more recent conceptualization, the multilevel perspective differentiates between landscapes
(macro), regimes (meso), and niches (micro), within which technological change occurs (Geels, 2010,
2002; Loorbach, 2007; Smith et al., 2010). This specification of the multilevel perspective is not
considered here because it refers to technological change.

3
2 LITERATURE REVIEW
In this section, we review sustainable development indices proposed in the literature.
Generally, only weighted composite measures that encompass the three contentual
domains are capable of quantifying sustainable development in its entirety. Thus,
weighted composite measures constitute the scope of this review; adjusted economic
measures such as the Genuine Progress Indicator (GPI, formerly: Index of
Sustainable Economic Welfare, ISEW) or subjective measures of well-being are
disregarded (Costanza et al., 2014; Lawn, 2003).
The evaluation of the identified weighted composite indicators, i.e., sustainable
development indices, is performed along their calculation steps and is guided by
assessment principles proposed in the literature. The calculation steps are illustrated
in Figure 1.

Figure 1. Calculation steps of a sustainable development index.

Core assessment principles for selecting indicators convey that the indicator base
should:

− be comprehensive in covering the three contentual domains (Hacking and


Guthrie, 2008; Pintér et al., 2018; Sala et al., 2015),
− include efficiency and effectiveness measures, referring to relative and
absolute measures, respectively (Figge and Hahn, 2004),
− be target- and boundary-oriented, indicating distances to goals (Sala et al.,
2015).
Both indicators and indices should enable comparability and benchmarking across
multiple dimensions (Becker et al., 2017; Esty, 2018; Hacking and Guthrie, 2008;

4
Pintér et al., 2018; Sala et al., 2015), including comparability within and across
aggregational levels.
Core assessment principles for constructing the index purport that the index should:

− reflect interconnections of goals and themes (Costanza et al., 2016; Hacking


and Guthrie, 2008; Sala et al., 2015), and indicate relevance (Janoušková et
al., 2018), both reflected in the indicators’ weights,
− be effective in communication (Pintér et al., 2018),
− be methodologically sound, which is characterized by objectivity, robustness
(Sala et al., 2015), and credibility, i.e., the scientific and technical adequacy of
measurement (Cash et al., 2003; Parris and Kates, 2003) throughout the
calculation steps.
Among the numerous indices proposed in the literature, two meso- and seven macro-
level indices are potentially expedient for comprehensively measuring contributions
to the SDGs. These indices generally lack multilevel comparability and benchmarking
of micro, meso, and macro agents because their indicator selection focuses on a
single level. No multilevel indices could be identified, emphasizing the contribution of
this work. Moreover, micro-level indices are absent, constituting a research gap
discussed in Section 5. Table 1 reviews the indices and their compliance with core
assessment principles.

5
Calculation step DJSI ICSD FEEM SI HSDI MISD SDGI SDI SSI WI
Assessment principle
1. Collection of key 81 indicators 38 indicators 19 indicators 4 indicators 31 indices 77 indicators 12 indicators 21 indicators 86 indicators
figures (to derive (based on (based on
indicators) GRI) SDGs)
Comprehensiveness Y P Y N U Y Y Y Y
Multilevel N N N N N N N N N
comparability
2. Preparation of U N N N Transformation N N N N
key figures to normality
Methodological U U U U Y U U U U
soundness
3. Missing value U N N N Multiple P Mean Expert P
imputation imputation imputation judgement
Methodological U N N N Y N N N N
soundness
4. Standardization: U Y, e.g., per Y, e.g., per Y, e.g., per U Y, e.g., per Y, e.g., per Y, e.g., per Y, e.g., per
calculation of key units of GDP capita capita GDP capita GDP
indicators production
Agent comparability U N Y Y U Y Y Y Y
Efficiency and Y Y Y Y U Y Y Y Y
effectiveness
5. Outlier detection U N Lower weights N N Truncate the N Thresholds on Truncate the
and treatment on outliers bottom below skewness and top, exceeding
the 2.5 kurtosis; the base-best
percentile nonlinear scale by 90%
transformation
Methodological U N N N N P N N P
soundness
6. Scaling U Continuous Discrete Continuous Continuous Continuous Continuous z- Continuous Continuous
ratio scaling rescaling rescaling rescaling rescaling scores rescaling rescaling
Comparability U Y Y Y Y Y Y Y Y

6
continued
Calculation step DJSI ICSD FEEM SI HSDI MISD SDGI SDI SSI WI
Assessment principle
Target orientation N Y Y Y U Y N Y Y
Effective in U N Y Y Y Y N Y Y
communication
Methodological U N N Y Y Y Y Y Y
soundness
7. Weighting U Analytical Experts’ Equal FA Top-down PCA Top-down Arbitrary
hierarchy elicitation weighting equal equal weighting
process weighting weighting
Comparability N (floating/ Y (constant Y (constant Y (constant Y (constant Y (constant Y (constant Y (constant Y (constant
industry- weights) weights) weights) weights) weights) weights) weights) weights)
specific
weights)
Interconnection of U Y Y N Y N Y N P
goals and relevance
Methodological U N (subjective) N (subjective) N (not N (not N (not Y N (not N (subjective
soundness credible) credible) credible) credible) and not
credible)
8. Aggregation U Arithmetic Choquet Geometric Geometric Arithmetic Arithmetic Geometric Arithmetic
aggregation integral aggregation aggregation aggregation aggregation aggregation aggregation
Methodological U N (violation of N (subjective) Y Y N (violation of N (violation of Y N (violation of
soundness aggregation aggregation aggregation aggregation
rules) rules) rules) rules)
9. Sensitivity U N Weighting N N Outlier and N Weighting N
analyses aggregation
Methodological N N P N N Y N P N
soundness
Table 1. Meso and macro weighted composite measures and assessment principle compliance of their methodological approaches.
Y: Yes; P: Partially; N: No; U: (cannot be assessed due to) unavailable information; DJSI: Dow Jones Sustainability Indices; FA: factor analysis; FEEM SI: FEEM
Sustainability Index; GDP: gross domestic product; GRI: Global Reporting Initiative; HSDI: Human Sustainable Development Index; ICSD: Composite
Sustainable Development Index; MISD: Mega Index of Sustainable Development; PCA: principal component analysis; SDI: Sustainable Development Index;
SDGI: Sustainable Development Goal Index; SDGs: Sustainable Development Goals; SSI: Sustainable Society Index; WI: Well-being Index.

7
2.1 Meso-level sustainable development indices
The Dow Jones Sustainability Indices (DJSI) aim to provide investors with
benchmarks of corporate sustainability performance for managing their sustainability
investment portfolios (RobecoSAM, 2018; S&P Dow Jones Indices, 2019, 2018). A
lack of transparent information on the data input and methodology hamper the DJSI’s
evaluation. Examining the available information, it seems that the DJSI are
comprehensive, involve both efficiency and effectiveness indicators but are neither
target-oriented nor comparable, given their floating and industry-specific weights.
Sensitivity analyses are not mentioned, such that robustness and credibility are
questionable (Sala et al., 2015). The detailed methodology for data cleaning (i.e.,
treatment of missing values and outliers), scaling, weighting, and aggregation
remains unknown.
The Composite Sustainable Development Index (ICSD) intends to monitor corporate
contributions to sustainable development (Krajnc and Glavič, 2005). Data input is
based on Global Reporting Initiative (GRI) indicators, generally ensuring
comprehensiveness and credibility of the base data. However, the social domain is
not represented sufficiently. For example, aspects concerning equality (SDG 5,
“Gender equality”) are missing. Efficiency and effectiveness are mapped, and targets
are set, but several methodological deficits are present: data are not cleaned,
entailing statistical biases (e.g., Little and Rubin, 2002). Agent comparability within
the meso level is not ensured because indicators are standardized to units of
production and are thus only comparable to agents with equivalent production. Ratio
scaling is deployed but involves mathematical inconsistencies (Pollesch and Dale,
2016). Objectivity is neglected by the subjective weighting factors of decision makers
derived from an analytical hierarchy process (Saaty, 2001). Arithmetic aggregation is
applied on ratio scales, violating Ebert and Welsch’s (2004) rules for meaningful
aggregation. Last, sensitivities are not tested.

2.2 Macro-level sustainable development indices


The FEEM Sustainability Index (FEEM SI) is designed to support political target
setting by projecting future evolutions of macroeconomic contributions to sustainable
development (Carraro et al., 2013; Pinar et al., 2014). It is comprehensive, includes

8
targets by definition, as well as absolute and relative measures. Comparability within
the macro level is provided by standardizing key figures to, for example, the gross
domestic product (GDP). However, missing values are not imputed; scales are
discrete, leading to information loss (Yang and Webb, 2009; Zhou et al., 2010);
weights and aggregation are based on subjective methods; and sensitivities are
tested for a single calculation step only.
The Human Sustainable Development Index (HSDI, formerly Human Development
Index, HDI) aims to monitor macro-level human development (Bravo, 2018; Togtokh,
2011). It is an aggregate of four indicators and therefore not able to comprehensively
depict sustainable development. On the methodological side, it continuously scales
indicators between zero and one, a credible method that ensures effective
communication with its easy interpretation (Pollesch and Dale, 2016). The geometric
mean is applied, complying with Ebert and Welsch’s (2004) rules for meaningful
aggregation. However, missing values and outliers are not dealt with, and equal
weighting is applied. Equal weights are “universally considered to be wrong”
(Chowdhury and Squire, 2006) because they do not tackle interconnections and
relevance of indicators. Moreover, sensitivities are not investigated.
The Mega Index of Sustainable Development (MISD) is a function of 31 known
indices (Shaker, 2018, 2015), and thus difficult to evaluate. Concentrating on the
MISD’s methodology, it overcomes other indices’ methodological shortcomings in
terms of missing value imputation and weighting. First, the MISD fills missing values
by multiple imputation, reducing statistical biases; second, it determines weights by
multivariate statistical analysis, which is generally the preferred field of methods
(Mayer, 2008). However, deriving weights by factor analysis (FA) is not suitable for
construction of a sustainable development index. FA is a top-down approach
(Haerdle and Simar, 2012), but bottom-up approaches are required (Mayer, 2008)
because sustainable development only becomes defined when measured (Bell and
Morse, 2008). Furthermore, outliers remain untreated, and sensitivities are not
investigated.
The Sustainable Development Goal Index (SDGI) stands out in being clearly linked to
the SDGs: Its purpose is to assess countries’ baselines for the SDGs (Schmidt-Traub
et al., 2017). It is comprehensive, and targets are included in terms of the SGD
agenda. Generally, the SDGI does not treat missing values to draw attention to
missing data. Exceptions are made, and cold deck or mean imputation are carried

9
out. Statistical biases remain with these methods (Rässler et al., 2013). Outliers are
only treated at the bottom, but two-sided treatment is required (Aggarwal, 2017).
Further deficits are deployment of top-down equal weighting and arithmetic
aggregation. However, robustness is claimed by testing sensitivities for outlier
treatment and aggregation.
The Sustainable Development Index (SDI) aims to quantify the macro-level
sustainable development of countries (Bolcárová and Kološta, 2015). Absolute and
relative indicators are present, sound weighting is executed, applying bottom-up PCA
(Mayer, 2008), and missing values are imputed. However, the chosen mean
imputation leads to invalid inferences (Rässler et al., 2013); targets are not included;
outliers are not treated; z-score scaling is applied, which is difficult to interpret
(Pollesch and Dale, 2016); indicators are aggregated arithmetically; and sensitivities
are not investigated.
The Sustainable Society Index (SSI) also aspires to measure a country’s contribution
to macro-level sustainable development (Saisana and Philippas, 2012; van de Kerk
et al., 2014; van de Kerk and Manuel, 2008). Data are cleaned, but the chosen
methods are not sound: Missing value imputation is based on subjective expert
judgments, outliers are identified with thresholds on nonrobust skewness and
kurtosis (Aggarwal, 2017), and outliers are treated by nonlinear transformation.
Nonlinear transformation is harmful in index calculation because it changes variables’
correlations (Oh and Lee, 1994), which should be assessed in statistical weighting
procedures (Mayer, 2008) to investigate the interconnections and relevance of
indicators. The SSI does not deploy statistical weighting but top-down equal
weighting. Hence, nonlinear transformation is not harmful to the SSI, but equal
weighting is generally insufficient (see above). However, the SSI tests sensitivities of
weights and claims robustness.
Last, the Well-being Index (WI) intends to assess macro-level human and
environmental well-being (Prescott-Allen, 2001) and features the following
methodological shortcomings: insufficient data cleaning, arbitrary weighting,
arithmetic aggregation, and lack of sensitivity analyses.
Summarizing, the review yields the following conclusions:
1. If indices encompass the three contentual domains (a prerequisite of this
review), they are generally comprehensive.
2. Efficiencies and effectiveness are mostly mapped.

10
3. Targets and boundaries are mostly included. However, these are subjective,
corporate or policy targets. Their scientific derivation has emerged only
recently, and further research is needed (e.g., Haffar and Searcy, 2018).
4. Comparability within an aggregational level is generally ensured but
benchmarking of micro, meso, and macro agents is not. Multilevel indices do
not exist, and the reviewed indices’ scopes and objectives disable multilevel
applications.
5. Weighting methods are insufficient: Less than half of the reviewed indices
investigate the interconnections and relevance of indicators, and only one of
these does so in an objective and credible manner.
6. Aggregation methods are unsatisfactory: Only one third of the reviewed
indices perform objective and credible aggregation.
7. Sensitivity analyses are missing: Only one third of the reviewed indices
investigate sensitivities, and only one of these does so for more than one
calculation step.
8. Data cleaning is deficient: Only one index reduces the statistical bias of
missing values objectively, and none treats outliers credibly.
The following section develops the MLSDI, implementing the multilevel perspective
and overcoming the identified methodological deficiencies.

3 METHODOLOGY
The computation of a meaningful, methodologically sound index is very complex and
challenging (Ebert and Welsch, 2004; Fusco, 2015). Thus, profound methodological
research is carried out and described in this section. The subsections are structured
chronologically according to an index’s calculation steps (see Figure 1).

3.1 Collection of key figures


The first step in calculating the MLSDI is to collect the sustainable development key
figures. These critically determine the comprehensiveness and quality of an index
(Amor-Esteban et al., 2018; Mayer, 2008) and are inferred from the sustainable
development key indicators (see Section 3.4). The key figures must be applicable to
multiple levels (see Section 3.4; Rotmans et al., 2001), and macro-level data for

11
benchmarking must be available from official statistics, given their comparability and
easy open access acquisition (Zuo et al., 2017). The set of sustainable development
key figures, also called the set of class-5 indicators, is four-dimensional (Witjes et al.,
2017) and formally represented by:

5 , , , , (1)

where ∈ 1, represents an economic agent of any aggregational level, ∈ 1,


portrays a key figure of the three contentual domains, ∈ 1, is the time period,
and ∈ 1, represents the geographical region. The key figures are collected in
Section 4.1 and listed in the supplementary material. The structure of the data set is
shown in Figure 2.

Figure 2. Structure of the data set.

3.2 Preparation of key figures


The second step in the calculation procedure is the preparation of the key figures,
which is required for credibility of the input data (e.g., Cash et al., 2003). First, macro-
level industry data are transformed in classifications, because data are not uniformly
available.3 Second, meso-level data are transferred to macro-level categories for
multilevel comparability. Meso-level data and not macro-level data are transferred,
because sustainable development is a society-level concept (Hahn et al., 2015), and
thus measurement of contributions to the macro level are targeted (see Section 1).

3
Detailed information on statistical classifications of economic activities can be found in Eurostat
(2008a); and UN (2008).

12
3.3 Missing value imputation
Third, missing values are imputed: The incomplete data set is turned into a complete
set, because it is assumed that the missing data are meaningful for the model and
would otherwise cause a bias (Little and Rubin, 2002). Therefore, missing value
imputation ensures input data credibility (e.g., Cash et al., 2003). In the case of the
MLSDI, missing values are single points in time or total time series on lower
aggregational levels with data availabilities on higher aggregational levels. For
example, company or industry data are missing, while data for the overall economy
are available. Because of the hierarchical data structure, the uncertainty in the
imputation process is relatively low. This entails two aspects: First, the complete and
incomplete samples are assumed to have identical distributions, i.e., the data are
assumed to be missing at random (MAR) (Rässler et al., 2013; Schafer and Graham,
2002); second, single imputation methods, which – in contrast to multiple imputation
methods – do not account for uncertainty in the imputation process (Little and Rubin,
2002; Rubin, 1978), are successful. Moreover, the MLSDI’s imputation treats the
data as a time series to disallow any a priori coherence of key figures. The temporal
dimension is an efficient predictor given stable trends (see Section 4.2) and the
strong correlation over time (see Section 4.3). Varying according to the number of
observations in the time series, the following time series imputation methods are
chosen for completing the set of key figures:

• > 2 observations: Kalman smoothing on structural time series model fitted by


maximum likelihood (Harvey, 1989; Kalman, 1960) – this method requires a
minimum of three observations and is implemented because of its stable
output, suiting the key figures’ stable trends (see Section 4.2),
• 2 observations: Stineman algorithm (Stineman, 1980) – this method requires
two observations and is also chosen because of its stable output,
• 1 observation: Hot deck imputation by keeping the observation constant (Little
and Rubin, 2002) – because of the strong correlation over time, the only
observation is assumed to be the best predictor,
• 0 observations: Hot deck imputation by adjusted higher aggregational data
(Little and Rubin, 2002) – because lower aggregational economic agents are
included in higher aggregational economic agents, the latter are assumed to
be the best predictors in case of a total missing time series.

13
Kalman smoothing on structural time series model assumes normal and stationary
input data as well as independent and identically distributed (i.i.d.) residuals (Greene,
2003; Harvey, 1989). These assumptions are tested with the Shapiro-Wilk,
Kolmogorov-Smirnov, augmented Dickey-Fuller, and Ljung-Box tests. Stationarity
and i.i.d. are confirmed, but normality is rejected. However, Harvey (1989) asserts
the Kalman filter remains an optimal linear estimator, minimizing the mean square
error, if the normality assumption is violated. Despite this, the deployed single
missing value imputation yields reasonable results and is hence adopted. The
remaining three methods are free of assumptions.

3.4 Standardization: calculation of key indicators


Multilevel comparability within and across aggregational levels is required because
sustainable development is a society-level concept (Hahn et al., 2015), and the
macro-level SDGs can only be accomplished by aid of micro and meso agents
(Griggs et al., 2014; Sachs, 2012). To obtain the required comparability, the
multilevel perspective of micro, meso, and macro agents by Rotmans et al. (2001) is
applied: The multilevel comparable key indicators are computed by standardizing the
key figures. At the macro level, 234 SDG indicators (UN, 2018b, 2018a) are relevant,
as the United Nations (UN) has released the most elaborated concept of sustainable
development (Lock and Seele, 2017). At the meso level, the GRI disclosures are
most pertinent (GRI, 2016) because GRI is the most widely used standard for
corporate reporting on sustainability (KPMG, 2017). Micro level frameworks could not
be identified, such that embracement of multiple perspectives is currently limited to
the meso-level (corporate) and the macro-level (industry or overall economic)
perspectives (see Section 5). Both frameworks only distinguish between three
contentual domains, disregarding a fourth domain: institutions. We follow this
approach, and only meaningful indicators on both meso and maco levels are
admitted to the MLSDI. The intersection of SDG and GRI indicators constitutes the
ideal data set for the MLSDI in the three contentual domains. We determine the
intersection because an alignment is only available at the target level (GRI and
UNGC, 2018). The set of key indicators should reflect both efficiency and
effectiveness (Figge and Hahn, 2004). Intensity indicators reflect efficiencies and are
relative measures of sustainable development influences and their generating activity
in monetary units (Maxime et al., 2006; Spangenberg, 2015). They may positively or

14
negatively affect sustainable development and the composite measures. Intensity
indicators standardize an economic agent’s sustainable development influence by its
generating activity in terms of gross value added (GVA), employment, or in some
cases, further key figures are utilized as standardization measures (UN, 2018b). Any
standardization must fulfil the assessment principle comparability. Units of production
are inappropriate standardization measures because comparability is only achieved
among the same units. Intensity indicators’ reciprocals are productivity indicators,
and both efficiency and productivity indicators are referred to as ratio indicators.
Ratio indicators standardized by GVA or employment capture the relationship
between sustainable development and economic growth. In the environmental
domain, for example, a decoupling of environmental degradation and economic
growth is desired: Efficiency indicators are aimed to increase (Kallis et al., 2018).
Generally, economic growth is ambiguously related to sustainable development. On
the one hand, economic growth might contribute to sustainable development by, for
example, inducing technological advancement required to mitigate environmental
degradation (Stern, 2015; van den Bergh, 2011), or by lifting people out of poverty
(Holden et al., 2017). On the other hand, economic growth might harm sustainable
development, as it typically entails environmental damage and might reduce social
equality (Atkinson, 2015; Holden et al., 2017; Piketty, 2014). The Environmental
Kuznets Curve (EKC) harmonizes these ambiguous causal relationships of the
environmental and economic domain by assuming an inverted U-shape (e.g., Dinda,
2004). Empirical evidence of the EKC has been critically discussed in the literature
(e.g., das Neves Almeida et al., 2017).
Effectiveness indicators are absolute measures (Figge and Hahn, 2004), which are
generally not comparable across economic agents. Growth rates indicate changes in
absolute values. Hence, they map effectiveness in a comparable manner. Because
sustainable development is a long-term goal (Dragicevic, 2018), growth rates from
the first period = 1 to the last period = are computed. That is, the last period is
standardized by the first period and held constant over time, equalizing the
composite indicators’ variable base for temporal comparability. These indicators are
referred to as growth indicators. The result of the standardization is the set of
sustainable development key indicators:

4 , , , , (2)

15
where ∈ 1, represents the key indicators or class-4 indicators, of ratio or growth
type. The key indicators are computed in Section 4.2 and presented in the
supplementary material.

3.5 Outlier detection and treatment


Outliers are nonnormal observations, and similar to missing values (see Section 3.3),
they cause a bias when untreated (Aggarwal, 2017). In the case of the MLSDI, outlier
treatment is required for unbiased scaling (see Section 3.6) and weighting (see
Section 3.7). In eliminating this bias, credibility is established (e.g., Cash et al.,
2003). The inter-quartile range (IQR) method is applied for two reasons. First, the
IQR method is a univariate method, in line with the univariate (time series) missing
value imputation. Second, it is based on the median, which is robust to outliers. In
contrast, outlier detection based on the mean, skewness, or kurtosis is sensitive to
outliers (Field, 2009; Hadi et al., 2009) and should thus be avoided. The IQR method
detects a key indicator as an outlier if it surpasses or falls below the outlier thresholds
, which are defined by:

, = , + ∙ , (3)
%,
!" , = # , − ∙ ,

where is the upper threshold, !" represents the lower threshold, portrays
the outlier coefficient, is the third quartile, # depicts the first quartile, and
measures the IQR, which is the difference between the third and first quartile. The
data set’s total time series is incorporated such that outlier thresholds are invariant
over economic agents and time for comparability. However, the outlier thresholds
may vary across geographical regions, as Nilsson et al. (2016) suggest countries
interpret progress in sustainable development according to their national
circumstances. This approach, however, disables country comparison and should be
abandoned if multinational analyses are conducted. The outlier coefficient is
equalized to 1.5, and outliers are treated by replacement with the thresholds
(Aggarwal, 2017; Han et al., 2012). This method is free of assumptions and
propounds that statistical biases would remain if outliers were treated on one side of
the distribution or if values are retained with lower weights. After outliers are treated,
data cleaning is complete, and the core index computation steps – scaling, weighting,
and aggregation – are executed.

16
3.6 Scaling
The key indicators feature different units, and a scaling procedure is required to unify
the key indicators’ scales (Pollesch and Dale, 2016). Scaling enables cross-indicator
comparison and is thus a prerequisite for credible weighting and aggregation (Ebert
and Welsch, 2004). Scales further serve the assessment principles of target
orientation and effective communication: Targets should be a defining element, and
scales should be easy to interpret for effective communication. Similar to the outlier
thresholds, scales are kept constant over economic agents and time to allow cross-
agent comparability and progress analysis (Nardo et al., 2008), but they may vary
over geographical regions (Nilsson et al., 2016).
In addition to outlier-free data, the subsequent multivariate analysis (see Section 3.7)
requires z-scores (mean equal to zero and variance equal to one). Otherwise,
variables with higher variances would dominate (Jolliffe, 2002). The set of z-score
scaled key indicators is formally described by:

4& , &, , , (4)

where & ∈ 1, ' represents the z-score scaled key indicators. However, because z-
scores indicate distances to the mean in units of standard deviations, they are neither
easy to interpret nor effective in communication (Pollesch and Dale, 2016).
Furthermore, given their definition in both negative and positive domains, they are
not suitable for the chosen aggregation method (see Section 3.8). Therefore, z-
scores are only used for the multivariate analysis, and continuous rescaling between
10 and 100 is applied for aggregation (Bravo, 2014; Krajnc and Glavič, 2005;
Saisana and Philippas, 2012). Continuous rescaling overcomes both shortcomings of
z-score scaling, minimizes information loss (Yang and Webb, 2009; Zhou et al.,
2010), and does not lead to mathematical inconsistencies as ratio scaling, for
example, does (Pollesch and Dale, 2016). The set of rescaled indicators is described
by:

4( , (, , , (5)

where ( ∈ 1, ) describes the rescaled key indicators. Their scores are interpreted as
follows (Prescott-Allen, 2001):

• 10 to 20: bad performance,


• 21 to 40: poor performance,
• 41 to 60: medium performance,

17
• 61 to 80: fair performance,
• 81 to 100: good performance.
Rescaling’s disadvantage of being sensitive to outliers has been eliminated via the
previous detection and treatment of outliers (see Section 3.5). Both scaling
procedures are linear, and negative effective directions of class-4 indicators are
accounted for, such that the scaled data sets positively affect the composite
measures (Bravo, 2014).

3.7 Weighting
Weighting of the key indicators is required because correlations exist and double
counting of similar information must be avoided (Bolcárová and Kološta, 2015; Greco
et al., 2019; Nilsson et al., 2016). By means of a weighting procedure, correlations
are investigated and interconnections of goals and themes (Costanza et al., 2016;
Hacking and Guthrie, 2008; Sala et al., 2015) as well as the relevance of indicators
are explored (Janoušková et al., 2018): Weighting factors may enhance or degrade a
key indicator’s influence towards the composite measures (Greco et al., 2019). For
the weighting process, differentiation of the contentual domains is not made, as
correlations and interconnections occur across domains. However, to account for the
unbalanced number of indicators within the three domains, weights are adjusted to
sum up to one in each domain after the integrated assessment. Therefore, weights
are importance factors within a domain (i.e., subindices; see Section 3.8) and are
subsequently adjusted to importance factors towards the overall MLSDI. Consistent
with outlier thresholds and scales, weights are invariant over economic agents and
time (Nardo et al., 2008), but may vary over geographical regions (Nilsson et al.,
2016).
Weighting is highly controversial. Within the environmental domain, for example, it
can only be conducted properly if the natural scientific relationship is known (Ebert
and Welsch, 2004). The concept of planetary boundaries addresses this scientific
relationship (Rockström et al., 2009; Steffen et al., 2015), but weights have not yet
been established to the best of our knowledge. Research on disaggregation and
connection of planetary boundaries to the country or corporate level has emerged
only recently (e.g., Haffar and Searcy, 2018). Therefore, equal weights may be
applied, or they may be derived by expert opinion or statistical methods. Weights

18
based on expert opinions are subjective, leading to excessively high sensitivities
(Giannetti et al., 2009) and violating scientific requirements (Sala et al., 2015). Equal
weighting or top-down equal weighting is not sufficient (Rogge, 2012) and
“universally considered to be wrong” (Chowdhury and Squire, 2006) because it does
not account for correlations of individual elements (Griggs et al., 2014). Statistical
weighting is the preferred approach because it is the least-biased (Mayer, 2008) and
least-subjective approach (Greco et al., 2019; Zhou et al., 2007), fulfilling the
assessment principles credibility (e.g., Cash et al., 2003) and objectivity (Sala et al.,
2015).
A variety of statistical methods is reviewed, resulting in the implementation of the
PCA, PTA, and the information-theoretic MRMRB. In the field of sustainable
development assessment, data envelopment analysis (DEA), benefit of the doubt
(BoD), FA, and PCA are frequently conducted. However, DEA, BoD, and FA are not
suitable for sustainable development index construction. For arguments regarding
FA, see Section 2.2. DEA is generally not suitable in weighting sustainable
development elements because it is a technique for measuring efficiencies of
decision-making units. Weights maximize the composite indicator, relative to other
decision-making units (Ramanathan, 2003). However, sustainable development
index construction is not an optimization problem, but it aims to quantify
unsupervised sustainable development performances (Bell and Morse, 2008;
Böhringer and Jochem, 2007). DEA overemphasizes well-performing elements, such
that economic agents may appear as brilliant performers while they are not (Rogge,
2012). BOD is a particular case of DEA and hence also not appropriate. However,
both methods may be valuable tools in further contexts (e.g., Fusco, 2015; Lee and
Farzipoor Saen, 2012; Tziogkidis et al., 2018). PCA, in contrast, fits the frame of
sustainable development index construction: First, it is a bottom-up method and the
underlying key indicators drive the composite measure (Mayer, 2008); second, it
detects linear correlations, eliminating the multiplicity of included information. It is the
first method to be deployed and tested for the MLSDI. In particular, PCA transforms
the data set into uncorrelated principal components (PCs), where the first PC
possesses the highest variation in the data set (Hotelling, 1933; Jolliffe, 2002;
Pearson, 1901). By setting up a system of linear equations subject to restrictions on
the variance, factor loadings are determined, which then serve to compute the
indicators’ weights. PCs with eigenvalues higher than one or necessary to reach a

19
cumulative contribution to the explanation of the overall variable of more than 70
percent are included (Field, 2009; Kaiser, 1960). The PCs correspond to categories
or class-3 indicators, and their set is formally represented by:

, *, , , (6)

where * ∈ 1, + is a PC. In classical PCA, the PCs are computed by arithmetic


aggregation, but these are disregarded in the MLSDI: The PCA only derives the
weights, and a different aggregation method is applied (see Section 3.8). PCA does
not make distributional assumptions, but the data must be free of outliers, z-score
scaled, and linearly correlated. The first two requirements are addressed in Section
3.5 and Section 3.6. The latter requirement is tested with the Kaiser-Meyer-Olkin
(KMO) and Bartlett’s tests for sampling adequacy and sphericity. The thresholds for
efficient results are 0.5 and 0.05, respectively (Bartlett, 1950; Kaiser, 1974). Because
the PCA is a static technique, it is performed for each time period, and the results are
averaged arithmetically. This shortcoming of incorrect assessment of the temporal
dimension is overcome by the PTA. It interprets three-way tables as a sequence of
two-way tables and accounts for the temporal dimension by including temporal
importance factors (Kroonenberg, 1983; Thioulouse et al., 2004). In doing so, a so-
called compromise matrix is calculated, on which PCA is performed (Gallego-Álvarez
et al., 2015). Assumptions, requirements, and tests are equivalent to PCA. Both PCA
and PTA, hereafter called the PC family, are limited to linear correlations. To detect
higher-order correlations, the information-theoretic MRMRB is deployed. Information-
theoretic approaches are bottom-up methods (Mayer, 2008) and generally known for
their efficiency and effectiveness (Meyer et al., 2008; Peng et al., 2005; Yu and Liu,
2004). Methods based on entropy and mutual information are preferred over the
Fisher information because they are nonparametric (Cover and Thomas, 1991). The
MRMRB is such an algorithm and is deployed because simulation studies have
shown that it outperforms alternative entropy-based algorithms (Bourdakou et al.,
2016; Meyer et al., 2010). Nonlinear correlations are captured by measuring their
mutual information, which is defined as the “amount of information a random variable
contains about another [random variable]” (Cover and Thomas, 1991). The MRMRB
ranks variables according to the difference in their mutual information and the
average mutual information, starting with the lowest amount of mutual information
(Meyer et al., 2007; Peng et al., 2005). It does not rely on assumptions. Since the
MRMRB captures higher-order correlations, it is expected to yield superior results

20
compared to the PC family. Moreover, within these three approaches, sensitivities of
weights are expected to be low, and restrictions of weights as frequently applied in
DEA or expert elicitations are expected to be dispensable (e.g., Podinovski, 2016).

3.8 Aggregation
The rescaled and weighted indicators are aggregated into three subindices for each
contentual domain and subsequently, these are aggregated into the overall MLSDI.
Aggregation is the major step in index construction (Zhou et al., 2010) because the
aggregation function moderates the degree of compensability of the indicators
(Grabisch et al., 2009). In a sustainable development index, the aggregation function
should be compensatory – high input components can be offset by low input
components and vice versa (Pollesch and Dale, 2015) – because multiple pathways
to sustainable development allow for compensability and weak sustainability (Leach
et al., 2013). Nonetheless, the aggregation function should minimize compensability
because most ecological economists promote strong sustainability (Costanza and
Daly, 1992; Daly, 1990; Dragicevic, 2018; Neumayer, 2010). Geometric (product)
aggregation exactly maps these requirements and is therefore applied in the MLSDI.
Geometric aggregation is a compensatory aggregation function (Pollesch and Dale,
2015), but in contrast to arithmetic (additive) aggregation, compensability is reduced
to partial compensability only. Balanced performances yield better aggregated scores
than unbalanced performances, because geometric aggregation punishes bad
performances and rewards good performances (Saisana and Philippas, 2012; Zhou
et al., 2006). The lower an indicator’s score, the lower the rate of is. If only one
indicator equals zero, the composite measure vanishes. To avoid this
noncompensatory case, the geometric aggregation is combined with rescaled key
indicators between 10 – instead of zero – and 100 (Saisana and Philippas, 2012).
Moreover, Zhou et al. (2006) detect that the weighted product performs best with
respect to minimum loss of information, and geometric aggregation is in line with the
aggregation rules established by Ebert & Welsch (2004).
The subindices for each domain, i.e., the rescaled, weighted, and geometrically
aggregated key indicators are called class-2 indicators, and their set is denoted by:

2 , -, , , (7)

21
where - ∈ 1, . represents a contentual domain’s subindex. Thereafter, the
subindices are geometrically aggregated with equal weights (as correlations are
accounted for at the class-2 level) to obtain the overall MLSDI or the class-1
indicator:

1 , , . (8)

3.9 Sensitivity analyses


Sensitivity analyses must be performed to test for methodological biases (Sala et al.,
2015) and increase robustness and overall credibility. The MLSDI’s sensitivities are
tested for five calculation steps. Single missing value imputation is tested against the
expectation-maximizing bootstrapping algorithm Amelia II (Honaker et al., 2018,
2011). Less robust results are obtained because Amelia II is nonrobust to violation of
the normality assumption. The outlier coefficient is varied between 1.0 and 3.0
(Aggarwal, 2017; Han et al., 2012), and a nontreatment case is explored: Outlier
treatment is required and robust results are reached for any coefficient. The common
value of 1.5 remains. Scaling by the distance-to-reference method, i.e., dividing by
the mean and median, leads to a loss of interpretability, strengthening the application
of rescaling. Arithmetic aggregation generates less reasonable results than
geometric aggregation due to its high compensability. The results of the three
weighting methods are evaluated against each other in Section 4.3.

4 EMPIRICAL FINDINGS
In this section, computation results of the MLSDI are analyzed. The sample is
described in Section 4.1. Section 4.2 investigates the key indicators’ descriptive
statistics and results of the selected branches. Section 4.3 explores the results of the
three presented weighting methods, and last, Section 4.4 analyzes the composite
indicators’ descriptive statistics and results of the selected branches.

4.1 The sample


The geographical region of the sample is Germany with 62 divisions on the two-
digit level of the statistical classification of economic activities in the European

22
Community (NACE)4, representing the economic agents , also called industries or
branches. Additionally, after determining scales and weights, several aggregated
sectors are computed. As a particular characteristic, the health economy, a cross-
sectional industry whose statistical classification is defined and published by the
economic research institute WifOR and the Federal Ministry of Economic Affairs and
Energy (BMWi) (Gerlach et al., 2018), is included in the sample. The data set does
not comprise corporate data yet, but corporations are strongly encouraged to
benchmark their performances with various macro industries. The results for the
following selected branches are displayed: the health economy, agricultural sector,
manufacturing sector, chemical industry, automobile industry, service sector, IT
industry, financial industry, real estate industry, and the overall German economy.
Information on the selected branches’ NACE codes are included in the
supplementary material.
The MLSDI’s collected key figures are inferred from the key indicators as the
intersection of SDG and GRI indicators (see Section 3.4). Data collection is bounded
by restrictions of official statistics retrieved from the Federal Bureau of Statistics
(Destatis), Eurostat, and the Federal Employment Agency (BA). The sample’s key
figures and key indicators are listed in the supplementary material.
The time periods range from 2008 to 2016, the most recent year for major
macroeconomic statistics available at the time.

4.2 Empirical findings of the rescaled sustainable development key


indicators
The rescaled indicators positively affect the composite measures such that high-
performance scores are always desirable. The distributional properties of the
indicators are relatively stable over time, with mostly increasing trends. For example,
decoupling of environmental degradation and economic activity is observed for the
German sample. The observed decoupling is recorded due to efficiency and
effectiveness gains, being most desired. Central performances of environmental
indicators are mostly fair to good. The median mostly exceeds the mean, resulting in

4
Classification for economic activities instead of products is used because comparative analysis with
corporations is aimed at, and companies are typically active in more than one production area. The 62
included divisions range from NACE codes A to S on two-digit level. Further information can be found
in Eurostat (2008a).

23
high negative skewness.5 This is a positive distributional property in terms of
sustainable development: Relatively many economic agents are clustered at the top
range, yielding fair to good performances. Social performances mostly center around
medium to fair scores with moderate positive skewness (undesirable). Economic
indicators exhibit the lowest performance levels among the domains. Central
measures are poor to medium, and the distributions are mostly positively skewed.
Descriptive statistics of the rescaled indicators are included in the supplementary
material.
The rescaled key indicators of the selected branches are summarized into a radar
chart for each contentual domain and displayed in Figure 3.
The radar chart of the environmental ratio indicators (see Figure 3a) reveals that
service sector industries (i.e., the IT, financial, and real estate industry) are clustered
at the outskirts of the graph and are environmentally efficient but yield low scores for
environmental tax intensity. The agricultural sector reports diverse and the chemical
industry bad performances. The health economy, which is a cross-sectional industry
comprising elements of both the manufacturing and the service sector, is clustered
along with the service sector’s selected branches. Its stakes in the manufacturing
sector are not concentrated in environmentally polluting industries.
The best displayed performers among the environmental growth indicators6 are the
financial industry in reduction of air emissions and primary energy consumption as
well as the automobile industry regarding reduction of water use and waste water
(see Figure 3b). The IT industry scores best among the selected branches in
reducing hazardous waste. However, its further outcomes are sparse. The chemical
industry’s reduction rates are among the lowest and only medium with respect to
reductions in air emissions and hazardous waste.

5
Negative skewness does not necessarily occur if the median transgresses the mean (von Hippel,
2005).
6
Growth indicators on taxes are not computed because these would not indicate effectiveness of the
taxation system but merely an increase in the tax base. Evaluation of a taxation system’s
effectiveness is complex and typically investigated with computable general equilibrium models (e.g.,
Bergman, 2005).

24
Reduction of
Air emissions efficiency
air emissions
100 100

80 80
Energy Environmental tax
60 60
efficiency intensity
Reduction of Reduction of
40 primary energy 40 hazardous
consumption waste
20 20

0 0

Water Hazardous
efficiency w aste efficiency

Reduction of Reduction of
water use waste water
Waste water efficiency

Health economy Service sector Health economy Service sector


Agricultural sector IT industr y Agricultural sector IT industr y
Manufacturing sector Financial industr y Manufacturing sector Financial industr y
Chemical industr y Real estate industr y Chemical industr y Real estate industr y
Automobile industr y Overall German economy Automobile industr y Overall German economy

(a) Environmental ratio indicators (b) Environmental growth indicators

(c) Social ratio indicators (d) Social and economic growth indicators
Gross capital
productivity
Net capital Share of
productivity 100 imported input
80
Degree of Net impor t
60
modernity intensity
40

20
Consumed Labor
capital 0 productivity
productivity p.h.

Investment Labor
intensity productivity p .c.

Internal
Share of GVA rate
R&D intensity
R&D employees

Health economy Service sector


Agricultural sector IT industr y
Manufacturing sector Financial industr y
Chemical industr y Real estate industr y
Automobile industr y Overall German economy

(e) Economic ratio indicators

Figure 3. Rescaled ratio and growth indicators in the environmental, social, and economic domain for
selected branches in the German economy in 2016.

25
CIT: corporate income tax; GVA: gross value added; p.c.: per capita; p.h.: per hour; R&D: research
and development; VAT: value added tax.

Social ratio indicators of the selected branches are diverse and distributed across the
whole scale (see Figure 3c). A segmentation of service and manufacturing sectors is
not observed. The financial industry stands out positively with good performances in
several indicators. It is followed by the automobile industry, which leads with regard
to average compensations of employees and quota of severely-disabled employees.
However, the automobile industry exhibits bad to poor performances in the VAT
intensity and quota of gender equality. Even more diverse are the real estate
industry’s performances: It scores good in two indicators but bad in several other
indicators. The IT industry is a balanced mid-ranging industry and is neither among
the best nor worst performers. The agricultural industry performs poorly and only
scores fairly with the share of apprentices.
Social growth indicators are relatively homogeneous among the selected branches: A
branch’s performance in one growth indicator is similar to its performance in another
growth indicator (see Figure 3d). The IT industry scores best among the selected
branches and is only overtaken regarding the reduction of female marginally-
employed employees by the financial industry. However, the financial industry mainly
achieves poor to medium performances in the social growth indicators. The other
industries’ scores are intermediate.
In accordance with the summary statistics, the economic performances of the
selected branches are located at the interior of the radar chart (see Figure 3e).
Performances are diverse: Being a good performer in one economic ratio indicator
does not imply good performance in other indicators. The real estate industry,
especially, experiences unbalanced performances. It performs best in several
indicators, but worst in others. The chemical and automobile industries yield similar
but less extreme results. The IT industry is rather balanced, with mostly medium to
fair performances.
The economic domain’s only growth indicator – growth of working population – is
reported along with the social domain and is ordinary.7

7
Growth indicators are generally not included in the economic domain because economic growth is
only required to eliminate poverty (Holden et al., 2017), and Germany is one of the seven major
economies of the world (UN, 2019). However, growth of employment is included because employment
is not only a source of income but key to any successful transition (Harangozo et al., 2018).

26
4.3 Analysis of the key indicators’ weights and importance factors
The PC family is tested with the KMO and Bartlett’s tests. In both cases (PCA and
PTA) the KMO measure reaches 0.84, classifying the results of the analysis as
meritorious (Kaiser, 1974). Bartlett’s tests obtain / < .000.8 The data are suitable for
both methods. The first 11 and 13 PCs are included in determining weights,
respectively. The PTA’s weighting factors for the years of calculation are listed in
Table 2. These range from 11.03% to 11.16%, not significantly deviating from equal
yearly weights (11.11%). In conclusion, strong correlation over time is present and
the implicit equal weighting of time periods by the PCA and the MRMRB is
legitimized.

Year 2008 2009 2010 2011 2012 2013 2014 2015 2016
Weights PTA 0.1103 0.1106 0.1112 0.1115 0.1116 0.1116 0.1116 0.1110 0.1105
Table 2. Weights of the years of observation of the partial triadic analysis (PTA).

Table 3 to Table 5 compare the indicators’ weights derived by the three weighting
methods (see Section 3.7). The PC family’s weights remain close to equal weights,9
empirically justifying the absence of weight restrictions in PC applications (see
Section 3.7). However, weights derived by the MRMRB feature higher variations.
Within the environmental domain, efficiency indicators are mostly weighted more
heavily than the corresponding effectiveness indicators (see Table 3). The climate
change topic with indicators for air emissions and energy consumption receives the
highest weights. The MRMRB weights energy efficiency (13.07%) more heavily than
air emissions intensity (10.36%), because energy consumption is a source of air
pollution and the original problem should be managed. The PC family is not capable
of capturing this background information and assigns similar weights to both
indicators.

Environmental key indicators Weights PCA Weights PTA Weights MRMRB


Air emissions efficiency 0.0990 0.0965 0.1036
Energy efficiency 0.0998 0.0964 0.1307
Water efficiency 0.0917 0.0884 0.1114
Waste water efficiency 0.0888 0.0850 0.1115

8
For the PCA, both test results are arithmetically averaged over the years of calculation.
9
Because of the uneven number of key indicators across the three domains, equal weights vary:
9.09% in the environmental domain, 5.00% in the social domain, and 7.69% in the economic domain.

27
Hazardous waste efficiency 0.0885 0.0863 0.0904
Environmental tax intensity 0.0809 0.0782 0.0863
Reduction of air emissions 0.0942 0.0940 0.0692
Reduction of primary energy 0.0956 0.0962 0.0700
consumption
Reduction of water use 0.0909 0.0947 0.0805
Reduction of waste water 0.0865 0.0946 0.0760
Reduction of hazardous waste 0.0842 0.0895 0.0705
Table 3. Weights of the environmental key indicators, derived by the principal component analysis
(PCA), partial triadic analysis (PTA), and maximum relevance minimum redundancy backward
algorithm (MRMRB).

The social scaled growth indicators receive higher weights than their ratio indicator
counterparts (see Table 4). The growth of socially-insured employees (MRMRB:
7.85%) and the growth of employees (MRMRB: 7.38%) are most important in the
social domain. This finding is reasonable because employment possesses a dual
purpose: It is a source of income and key to any transition (Harangozo et al., 2018).
A further interesting finding rests in the indicators for compensations of employees.
The MRMRB assigns a lower weight to the average compensation of employees per
capita (p.c.; 5.36%) because compared to the indicator per hour (p.h.; 6.89%), it is
less precise given the mixture of full-time and part-time employees (see
supplementary material). Moreover, the MRMRB downgrades the labor share
(4.02%), which is the compensation of employees per GVA. The proportion of GVA
distributed is not of interest but the monetary value received in relation to the work
done. The PCA does not yield the same finding. Furthermore, the MRMRB
recognizes that the quota of gender equality of marginally-employed employees
(3.51%) is informationally richer in terms of social development than the quota of
gender equality (3.17%), while the PC family does not recognize this.

Social key indicators Weights PCA Weights PTA Weights MRMRB


Average compensation of employees p.c. 0.0532 0.0515 0.0536
Average compensation of employees p.h. 0.0543 0.0524 0.0689
Labor share 0.0537 0.0512 0.0402
Share of non-marginally-employed 0.0472 0.0512 0.0446
employees
Quota of gender equality 0.0514 0.0536 0.0317
Quota of gender equality of marginally- 0.0398 0.0385 0.0351
employed employees
Quota of severely-disabled employees 0.0494 0.0476 0.0353
Share of apprentices 0.0358 0.0516 0.0333
VAT intensity 0.0418 0.0423 0.0434

28
Intensity of net taxes on products 0.0447 0.0429 0.0334
CIT intensity 0.0532 0.0451 0.0507
Local business tax intensity 0.0531 0.0494 0.0558
Growth of compensation of employees 0.0531 0.0515 0.0656
Growth of employees 0.0552 0.0538 0.0738
Growth of socially-insured employees 0.0559 0.0559 0.0785
Reduction of marginally-employed 0.0492 0.0514 0.0568
employees
Growth of female socially-insured 0.0542 0.0526 0.0583
employees
Reduction of female marginally-employed 0.0528 0.0523 0.0516
employees
Growth of severely-disabled employees 0.0491 0.0527 0.0470
Growth of apprentices 0.0531 0.0526 0.0425
Table 4. Weights of the social key indicators, derived by the principal component analysis (PCA),
partial triadic analysis (PTA), and maximum relevance minimum redundancy backward algorithm
(MRMRB).
CIT: corporate income tax; p.c.: per capita; p.h.: per hour; VAT: value added tax.

Among the economic indicators on capital, gross capital productivity receives the
highest weight by all three weighting methods (see Table 5). This finding is justified
by the fact that the gross capital productivity contains the most information: It
includes the current value of assets as well as the depreciated value in relation to
generated GVA (see supplementary material). Similar to the average compensations
of employees, the labor productivities are weighted in an economically reasonable
way by the MRMRB, while the PC family neglects this aspect. Further examples may
be found to demonstrate the MRMRB’s superiority.

Economic key indicators Weights PCA Weights PTA Weights MRMRB


Gross capital productivity 0.0862 0.0844 0.0940
Net capital productivity 0.0858 0.0843 0.0891
Degree of modernity 0.0692 0.0686 0.0518
Consumed capital productivity 0.0824 0.0825 0.0797
Investment intensity 0.0784 0.0763 0.0815
Internal R&D intensity 0.0840 0.0815 0.0907
Share of R&D employees 0.0820 0.0788 0.0961
GVA rate 0.0650 0.0650 0.0492
Labor productivity p.c. 0.0852 0.0821 0.0646
Labor productivity p.h. 0.0785 0.0808 0.0729
Net import intensity 0.0596 0.0740 0.0842
Share of imported input 0.0661 0.0652 0.0578
Working population growth 0.0776 0.0764 0.0885

29
Table 5. Weights of the economic key indicators, derived by the principal component analysis (PCA),
partial triadic analysis (PTA), and maximum relevance minimum redundancy backward algorithm
(MRMRB).
GVA: gross value added; p.c.: per capita; p.h.: per hour; R&D: research and development.

Figure 4 portrays the importance factors according to the MRMRB in a decreasing


order. Because the MRMRB’s importance factors are gradual, weight restrictions are
also not required in this case. According to the MRMRB, the growth of socially-
insured employees, growth of employees, and energy efficiency are most important.
This order is reasonable because employment follows a dual mission, and climate
change is one of the main topics of sustainable development. The PC family differs
from this order.
In conclusion, the theoretical superiority of the MRMRB is supported by empirical
evidence: Among the tested methods, the MRMRB is the only method capable of
detecting informationally richer indicators. Higher-order correlations are present,
which the PC family ignores. This results in almost equal weighting, neglecting
interconnections of indicators. In contrast, the MRMRB reflects these
interconnections and generates gradually varied weights. By assigning higher
weights to informationally richer indicators, decision makers are incentivized to focus
actions on these indicators. Synergies are emphasized and improvements will be
reached in other indicators, given their correlations. Enhancements of overall
sustainable development performance scores measured by the overall MLSDI will be
accelerated. Given the MRMRB’s theoretical and empirical dominance, the
calculation proceeds with its results.

30
Figure 4. Ranking of importance factors by the principal component analysis (PCA), partial triadic
analysis (PTA), and maximum relevance minimum redundancy backward algorithm (MRMRB).
CIT: corporate income tax; GVA: gross value added; p.c.: per capita; p.h.: per hour; R&D: research
and development; VAT: value added tax.

31
4.4 Empirical findings of the composite indicators
Figure 5 provides an overview of the composite indicators’ results from the total
sample. Descriptive statistics are included in the supplementary material. The
environmental subindex features the highest spread and yields medium to fair central
performances. The evolution over time is insignificant, but the distributional shape is
in favor of environmental protection, as it is negatively skewed. Therefore, actions
should be directed towards the bottom to lift their performances to be at least fair.
The social subindex’s spread and central performances are smaller. More effort is
required to yield fair performances. Minima of the social subindex are highest among
the four composite measures; not the bottom but the center of the distribution should
be focused for improved social development. The economic subindex performs
worst. Its central scores are rated as poor performances, minima deteriorate over
time, and the distribution is positively skewed. Major improvements are required
across the whole distribution.
100
Rescaled performance scores from 10 to 100
20 40 60 80

2008 2010 2012 2014 2016


Years

Environmental subinde x Economic subindex


Social subindex Overall MLSDI

Figure 5. The environmental, social, and economic subindices as well as the overall Multilevel
Sustainable Development Index (MLSDI) of the German economy from 2008 to 2016.

Figure 6 illustrates the evolution of the composite indicators for selected branches in
the German economy from 2008 to 2016. Results are generally stable over time and
slightly increasing trends are observed in the social and economic domain.

32
100

100
Rescaled performance scores from 10 to 100

Rescaled performance scores from 10 to 100


80

80
60

60
40

40
20

20
2008 2010 2012 2014 2016 2008 2010 2012 2014 2016
Years Years

Health economy Service sector Health economy Service sector


Agricultural sector IT industr y Agricultural sector IT industr y
Manufacturing sector Financial industr y Manufacturing sector Financial industr y
Chemical industr y Real estate industr y Chemical industr y Real estate industr y
Automobile industr y Overall German economy Automobile industr y Overall German economy

(a) Environmental subindex (b) Social subindex


100

100
Rescaled performance scores from 10 to 100

Rescaled performance scores from 10 to 100


80

80
60

60
40

40
20

20

2008 2010 2012 2014 2016 2008 2010 2012 2014 2016
Years Years

Health economy Service sector Health economy Service sector


Agricultural sector IT industr y Agricultural sector IT industr y
Manufacturing sector Financial industr y Manufacturing sector Financial industr y
Chemical industr y Real estate industr y Chemical industr y Real estate industr y
Automobile industr y Overall German economy Automobile industr y Overall German economy

(c) Economic subindex (d) Overall MLSDI

Figure 6. The environmental, social, and economic subindices as well as the overall Multilevel
Sustainable Development Index (MLSDI) for selected branches in the German economy from 2008 to
2016.

The environmental domain’s subindex for the selected branches is displayed in


Figure 6a. Given the financial industry’s fair performances in both environmental ratio
and growth indicators, it ranks first. It is followed by the automobile industry, which
owes its fair scores to environmental effectiveness. The agricultural industry performs
poorly, and the chemical industry exhibits the worst environmental performance. It
only obtains values above the absolute minimum of ten due to paying environmental
taxes.

33
In the social domain, the application of the geometric aggregation and its properties
become apparent. The IT industry exhibits constant medium to fair performances and
is the leader in the social domain (see Figure 6b). The financial and automobile
industry feature unbalanced performances, with several fair and bad results.
Because of the geometric aggregation, bad performances are not easily offset, such
that the financial and automobile industry are surpassed by the IT industry. The real
estate industry is also adversely affected by the geometric aggregation, and its highly
diverse performances result in a poor social subindex. Only the agricultural industry
scores lower.
Similar to the social subindex, the IT industry ranks first in the economic subindex
due to its regular fair to good performances (see Figure 6c). Since the real estate
industry stands out with both good and bad performances, its geometrically
aggregated score is among the lowest. Again, the agricultural sector performs worst.
Last, Figure 6d portrays the overall MLSDI for the selected branches. Due to its
constant medium to good performances, the IT industry is ranked most sustainable.
The automobile industry scores second. The automobile and chemical industries
exhibit similar performances in the social and economic domain, but the
environmental domain sorts the wheat from the chaff: The chemical industry does not
recover from its poor environmental performances because the geometric
aggregation exacerbates compensability of the domains. The chemical industry
obtains an overall MLSDI similar to the agricultural industry, which performs bad to
poor in all domains. The agricultural sector’s importance for sustainable development
is highlighted in the SDGs (UN, 2018b) and urgent action is required.
Weak sustainability with minimized compensability is implemented by the geometric
aggregation, and the empirical results demonstrate its success. At both aggregation
stages (class-4 into class-2 and class-2 into class-1 indicators), balanced
performances benefit from geometric aggregation, while bad performances cannot be
easily compensated.

5 DISCUSSION AND CONCLUSION


This paper develops a novel sustainable development index: the MLSDI. A literature
review shows that existing indices frequently fail to incorporate the three domains of
sustainable development and the multilevel perspective, which embraces the

34
aggregational levels of micro, meso, and macro agents. However, comprehensive
multilevel applications are essential, because the macro level SDGs comprise
environmental, social, and economic aspects and can only be reached with the
contributions of micro and meso agents. Moreover, previous indices lack data
cleaning, sound weighting, aggregation, and sensitivity analysis, and may therefore
be misleading regarding decision making. The MLSDI fills these gaps by involving a
reasonable number of indicators within each domain that constitute the intersection
of the leading meso (GRI) and macro (SDG) frameworks. Data are cleaned, three
statistical weighting methods are implemented, and sensitivities of five calculation
steps are tested for enhanced understanding of the effects that alternative methods
induce. The MLSDI emerges as a robust and innovative sustainable development
index for accurate decision making.
The introduced methodology is applied to a sample of 62 industries in the German
economy as well as several aggregated branches, including the cross-sectional
health economy, from 2008 to 2016. The application confirms the superiority of
weight derivation by the information-theoretic MRMRB. Informationally richer
indicators are weighted more heavily, while the PCA and PTA yield unreasonable
results. The highest weights are assigned to climate change efficiency indicators
(energy efficiency) and employment effectiveness indicators. The application of
geometric aggregation achieves the desired effect of weak sustainability with
minimized compensability: Bad performances are punished and cannot be easily
compensated. In conclusion, industries with unbalanced performances lag industries
with rather balanced results.
The comparative analysis of the selected branches demonstrates their contributions
to sustainable development. The IT industry contributes most, while improvements in
the chemical industry’s environmental performance and the agricultural industry’s
performance with respect to all domains are required. The agricultural industry’s
importance for sustainable development is highlighted in the SDGs and thus, actions
and aid are urgently needed. Generally, the environmental domain yields the highest
central outcomes, while the economic domain yields the lowest results.
Compared to single-level indices, the MLSDI features a wider scope and may serve
management decisions, national industry policy, and international affairs. Single-level
indices only address one level. For example, the DJSI supports corporate decision
making and the SSI international policy making by comparing country performances.

35
Moreover, compared to indices of single domains (e.g., Environmental Performance
Index, EPI; Esty and Emerson, 2018) or indices with a limited number of indicators
(e.g., HSDI), the MLSDI assists decision making with regard to a broader range of
essential sustainable development topics.
Several limitations remain and may be investigated in future research. First and
foremost, consideration of multiple levels sacrifices detailed analysis within one level.
For instance, topics such as economic proximity are only reflected in the
performance scores, but benefits economic agents may experience through proximity
cannot be analyzed in detail. Despite successful punishment of bad performances by
the geometric aggregation, the MLSDI is not capable of indicating urgency. This
judgment remains with decision makers and is hence subjective. The current sample
is limited to meso- and macro-level applications because micro-level frameworks and
macro-level boundaries are not available. For a complete micro-to-macro connection,
micro frameworks must be developed, and macro boundaries must be downscaled to
lower aggregational levels. Initial research on breakdowns exists, but more is
required (Antonini and Larrinaga, 2017; Dahlmann et al., 2019; Haffar and Searcy,
2018; Li et al., 2019; O’Neill et al., 2018; Whiteman et al., 2013). Moreover, to
demonstrate the MLSDI’s capability of implementing the multilevel perspective and
highlighting the benchmarking opportunities across aggregational levels, an empirical
application to meso agents (i.e., corporations) should be prospectively performed.
Data sources are attached in the supplementary material to facilitate future
applications. The inclusion of more indicators in the MLSDI is desirable to cover all
multilevel aspects of the SDGs, but further data are missing for the German sample.
The current selection of indicators focuses on developed countries such as Germany.
However, the SDGs are, in contrast to their preceding Millennium Development
Goals (MDGs), universally applicable to all countries (Glaser, 2012; Sachs, 2012),
inviting multinational applications and country comparisons. In such applications,
outlier thresholds, scales, and weights must be homogeneous. To evaluate the
usefulness of national vs. multinational calculations, the MLSDI’s sample should be
enlarged to explore both scopes.
In conclusion, the usefulness of the approach for informed managerial and policy
decisions is expected to be high from both the theoretical and methodological
viewpoint but remains subject to further empirical investigation on all levels – micro,
meso, and macro.

36
REFERENCES
Aggarwal, C.C., 2017. Outlier analysis, 2nd Ed. ed. Springer, Cham.
Almássy, D., Pintér, L., 2018. Environmental governance indicators and indices in support of policy-
making, in: Bell, S., Morse, S. (Eds.), Routledge Handbook of Sustainability Indicators.
Routledge, Abingdon, pp. 204–223.
Amor-Esteban, V., Galindo-Villardón, M.-P., García-Sánchez, I.-M., 2018. Useful information for
stakeholder engagement: A multivariate proposal of an Industrial Corporate Social Responsibility
Practices Index. Sustain. Dev. 26, 620–637. https://doi.org/10.1002/sd.1732
Antonini, C., Larrinaga, C., 2017. Planetary boundaries and sustainability indicators: A survey of
corporate reporting boundaries. Sustain. Dev. 25, 123–137. https://doi.org/10.1002/sd.1667
Atkinson, A.B., 2015. Inequality: What can be done? Harvard University Press, Cambridge.
Bartlett, M.S., 1950. Tests of significance in factor analysis. Br. J. Stat. Psychol. 3, 77–85.
https://doi.org/10.1111/j.2044-8317.1950.tb00285.x
Becker, W., Saisana, M., Paruolo, P., Vandecasteele, I., 2017. Weights and importance in composite
indicators: Closing the gap. Ecol. Indic. 80, 12–22. https://doi.org/10.1016/j.ecolind.2017.03.056
Bell, S., Morse, S., 2018. What next?, in: Bell, S., Morse, S. (Eds.), Routledge Handbook of
Sustainability Indicators. Routledge, Abingdon, pp. 543–555.
Bell, S., Morse, S., 2008. Sustainability indicators: Measuring the immeasurable?, 2nd Ed. ed.
Earthscan, London.
Bergman, L., 2005. CGE modeling of environmental policy and resource management, in: Handbook
of Environmental Economics. Elsevier, Amsterdam, pp. 1273–1306.
Böhringer, C., Jochem, P.E.P., 2007. Measuring the immeasurable - A survey of sustainability indices.
Ecol. Econ. 63, 1–8. https://doi.org/10.1016/j.ecolecon.2007.03.008
Bolcárová, P., Kološta, S., 2015. Assessment of sustainable development in the EU 27 using
aggregated SD index. Ecol. Indic. 48, 699–705. https://doi.org/10.1016/j.ecolind.2014.09.001
Bourdakou, M.M., Athanasiadis, E.I., Spyrou, G.M., 2016. Discovering gene re-ranking efficiency and
conserved gene-gene relationships derived from gene co-expression network analysis on breast
cancer data. Nat. Sci. Reports 6, 20518. https://doi.org/10.1038/srep20518
Bravo, G., 2018. Human Sustainable Development Index, in: Bell, S., Morse, S. (Eds.), Routledge
Handbook of Sustainability Indicators. Routledge, Abingdon, pp. 284–293.
Bravo, G., 2014. The Human Sustainable Development Index: New calculations and a first critical
analysis. Ecol. Indic. 37, 145–150. https://doi.org/10.1016/j.ecolind.2013.10.020
Carraro, C., Campagnolo, L., Eboli, F., Giove, S., Lanzi, E., Parrado, R., Pinar, M., Portale, E., 2013.
The FEEM Sustainability Index: An integrated tool for sustainability assessment, in:
Erechtchoukova, M.G., Khaiter, P.A., Golinska, P. (Eds.), Sustainability Appraisal: Quantitative
Methods and Mathematical Techniques for Environmental Performance Evaluation. Springer,
Berlin, pp. 9–32.
Cash, D.W., Clark, W.C., Alcock, F., Dickson, N.M., Eckley, N., Guston, D.H., Jäger, J., Mitchell, R.B.,
2003. Knowledge systems for sustainable development. Proc. Natl. Acad. Sci. 100, 8086–8091.
https://doi.org/10.1073/pnas.1231332100
Chowdhury, S., Squire, L., 2006. Setting weights for aggregate indices: An application to the
commitment to Development Index and Human Development Index. J. Dev. Stud. 42, 761–771.
https://doi.org/10.1080/00220380600741904
Costanza, R., Daly, H.E., 1992. Natural capital and sustainable development. Conserv. Biol. 6, 37–46.
https://doi.org/10.1046/j.1523-1739.1992.610037.x
Costanza, R., Fioramonti, L., Kubiszewski, I., 2016. The UN Sustainable Development Goals and the
dynamics of well-being. Front. Ecol. Environ. 14, 59. https://doi.org/10.1002/fee.1231
Costanza, R., Kubiszewski, I., Giovannini, E., Lovins, H., McGlade, J., Pickett, K.E., Vala

37
Ragnarsdóttir, K., Roberts, D., de Vogli, R., Wilkinson, R., 2014. Time to leave GDP behind.
Nature 505, 283–285. https://doi.org/10.1038/505283a
Cover, T.M., Thomas, J.A., 1991. Elements of information theory. John Wiley & Sons, New York.
Dahlmann, F., Stubbs, W., Griggs, D., Morrell, K., 2019. Corporate actors, the UN Sustainable
Development Goals and Earth System Governance: A research agenda. Anthr. Rev. 6, 167–176.
https://doi.org/10.1177/2053019619848217
Daly, H.E., 1990. Toward some operational principles of sustainable development. Ecol. Econ. 2, 1–6.
https://doi.org/10.1016/0921-8009(90)90010-R
das Neves Almeida, T.A., Cruz, L., Barata, E., García-Sánchez, I.-M., 2017. Economic growth and
environmental impacts: An analysis based on a Composite Index of Environmental Damage.
Ecol. Indic. 76, 119–130. https://doi.org/10.1016/j.ecolind.2016.12.028
Dinda, S., 2004. Environmental Kuznets Curve hypothesis: A survey. Ecol. Econ. 49, 431–455.
https://doi.org/10.1016/j.ecolecon.2004.02.011
Dragicevic, A.Z., 2018. Deconstructing sustainability. Sustain. Dev. 26, 525–532.
https://doi.org/10.1002/sd.1746
Ebert, U., Welsch, H., 2004. Meaningful environmental indices: A social choice approach. J. Environ.
Econ. Manage. 47, 270–283. https://doi.org/10.1016/j.jeem.2003.09.001
Esty, D.C., 2018. Measurement matters: Toward data-driven environmental policy-making, in: Bell, S.,
Morse, S. (Eds.), Routledge Handbook of Sustainability Indicators. Routledge, Abingdon, pp.
494–506.
Esty, D.C., Emerson, J.W., 2018. From crises and gurus to science and metrics: Yale’s Environmental
Performance Index and the rise of data-driven policymaking, in: Bell, S., Morse, S. (Eds.),
Routledge Handbook of Sustainability Indicators. Routledge, Abingdon, pp. 93–102.
Eurostat, 2008a. Manual of supply, use and input-output tables. European Communities, Luxembourg.
Eurostat, 2008b. NACE Rev. 2: Statistical classification of economic activities in the European
Community. European Communities, Luxembourg.
Field, A., 2009. Discovering statistics using SPSS, 3rd Ed. ed. Sage, London.
Figge, F., Hahn, T., 2004. Sustainable value added - Measuring corporate contributions to
sustainability beyond eco-efficiency. Ecol. Econ. 48, 173–187.
https://doi.org/10.1016/j.ecolecon.2003.08.005
Fusco, E., 2015. Enhancing non-compensatory composite indicators: A directional proposal. Eur. J.
Oper. Res. 242, 620–630. https://doi.org/10.1016/j.ejor.2014.10.017
Gallego-Álvarez, I., Galindo-Villardón, M.P., Rodríguez-Rosa, M., 2015. Evolution of sustainability
indicator worldwide: A study from the economic perspective based on the X-STATICO method.
Ecol. Indic. 58, 139–151. https://doi.org/10.1016/j.ecolind.2015.05.025
Geels, F.W., 2010. Ontologies, socio-technical transitions (to sustainability), and the multi-level
perspective. Res. Policy 39, 495–510. https://doi.org/10.1016/j.respol.2010.01.022
Geels, F.W., 2002. Technological transitions as evolutionary configuration processes: A multi-level
perspective and a case study. Res. Policy 31, 1257–1274. https://doi.org/10.1016/S0048-
7333(02)00062-8
Gerlach, J.N., Legler, B., Ostwald, D.A., 2018. Gesundheitswirtschaft Fakten und Zahlen: Handbuch
zur Gesundheitswirtschaftlichen Gesamtrechnung mit Erläuterungen und Lesehilfen.
Bundesministerium für Wirtschaft und Energie (BMWi), Berlin.
Giannetti, B.F., Bonilla, S.H., Silva, C.C., Villas Bôas de Almeida, C.M., 2009. The reliability of experts’
opinions in constructing a composite environmental index: The case of ESI 2005. J. Environ.
Manage. 90, 2448–2459. https://doi.org/10.1016/j.jenvman.2008.12.018
Glaser, G., 2012. Base Sustainable Development Goals on science. Nature 491, 35.
https://doi.org/10.1038/491035a
Grabisch, M., Marichal, J.-L., Mesiar, R., Pap, E., 2009. Aggregation functions. Cambridge University

38
Press, Cambridge.
Greco, S., Ishizaka, A., Tasiou, M., Torrisi, G., 2019. On the methodological framework of composite
indices: A review of the issues of weighting, aggregation, and robustness. Soc. Indic. Res. 141,
61–94. https://doi.org/10.1007/s11205-017-1832-9
Greene, W.H., 2003. Econometric analysis, 5th Ed. ed. Prentice Hall, Upper Saddle River.
GRI, 2016. Consolidated set of GRI sustainability reporting standards. Global Reporting Initiative
(GRI), Amsterdam.
GRI, UNGC, 2018. An analysis of the goals and targets. Global Reporting Initiative (GRI), United
Nations Global Compact (UNGC), Amsterdam.
Griggs, D., Stafford-Smith, M., Rockström, J., Öhman, M.C., Gaffney, O., Glaser, G., Kanie, N., Noble,
I., Steffen, W.L., Shyamsundar, P., 2014. An integrated framework for Sustainable Development
Goals. Ecol. Soc. 19, 49. https://doi.org/10.5751/ES-07082-190449
Hacking, T., Guthrie, P., 2008. A framework for clarifying the meaning of triple bottom-line, integrated,
and sustainability assessment. Environ. Impact Assess. Rev. 28, 73–89.
https://doi.org/10.1016/j.eiar.2007.03.002
Hadi, A.S., Rahmatullah Imon, A.H.M., Werner, M., 2009. Detection of outliers. WIREs Comput. Stat.
1, 57–70. https://doi.org/10.1002/wics.6
Haerdle, W.K., Simar, L., 2012. Applied multivariate statistical analysis, 3rd Ed. ed. Springer, Berlin.
Haffar, M., Searcy, C., 2018. Target-setting for ecological resilience: Are companies setting
environmental sustainability targets in line with planetary thresholds? Bus. Strateg. Environ. 27,
1079–1092. https://doi.org/10.1002/bse.2053
Hahn, T., Figge, F., 2011. Beyond the bounded instrumentality in current corporate sustainability
research: Toward an inclusive notion of profitability. J. Bus. Ethics 104, 325–345.
https://doi.org/10.1007/s10551-011-0911-0
Hahn, T., Pinkse, J., Preuss, L., Figge, F., 2015. Tensions in corporate sustainability: Towards an
integrative framework. J. Bus. Ethics 127, 297–316. https://doi.org/10.1007/s10551-014-2047-5
Han, J., Kamber, M., Pei, J., 2012. Data mining: Concepts and techniques, 3rd Ed. ed. Morgan
Kaufmann, Waltham.
Harangozo, G., Csutora, M., Kocsis, T., 2018. How big is big enough? Toward a sustainable future by
examining alternatives to the conventional economic growth paradigm. Sustain. Dev. 26, 172–
181. https://doi.org/10.1002/sd.1728
Harvey, A.C., 1989. Forecasting, structural time series models and the Kalman filter. Cambridge
University Press, Cambridge.
Holden, E., Linnerud, K., Banister, D., 2017. The imperatives of sustainable development. Sustain.
Dev. 25, 213–226. https://doi.org/10.1002/sd.1647
Honaker, J., King, G., Blackwell, M., 2018. Package “Amelia.” Comprehensive R Archive Network
(CRAN).
Honaker, J., King, G., Blackwell, M., 2011. Amelia II: A program for missing data. J. Stat. Softw. 45, 1–
54. https://doi.org/10.18637/jss.v045.i07
Hotelling, H., 1933. Analysis of complex of statistical variables into principal components. Baltimore,
Warwick.
Janoušková, S., Hák, T., Moldan, B., 2018. Relevance - A neglected feature of sustainability
indicators, in: Bell, S., Morse, S. (Eds.), Routledge Handbook of Sustainability Indicators.
Routledge, Abingdon, pp. 477–493.
Jolliffe, I.T., 2002. Principal component analysis, 2nd Ed. ed. Springer, New York.
Kaiser, H.F., 1974. An index of factorial simplicity. Psychometrika 39, 31–36.
https://doi.org/10.1007/BF02291575
Kaiser, H.F., 1960. The application of electronic computers to factor analysis. Educ. Psychol. Meas.

39
20, 141–151. https://doi.org/10.1177/001316446002000116
Kallis, G., Kostakis, V., Lange, S., Muraca, B., Paulson, S., Schmelzer, M., 2018. Research on
degrowth. Annu. Rev. Environ. Resour. 43, 291–316. https://doi.org/10.1146/annurev-environ-
102017-025941
Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. J. Basic Eng. (Series
D) 82, 35–45. https://doi.org/10.1115/1.3662552
Köhler, J., Geels, F.W., Kern, F., Markard, J., Onsongo, E., Wieczorek, A., Alkemade, F., Avelino, F.,
Bergek, A., Boons, F., Fünfschilling, L., Hess, D., Holtz, G., Hyysalo, S., Jenkins, K., Kivimaa, P.,
Martiskainen, M., McMeekin, A., Mühlemeier, M.S., Nykvist, B., Pel, B., Raven, R., Rohracher,
H., Sandén, B., Schot, J., Sovacool, B., Turnheim, B., Welch, D., Wells, P., 2019. An agenda for
sustainability transitions research: State of the art and future directions. Environ. Innov. Soc.
Transitions 31, 1–32. https://doi.org/10.1016/j.eist.2019.01.004
KPMG, 2017. The road ahead: The KPMG survey of corporate responsibility reporting. KPMG
International, Zurich.
Krajnc, D., Glavič, P., 2005. A model for integrated assessment of sustainable development. Resour.
Conserv. Recycl. 43, 189–208. https://doi.org/10.1016/j.resconrec.2004.06.002
Kroonenberg, P.M., 1983. Three-mode principal component analysis. DSWO Press, Leiden.
Lawn, P.A., 2003. A theoretical foundation to support the Index of Sustainable Economic Welfare
(ISEW), Genuine Progress Indicator (GPI), and other related indexes. Ecol. Econ. 44, 105–118.
https://doi.org/10.1016/S0921-8009(02)00258-6
Leach, M., Raworth, K., Rockström, J., 2013. Between social and planetary boundaries: Navigating
pathways in the safe and just space for humanity, in: World Social Science Report: Changing
Global Environments. United Nations Educational, Scientific and Cultural Organization
(UNESCO), Paris, pp. 84–89.
Lee, K.-H., Farzipoor Saen, R., 2012. Measuring corporate sustainability management: A data
envelopment analysis approach. Int. J. Prod. Econ. 140, 219–226.
https://doi.org/10.1016/j.ijpe.2011.08.024
Li, M., Wiedmann, T., Hadjikakou, M., 2019. Towards meaningful consumption-based planetary
boundary indicators: The phosphorus exceedance footprint. Glob. Environ. Chang. 54, 227–238.
https://doi.org/10.1016/j.gloenvcha.2018.12.005
Little, R.J.A., Rubin, D.B., 2002. Statistical analysis with missing data, 2nd Ed. ed. John Wiley & Sons,
Hoboken.
Lock, I., Seele, P., 2017. Theorizing stakeholders of sustainability in the digital age. Sustain. Sci. 12,
235–245. https://doi.org/10.1007/s11625-016-0404-2
Loorbach, D.A., 2007. Transition management: New mode of governance for sustainable
development. International Books, Utrecht.
Maxime, D., Marcotte, M., Arcand, Y., 2006. Development of eco-efficiency indicators for the
Canadian food and beverage industry. J. Clean. Prod. 14, 636–648.
https://doi.org/10.1016/j.jclepro.2005.07.015
Mayer, A.L., 2008. Strengths and weaknesses of common sustainability indices for multidimensional
systems. Environ. Int. 34, 277–291. https://doi.org/10.1016/j.envint.2007.09.004
Meyer, P.E., Kontos, K., Lafitte, F., Bontempi, G., 2007. Information-theoretic inference of large
transcriptional regulatory networks. J. Bioinforma. Syst. Biol. 2007, 79879.
Meyer, P.E., Lafitte, F., Bontempi, G., 2008. minet: A R/Bioconductor package for inferring large
transcriptional networks using mutual information. BMC Bioinformatics 9, 461.
https://doi.org/10.1186/1471-2105-9-461
Meyer, P.E., Marbach, D., Roy, S., Kellis, M., 2010. Information-theoretic inference of gene networks
using backward elimination. Int. Conf. Bioinforma. Comput. Biol.
Nardo, M., Saisana, M., Saltelli, A., Tarantola, S., Hoffman, A., Giovannini, E., 2008. Handbook on
constructing composite indicators: Methodology and user guide. Organisation for Economic Co-

40
operation and Development (OECD), Paris.
Neumayer, E., 2010. Weak versus strong sustainability. Edward Elgar, Cheltenham.
Nilsson, M., Griggs, D., Visback, M., 2016. Map the interactions between Sustainable Development
Goals. Nature 534, 320–322. https://doi.org/10.1038/534320a
O’Neill, D.W., Fanning, A.L., Lamb, W.F., Steinberger, J.K., 2018. A good life for all within planetary
boundaries. Nat. Sustain. 1, 88–95. https://doi.org/10.1038/s41893-018-0021-4
Oh, S.-H., Lee, Y.-J., 1994. Effect of nonlinear transformations on correlation between weighted sums
in multilayer perceptrons. IEEE Trans. Neural Networks 5, 508–510.
https://doi.org/10.1109/72.286927
Parris, T.M., Kates, R.W., 2003. Characterizing and measuring sustainable development. Annu. Rev.
Environ. Resour. 28, 559–586. https://doi.org/10.1146/annurev.energy.28.050302.105551
Pearson, K., 1901. LIII. On lines and planes of closest fit to systems of points in space. Philos. Mag.
(Series 6) 2, 559–572. https://doi.org/10.1080/14786440109462720
Peng, H., Long, F., Ding, C., 2005. Feature selection based on mutual information: Criteria of max-
dependency, max-relevance and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27,
1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Piketty, T., 2014. Capital in the twenty-first century. Harvard University Press, Cambridge.
Pinar, M., Cruciani, C., Giove, S., Sostero, M., 2014. Constructing the FEEM Sustainability Index: A
Choquet integral application. Ecol. Indic. 39, 189–202.
https://doi.org/10.1016/j.ecolind.2013.12.012
Pintér, L., Hardi, P., Martinuzzi, A., Hall, J., 2018. Bellagio STAMP: Principles for sustainability
assessment and measurement, in: Bell, S., Morse, S. (Eds.), Routledge Handbook of
Sustainability Indicators. Routledge, Abingdon, pp. 21–41.
Podinovski, V. V., 2016. Optimal weights in DEA models with weight restrictions. Eur. J. Oper. Res.
254, 916–924. https://doi.org/10.1016/j.ejor.2016.04.035
Pollesch, N.L., Dale, V.H., 2016. Normalization in sustainability assessment: Methods and
implications. Ecol. Econ. 130, 195–208. https://doi.org/10.1016/j.ecolecon.2016.06.018
Pollesch, N.L., Dale, V.H., 2015. Applications of aggregation theory to sustainability assessment. Ecol.
Econ. 114, 117–127. https://doi.org/10.1016/j.ecolecon.2015.03.011
Prescott-Allen, R., 2001. The wellbeing of nations: A country-by-country index of quality of life and the
environment. Island Press, Washington, D.C.
Ramanathan, R., 2003. An introduction to data envelopment analysis: A tool for performance
measurement. Sage, New Delhi.
Rässler, S., Rubin, D.B., Zell, E.R., 2013. Imputation. WIREs Comput. Stat. 5, 20–29.
https://doi.org/10.1002/wics.1240
RobecoSAM, 2018. Corporate sustainability assessment companion. RobecoSAM, Zurich.
Rockström, J., Steffen, W.L., Noone, K., Persson, Å., Chapin III, F.S., Lambin, E., Lenton, T.M.,
Scheffer, M., Folke, C., Schellnhuber, H.J., Nykvist, B., de Wit, C.A., Hughes, T., van der Leeuw,
S., Rodhe, H., Sörlin, S., Snyder, P.K., Costanza, R., Svedin, U., Falkenmark, M., Karlberg, L.,
Corell, R.W., Fabry, V.J., Hansen, J., Walker, B., Liverman, D., Richardson, K., Crutzen, P.,
Foley, J., 2009. A safe operating space for humanity. Nature 461, 472–475.
https://doi.org/10.1038/461472a
Rogge, N., 2012. Undesirable specialization in the construction of composite policy indicators: The
Environmental Performance Index. Ecol. Indic. 23, 143–154.
https://doi.org/10.1016/j.ecolind.2012.03.020
Rotmans, J., Kemp, R., van Asselt, M., 2001. More evolution than revolution: Transition management
in public policy. Foresight 3, 15–31. https://doi.org/10.1108/14636680110803003
Rubin, D.B., 1978. Multiple imputations in sample surveys - A phenomenological Bayesian approach
to nonresponse. Proc. Surv. Res. Methods Sect. Am. Stat. Assoc. 20–28.

41
S&P Dow Jones Indices, 2019. Float adjustment: Methodology. S&P Dow Jones Indices, New York.
S&P Dow Jones Indices, 2018. Dow Jones Sustainability Indices: Methodology. S&P Dow Jones
Indices, New York.
Saaty, T.L., 2001. Fundamentals of the analytic hierarchy process, in: Schmoldt, D.L., Kangas, J.,
Mendoza, G.A., Pesonen, M. (Eds.), The Analytic Hierarchy Process in Natural Resource and
Environmental Decision Making. Springer, Dordrecht, pp. 15–36.
Sachs, J.D., 2012. From Millennium Development Goals to Sustainable Development Goals. Lancet
379, 2206–2211. https://doi.org/10.1016/S0140-6736(12)60685-0
Saisana, M., Philippas, D., 2012. Sustainable Society Index (SSI): Taking societies’ pulse along social,
environmental and economic issues: The Joint Research Centre audit on the SSI. European
Union (EU), Luxembourg.
Sala, S., Ciuffo, B., Nijkamp, P., 2015. A systemic framework for sustainability assessment. Ecol.
Econ. 119, 314–325. https://doi.org/10.1016/j.ecolecon.2015.09.015
Schafer, J.L., Graham, J.W., 2002. Missing data: Our view of the state of the art. Psychol. Methods 7,
147–177. https://doi.org/10.1037//1082-989X.7.2.147
Schmidt-Traub, G., Kroll, C., Teksoz, K., Durand-Delacre, D., Sachs, J.D., 2017. National baselines
for the Sustainable Development Goals assessed in the SDG index and dashboards. Nat.
Geosci. 10, 547–555. https://doi.org/10.1038/NGEO2985
Shaker, R.R., 2018. A mega-index for the Americas and its underlying sustainable development
correlations. Ecol. Indic. 89, 466–479. https://doi.org/10.1016/j.ecolind.2018.01.050
Shaker, R.R., 2015. The spatial distribution of development in Europe and its underlying sustainability
correlations. Appl. Geogr. 63, 304–314. https://doi.org/10.1016/j.apgeog.2015.07.009
Smith, A., Voß, J.-P., Grin, J., 2010. Innovation studies and sustainability transitions: The allure of the
multi-level perspective and its challenges. Res. Policy 39, 435–448.
https://doi.org/10.1016/j.respol.2010.01.023
Spangenberg, J.H., 2015. Indicators for sustainable development, in: Redclift, M.R., Springett, D.
(Eds.), Routledge International Handbook of Sustainable Development. Routledge, Abingdon,
pp. 308–322.
Steffen, W.L., Richardson, K., Rockström, J., Cornell, S.E., Fetzer, I., Bennett, E.M., Biggs, R.,
Carpenter, S.R., de Vries, W., de Wit, C.A., Folke, C., Gerten, D., Heinke, J., Mace, G.M.,
Persson, L.M., Ramanathan, V., Reyers, B., Sörlin, S., 2015. Planetary boundaries: Guiding
human development on a changing planet. Science (80-. ). 347, 736–746.
https://doi.org/10.1126/science.1259855
Stern, N., 2015. Why are we waiting? The logic, urgency, and promise of tackling climate change. MIT
Press, Cambridge.
Stineman, R.W., 1980. A consistently well-behaved method of interpolation. Creat. Comput. 6, 54–57.
Thioulouse, J., Simier, M., Chessel, D., 2004. Simultaneous analysis of a sequence of paired
ecological tables. Ecology 85, 272–283. https://doi.org/10.1890/02-0605
Togtokh, C., 2011. Time to stop celebrating the polluters. Nature 479, 269.
https://doi.org/10.1038/479269a
Tziogkidis, P., Matthews, K., Philippas, D., 2018. The effects of sector reforms on the productivity of
Greek banks: A step-by-step analysis of the pre-Euro era. Ann. Oper. Res. 266, 531–549.
https://doi.org/10.1007/s10479-016-2381-3
UN, 2019. World economic situation prospects. United Nations (UN), New York.
UN, 2018a. Sustainable development knowledge platform: Sustainable Development Goals [WWW
Document]. URL https://sustainabledevelopment.un.org/sdgs (accessed 8.14.18).
UN, 2018b. Global indicator framework for the Sustainable Development Goals and targets of the
2030 agenda for sustainable development. United Nations (UN), New York.
UN, 2008. International standard industrial classification of all economic activities, Rev. 4. United

42
Nations (UN), New York. https://doi.org/10.1017/CBO9781107415324.004
van de Kerk, G., Manuel, A.R., 2008. A comprehensive index for a sustainable society: The SSI - The
Sustainable Society Index. Ecol. Econ. 66, 228–242.
https://doi.org/10.1016/j.ecolecon.2008.01.029
van de Kerk, G., Manuel, A.R., Kleinjans, R., 2014. Sustainable Society Index: SSI-2014. Sustainable
Society Foundation, The Hague.
van den Bergh, J.C.J.M., 2011. Environment versus growth - A criticism of “degrowth” and a plea for
“a-growth.” Ecol. Econ. 70, 881–890. https://doi.org/10.1016/j.ecolecon.2010.09.035
von Hippel, P.T., 2005. Mean, median, and skew: Correcting a textbook rule. J. Stat. Educ. 13, 1–13.
https://doi.org/10.1080/10691898.2005.11910556
Weitz, N., Carlsen, H., Nilsson, M., Skånberg, K., 2018. Towards systemic and contextual priority
setting for implementing the 2030 agenda. Sustain. Sci. 13, 531–548.
https://doi.org/10.1007/s11625-017-0470-0
Whiteman, G., Walker, B., Perego, P., 2013. Planetary boundaries: Ecological foundations for
corporate sustainability. J. Manag. Stud. 50, 307–336. https://doi.org/10.1111/j.1467-
6486.2012.01073.x
Witjes, S., Vermeulen, W.J.V., Cramer, J.M., 2017. Assessing corporate sustainability integration for
corporate self-reflection. Resour. Conserv. Recycl. 127, 132–147.
https://doi.org/10.1016/j.resconrec.2017.08.026
Wu, J., Wu, T., 2012. Sustainability indicators and indices: An overview, in: Madu, C.N., Kuei, C.-H.
(Eds.), Handbook of Sustainability Management. World Scientific, Singapore, pp. 65–86.
Yang, Y., Webb, G.I., 2009. Discretization for naive-Bayes learning: Managing discretization bias and
variance. Mach. Learn. 74, 39–74. https://doi.org/10.1007/s10994-008-5083-5
Yu, L., Liu, H., 2004. Efficient feature selection via analysis of relevance and redundancy. J. Mach.
Learn. Res. 5, 1205–1224.
Zhou, P., Ang, B.-W., Poh, K.-L., 2007. A mathematical programming approach to constructing
composite indicators. Ecol. Econ. 62, 291–297. https://doi.org/10.1016/j.ecolecon.2006.12.020
Zhou, P., Ang, B.-W., Poh, K.-L., 2006. Comparing aggregating methods for constructing the
composite environmental index: An objective measure. Ecol. Econ. 59, 305–311.
https://doi.org/10.1016/j.ecolecon.2005.10.018
Zhou, P., Fan, L.-W., Zhou, D.-Q., 2010. Data aggregation in constructing composite indicators: A
perspective of information loss. Expert Syst. Appl. 37, 360–365.
https://doi.org/10.1016/j.eswa.2009.05.039
Zuo, X., Hua, H., Dong, Z., Hao, C., 2017. Environmental Performance Index at the provincial level for
China 2006-2011. Ecol. Indic. 75, 48–56. https://doi.org/10.1016/j.ecolind.2016.12.016

43

You might also like