You are on page 1of 184

DISS. ETH No.

24196

DIGITAL SOIL MAPPING


FOR SWITZERLAND
EVALUATION OF STATISTICAL APPROACHES
AND MAPPING OF SOIL PROPERTIES

A thesis submitted to attain the degree of

DOCTOR OF SCIENCES of ETH ZURICH

(Dr. sc. ETH Zürich)

presented by

MADLENE NUSSBAUM

MSc. in Geography, University of Zurich


born on 13 January 1984
citizen of Schlosswil BE

accepted on the recommendation of

Prof. Dr. Dani Or


Dr. Andreas Papritz
Dr. Martin Mächler
Dr. Marco Carizzoni

2017
Citation
Nussbaum, M.: Digital Soil Mapping for Switzerland, Evaluation of Statistical
Approaches and Mapping of Soil Properties, Ph.D. thesis, ETH Zurich, Switzerland,
doi:10.3929/ethz-b-000193154, 2017.
Contents

Abstract v

Zusammenfassung vii

Acknowledgements ix

List of abbreviations xi

1. General introduction 1
1.1. Soil mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1. Conventional soil mapping (CSM) . . . . . . . . . . . . . . . 2
1.1.2. Digital soil mapping (DSM) . . . . . . . . . . . . . . . . . . 3
1.2. Research projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3. Research questions and objectives . . . . . . . . . . . . . . . . . . . 5
1.4. Structure of PhD thesis . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. Mapping of soil properties by boosted geoadditive models 7


2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2. geoGAM modelling framework . . . . . . . . . . . . . . . . . . . . . 11
2.2.1. Model representation . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2. Model building (selection of covariates) . . . . . . . . . . . . 13
2.2.3. Predictions and predictive distribution . . . . . . . . . . . . 15
2.3. Case studies - Materials and Methods . . . . . . . . . . . . . . . . . 16
2.3.1. Study regions . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3. Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 20
2.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1. ECEC – case study 1 . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2. Presence of waterlogged soil horizons – case study 2 . . . . . 29
2.4.3. Drainge classes – case study 3 . . . . . . . . . . . . . . . . . 31
2.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.1. Model building and covariate selection . . . . . . . . . . . . 34
2.5.2. Model structure . . . . . . . . . . . . . . . . . . . . . . . . . 35

i
Contents

2.5.3. Predictive performance of fitted models . . . . . . . . . . . . 36


2.5.4. Spatial structure of predicted maps . . . . . . . . . . . . . . 37
2.6. Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . 37

3. Evaluation of statistical approaches 39


3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2. Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.1. Study regions . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.2. Soil data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3. Covariates for statistical modelling . . . . . . . . . . . . . . 48
3.3. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.1. Group lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2. Robust external-drift kriging (georob) . . . . . . . . . . . . . 51
3.3.3. Boosted geoadditive model (geoGAM) . . . . . . . . . . . . 52
3.3.4. Boosted regression trees (BRT) . . . . . . . . . . . . . . . . 52
3.3.5. Random forest (RF) . . . . . . . . . . . . . . . . . . . . . . 53
3.3.6. Model averaging (MA) . . . . . . . . . . . . . . . . . . . . . 53
3.3.7. Legacy soil map . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.8. Evaluating predictive performance . . . . . . . . . . . . . . . 54
3.4. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.1. Model building . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.2. Evaluation of model performance . . . . . . . . . . . . . . . 57
3.4.3. Evaluation of covariate relevance . . . . . . . . . . . . . . . 64
3.4.4. Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.5. Practical use of statistical methods . . . . . . . . . . . . . . 69
3.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4. Pedotransfer function to predict density of forest soils 73


4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2. Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.2. Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 76
4.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5. SOC estimation by robust external-drift kriging 85


5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2. Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.1. Study region . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.2. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

ii
Contents

5.2.3. Statistical analyses . . . . . . . . . . . . . . . . . . . . . . . 95


5.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3.1. Calculated SOC stocks . . . . . . . . . . . . . . . . . . . . . 100
5.3.2. Models for SOC stocks in 0–30 cm and 0–100 cm depth . . . 101
5.3.3. Validation of SOC stock predictions with independent data . 102
5.3.4. Prediction of SOC stocks for Swiss forest soils . . . . . . . . 102
5.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.1. Model building and covariate selection . . . . . . . . . . . . 104
5.4.2. Residual spatial autocorrelation . . . . . . . . . . . . . . . . 105
5.4.3. Robust parameter estimation . . . . . . . . . . . . . . . . . 105
5.4.4. Predictive performance of fitted models . . . . . . . . . . . . 106
5.4.5. Spatial structure of SOC stock predictions . . . . . . . . . . 106
5.4.6. Comparison with SOC stock estimates of previous studies . 107
5.5. Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . 107

6. Concluding remarks and outlook 109


6.1. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2. Outlook – DSM in Switzerland . . . . . . . . . . . . . . . . . . . . 112
6.2.1. Map scale and precision . . . . . . . . . . . . . . . . . . . . 112
6.2.2. Soil data acquisition . . . . . . . . . . . . . . . . . . . . . . 113
6.2.3. Cost and duration of mapping campaigns . . . . . . . . . . . 114
6.2.4. Stakeholders’ objections to DSM . . . . . . . . . . . . . . . 115
6.2.5. Institutional structures discouraging DSM . . . . . . . . . . 116

References xiii

A. Appendix xxix
A.1. Supplementary material to Chapter 2 . . . . . . . . . . . . . . . . . xxxi
A.2. Supplementary material to Chapter 3 . . . . . . . . . . . . . . . . . xxxix
A.3. Supplementary material to Chapter 5 . . . . . . . . . . . . . . . . . liii
A.4. List of publications . . . . . . . . . . . . . . . . . . . . . . . . . . . lxi

iii
Abstract

Soils fulfil many important functions in agriculture, forestry, natural hazards and
resources management. The functionality of soils depends on their properties.
For optimal spatial management of soils accurate and spatially highly resolved
maps of properties such as texture, soil organic carbon (SOC) content or pH are
required. Such spatial information is missing for many regions, and mapping
soils by conventional field survey is time-consuming and costly. Meanwhile, soil
data from previous surveys (legacy data) and exhaustive environmental datasets
describing soil forming factors are often available for a region of interest.
Digital soil mapping (DSM) is a technique that relates (legacy) soil data to en-
vironmental datasets (covariates). Statistical models exploit correlations between
soil properties (or soil types) and covariates to calculate predictions at locations
where no soil sample has been taken. Large covariate sets have become common
recently, because aerial or satellite imagery and multiple-scale terrain analysis gen-
erate large amount of data that can be used profitably for DSM. Model building,
especially selecting relevant covariates, is then challenging.
This thesis proposes a new boosted geoadditive modelling framework (geoGAM)
for DSM. A geoGAM additively models nonlinear relations between soil properties
and single covariates and accounts for spatial autocorrelation. Selection of covari-
ates relies on componentwise gradient boosting, a machine learning technique that
builds models in small forward steps. The framework was successfully applied to
soil data sampled in the Canton of Zurich, Switzerland. GeoGAM combines effi-
cient model building from large sets of covariates with ease of effect interpretation
and therefore likely raises acceptance of DSM products by end-users.
In another study SOC stocks in Swiss forest soils were mapped by robust
external-drift kriging (EDK), because additionally regional and national mean
SOC stocks had to be estimated. Linear relationships between SOC stocks and
environmental covariates and spatial autocorrelation were estimated by a robust
approach, which is insensitive to outliers in the data. Estimated mean regional and

v
Abstract

national SOC stocks showed that previous studies underestimated SOC stocks of
topsoil slightly and those of subsoils strongly. Block kriging allowed to accurately
estimate standard errors (SE) of regional and national means. The resulting SE
were substantially smaller than SE of previous estimates.
In addition to geoGAM and EDK, many statistical approaches have been used
for DSM, but it is not evident from the DSM literature which method gener-
ally performs best. This thesis evaluated six approaches for DSM with soil data
from three study regions in Switzerland. Models were built from 300–500 environ-
mental covariates by 1) grouped lasso, a shrinkage method for linear models, 2)
robust EDK, 3) geoGAM, and two tree-based machine learning methods 4) boos-
ted regression trees (BRT) and 5) random forest (RF). Lastly, 6) weighted model
averages (MA) from predictions by methods 1–5 were computed. Differences in
predictive performance, tested on independent validation datasets, were mostly
small and did not reveal a single best method. Nevertheless, RF was on average
performing most often best among methods 1–5, but was outcompeted by MA for
about half of the modelled responses.
A prerequisite for DSM are harmonized and complete soil datasets. Especially,
soil density is often missing. A new pedotransfer function (PTF) for predicting
soil density of Swiss forest soils was therefore developed in this thesis. The PTF
predicts soil density mainly from field density estimates, and it clearly outperforms
published PTF used on the data.
In summary, this thesis evaluated statistical approaches for DSM and produced
soil property maps for study regions in Switzerland as basis for digital soil function
assessment. Moreover, it contributed to soil data harmonization.

vi
Zusammenfassung

Der Boden erfüllt zahlreiche Funktionen, nicht nur in der Land- und Forstwirtschaft,
er filtert unser Trinkwasser, schützt uns vor Hochwasser und bietet Lebensraum für
eine grosse Vielfalt von Lebewesen. Diese Funktionen sind von den Eigenschaften
des Bodens abhängig. Um Böden optimal räumlich zu nutzen und ihre Funktionen
zu erhalten, sind genaue und hoch aufgelöste Karten von Bodeneigenschaften wie
Textur, organischem Kohlenstoffgehalt (Corg ) oder pH notwendig. Solche Karten
fehlen jedoch für viele Gebiete. Kartierungen mittels konventioneller Feldauf-
nahme kosten viel und nehmen viel Zeit in Anspruch. Gleichzeitig sind für viele
Gebiete Bodendaten aus früheren Erhebungen (Archivdaten) sowie flächendeck-
ende Umwelt-Geodatensätze (z. B. Höhenmodell, geologische Karte) vorhanden.
Die digitale Bodenkartierung (digital soil mapping, DSM) ist ein Ansatz, um
(Archiv-)Bodendaten mit Umweltdatensätzen (Kovariablen) zu verknüpfen. Statis-
tische Modelle schöpfen die Korrelation zwischen Bodeneigenschaften (oder Bo-
dentypen) und Kovariablen aus, um anschliessend Vorhersagen für Standorte ohne
Erhebung zu berechnen. Für DSM werden vermehrt sehr grosse Kovariablensets
verwendet. Terrain-Analyse auf mehreren Massstabsebenen sowie Luft- und Satel-
litenbilder erlauben zahlreiche Ableitungen, die für DSM eingesetzt werden können.
Die Erstellung der statistischen Modelle, vor allem die Auswahl relevanter Kovari-
ablen, wird damit zu einer Herausforderung.
Diese Doktorarbeit schlägt ein neues DSM-Werkzeug zur Erstellung räumlicher
additiver Modelle (geoGAM) vor. Ein geoGAM modelliert nicht-lineare Beziehun-
gen zwischen Bodeneigenschaften und einzelnen Kovariablen in additiver Weise
und berücksichtigt zudem die räumliche Autokorrelation. Die Auswahl relevanter
Kovariablen stützt sich hauptsächlich auf den “componentwise gradient boosting”-
Algorithmus; eine Technik des maschinellen Lernens, die das Modell in kleinen
Vorwärtsschritten aufbaut. Der Ansatz wurde erfolgreich auf Archiv-Bodendaten
angewandt, welche im Kanton Zürich (Schweiz) erhoben worden sind. GeoGAM
kombiniert effiziente Wahl relevanter Kovariablen aus grossen Sets mit einfacher
Interpretation der modellierten Beziehungen. Möglicherweise verbessert dies die
Akzeptanz von DSM-Produkten bei Anwenderinnen und Anwendern von Boden-

vii
Zusammenfassung

karten.
Eine weiterer Teil der Arbeit befasst sich mit der Modellierung der Corg -Vorräte
im Schweizer Waldboden mittels Kriging mit externer Drift (EDK), um zusätzlich
regionale und nationale Corg -Vorräte zu schätzen. Lineare Beziehungen zwischen
Corg -Vorräten und Umweltkovariablen sowie die räumliche Autokorrelation wurden
durch einen robusten Ansatz geschätzt, welcher die Verzerrung des Modells durch
Ausreisser in den Daten reduziert. Die geschätzten regionalen und nationalen Corg -
Vorräte zeigten, dass vorangehende Studien Corg -Vorräte im Oberboden leicht und
im Unterboden stark unterschätzt hatten. Die Anwendung von Block-Kriging er-
laubte die genaue Schätzung der Standardfehler von regionalen und nationalen
Mittelwerten. Die resultierenden Standardfehler waren bedeutend kleiner als bish-
erige Schätzungen.
Neben geoGAM und EDK werden zahlreiche statistische Ansätze für DSM ver-
wendet. Aus der Literatur wird jedoch nicht klar, welche Methode generell die
besten Resultate erzielt. In dieser Doktorarbeit wurden sechs DSM-Verfahren
mit Bodendaten aus drei Studiengebieten in der Schweiz geprüft. Aus den zur
Verfügung stehenden 300–500 Kovariablen wurden Modelle erstellt mit 1) Group
Lasso, einer Schrumpfungsmethode für lineare Modelle, 2) robustem EDK, 3) geo-
GAM, sowie mit zwei maschinellen Lerntechniken, die auf Entscheidungsbäumen
basieren, näm-lich 4) Boosted Regression Trees und 5) Random Forest (RF).
Zusätzlich wurden gewichtete Mittelwerte der Vorhersagen von 1–5 berechnet
(Model Averaging, MA). Die Unterschiede in der Vorhersage, berechnet für un-
abhängige Validierungsdaten, waren zumeist klein, und es konnte keine beste Meth-
ode ermittelt werden. Dennoch zeigte RF im Mittel unter Methoden 1–5 am
häufigsten die beste Vorhersage, wurde jedoch durch MA bei etwa der Hälfte der
modellierten Bodeneigenschaften noch geringfügig übertroffen.
Harmonisierte und komplette Bodendatensätze sind eine Vorbedingung für DSM.
Die Dichte des Bodens wird jedoch häufig nicht erhoben. Deshalb wurde in dieser
Doktorarbeit eine neue Pedotransfer-Funktion (PTF) zur Vorhersage der Fein-
erdedichte von Schweizer Waldböden entwickelt. Die PTF zeigt für die verwen-
deten Daten eine deutlich bessere Vorhersage als Transferfunktionen bisheriger
Studien.
Zusammenfassend dargestellt, wurden in dieser Doktorarbeit statistische Ansätze
für DSM geprüft und digitale Bodeneigenschaftskarten für Studiengebiete in der
Schweiz erstellt. Weiter wurde ein Beitrag zur Datenharmonisierung geleistet.

viii
Acknowledgements

Clearly, such an undertaking as presented here cannot be done by one single person
alone. Many individuals contributed directly or indirectly in some way to this work:
My largest thanks go to my thesis supervisor Andreas Papritz: you were always
very supportive to my career, an extremely competent person to ask technical or
mountaineering questions and you always knew who to consult next in case you
were not sure; project collaboration was very straightforward with clear guidelines
and realistic expectations, often we had a laugh even about work things and last
but not least, proposals were realistic! Only with all this preliminary work, orga-
nization and guidance it was possible for my to venture as far (das isch aues bi
wiitem nid säubscht-verständlech).
Then I would like to thank Dani Or for hosting me in his research group allowing
my projects to run under the umbrella of STEP, to provide all needed computing
infrastructure and grant as much freedom as was necessary. Many thanks go to
further members of the thesis committee Marco Carizzoni and Martin Mächler for
their time and effort to make sense of all the content of this thesis.
Moreover, I would like to thank PMSoil project team and contributors: Lucie
Greiner for all the input on how to model waterlogging, Michael Schaepman and
Sanne Diek for preprocessing the countless bands of APEX, Armin Keller, Lucas
Bridler, Micha Lussi, Urs Grob and Lorenz Walthert for their extremely important,
laborious and highly undervalued work of soil data correction and harmonization,
Kay Spiess for his innovative random forest groundwork, Andri Baltensweiler and
Marielle Fraefel for all the terrain attributes processing always finished in very
good time, and again Marco Carizzoni, Urs Zihlmann and Stefan Zimmermann
for giving input in one way or other. I am very grateful to the Soil Protection
Agencies of the Cantons of Zurich and Berne, more specifically to Ubald Gasser,
Alex Lehmann and to Peter Trachsel for letting me work with and making me
understand their soil data, and especially to Res Chervet for being supportive
of whatever soil data analysis I touched. I would like to thank Lorenz Ramseier
and Christine Rupflin for letting me join their soil surveying and Martin Zürrer for
helping me better understand FAL24(+), and again Martin Zürrer, Alex Lehmann
and Peter Schwab for assisting me to figure out Zurich’s legacy soil map.

ix
Acknowledgements

Then, there are many many people I can not list here, because I do not even
know their names: those are soil surveyors that collected all these samples over
the years, lab staff doing soil analysis, persons operating satellites and preprocess-
ing imagery, people collecting elevation or rainfall data, staff updating drainage
networks, geologists who sampled maps and a large guild of unknown open source
programmers (R packages, JabRef, Inkscape, Debian, Gnome, QGis, SAGA, RStu-
dio, LATEX, TexStudio, LibreOffice, etc.) most likely offering their spare time for
my benefit.
I would like to thank everybody being around CHN E floor since my Master the-
sis in April 2011, particularly (former) STEP members: Franziska, Minsu, Linfeng,
Stan, Frouke, Dani B, Milad, Ali, Sämi, Ben, Hans, Hannah, Leo, Peter, Robin,
Liliana, Pierre, Sabine, Sami, Siul, Gernot, Marine, Shai, Niels, Christina, Olga,
Elena, Jonas vR, Jonas W, Gang, Yu, Huifang, Roman, Kerstin, Ana, Björn, Su-
san, Elham, Nikos, Gilbert, Rainer, Christine, Michi, Katrin for all kinds of coffee
room discussions, exploring new lunch facilities (thanks Gilbert), for all the swim
trainings and concerts, joining for Snow and Fun or being entertaining office mates
(thanks for the late evening conversations), for joining yoga classes, ice skating,
cutting trees and removing weeds or just enjoying views on Alp Samada, the after-
work pints, eating out, dinner invitations, jumps into the Limmat, hiking trips,
snow shoeing or cross country skiing, cycling tours and what not!
I would like to thank Theresa Moser for being an anatomically very competent
and great yoga teacher (over all these years I did not miss many Wednesday lessons)
saving me from common back troubles in this research group and Sandra Creazione
Blu for regularly tidying up my hair during coffee breaks.
Then very large thanks go to Daniel not so much for detaining my attention in
the statistics class, but more for maintaining our computational infrastructure, for
always readily helping with any software issues or for persuading me what type of
tool ”I want” and for generally having a relaxing effect on me.
It was a pleasure doing this research!

x
List of abbreviations

AIC Akaike information criterion


ALR additive log-ratio transformation
ANN artificial neural networks
APEX Airborne Prism Experiment
a.s.l above sea level
BD density of fine soil fraction (≤ 2 mm)
BIC Bayesian information criterion
BRT boosted regression trees
BS Brier score
BSS Brier skill score
C carbon
CART classification and regression trees
CM class matching
CRPS continuous ranked probability score
CSM conventional soil mapping
CV cross-validation
DEM digital elevation model
DSM digital soil mapping
ECEC effective cation exchange capacity
EDF estimated degree of freedom
EDK external-drift kriging
FOEN Swiss Federal Office for the Environment
GAM generalized additive models
geoGAM boosted geoadditive model selection framework
georob robust external-drift kriging
GHG greenhouse gas
GLM generalized linear models
GS Gerrity score
GSS Gilbert skill score
lasso least absolute shrinkage and selection operator
LM linear model
LS ordinary least squares
MA weighted model averaging
MAD median absolute deviation

xi
List of abbreviations

ME mean error
ML machine learning
MRVBF multi-resolution valley bottom flatness
MSE mean squared error
N nitrogen
NDVI normalized differenced vegetation index
NFI national forest inventory
NRP national research programme
OC organic carbon
OK ordinary kriging
PC percentage correctly classified
PIT probability integral transform
PSS Pierce skill score
PTF pedotransfer function
R2 coefficient of determination
REML restricted maximum likelihood
RF random forest
RK regression kriging
RMSE root mean squared error
RPS ranked probability score
RPSS ranked probability skill score
SD effective soil depth available to plants
SE standard error
SOC soil organic carbon
SOM soil organic matter
sqrt square root
SSmse mean squared error skill score
STR stratified random sampling
SVM support vector machines
TPI topographic position index
TWI topographic wetness index
WSL Swiss Federal Institute for Forest, Snow and Landscape Research
ZH Canton of Zurich

xii
1. General introduction

Soils are the weathered top layer of the earth’s surface consisting of minerals and
decayed organic material, gases, liquids and countless organisms. This complex
system sustains terrestrial life with many relevant functions. These depend on
soil properties that are a product of soil formation influenced by relief, climate,
plants, animals, humans and parent material interacting over time. Increasing
claims on soils leads to severe degradation or even complete loss of this nonre-
newable resource. Globally, FAO and ITPS (2015) defined ten major threats to
soils’ function to support life: erosion, organic carbon change, nutrient imbalance,
salinization, sealing, loss of soil biodiversity, acidification, compaction, waterlog-
ging and contamination. Soil functions are relevant to 9 out of 17 UN sustainable
development goals (Wall et al., 2015, box 2). In addition, the European Union is
specifically concerned about loss of soil organic matter in peat soils, desertification,
flooding and landslides (Stolte et al., 2015).
Mitigation of soil threats requires three-dimensional spatial information on state
and functioning of soils (Omuto and Nachtergaele, 2013). Need of spatial soil infor-
mation goes far beyond pedological research. Clearly, one main user of soil maps is
the agricultural sector for farm management, soil organic matter preservation or for
advising on erosion (Gisler et al., 2010), for compaction or nutrient loss reduction,
to decelerate degradation of organic soils or for planning irrigation infrastructure
to adapt to climate change (BAFU, 2012). Also for forest management, selecting
tree species well adapted to local conditions becomes more important with raising
temperatures, longer drought periods (Pluess et al., 2016) or increasingly acidified
soils (Zimmermann et al., 2011). Besides farming and forestry, spatial evaluation
of soils’ potential for biomass production is relevant for urban and regional plan-
ning (ARE, 2006). Finding soils with large potential to support rare plant species
results in more efficient biodiversity protection strategies. Maps of soils’ poten-
tial to infiltrate and store water are inputs to model discharge into rivers (Naef
et al., 2007) and form a basis for implementing flood control measures. More-
over, landslide forecasting can be improved by using soil type information (Fan

1
1. General introduction

et al., 2016). Carbon sequestration of soils can only be accurately quantified with
detailed spatial information on carbon stocks (FOEN, 2016).
Above examples briefly demonstrated that accurate soil maps support broad
assessment of soil functions (see also Table 3.1) and hence allow to protect the
resource soil. Unfortunately, spatial soil information is missing for many regions,
is not available at the desired resolution or does not provide information on the
required soil properties or functions (Omuto and Nachtergaele, 2013, Tables 2.2–
2.7, for details on Switzerland, see Knecht et al., 2017, Table 13). The remainder
of this chapter briefly explains concepts of conventional (CSM) and digital soil
mapping (DSM), gives some information on research projects in which the current
work was embedded and states general objectives.

1.1. Soil mapping

1.1.1. Conventional soil mapping (CSM)

Subsequent description of conventional soil mapping (CSM) methodology is based


on survey manuals form Switzerland (Ruef and Peyer, 1996; Brunner et al., 1997;
Jäggli et al., 1998; AfUSO, 2014), but guidelines of other countries are very sim-
ilar (U.S. Soil Survey Division Staff, 1993; Ad-hoc Arbeitsgruppe Boden, 2005;
Kempen et al., 2012). Conventional soil maps are created by “free” field survey.
Map acquisition is based on conceptual (mental) soil-landscape models of the sur-
veyor, built on his or her previous experience in similar areas. CSM can roughly
be divided in three main steps:

1. The soil surveyor collects all available (spatial) data like aerial imagery, topo-
graphical maps, elevation models and maybe visits the area to get a general
overview. Based on this initial information, expected areas with similar soil
properties are delimited on a first conceptual map and suitable locations for
profile pits are determined.
2. A complete pedological description of these soil profiles is made, and soil
samples for laboratory analysis are taken.
3. The actual mapping takes place by delineating map units (polygons) in the
field with the aim of obtaining areas with similar soil properties and thus
small within-polygon variation. Polygons are demarcated based on landscape
elements and by using soil property data estimated in the field by augering.

2
1.1. Soil mapping

Soil classification qualifiers and a range of soil properties in top- and sub-
soil are assigned to each polygon. Topsoil is defined as genetic A horizon
and subsoil as the underlying material. Additionally, soil type and rating
of potentials for soil functions (e.g. agricultural production) are attributed.
Depending on target map scale, polygons with large within variation are as-
signed several soil types along with their probability of occurrence (complex
units).

As reproducibility is a concern of CSM, quality control measures are required


(AfUSO, 2014). CSM results in polygons with crisp boundaries neglecting within-
polygon variation. As pointed out by CSM manuals, delineation of polygons seems
the most difficult task throughout CSM (e.g. Peyer and Frey, 1992, p. 32). CSM
heavily relies on surveyors’ experience that can only be acquired over a long time
period. Future soil mapping projects for large areas most likely will suffer from
lack of trained personnel as education of new surveyors requires time (Knecht
et al., 2017). Further, CSM is (field) work-intensive and therefore costly. Costs
increase roughly linearly with mapped area and possibilities for decreasing costs
by mapping large areas seem very limited (Kempen et al., 2012). Stakeholders
needs became more complex and reach beyond the top- and subsoil map content
of current CSM (Knecht et al., 2017). Finally, if new soil data becomes available or
follow up surveys are done for monitoring changes of soil properties, direct updates
of conventional soil maps are difficult.

1.1.2. Digital soil mapping (DSM)

Similar to CSM, digital soil mapping (DSM) exploits relationships between soil
properties and environmental soil forming factors. Instead of working with implicit
conceptual models of soil surveyors, DSM attempts to make these relationships
explicit in the form of landscape-soil decision rules or statistical correlations. The
emergence of harmonized legacy soil databases, elevation models and airborne or
satellite-measured spectral information with high spatial resolution, proximally
sensed data, advances in statistical methodology and readily available computing
power to handle large datasets stimulated the development of many variants of
DSM and allowed their application in practice (Sect. 2.1, 3.1 and 5.1). According
to McBratney et al. (2003), DSM models soil type or properties S as function of
environmental covariates by

3
1. General introduction

S = f (s, c, o, r, p, a, n). (1.1)

s refers to information from previous soil maps or remotely sensed data on bare
soil, c are climate factors as temperature or rainfall, o are organisms, for exam-
ple vegetation maps or man-made soil changes, r characterizes topography, p the
parent material from which soils developed while a stands for the elapsed time to
form a soil and n for its relative spatial position.
As outlined in Minasny and McBratney (2016), DSM consists at least of follow-
ing three main components:
1. Input of
a) soil data (laboratory measurements or field estimates) previously (le-
gacy data) or newly sampled, preferably by a randomized statistical
design,
b) environmental covariates derived from spatially comprehensive geodata
sets,
2. a process in the form of a (spatial) soil inference system, this includes sta-
tistical models relating soil data to environmental covariates (Eq. 1.1) and
3. output in the form of spatial soil information including rasters of predictions
from (2) along with information on their uncertainty. If new information
(1.a and/or 1.b) becomes available this output can readily be updated.
Final maps are spatial grids of predictions that map gradual changes of soil proper-
ties, in contrast to the crisp polygons in CSM. Digital maps predict soil properties
at predefined soil depth layers of the input soil data. Soil types are represented as
probability of occurrence. Quality of DSM output is verified through comparison
of final maps to soil data not previously used in the DSM process. Ideally, these
samples are selected by a randomized design independently from the input to the
DSM process (Brus et al., 2011). DSM exploits existing soil data and can answer
new questions by reanalysis of available data.

1.2. Research projects

The results presented in my thesis were part of two research projects. First,
Chapters 2 an 3 relate to work done under the umbrella of the Swiss National
Research Programme NRP 68 Sustainable use of soil as a resource in the research

4
1.3. Research questions and objectives

project PMSoil: Predictive mapping of soil properties for the evaluation of soil
functions at regional scale. PMSoil was a joint effort of ETH Zurich, the Swiss
Federal Institute for Forest, Snow and Landscape Research (WSL), Agroscope
(ART), the Swiss Soil Monitoring Network (NABO) and the University of Zurich.
It contained four workpackages: A) harmonization of legacy soil data from different
sources and sampled over a long time period, B) multi-scale terrain modelling
by different approaches and evaluation hyperspectral remote sensing for DSM,
C) evaluation of statistical modelling approaches for DSM, especially geoadditive
models, and D) pedotransfer functions and soil function maps for comprehensive
soil function assessment. Study regions of PMSoil were determined by availability
of soil and remote sensing data and needs of stakeholders and two other NRP 68
research projects OPSOL: Collaborative decision support system for matching soil
functions and soil uses and iMSoil: Modelling agricultural management and soil
functions. My thesis is mainly concerned with workpackage C of PMSoil.
Second, results in Chapters 4 and 5 evolved within the project Organic carbon
stocks in forest soils of Switzerland which supported the Swiss greenhouse gas
inventory done by the Swiss Federal Office for the Environment (FOEN).

1.3. Research questions and objectives

The work of my thesis was guided by the following questions (details on research
questions and objectives are given in the introduction sections of Chapters 2 to 5
and are not repeated here):

• Spatial generalized additive models (geoGAM, Hothorn et al., 2011) were


not used for DSM. For large sets of covariates geoGAM can be efficiently
built by componentwise gradient boosting (Bühlmann and Hothorn, 2007).
How feasible is such a boosted geoadditive modelling approach to predict soil
properties or classes and to model prediction uncertainty? Do the practical
requirements allow routine use of geoGAM for DSM?
• Do geoGAM outperform other DMS approaches such as external-drift krig-
ing (EDK), boosted regression trees (BRT), random forest (RF) to map soil
properties? Can the convenient but rarely used lasso (least absolute shrink-
age and selection operator) compete with geoGAM, EDK, BRT or RF? Is
model averaging better than computing predictions by single models? How
precise are DSM predictions compared to predictions derived from a conven-
tional soil map?

5
1. General introduction

Following from the above questions one can formulate the objectives that guided
my work:

1. Develop a new DSM method based on geoGAM that processes binary, or-
dinal and continuous responses and any kind of covariate data, accounts for
nonlinear and nonstationary relationships between responses and covariates,
automatically builds parsimonious and readily interpretable models for large
sets of covariates and estimates the full predictive distributions at unvisited
sites by bootstrapping.
2. Validate statistical methods (geoGAM, lasso, EDK, BRT, RF) for DSM of
soil properties in three study regions in Switzerland and assess the accuracy
of predictions by a conventional (legacy) soil map.
3. Generate maps of soil properties for study regions in Switzerland driven by
the needs of soil function assessment. Calibrate and validate the selected
DSM methods for the study regions.
4. Provide reliable spatial information on soil organic carbon (SOC) stocks of
forest soils in Switzerland along with reliable estimates of the precision of
predictions. And, in response to specific needs of FOEN, estimate mean
forest soil SOC stocks for the whole country and its main regions, again with
estimates of standard errors.

1.4. Structure of PhD thesis

My thesis is a compilation of four (peer-reviewed) articles forming its main Chap-


ters 2 to 5. The first article introduces a strategy to build parsimonious geoadditive
models largely automatically if numerous covariates are available. The second ar-
ticle evaluates how well several statistical DSM approaches predict multiple soil
properties in three study regions in Switzerland. The third short article presents
a new pedotransfer function (PTF) to predict bulk density of forest soil horizons
if such measurements are missing. The fourth article describes an application of
robust geostatistics to map soil organic carbon stocks stored in Swiss forest soils.
The last chapter draws a few general conclusions from the presented work and
closes with an outlook how DSM might in the future help to generate spatial soil
information in Switzerland. Finally, the appendix contains additional material to
Chapters 2, 3 and 5.

6
2. Mapping of soil properties at
high resolution using boosted
geoadditive models

Chapter 2 was published as research article: Nussbaum, M., Walthert, L., Fraefel, M., Greiner,
L., and Papritz, A.: Mapping of soil properties at high resolution in Switzerland using boosted
geoadditive models, SOIL Discussions, 2017, 1–32, doi:10.5194/soil-2017-13, URL http://www.
soil-discuss.net/soil-2017-13/, 2017 (accepted on 06/10/2017 to be published in SOIL).

Abstract

High-resolution maps of soil properties are a prerequisite for assessing soil threats
and soil functions and to foster sustainable use of soil resources. For many regions
in the world accurate maps of soil properties are missing, but often sparsely sam-
pled (legacy) soil data are available. Soil property data (response) can then be
related by digital soil mapping (DSM) to spatially exhaustive environmental data
that describe soil forming factors (covariates) to create spatially continuous maps.
With air- and spaceborne remote sensing and multi-scale terrain analysis large sets
of covariates have become common. Building parsimonious models, amenable to
pedological interpretation, is then a challenging task.
We propose a new boosted geoadditive modelling framework (geoGAM) for
DSM. A geoGAM models smooth nonlinear relations between responses and single
covariates and combines these model terms additively. Residual spatial autocorre-
lation is captured by a smooth function of spatial coordinates, and nonstationary
effects are included by interactions between covariates and smooth spatial func-
tions. The core of fully automated model building for geoGAM is componentwise
gradient boosting.
We illustrate the application of the geoGAM framework by using soil data from

7
2. Mapping of soil properties by boosted geoadditive models

the Canton of Zurich, Switzerland. We modelled effective cation exchange capac-


ity (ECEC) in forest topsoils as continuous response. For agricultural land we
predicted the presence of waterlogged horizons in given soil depths as binary and
drainage classes as ordinal responses. For the latter we used proportional odds geo-
GAM taking the ordering of the response properly into account. Fitted geoGAM
contained only few covariates (7 to 17) selected from large sets (333 covariates for
forests, 498 for agricultural land). Model sparsity allowed covariate interpretation
by partial effects plots. Prediction intervals were computed by model-based boot-
strapping for ECEC. Predictive performance of the fitted geoGAM, tested with
independent validation data and specific skill scores (SS) for continuous, binary
and ordinal responses, compared well with other studies that modelled similar
soil properties. SS of 0.23 up to 0.53 (with SS = 1 for perfect predictions and
SS = 0 for zero explained variance) were achieved depending on response and type
of score. geoGAM combines efficient model building from large sets of covariates
with ease of effect interpretation and therefore likely raises the acceptance of DSM
products by end-users.

2.1. Introduction

Soils fulfil many functions important for agriculture, forestry and the management
of soil resources and natural hazards. The functionality of soils depends on their
properties, hence, accurate and spatially highly resolved maps of basic soil prop-
erties such as texture, organic carbon content and pH for specific soil depth are
needed for sustainable management of soils (FAO and ITPS, 2015). Unfortunately,
such soil property maps are often missing and availability of soil information is very
different between nations and continents (Omuto and Nachtergaele, 2013). For ar-
eas where spatially referenced, but sparse (legacy) soil data is available, e.g. soil
datasets consisting of soil profile data and laboratory measurements, these point
data can be linked using digital soil mapping (DSM) techniques (e.g. McBratney
et al., 2003; Scull et al., 2003) to spatial information on soil formation factors to
generate spatially continuous maps.
In the past, many DSM approaches have been proposed to exploit the correla-
tion between soil properties (response Y (s)) and soil forming factors (covariates
x(s)). Linear regression modelling (LM, e.g. Meersmans et al., 2008; Hengl et al.,
2014) and kriging with external drift (EDK), its extension for autocorrelated errors

8
2.1. Introduction

(Chapt. 5, Bourennane et al., 1996), have been often used. Strengths of LM and
EDK are the ease of interpretation of the fitted models (e.g. by partial residual
plots, Faraway, 2005, p. 73). This is important for checking whether modelled
relations between the target soil property and soil forming factors accord with
pedological expertise and for conveying results of DSM analyses to users of such
products. LM and EDK capture only linear relations between the covariates and
a response. By using interactions between covariates, one can sometimes account
for nonlinear relationships, but this quickly becomes unwieldy for a large number
of covariates (e.g. above 30). Fitting models to (very) large sets of covariates has
become common with the advent of remotely sensed data (Ben-Dor et al., 2009;
Mulder et al., 2011) and novel approaches for terrain analysis (Behrens et al.,
2010a). Model building, i.e. covariate selection, is then a formidable task. Al-
though specialized methods like L2-boosting (Bühlmann and Hothorn, 2007) and
lasso (least absolute shrinkage and selection operator, Hastie et al., 2009, Chapt. 3)
are available, they have not often been used for DSM (Chapt. 5, Liddicoat et al.,
2015; Fitzpatrick et al., 2016). Generalized linear models (GLM, e.g. Dobson, 2002)
extend linear modelling to binary, nominal (e.g. soil taxonomic units, Hengl et al.,
2014; Heung et al., 2016) or ordinal responses (e.g. soil drainage classes, Campling
et al., 2002). Although GLM are nonlinear models, the nonlinearly transformed
conditional expectation g(E[Y (s)|x(s)]) – g(·) is some known link function – still
depends linearly on covariates.
Lately, tree-based machine learning methods have become popular for DSM:
Classification and regression trees (CART, e.g. Liess et al., 2012; Heung et al.,
2016), Cubist, (e.g. Henderson et al., 2005; Adhikari et al., 2013; Lacoste et al.,
2016) and ensemble tree methods like random forest (RF, e.g. Grimm et al., 2008;
Wiesmeier et al., 2011) and boosted regression trees (BRT, e.g. Moran and Bui,
2002; Martin et al., 2011) were used. All tree-based methods easily account for
complex nonlinear relations between responses and covariates. They model con-
tinuous and categorical responses (albeit without making a difference between
nominal and ordinal responses), inherently deal with incomplete covariate data
and allow to model spatially changing (nonstationary) relationships. BRT and
RF fit models to large sets of covariates. The structure of the fitted models can
be explored by variable importance and partial dependence plots (Hastie et al.,
2009, Sect. 10.9, and Martin et al., 2011, for an application). Nevertheless, tree-
based ensemble methods remain complex, and results are not as easy to interpret
regarding the relevant soil forming factors as results from (G)LM.

9
2. Mapping of soil properties by boosted geoadditive models

Generalized additive models (GAM, e.g. Hastie and Tibshirani, 1990, Chapt. 6)
offer a compromise between ease of interpretation and flexibility in modelling non-
linear relationships. GAM expand the (possibly transformed) conditional expec-
tation of a response given covariates as an additive series
  X
g E[Y (s) | x(s)] = ν + f (x(s)) = ν + fj (xj (s)), (2.1)
j

where ν is a constant and fj (xj (s)) are linear terms or unspecified “smooth” non-
linear functions of single covariates xj (s) (e.g. smoothing spline, kernel or any
other scatterplot smoother) and g(·) is again a link function. GAM extend GLM
to account for truly nonlinear relations between Y and x (and not just for non-
linearities imposed by g), but they limit the complexity of the fitted functions to
additive combinations of simple nonlinear terms and thereby avoid the curse of
dimensionality (Hastie et al., 2009, Sect. 2.5). For continuous, ordinal and nom-
inal responses, GAM can be readily fitted to large sets of covariates by boosting
(Hofner et al., 2014; Hothorn et al., 2015). Boosting handles covariate selection
and avoids over-fitting if stopped early (Bühlmann and Hothorn, 2007). Hence,
the structure of boosted GAM can be more easily checked and interpreted than
RF and BRT models. In the past, GAM have occasionally been used for DSM
and only recently became more popular (e.g. Buchanan et al., 2012; Poggio et al.,
2013; Poggio and Gimona, 2014; de Brogniez et al., 2015; Sindayihebura et al.,
2017).
Besides accurate predictions, sometimes also accurate modelling of prediction
uncertainty matters for DSM (e.g. for mapping temporal changes of soil carbon and
nutrients stocks). Quantile regression forest (Meinshausen, 2006), an extension of
RF, estimates the quantiles of the distributions Y (s)|x(s) and provides prediction
intervals directly. Prediction intervals can also easily be constructed for predictions
by EDK, (G)LM and GAM, as long as the uncertainty arising from model building
is ignored. To take the effect of model building properly into account one resorts
best to bootstrapping (Davison and Hinkley, 1997, Sect. 6.3.3). Bootstrapping is
also useful to model prediction uncertainty for boosted models, which per se do
not qualify the accuracy of predictions, and to account for all sources of prediction
uncertainty of regression kriging approaches (Viscarra Rossel et al., 2014).
In summary, a versatile DSM procedure should
1. model nonlinear relations between Y (s) and x(s), where responses and
covariates may be continous, binary, nominal or ordinal variables,

10
2.2. geoGAM modelling framework

2. efficiently build models with good predictive performance for large sets of
covariates (p >> 30),
3. preferably result in parsimonious models with a simple structure that can be
easily interpreted and checked for plausibility, and
4. accurately quantify the accuracy of predictions computed from the fitted
models.
The objective of our work was to develop a DSM framework that meets require-
ments 1–4 based on boosted geoadditive models (geoGAM), an extension of GAM
for spatial data. First, we introduce the modelling framework and describe in
detail the model building procedure. Second, we use the method in three DSM
case studies in the Canton of Zurich, Switzerland, aiming at different types of re-
sponses: Effective cation exchange capacity (ECEC) of forest topsoils (continuous
response), presence/absence of morphological features for waterlogging in agricul-
tural soils (binary response), and drainage classes, characterizing prevalence of
anoxic conditions, again in agricultural soils (ordinal response). To assess the va-
lidity of the modelling results with independent data (obtained by splitting the
original dataset into calibration and validations subsets), we used specific criteria
that take the nature of the various responses properly into account. These crite-
ria are in common use for forecast verification in atmospheric sciences (e.g. Wilks,
2011), but, to our knowledge, have not been much used for (cross-)validating DSM
predictions.

2.2. geoGAM modelling framework

2.2.1. Model representation

A generalized additive model (GAM) is based on the following components (Hastie


and Tibshirani, 1990, Chapt. 6 and Eq. (2.1)): i) Response distribution: Given
x(s) = (x1 (s), x2 (s),
..., xp (s))T , the Y (s) are conditionally independent observations from simple ex-
ponential family distributions. ii) Link function: g(·) relates the expectation
µ(x(s)) = E[Y (s)|x(s)] of the response distribution to iii) the additive predictor
P
j fj (xj (s)).

geoGAM extend GAM by allowing a more complex form of the additive predictor

11
2. Mapping of soil properties by boosted geoadditive models

(Kneib et al., 2009; Hothorn et al., 2011): First, one can add a smooth function
fs (s) of the spatial coordinates (smooth spatial surface) to the additive predictor
to account for residual autocorrelation. More complex relationships between Y
and x can be modelled by adding terms like fj (xj (s)) · fk (xk (s)) – capturing the
effect of interactions between covariates – and fs (s) · fj (xj (s)) – accounting for
spatially changing dependence between Y and x. Hence, in its full generality, a
generalized additive model for spatial data is represented by

g(µ(x(s))) = ν + f (x(s)) =
X X
ν+ fju (xju (s)) + fjv (xjv (s)) · fkv (xkv (s))
|u v
{z }
global marginal and interaction effects
X
+ fsw (s) · fjw (xjw (s)) + f (s) . (2.2)
| s{z }
|w {z } autocorrelation
nonstationary effects

Kneib et al. (2009) called Eq. (2.2) a geoadditive model, a name coined before by
Kammann and Wand (2003) for a combination of Eq. (2.1) with a geostatistical
error model.
It remains to specify what response distributions and link functions should be
used for the various response types: For (possibly transformed) continuous re-
sponses one uses often a normal response distribution combined with the identity
link g (µ(x(s))) = µ(x(s)). For binary data (coded as 0 and 1), one assumes a
Bernoulli distribution and uses often a logit link
 
µ(x(s))
g (µ(x(s))) = log , (2.3)
1 − µ(x(s))

where
exp(ν + f (x(s)))
µ(x(s)) = Prob[Y (s) = 1 | x(s)] = . (2.4)
1 + exp(ν + f (x(s)))
For ordinal data, with ordered response levels, 1, 2, . . . , k, we used the cumulative
logit or proportional odds model (Tutz, 2012, Sect. 9.1). For any given level
r ∈ (1, 2, . . . , k), the logarithm of the odds of the event Y (s) ≤ r | x(s) is then
modelled by  
Prob[Y (s) ≤ r | x(s))]
log = νr + f (x(s)), (2.5)
Prob[Y (s) > r | x(s))]

12
2.2. geoGAM modelling framework

with νr a sequence of level-specific constants satisfying ν1 ≤ ν2 ≤ . . . ≤ νr . Con-


versely,
exp(νr + f (x(s)))
Prob[Y (s) ≤ r | x(s)] = . (2.6)
1 + exp(νr + f (x(s)))
Note that Prob[Y (s) ≤ r | x(s)] depends on r only through the constant νr . Hence,
the ratio of the odds of two events Y (s) ≤ r | x(s) and Y (s) ≤ r | x̃(s) is the same
for all r (Tutz, 2012, p. 245).

2.2.2. Model building (selection of covariates)

To build parsimonious models that can readily be checked for agreement with
pedological understanding, we applied a sequence of fully automated steps 1–6. In
several of these steps we optimised tuning parameters by 10-fold cross-validation
with fixed subsets using either root mean squared error (RMSE, Eq. (2.12), con-
tinuous responses), Brier score (BS, Eq. (2.16), binary responses) or ranked prob-
ability score (RPS, Eq. (2.18), ordinal responses) as optimisation criteria. Model
building aims to optimise accuracy of predictions, hence we did not use equivalent
”goodness-of-fit” statistics. To improve the stability of the algorithm continuous
covariates were first scaled (by difference of maximum and minimum value) and
centred.

1. Boosting (see step 2 below) is more stable and converges more quickly when
the effects of categorical covariates (factors) are accounted for as model offset.
We therefore used the group lasso (Breheny and Huang, 2015) – an algorithm
that likely excludes nonrelevant covariates and treats factors as groups – to
select important factors for the offset. For ordinal responses (Eq. (2.6)) we
used stepwise proportional odds logistic regression in both directions with
BIC (e.g. Faraway, 2005, p. 126) to select the offset covariates because lasso
cannot be used for such responses.
2. Next, we selected a subset of relevant factors, continuous covariates and spa-
tial effects by componentwise gradient boosting. Boosting is a slow stagewise
additive learning algorithm. It expands f (x(s)) in a set of base procedures
(baselearners) and approximates the additive predictor by a finite sum of
them as follows (Bühlmann and Hothorn, 2007):

a) Initialize fˆ(x(s))[m] by offset of step 1 above and set m = 0.


b) Increase m by 1. Compute the negative gradient vector U[m] (e.g. resid-
uals) for a loss function l(·).

13
2. Mapping of soil properties by boosted geoadditive models

c) Fit all baselearners fj (xj (s)), j = 1, ..., p to U[m] and select the base-
learner, say fk (xk (s))[m] that minimizes l(·).
d) Update fˆ(x(s))[m] = fˆ(x(s))[m−1] + v · fk (xk (s))[m] with step size v ≤ 1.
e) Iterate steps (b) to (d) until m = mstop (main tuning parameter).

We used the following settings in above algorithm: As loss functions l(·) we


used L2 for continuous, negative binomial likelihood for binary (Bühlmann
and Hothorn, 2007) and proportional odds likelihood for ordinal responses
(Schmid et al., 2011). Early stopping of the boosting algorithm was achieved
by determining optimal mstop by cross-validation. We used default step
length (υ = 0.1). This is not a sensitive parameter as long as it is clearly
below 1 (Hofner et al., 2014). For continuous covariates we used penalized
smoothing spline baselearners (Kneib et al., 2009). Factors were treated as
linear baselearners. To capture residual autocorrelation we added a bivari-
ate tensor-product P-spline of spatial coordinates (Wood, 2006, pp. 162) to
the additive predictor. Spatially varying effects were modelled by baselearn-
ers formed by multiplication of continuous covariates with tensor-product
P-splines of spatial coordinates (Wood, 2006, pp. 168). Uneven degree of
freedom of baselearners biases baselearner selection (Hofner et al., 2011).
We therefore penalized each baselearner to 5 degrees of freedom (df ). Fac-
tors with less than 6 levels (df < 5) were aggregated to grouped baselearners.
By using an offset, effects of important factors with more than 6 levels were
implicitly accounted for without penalization.
3. At mstop (see step 2 above), many included baselearners had very small
effects only. We fitted generalized additive models (GAM, Wood, 2011)
and included smooth and factor effects only if their effect size ej of the
corresponding baselearner fj (xj (s)) was substantial. Effect size ej of factors
was the largest difference between effects of two levels and for continuous
covariates it was equal to the maximum contrast of estimated partial effects
(after removal of extreme values as in boxplots, Frigge et al., 1989). We
iterated through ej and excluded covariates with ej smaller than a threshold
effect size et . Optimal et was determined by 10-fold cross-validation of GAM.
In these GAM fits smooth effects were penalized by 5 degrees of freedom as
imposed by componentwise gradient boosting (step 2 above). The factors
selected as offset in step 1 were now included in the GAM.
4. We further reduced the GAM by stepwise removal of covariates by cross-

14
2.2. geoGAM modelling framework

validation. The candidate covariate to drop was chosen by largest p value of


F tests for linear terms and approximate F tests (Wood, 2011) for smooth
terms.
5. Factor levels with similar estimated effects were merged stepwise again by
cross-validation based on largest p values from two sample t-tests of partial
residuals.
6. The final model (used to compute spatial predictions) was a parsimonious
GAM. Because of step 5, factors had possibly a reduced number of coeffi-
cients. Effects of continuous covariates were modelled by smooth functions
and – if at all present – spatially structured residual variation (autocorrela-
tion) was represented by a smooth spatial surface. To avoid over-fitting both
types of smooth effects were penalized by 5 degrees of freedom (as imposed
by step 2).

Model building steps 1 to 6 were implemented in the R package geoGAM (Nussbaum,


2017).

2.2.3. Predictions and predictive distribution

Soil properties were predicted for new locations s+ from the final geoGAM by
Ỹ (s+ ) = fˆ(x(s+ )). To model the predictive distributions for continuous re-
sponses we used a nonparametric, model-based bootstrapping approach (Davison
and Hinkley, 1997, pp. 262, 285) as follows:
A. New values of the response were simulated according to Y (s)∗ = fˆ(x(s)) + ,
where fˆ(x(s)) are the fitted values of the final model and  are errors ran-
domly sampled with replacement from the centred, homoscedastic residuals
of the final model (Wood, 2006, p. 129).
B. The geoGAM was fitted to Y (s)∗ according to steps 1–6 of Sect. 2.2.2.

C. Prediction errors were computed according to δ+ =
fˆ(x(s+ ))∗ − ( fˆ(x(s+ )) +  ) , where fˆ(x(s+ ))∗ are predicted values at new
locations s+ of the model built with the simulated response Y (s)∗ in step
B above, and the errors  are again randomly sampled from the centred,
homoscedastic residuals of the final model (see step A).
Prediction intervals were computed according to

[fˆ(x(s+ )) − δ+
∗ ˆ ∗
(1−α) ; f (x(s+ )) − δ+ (α) ]. (2.7)

15
2. Mapping of soil properties by boosted geoadditive models

ZH forest

ra
Ju Greifensee
t eau
P la
Alp s

0 50 100 km

Data sources: Biogeographical regions © 2001 BAFU / Swiss Boundary, Lakes ©


2012 BFS GEOSTAT / Boundries Europe: NUTS © 2010 EuroGeographics

Figure 2.1.: Location of the study regions Greifensee and ZH forest on the Swiss
Plateau.

∗ ∗ ∗
where δ+ (α) and δ+ (1−α) are the α- and (1 − α)-quantiles of δ+ , pooled over all
1000 bootstrap repetitions.
Predictive distributions for binary and ordinal responses were directly obtained
from the final geoGAM fit by predicting probabilities of occurrence Prob(Y
] (s) =
r | x(s)) (Davison and Hinkley, 1997, p. 358).

2.3. Case studies - Materials and Methods

2.3.1. Study regions

We applied the modelling framework to 3 datasets on properties of forest and


agricultural soils in the Canton of Zurich in Switzerland (Fig. 2.1). Forests (ZH
forest), as defined by the Swiss topographic landscape model (swissTLM3D, Swis-
stopo, 2013a), cover an area of 506.5 km2 , or roughly 30 % of the total area of
the Canton of Zurich. The spatial extent of the agricultural region near the Lake
Greifensee was defined by the availability of imaging spectroscopy data collected
by the APEX spectrometer (Schaepman et al., 2015). Agricultural land was de-
fined as the area not covered by any areal features such as settlements or forests
extracted from the Swiss topographic landscape model (swissTLM3D, Swisstopo,
2013a). Wetlands, forests, parks or city gardens were excluded, resulting in a study
region of 170 km2 .
In the Canton of Zurich, forests extend across altitudes ranging from 340 to

16
2.3. Case studies - Materials and Methods

1170 m above sea level (a.s.l), and in the Greifensee area elevation ranges from
390 to 840 m a.s.l. (Swisstopo, 2016a). The climatic conditions (period 1961–
1990, Zimmermann and Kienast, 1999) vary accordingly, with mean annual rainfall
between 880–1780 mm for the forested and 1040–1590 mm for the agricultural
study region. Mean annual temperatures range between 6.1–9.1 ◦ C and 7.5–9.1 ◦ C,
respectively. Two thirds of the forested area is dominated by coniferous trees
(FSO, 2000b). Half of the Greifensee study region is covered by crop land and one
third by permanent grassland. The remainder are orchards, horticultural areas
or mountain pastures (Hotz et al., 2005). In the Canton of Zurich, soils formed
mostly from Molasse formations and quaternary sediments dominantly from the
last glaciation (Würm). In the north-eastern part, the limestone Jura hills reach
into the ZH forest study region (Hantke, 1967).

2.3.2. Data

Soil data base

We used legacy soil data collected between 1985 and 2014. Data originates from
long-term soil monitoring of the Canton of Zurich (KaBo), a soil pollutant survey
(Wegelin, 1989), field surveys for creating soil maps of the agricultural land (Jäggli
et al., 1998) or soil investigations in the course of different projects by the Swiss
Federal Institute for Forest, Snow and Landscape Research (WSL, Walthert et al.,
2004). Sites for pollutant surveying were chosen on a regular grid, those for creating
soil maps were determined by purposive sampling (Webster and Lark, 2013, p. 86)
by field surveyors to best represent soils typical for the given landform. The sites of
WSL were chosen by purposive sampling according to the aims of the project. Soil
data was therefore quite heterogeneous, and tailored harmonisation procedures
were required to provide consistent soil datasets. The heterogeneity resulted from
several standards of soil description and soil classification, different data keys,
different analytical methods and in particular, often missing metadata for a proper
interpretation of the datasets. Therefore, we elaborated a general harmonisation
scheme that covers main steps required to merge different soil legacy data into one
common consistent database (Walthert et al., 2016). Sampling sites were recorded
in the field on topographic maps (scale 1:25 000), hence we estimated accuracy of
coordinates to about ± 25 m.

17
2. Mapping of soil properties by boosted geoadditive models

Effective cation exchange capacity (ECEC, forest soils)

After the removal of sites with missing covariate values, we used 1844 topsoil
samples from 1348 sites with data on effective cation exchange capacity (ECEC).
Most measurements refer to composite samples where aliquots were measured in
20 by 20 m squares from 0–20 cm soil depth. For about 100 sites soil profiles
genetic horizons were sampled. ECEC [mmolc kg−1 ] for 0–20 cm was computed
from horizon data by
h
X
ECEC0−20 = wi ECECi , (2.8)
i=1

where ECECi is the value for horizon i, wi is a weight given by soil density ρi and
the fraction of the thickness of horizon i within 0–20 cm and h is the number of
horizons intersecting the 0–20 cm depth. ρi was estimated from soil organic matter
(SOM) and/or sampling depth by a pedotransfer function (Appendix, Table A.4).
Due to a lack of respective data, the volumetric stone content was assumed to be
constant.
For most soil samples, ECEC was measured after extraction in an ammonium
chloride solution (FAC, 1989; Walthert et al., 2004, 2013). Roughly 5 % of the
samples had only measurements of Ca, Mg, K and Al (extracted by ammonium
acetate EDTA solution, Lakanen and Erviö, 1971; ELF, 1996; Gasser et al., 2011).
For these samples, we estimated ECEC by using a PTF (Nussbaum and Papritz,
2015).
We assigned 293 of 1348 sites (528 samples) to the validation set, which was
used to check the predictive performance of the fitted statistical model, and the
remaining 1055 sites (1316 samples) were used to calibrate the model. The legacy
samples were spatially clustered. To ensure that the validation sites were evenly
spread over the study region, the validation sites were selected by weighted random
sampling. The weight attributed to a site was proportional to the forested area
within its Dirichlet polygon (Dirichlet, 1850).
We found a considerable variation in ECEC values ranging from 17.4 to 780
mmolc kg−1 (median 141.1 mmolc kg−1 , Appendix, Table A.1). On average, ECEC
was slightly larger in the calibration than in the validation set.

18
2.3. Case studies - Materials and Methods

Presence of waterlogged soil horizons (agricultural soils)

Waterlogging characteristics were recorded in the field at 962 sites within the
Greifensee study region by visual evaluation (Jäggli et al., 1998). Swiss soil clas-
sification distinguishes horizon qualifiers gg (strongly gleyic, predominantly oxi-
dized) and r (anoxic, predominantly reduced) and both are believed to limit plant
growth (Jäggli et al., 1998; Müller et al., 2007; Litz, 1998; Danner et al., 2003;
Kreuzwieser and Rennberg, 2014).
We constructed binary responses for three soil depths 0–30 cm, 0–50 cm and
0–100 cm. If one of the horizon qualifiers gg or r was recorded within the interval,
we assigned 1 = presence of waterlogged horizons and 0 = absence of waterlogged
soil horizons otherwise.
We chose 198 of 962 sites to form a validation set, again by using weighted
random sampling. The remaining 764 sites were used to build and fit the models.
In the topsoil (0–30 cm) gg or r horizon qualifiers were only observed at 13.4 % of
the 962 sites. Down to 50 cm about twice as many sites (25.9 %) showed signs of
anoxic conditions and down to 1 m already 38.6 % of sites featured an anoxic or
gleyic horizon (Appendix, Table A.2).

Drainage classes (agricultural soils)

Swiss soil classification differentiates hydromorphic features of soils in more detail,


describing the degree, depth and source of waterlogging by 3 supplementary qual-
ifiers for stagnic, gleyic or anoxic profiles (I, G, R; categorical attributes, Brunner
et al., 1997). To reduce complexity of classification, we aggregated these qualifiers
to three ordered levels well drained (qualifiers I1–I2, G1–G3, R1 or no hydromor-
phic qualifier), moderately well drained (I3–I4, G4) and poorly drained (G5–G6,
R2–R5).
For validation we used the same 198 sites as for presence of waterlogged soil
horizons, but only 732 sites were used for model building due to missing supple-
mentary qualifiers. The majority (66.6 %) of the 930 sites were well drained, only
12.7 % were classified as moderately well drained and 20.7 % as poorly drained
(Appendix, Table A.3).

19
2. Mapping of soil properties by boosted geoadditive models

Covariates for statistical modelling

To represent local soil formation conditions, we used data from 23 sources (Ta-
ble 2.1). For ECEC a total of 333 covariates were used to describe climatic (71
covariates) and topographic conditions (196 covariates). For the agricultural land,
we used in addition 180 spectral bands of the APEX spectrometer, spatial infor-
mation on historic wetlands and agricultural drainage networks resulting in 498
covariates in total.

2.3.3. Statistical analysis

We built models for the five responses according to Sect. 2.2.2 and computed
predictions for new locations at nodes of a 20 m-grid. Predictions were post-
processed in the following way:

Response transformation

ECEC data in 0–20 cm soil depth was positively skewed (Appendix, Table A.1),
hence we fitted the model to the log-transformed data. In full analogy to lognormal
kriging (Cressie, 2006, Eq. (20)), the predictions were backtransformed by
 
ˆ 1 2 1 ˆ
E[Y (s+ ) | x] = exp f (x(s+ )) + σ̂ − Var[f (x(s+ )] (2.9)
2 2

with fˆ(x(s+ )) being the prediction of the log-transformed response, σ̂ 2 the esti-
mated residual variance of the final geoGAM fit and Var[fˆ(x(s+ )] the variance of
fˆ(x(s+ )) as provided again by the final geoGAM. Limits of prediction intervals
were backtransformed by exp(·) as they are quantiles of the predictive distribu-
tions.

Conversion of probabilistic to categorical predictions

For binary and ordinal responses, Eq. (2.4) and (2.6) predict probabilities of the
respective response levels. To predict the “most likely” outcome one has to apply
a threshold to these probabilities. For binary data we predicted presence of water-
logged horizons if the probability exceeded the optimal value of the Gilbert skill
score (GSS, Sect. 2.3.3) that discriminated presence and absence of waterlogged
horizons best in cross-validation of the final geoGAM. GSS was selected because

20
2.3. Case studies - Materials and Methods

Table 2.1.: Overview of geodata and derived covariates, for more information see Appendix,
Table A.5 (r: pixel resolution for raster datasets or scale for vector datasets, a: only available for
study region Greifensee (Gr) or ZH forest (Zf), NDVI: normalized differenced vegetation index,
TPI: topographic position index, TWI: topographic wetness index, MRVBF: multi-resolution
valley bottom flatness).
geodata set r a covariate examples

Soil physiographical units,


historic wetland presence,
Soil overview map (FSO, 2000a) 1:200 000
presence of drainage networks
Wetlands Wild maps (ALN, 2002) 1:50 000 Gr or soil ameliorations
Wetlands Siegfried maps (Wüst-Galley et al., 2015) 1:25 000 Gr
Anthropogenic soil interventions (AWEL, 2012) 1:5 000 Gr
Drainage networks (ALN, 2014b) 1:5 000 Gr

Parent material (aggregated) geological units,


ice level during last
Last Glacial Maximum (Swisstopo, 2009) 1:500 000
glaciation, information on
Geotechnical map (BFS, 2001) 1:200 000 aquifers
Geological map (ALN, 2014a) 1:50 000
Groundwater occurrence (AWEL, 2014) 1:25 000 Gr

Climate mean annual/monthly


temperature, precipitation,
MeteoSwiss 1961–1990 (Zimmermann and Kienast, 25/100 m
radiation, degree days, NH3
1999)
concentration in air
MeteoTest 1975–2010 (Remund et al., 2011) 250 m
Air pollutants (BAFU, 2011) 500 m Zf
NO2 immissions (AWEL, 2015) 100 m Gr

Vegetation band ratios, NDVI, 180


hyperspectral bands,
Landsat7 scene (USGS EROS, 2013) 30 m
aggregated vegetation units,
DMC mosaic (DMC, 2015) 22 m canopy height
SPOT5 mosaic (Mathys and Kellenberger, 2009) 10 m Zf
APEX spectrometer mosaics (Schaepman et al., 2015) 2m Gr
Share of coniferous trees (FSO, 2000b) 25 m Zf
Vegetation map (Schmider et al., 1993) 1:5 000 Zf
Species composition data (Brassel and Lischke, 2001) 25 m Zf
Digital surface model (Swisstopo, 2011) 2m Zf

Topography slope, curvature, northness,


TPI, TWI, MRVBF (various
Digital elevation model (Swisstopo, 2011) 25 m
radii/resolutions)
Digital terrain model (Swisstopo, 2013b) 2m

21
2. Mapping of soil properties by boosted geoadditive models

absence of waterlogged horizons was more common than presence, especially in


topsoil. To ensure consistency of maps for sequential soil depths we assigned
presence of waterlogged horizons to the lower depth if it was predicted for the
depth above.
For ordinal responses we predicted the level to which the median of the proba-
bility distribution Prob(Y
] (s) ≤ r|x(s)) was assigned (Tutz, 2012, p. 475).

Evaluating the predictive performance of the statistical models

The predictive performance of the geoGAM, fitted for the continuous response
ECEC, was tested by comparing predictions Ỹ (si ) (Eq. (2.9)) with measurements
Y (si ) of independent validation sets. Marginal bias and overall accuracy were
assessed by
n
1X
BIAS = − (Y (si ) − Ỹ (si )), (2.10)
n i=1
 
robBIAS = −median1≤i≤n Y (si ) − Ỹ (si ) , (2.11)
2 1/2
n
!
1X
RMSE = Y (si ) − Ỹ (si ) , (2.12)
n i=1
 
robRMSE = MAD1≤i≤n Y (si ) − Ỹ (si ) , (2.13)
Pn  2
i=1 Y (si ) − Ỹ (si )
SSmse = 1 − Pn Pn 2 , (2.14)
1
i=1 Y (s i ) − n i=1 Y (si )

where MAD is the median absolute deviation. SSmse was defined as mean squared
error skill score (Wilks, 2011, p. 359) with the sample mean of the measurements
as reference prediction method. Interpretation is similar to R2 with SSmse = 1
for perfect predictions and SSmse = 0 for zero explained variance. SSmse becomes
negative if the root mean squared error (RMSE) exceeds the standard deviation
of the data. To validate the accuracy of the bootstrapped predictive distributions
we plotted the empirical distribution function of the probability integral transform
(Wilks, 2011, p. 375), which is equivalent to a plot of the coverage of one-sided
prediction intervals (0, q̃α (s)) against the nominal probabilities α used to construct
the quantiles q̃α (s).

22
2.3. Case studies - Materials and Methods

For binary responses the predictive performance of the fitted geoGAM was eval-
uated with independent validation data by the Brier skill score (BSS, Wilks, 2011,
Eq. (8.37))
BS
BSS = 1 − (2.15)
BSref
where the Brier score (BS) is computed by

n
1X
BS = (yi − oi )2 (2.16)
n i=1

where n is the number of sites, yi = Prob[Y


] (si ) = 1 | x(si )] are the predicted
probabilities and oi = I(Y (si ) = 1) the observation. BSref is the BS of a reference
prediction where always the more abundant level (absence of waterlogged hori-
zons) is predicted. After transforming the predicted probabilities to the binary
levels presence or absence of waterlogged horizons (Sect. 2.3.3) we further evalu-
ated the bias ratio, Peirce skill score (PSS) and GSS. Bias ratio is the ratio of the
number of presence predictions to the number of presence observations (Wilks,
2011, Eq. (8.10)). PSS is a skill score based on the proportion of correct presence
and absence predictions where the reference predictions are purely random pre-
dictions that are constrained to be unbiased (Wilks, 2011, Eq. (8.16)). GSS is a
skill score that uses the threat score as accuracy measure (Wilks, 2011, Eq. (8.18))
and again random predictions as reference. Perfect predictions have PSS and GSS
equal to 1, for random predictions the scores are equal to 0 and predictions worse
than the reference receive negative scores. PSS is truly and GSS asymptotically
equitable, meaning that purely random and constant predictions get the same
scores (see Wilks, 2011, p. 316 and 321 for details).
For the ordinal response drainage classes we tested the fitted geoGAM by eval-
uating the ranked probability skill score (RPSS), computed for the independent
validation data analogously to BSS by

RPS
RPSS = 1 − (2.17)
RPSref

where RPS is the ranked probability score (RPS, Wilks, 2011, Eq. (8.52)) given
by

n X
X J
RPS = (Yi,j − Oi,j )2 (2.18)
i=1 j=1

23
2. Mapping of soil properties by boosted geoadditive models

with Yi,j = Prob[Y


] (si ) ≤ j | x(si )] being the predicted cumulative probabilities up
to class j and Oi,j = jr=1 I(Y (si ) = r) indicating observed absence (0) or presence
P

(1) up to class j. RPSref is the RPS for a reference that predicts always the most
abundant class (well drained). For predictions of the ordinal outcomes (Sect. 2.3.3)
we also computed mean bias ratio from three bias ratios created analogously to
the binary case. These two-class settings were achieved by stepwise aggregation
of two out of three classes (well vs. moderately well or poorly drained, then well
or moderately well vs. poorly drained, Wilks, 2011, p. 319). In addition, PSS was
computed in its general form (Wilks, 2011, p. 319) together with the Gerrity score
(GS), which applies weights to the joint distribution of predicted and observed
classes to consider their ordering and frequency (Wilks, 2011, p. 322).

Software

Terrain attributes were computed by ArcGIS (version 10.2, ESRI, 2010) and SAGA
2.1.4 (version 2.1.4, Conrad et al., 2015). All statistical computations were done
in R (version 3.2.2, R Core Team, 2016) using several add-on packages, in par-
ticular grpreg for group lasso (version 2.8-1, Breheny and Huang, 2015), MASS
for proportional odds logit regression (version 7.3-43, Venables and Ripley, 2002),
mboost for componentwise gradient boosting (version 2.5-0, Hothorn et al., 2015),
mgcv for geoadditive model fits (version 1.8-6, Wood, 2011), raster for spatial
data processing (version 2.4-15, Hijmans et al., 2015) and geoGAM for the model
building routine (version 0.1-2, Nussbaum, 2017).

24
2.4. Results

2.4. Results

2.4.1. ECEC – case study 1

Models for ECEC in 0–20 cm depth

Figure 2.2 shows the change of RMSE during model building (10-fold cross-
validation). The small root mean squared error (RMSE) of 0.428 log mmolc kg−1
after the gradient boosting step – with coefficients shrunken by the algorithm –
could further be reduced (cross-validation RMSE 0.422 log mmolc kg−1 ) by re-
moving covariates and by factor aggregation. Aggregating factor levels resembles
shrinking of coefficients of such covariates.
Starting with 333 covariates model building successfully reduced the number of
covariates in the model to 17. The remaining ones characterized geology, vege-
tation and topography (Table 2.2). Effective cation exchange capacity (ECEC)
depended nonlinearly on nearly all continuous covariates, but nonlinearities were
in general rather weak (Appendix, Fig. A.1). No fs (s) term was included in the
model, because residual autocorrelation was very weak (Appendix, Fig. A.2). In-
cluding nonstationary effects in the model would have improved the model only
slightly (cross-validation RMSE 0.406 log mmolc kg−1 ), but would have added con-
siderable complexity to the final model (21 covariates including 8 interactions with
fs (s) terms).
cross validation RMSE [log mmolc kg−1]
0.46


0.45
0.44


0.43


0.42

step 1: step 2: step 3: step 4: step 5:


group gradient full reduced geoGAM
lasso boosting geoGAM geoGAM factors
aggregated

Figure 2.2.: Change of cross-validation root mean squared error (RMSE) in steps
1–5 of model building procedure (Sect. 2.2.2).

25
2. Mapping of soil properties by boosted geoadditive models

Table 2.2.: Covariates contained in final geoGAM for responses ECEC, presence of waterlogged
horizons and drainage classes. More details on covariate effects can be found in Fig. A.1 and
A.4 to A.7 in Appendix (p: number of covariates, SD: standard deviation in a moving window,
RAD: radius of moving window or parameter of terrain attribute algorithm, r: resolution of
elevation model, TPI: topographic position index, TWI: topographic wetness index, MRVBF:
multiresolution valley bottom flatness).
ECEC 0–20 cm presence of waterlogged horizons down to drainage
30 cm 50 cm 100 cm class

p 17 7 12 14 11

Legacy correction factor


soil data

Geology, distance to historic wetlands historic wetlands, historic wet- historic wetlands,
land use moraines, aquifer drainage systems lands, drainage drainage systems
map, overview soil map systems map, map, aquifer map
map, geological anthropogenic
map, geotechnical soil disturbance,
map extent last glacia-
tion, geological
map

Climate — global radiation global radiation dew point temper- precipitation


(r: 250 m), pre- (r: 250 m), pre- ature (r: 250 m) (r: 250 m)
cipitation (r: 250 cipitation (r: 100
m) m)

Vegetation SPOT5 vegetation — DMC green band — DMC green band


index (r: 10 m), (r: 22 m) (r: 22 m)
vegetation map

TopographySD slope (RAD: curvature (r: 25 SD elevation SD elevation SD elevation


20 m, r: 2 m), m), smooth east- (RAD: 3.6 km, r: (RAD: 3.6 km, (RAD: 3.6 km,
smooth northness ness (RAD: 3.6 25 m), SD slope r: 25 m), smooth r: 25 m), terrain
(RAD: 10 m, r: 2 km, r: 25 m), (RAD 50 m, r: curvature (RAD: texture (RAD:
m), ruggedness roughness (RAD: 2 m), smooth 120 m, r: 2m), 60 m, 2 m), TPI
(RAD: 225 m, 50 m, r: 2 m), curvature (RAD: smooth eastness (RAD: 300 m
r: 25 m), surface negative openness 120 m, r: 2 m), (RAD: 3.6 km, and 50 m, r: 2
convexity (RAD: (RAD: 1 km, r: 2 negative openness r: 25 m), con- m), smooth TWI
450 m, r: 25 m), m) (RAD: 1 km, vergence index (RAD: 14 m, r: 2
negative openness r: 25 m), TPI (RAD: 250 m, r: m)
(RAD: 2 km, r: (RAD: 50 m, r: 2 25 m), terrain tex-
25 m), vertical m), smooth TWI ture (RAD: 60 m,
distance to rivers (RAD 14 m, r: 2 r: 2 m), horizontal
(r: 25 m) m), MRVBF (r: distance to rivers
25 m) (r: 25 m), TWI
(RAD: 14 m, r: 2
m), MRVBF (25
m)

26
2.4. Results

n = 528
500 ●● ●

observed ECEC [mmol c kg−1]


●● ●●●● ●
●●●●●

●●●

● ●
● ●●

●●●● ● ● ● ●●
●●
●●●●
●●



●●


●●●●

●●●
● ●

● ● ●● ●●
●● ●

●●● ●
● ●
● ● ●
● ●● ● ● ●●


●●
● ● ● ●
200 ● ●● ●

●●





● ●●●●
●●●●●
●●
● ● ●
● ●
●●●
●●
● ●
●● ● ●●
●●





●●● ●

●●●●●●●● ●
●● ●●





●●

●●

●●
● ●
●●●
●●



●●
●●●
● ●●● ● ●
● ●●●
●● ●

●●
● ●●●
●●● ●
●●● ●●
● ●● ●●●
100 ●
●●●
●●
●● ●●●●
●● ●


● ●
●●


● ●


● ● ● ●
●● ●●●● ●
●●
● ● ●●
●● ●●
● ● ● ●
●●●● ●
●●●●●


●●●●



● ● ● ● ● ●

●●●
● ●
●●●●
● ●●

●●●●● ●●● ●

●●
●●

●●●
●●
●●● ●●


●●● ● ●●
50 ● ●●
●●

●●● ●●
●● ●● ●
● ●
●●● ● ●● ● ●●●

● ●

20

20 50 100 200 500

predicted ECEC [mmol c kg−1]

Figure 2.3.: Scatter plot of measured against predicted ECEC in 0–20 cm mineral
soil depth, computed with geoGAM (Sect. 2.4.1) for the sites of the validation set
(solid line: loess scatter plot smoother).

Validation of predicted ECEC with independent data

Predictive performance, as evaluated at 293 independent validation sites, was sat-


isfactory. Figure 2.3 shows for the validation set measured ECEC in 0–20 cm
plotted against the predictions. The solid line of the loess scatterplot smoother
(Cleveland, 1979) is close to the 1:1 line indicating absence of conditional bias.
This was confirmed by small marginal BIAS measures (Table 2.3). BIAS2 -to-MSE
ratio was small for both log-transformed and original data (1.2 and 0.7 %, re-
spectively). robRMSE (0.411 log mmolc kg−1 ) was somewhat smaller than RMSE
(0.471 log mmolc kg−1 ) indicating that a few outlying ECEC observations were not
particularly well predicted. RMSE computed with backtransformed predictions of
the validation set (74.9 mmolc kg−1 ) was also larger than its robust counterpart
robRMSE (55.3 mmolc kg−1 ). Judged on SSmse calculated for the independent vali-
dation data, the model explained about 40 % of the variance of the log-transformed
and 37 % of the variance of the original data (Table 2.3).

Table 2.3.: Validation statistics for (a) log-transformed and (b) backtransformed
ECEC 0–20 cm [mmolc kg−1 ] calculated for 528 samples (293 sites) of the indepen-
dent validation set (definition of statistics see Sect. 2.3.3).

BIAS robBIAS RMSE robRMSE SSmse

(a) 0.052 0.006 0.471 0.411 0.407


(b) 6.3 8.9 74.9 55.3 0.365

27
2. Mapping of soil properties by boosted geoadditive models

1.0

0.8

coverage probabilities
0.6

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

nominal probabilities

Figure 2.4.: Coverage of one-sided bootstrapped prediction intervals (0,q̃α (s))


for 528 ECEC validation samples, plotted against nominal probability α used to
construct the upper limit qα of the prediction intervals (Vertical lines mark the 5
and 95 % probabilities).

Figure 2.4 shows somewhat too large coverage for quantiles in the lower tails of
the predictive distributions, hence the extent of lower tails of bootstrapped predic-
tive distributions was underestimated. Upper tails of the predictive distributions
were modelled accurately as the coverage was close to the nominal probability
there. The coverage of symmetric 90 %-prediction intervals was again too small
(84.1 %) because the lower tails were too short. The median width of 90 %-
prediction intervals was equal to 201.8 mmolc kg−1 , demonstrating that prediction
uncertainty remained substantial, in spite of SSmse of nearly 40 %.

Mapping ECEC for ZH forest topsoils

Predictions of ECEC were computed by the final geoGAM for the nodes of a 20 m-
grid (Fig. 2.5). 44 % of the mapped topsoil has large to very large ECEC values.
In contrast, 13 % (∼66 km2 ) of the forest topsoils in the study region are acidic
with ECEC below 100 mmolc kg−1 . These soils are mostly found in the northern
part of the Canton of Zurich. The spatial pattern of the width of 90 %-prediction
intervals (Appendix, Fig. A.3) and of the mean predictions (Fig. 2.5) was very
similar (Pearson correlation = 0.981), which follows from the lognormal model
that we adopted for this response.

28
2.4. Results

ECEC 0-20 cm [mmolc kg-1]


extremley small < 25
very small 25-50
small 51-100
medium 101-200
large 201-300
very large 301-500
extremley large > 500

Winterthur

calibration sites (1055)


validation sites (293)

lake

no forest
Zurich

Uster

0 5 10 15 km

Data sources:
Soil sampling locations © 2013 FABO
Canton of Zurich (TID 22742)
Lakes: swissTLM3D © 2013 swisstopo
Relief: DHM25 © 2012 swisstopo
Reproduced with the authorisation
of swisstopo (JA100120 / D100042)

Figure 2.5.: geoGAM predictions of effective cation exchange capacity (ECEC) in 0–20 cm
depth of the mineral soil of forests in the Canton of Zurich, Switzerland (computed on a 20 m-
grid with final geoGAM with covariates according to Table 2.2. Black dots are locations used for
geoGAM calibration, locations with red triangles were used for model validation, ECEC legend
classes according to Walthert et al., 2004).

2.4.2. Presence of waterlogged soil horizons – case study 2

Models for presence of waterlogged horizons

Not surprisingly, the models for presence of waterlogged horizons in the three soil
depths contained similar covariates, characterizing mostly wet soil conditions such
as historic wetland maps, a map of agricultural drainage systems or several climatic
covariates (Table 2.2). The same terrain attributes were repeatedly chosen for the
three depths (Appendix, Figs. A.4 to A.6). For all three depths model selection
resulted in parsimonious sets of only 7 to 14 covariates chosen from a total of 498
covariates. The Brier skill score (BSS), computed using 10-fold cross-validation,
increased from 0.350 for the 0–30 cm depth to 0.704 for the 0–100 cm depth
suggesting that presence of waterlogged horizons can be better modelled when
they occur more frequently. Degree of residual spatial autocorrelation on logit-

29
2. Mapping of soil properties by boosted geoadditive models

Table 2.4.: Observed occurrence of waterlogged horizons at three soil depths against predictions
by geoGAM for the 198 sites of the validation set. Waterlogged soil horizons were predicted to
be present if prediction probabilities were larger than an optimal threshold (30 cm: 0.22, 50
cm: 0.35, 100 cm: 0.51) found by cross-validation with GSS as criteria (#: number of sites per
response level, BSS: Brier skill score, bias: bias ratio, PSS: Peirce skill score, GSS: Gilbert Skill
score).

waterlogged # observed BSS bias PSS GSS


down to # predicted present absent

30 cm present 16 27 0.312 1.720 0.484 0.227


absent 9 146

50 cm present 28 25 0.448 1.152 0.444 0.267


absent 18 127

100 cm present 43 22 0.526 1.000 0.496 0.330


absent 22 111

scale was stronger in the 0–30 cm than in 0–100 cm depth (Appendix, Fig. A.2)
confirming that the model performed better for the 0–100 cm depth. Adding a
fs (s) term did not improve cross-validated BSS (30 cm: 0.332, 100 cm: 0.688),
meaning that a penalized tensor product of spatial coordinates was too smooth to
capture short range autocorrelation.

Validation of predicted presence of waterlogged horizons with independent


data

Table 2.4 reports contingency tables for predicted outcomes for presence of water-
logged horizons at 198 sites of the validation set. BSS and bias ratio improved
again from the 0–30 cm to the 0–100 cm depth. In 0–30 cm depth presence of
waterlogged horizons were clearly and down to 50 cm slightly over-predicted while
down to 100 cm there was no bias. Performance evaluated by percentage correct
with the Peirce skill score (PSS) was similar for all three depths (correct predic-
tions being 44 to 50 % more frequent compared to random predictions). Ignoring
correct absence predictions in Gilbert skill score (GSS), the model predicted the
correct level 20–30 % more often than a random prediction scheme. Again, GSS
increased with depth and larger chance of waterlogging occurring.

30
2.4. Results

Mapping of presence of waterlogged horizons

Presence of waterlogged horizons in 0–30 cm was predicted for 13.8 % of the area
of study region Greifensee (Fig. 2.6). For 0–50 cm this share increased to 27.3 %
and in nearly 40 % of the soils waterlogged horizons were present in 0–100 cm.
Waterlogged horizons were mapped in upper soil depths mainly on the larger plains
to the North and South of Lake Greifensee. Deeper horizons had waterlogging
present mostly in local depressions and comparably smaller valley bottoms in the
hilly uplands to the South of the study region.

2.4.3. Drainge classes – case study 3

Model for drainage classes

The models for the ordinal drainage class data contained about the same covariates
as the models for presence of waterlogged horizons (Table 2.2). Most covariates
had only very weak nonlinear effects (Appendix, Fig. A.7). Residual spatial auto-
correlation was very weak with a short range (Appendix, Fig. A.2) suggesting that
the variation was well captured by the geoGAM. 10-fold cross-validation resulted
in a ranked probability skill score (RPSS) of 0.588.

Table 2.5.: Frequency of drainage class levels and predictions of respective outcomes by geoGAM
for the 198 sites of the validation set (#: number of sites per response level, RPSS: ranked
probability skill score, bias: mean bias ratio, PSS: Peirce skill score, GS: Gerrity score for
ordered responses).

# observed RPSS bias PSS GS


well moderately poorly
# predicted drained well drained drained

well drained 129 9 9 0.458 0.985 0.477 0.523


moderately well drained 9 9 3
poorly drained 8 5 17

31
2. Mapping of soil properties by boosted geoadditive models

(a) soil depth: 0-30 cm Waterlogged


soil horizon
absent
present

lake

Zurich

calibration
sites (764)
validation
sites (198)

(b) soil depth: 0-50 cm

Zurich

(c) soil depth: 0-100 cm

Figure 2.6: geoGAM predictions of


presence of waterlogged horizons be-
tween surface and 3 soil depths (a: 0–
Zurich
30, b: 0–50, c: 0–100 cm) for the agri-
cultural land in the Greifensee study re-
gion (computed on a 20 m-grid with fi-
nal geoGAM with covariates according
to Table 2.2, smoothed for better dis-
play with focal mean with radius of 3 0 5 10 km

pixels = 60 m). Black dots in panel (a) Data sources:


are locations used for geoGAM calibra- Soil sampling locations © 2013
FABO Canton of Zurich (TID 22742)
tion, locations with red triangles were Lakes: swissTLM3D © 2013 swisstopo
Relief: DHM25 © 2012 swisstopo, reproduced with
used for model validation. the authorisation of swisstopo (JA100120 / D100042)

32
2.4. Results

calibration sites (732) drainage class


validation sites (198) well drained
lake moderately
well drained
poorly
drained

Zurich

Gr
eif
en
se
e
0 5 10 km

Data sources: La
ke
Soil sampling locations © of
Zu
2013 FABO Canton of Zurich ric
h
(TID 22742)
Lakes: swissTLM3D © 2013 swisstopo
Relief: DHM25 © 2012 swisstopo
Reproduced with the authorisation of swisstopo (JA100120 / D100042)

Figure 2.7.: geoGAM predictions of drainage classes for the agricultural land
in the Greifensee study region (computed on a 20 m-grid with final geoGAM with
covariates according to Table 2.2, smoothed for better display with focal mean with
radius of 3 pixels = 60 m). Black dots are locations used for geoGAM calibration,
locations with red triangles were used for model validation.

Validation of predicted drainage classes with independent data

Table 2.5 reports the number of correctly classified and misclassified drainage class
predictions for the validation set. False predictions were equally distributed above
and below the diagonal, hence predictions were unbiased with a mean bias ratio
close to 1. Distinguishing moderately well drained soils from the other two classes
remained difficult as this class had been seldom observed. Overall, the model
accuracy was satisfactory, with RPSS of 0.458 being only slightly smaller than
cross-validation RPSS. Hence, the geoGAM was clearly better than predicting
always the most abundant class well drained. Measured by PSS and Gerrity score
(GS), the geoGAM was better than random predictions at every second site, for
which predictions were computed.

Mapping of drainage classes

Drainage classes were again predicted using a 20 m-grid (Fig. 2.7). 73.2 % of
the area of the Greifensee region had well drained soils. Poorly drained soils were
predicted for only 15.6 % of the area. The location of poorly drained soils coincides

33
2. Mapping of soil properties by boosted geoadditive models

with presence of waterlogged horizons in the topsoil (0–30 cm, panel [a] in Fig. 2.6).
The largest contiguous area of poorly drained soils was predicted on accumulation
plains at the lake inflow to the South of Lake Greifensee. The sites misclassified
had TPI values indicating local depressions and had larger erosion accumulation
potential (MRVBF) compared to correctly classified sites, thus predicting correct
drainage classes in valley bottoms seems more difficult. Misclassified sites of the
validation set had on average slightly larger clay and soil organic carbon contents
in topsoil.

2.5. Discussion

2.5.1. Model building and covariate selection

The model building procedure efficiently selected for all responses parsimonious
models with p ≤ 17 covariates. This corresponds to only 5.8 % of the covariates
considered for the effective cation exchange capacity (ECEC) modelling and to
1.4–2.8 % for modelling the binary and ordinal responses describing waterlogging.
The procedure was able to select meaningful covariates, which reveal the influ-
ence of soil forming factors on the response variable, without any prior knowledge
about the importance of a particular covariate. No pre-processing of covariates
was necessary, e.g. such as reducing the dimensionality of the covariate set to
deal with multi-collinearity. Especially for terrain covariates this is important.
Elevation data are often available in several resolutions, and various algorithms
can be used to calculate e.g. curvature or topographic wetness indices (TWI),
which all likely produce slightly different results. In addition, radii for computing
e.g. topographic position indices (TPI) have to be specified, and it is often not
a priori clear how these should be chosen (Behrens et al., 2010a; Miller et al.,
2015). Therefore, different algorithms and a range of parameter values are used to
create terrain covariates and the model building process selects the most suitable
covariate to model a particular soil property. Meanwhile, none of the 180 APEX
bands available for the Greifensee region was chosen for the final models. Most
likely, meaningful preprocessing – e.g. based on bare soil areas – could improve
the usefulness of such covariates (Diek et al., 2016). Since we used continuous
reflectance signals, including vegetated and sparsely vegetated areas, the remotely
sensed signal might not have expressed too well direct relationships to actual soil
properties.

34
2.5. Discussion

2.5.2. Model structure

Parsimonious models lend themselves to a verification of fitted effects from a pedo-


logical perspective. Yet, due to multi-collinearities in the covariate set, effects of
selected covariates could be substituted by effects of other covariates (Behrens
et al., 2014).
Although Johnson et al. (2000) did not find strong relationships between terrain
and ECEC, six terrain attributes were selected. Covariates representing geology
were important, too, with e.g. ECEC changing as a function of the distance to two
types of moraines. Also, vegetation provided information on ECEC in the topsoil
because a vegetation index (difference of near infrared to red reflectance) and a
vegetation map were included. Larger values of ECEC were modelled for plant
communities that are characteristic for nutrient-rich soils. The factor distinguish-
ing the origin of soil data either from direct measurement or pedotransfer function
(PTF, legacy data correction, Sect. 2.3.2, Appendix, Fig. A.1) was further relevant
in the ECEC model.
For modelling drainage classes and presence of waterlogged horizons plausible
covariates were selected (Appendix, Figs. A.4 to A.7). Most covariates were ter-
rain attributes derived from the digital elevation model (DEM). This is in accor-
dance with Campling et al. (2002) who found topography important in general
and Lemercier et al. (2012) who showed that a topographic wetness index was
among the most important covariates. Local depression at various scales (concave
curvature, basins in TPI, sites with accumulation by erosion, terrain wetness)
increased the probability for poorly drained soils and presence of waterlogged
horizons. More variable terrain (standard deviation of elevation) also increased
waterlogging probability. Climate covariates also seemed to be important. Rain-
fall pattern in summer (June, July), spring dew point temperature and global
radiation (March, April) correlated most strongly with presence of waterlogged
horizons. Information on human activities related to waterlogged soil ameliora-
tion were included in all four models. Maps of historic wetlands and areas with
drainage systems were most often chosen in combination. Geology was also partly
relevant (presence of waterlogged horizons in 0–100 cm soil depth and drainage
classes).
Overall, nonlinearities in effects were small for drainage classes and presence of
waterlogged horizons. Estimated degrees of freedom (EDF, Wood, 2006, pp. 170)
were generally smaller than 1.5, with some continuous effects even being close to 1

35
2. Mapping of soil properties by boosted geoadditive models

EDF. In contrast, most nonlinear effects of the model for ECEC had EDF around
1.7–1.8 with northness consuming even 2.0 EDF. The large area of the study region
and the response being a chemical property that depends on various combinations
of soil forming factors evidently required the use of a more complex model.

2.5.3. Predictive performance of fitted models

For the final models, cross-validation statistics were similar to results obtained for
the independent validation data. Through repeated cross-validation on the same
subsets the cross-validation statistics can be considered as conservative goodness-
of-fit statistics. Hence, we conclude that geoGAM did not over-fit the calibration
data.
Independently validated model accuracy was satisfactory for ECEC in the present
study with (SSmse 0.37). Compared to the few available studies the quality of our
maps of ECEC was intermediate. Building a separate model for forest soil ECEC
for a dataset with about 2.1 sites per km2 seem to produce much better results than
the study reported by Vaysse and Lagacherie (2015) who found very poor model
performance for ECEC (R2 = 0, equivalently computed as SSmse ) for a dataset
with 0.04 sites per km2 and a study region with multiple land uses. Mulder et al.
(2016) achieved somewhat better results (R2 = 0.24, details on computation not
given) for mapping topsoil ECEC for whole France. Hengl et al. (2017) mapped
ECEC with a large dataset globally and obtained a cross-validation R2 of 0.65
(computed as SSmse ). Viscarra Rossel et al. (2015, Supplement) reported R2 of
0.79 (computed as SSmse ) for topsoil ECEC for Australia. In Hengl et al.’s and
Viscarra Rossel et al.’s studies ECEC varied much more than in our study, and
this likely explains the better quality of the predictions.
Our models for presence of wet soils reached similar accuracy as reported in other
studies. Zhao et al. (2013, Table 1) reported that 64 to 87 % of the sites were
correctly classified (percentage correct, PC) in four studies that modelled three
drainage class levels. Three studies with up to seven drainage levels achieved PC
of 52 to 78%, and Zhao et al. (2013) themselves had 36 % of correctly classified
sites. Kidd et al. (2014) found PC of 53 % and 55 % for two study regions, and
Lemercier et al. (2012) reported PC of 52 % for a four-level drainage response. The
presented models (Table 2.4 and 2.5) are about as good with PC of 78 % to 82 %
for predicting presence of waterlogged horizons and PC of 78 % for predicting the
three drainage class levels.

36
2.6. Summary and conclusion

Nevertheless, PC is trivial to hedge (Jolliffe and Stephenson, 2012, pp. 46), and
comparisons should be made only with care. Better performance measures are
PSS and Cohen’s kappa (κ), also called Heidke skill score (Wilks, 2011, pp. 347).
Campling et al. (2002) reported a κ of 0.705, Kidd et al. (2014) κ’s of 0.27 and
0.31 for the two study regions, Lemercier et al. (2012) a κ of 0.27 and Peng et al.
(2003) found κ of 0.59 for predictions of three drainage levels. κ’s computed for
the models of this study ranged between 0.37 and 0.5 for modelling the presence of
waterlogged horizons and was 0.48 for predicting the three levels of drainage class.
Unequal distribution of the three drainage classes in the study region (majority of
soils were well drained) were reflected in the smaller value of κ compared to PC.

2.5.4. Spatial structure of predicted maps

The spatial distribution of ECEC as shown by Fig. 2.5 aligns well with pedolog-
ical knowledge about soils in the Canton of Zurich. The smallest ECEC (< 50
mmolc kg−1 ) was mapped in the northeast of the study region. The last glaciation
(Swisstopo, 2009) did not reach as far north and, as a consequence, strongly weath-
ered soils on old fluvioglacial gravel-rich sediments developed in this part of the
study region. Soils not covered by ice during the last glaciation have comparably
larger ECEC if they formed on Molasse.
As expected the spatial patterns for the presence of waterlogged soil horizons
and the drainage classes were very similar (Fig. 2.6 and 2.7). Especially soils
on plains to the North and South of Lake Greifensee are often poorly drained,
although at many locations agricultural drainage networks were installed in the
past.

2.6. Summary and conclusion

Effectively building predictive models for digital soil mapping (DSM) becomes
crucial if many soil properties are to be mapped. Selecting only a small set of
relevant covariates renders interpretation of the fitted models easier and allows to
check whether modelled relations accord with pedological understanding. Parsi-
monious, interpretable DSM models are likely more readily accepted by end-users
than complex black-box models. Moreover, model selection out of a large number
of covariates describing soil forming factors helps to improve knowledge about re-
lationships at larger scales. In this sense, it is also important, that the modelling

37
2. Mapping of soil properties by boosted geoadditive models

approach provides information about covariates which are not relevant for a cer-
tain response, e.g. the large number of APEX bands for persence of waterlogged
horizons and drainage classes.
We developed a model building framework for generalized additive models for
spatial data (geoGAM) and applied the framework to legacy soil data from the
Canton of Zurich (Switzerland). We found that geoGAM

• consistently modelled continuous, binary and ordinal responses, hence, al-


low DSM of measured soil properties and soil classification data using one
common approach,
• selected, given the large numbers of covariates, adequately small sets of pe-
dogenetically meaningful covariates without any prior knowledge about their
importance and without prior reduction of the covariate sets,
• required minimal user interaction for model building, which facilitates future
map updates as new soil data or new covariates become available,
• allowed easy interpretation of effects of the included covariates by partial
residual plots,
• modelled predictive distributions for continuous responses by a bootstrap-
ping approach, thereby taking uncertainty of model building into account,
• did not over-fit the calibration data in our applications, and
• predicted soil properties with similar accuracy as other approaches did in
other digital soil mapping studies, when tested with an independent valida-
tion set.

To further assess usefulness of geoGAM for DSM future work should focus on
comparisons of predictive accuracy with commonly used statistical methods (e.g.
geostatistics or tree-based machine learning techniques) on the same soil datasets.
Chapter 3 presents a first such study.

Acknowledgements We thank the Swiss National Science Foundation SNSF for


funding this work in the frame of the National Research Program ”Sustainable
Use of Soil as a Resource” (NRP 68)” and ”Swiss Earth Observatory Network”
(SEON) for funding aerial surveys with APEX. Special thanks go to WSL and
the soil protection agency of the Canton of Zurich for sharing their soil data with
us. Furthermore, we would like to thank Thorsten Hothorn for advice on model
selection and boosting.

38
3. Evaluation of digital soil mapping
approaches with large sets of
environmental covariates

Chapter 3 was published as research article: Nussbaum, M., Spiess, K., Baltensweiler, A., Grob,
U., Keller, A., Greiner, L., Schaepman, M. E., and Papritz, A.: Evaluation of digital soil map-
ping approaches with large sets of environmental covariates, SOIL Discussions, 2017, 1–32, doi:
10.5194/soil-2017-14, URL https://www.soil-discuss.net/soil-2017-14/, 2017.

Abstract

Spatial assessment of soil functions requires maps of basic soil properties. Unfortu-
nately, these are either missing for many regions or are not available at the desired
spatial resolution or down to the required soil depth. Field based generation of
large soil data sets and of conventional soil maps remains costly. Meanwhile, le-
gacy soil data and comprehensive sets of spatial environmental data are available
for many regions.
Digital soil mapping (DSM) approaches – relating soil data (responses) to en-
vironmental data (covariates) – face the challenge of building statistical models
from large sets of covariates originating for example from airborne imaging spec-
troscopy or multi-scale terrain analysis. We evaluated six approaches for DSM in
three study regions in Switzerland (Berne, Greifensee, ZH forest) by mapping ef-
fective soil depth available to plants (SD), pH, soil organic matter (SOM), effective
cation exchange capacity (ECEC), clay, silt, gravel content and fine fraction bulk
density for four soil depth (totalling 48 responses). Models were built from 300-500
environmental covariates by selecting linear models by 1) grouped lasso and by 2)
an ad-hoc stepwise procedure for robust external-drift kriging (georob). For 3)
geoadditive models we selected penalized smoothing spline terms by component-
wise gradient boosting (geoGAM). We further used two tree-based methods: 4)

39
3. Evaluation of statistical approaches

boosted regression trees (BRT) and 5) random forest (RF). Lastly, we computed
6) weighted model averages (MA) from predictions obtained from methods 1–5.
Lasso, georob and geoGAM successfully selected strongly reduced sets of covari-
ates (subsets of 3-6 % of all covariates). Differences in predictive performance,
tested on independent validation data, were mostly small and did not reveal a sin-
gle best method for 48 responses. Nevertheless, RF was often best among methods
1–5 (28 of 48 responses), but was outcompeted by MA for 14 of these 28 responses.
RF tended to over-fit the data. Performance of BRT was slightly worse than of
RF. geoGAM performed poorly on some responses and was best only for 7 of 48
responses. Prediction accuracy of lasso was intermediate. All models generally
had small bias. Only the computationally very efficient lasso had slightly larger
bias because it tended to under-fit the data. Summarizing, although differences
were small, the frequencies of best and worst performance clearly favoured RF if a
single method is applied and MA if multiple prediction models can be developed.

3.1. Introduction

Human well-being depends on numerous services that soils provide in agriculture,


forestry, natural hazards, water protection, resources management and other envi-
ronmental domains. The capacity of soil to deliver services is largely determined
by its functions, e.g. regulation of water, nutrient and carbon cycles, filtering of
compounds, production of food and biomass or providing habitat for plants and
soil fauna (Haygarth and Ritz, 2009; Robinson et al., 2013). The assessment of
the multi-functionality of soils depends on availability of datasets on chemical,
physical and biological soil properties (Calzolari et al., 2016). Greiner et al. (2017)
compiled a set of approved assessment methods for soil functions from the applied
soil science community that cover the multi-functionality of soils (Table 3.1). This
set of soil functions can be assessed with 12 basic soil properties (see references
in Table 3.1). Unfortunately, spatial assessment of soil functions is often hindered
because accurate maps of soil properties are missing in many countries of the world
(Hartemink et al., 2013; Rossiter, 2016). However, for many regions legacy data
on soil properties (responses) and comprehensive spatial environmental data (co-
variates) are available and can be linked by digital soil mapping techniques (DSM,
e.g. McBratney et al., 2003; Scull et al., 2003).

40
3.1. Introduction

Table 3.1.: Basic soil properties needed for spatial soil function assessment in the three study
regions. Most soil functions required data on further, expensive-to-measure soil properties that
were inferred by pedotransfer functions (PTF) from the basic soil properties (see Greiner et al.,
2017, BD: fine fraction bulk density, SOM: soil organic matter, SD: soil depth available to plants,
dw : depth of stagnic or gleyic horizon, dc : drainage class [dw and dc defined in Chapt. 2, BS:
base saturation, ECEC: effective cation exchange capacity, BC/Al: ratio of sum of basic cations
to aluminium).

BC/Al
ECEC
gravel

SOM
clay

BD
silt

SD
pH

BS

dw
dc
Soil (sub)function

Regulation function
Capacity for water infiltration and storage (Danner * * * *3 * * *
et al., 2003)
Nutrient cycling (Lehmann et al., 2013) * * * *3 * * * *
Binding capacity for inorganic contaminants (DVWK, * * *3 * * *
1988)
Binding and decomposition capacity for organic con- * * * *3 * * * * *
taminants (Litz, 1998)
Filtering of pollutants and acidity buffering (Bechler * * *3 * * * *
and Toth, 2010)
C storage (SOC-stock to 1 m soil depth, Greiner et al. * *3 * *
unpublished)
Capacity for plant nutrient retention (against percola- * * * *3 * * * *
tion and overland flow, Jäggli et al., 1998)
Acidity state of forest soils, resilience to acidification *1 * * * * 2 *2
and risk of aluminum toxicity (Zimmermann et al.,
2011)

Habitat function
Soils with extreme properties fostering rare plant com- * * *3 * * * *
munities (Siemer et al., 2014)
Habitat for plants (Greiner et al., unpublished) * * *3 * * *

Production function
Agricultural production (Jäggli et al., 1998) * * * *3 * * * * *
1
Only 50 sites with gravel estimates available, mean content per soil depth used.
2
Limited data for BS and BC/Al (topsoil 300, subsoil 210 sites), no independent validation possible, therefore not
included in this publication.
3
For Berne and Greifensee computed by PTF which used SOM to predict BD.

41
3. Evaluation of statistical approaches

Many recent DSM studies used relatively small sets of no more than 30 covariates
(e.g. Li et al., 2011; Liess et al., 2012; Adhikari et al., 2013; Vaysse and Lagacherie,
2015; Were et al., 2015; Lacoste et al., 2016; Mulder et al., 2016; Somarathna et al.,
2016; Taghizadeh-Mehrjardi et al., 2016; Yang et al., 2016). Geodata availability
and deemed importance often determine what covariates are used for DSM. How-
ever, Brungard et al. (2015) showed that a priori preselection of covariates using
pedological expertise might result in a decreased accuracy of soil class predictions.
Using comprehensive environmental geodata for DSM improves prediction accu-
racy because soil forming factors are likely better represented by a larger number
of covariates. Derivatives of geological or legacy soil maps (Chapt. 5), multi-scale
terrain analysis (Behrens et al., 2010a,b, 2014; Miller et al., 2015), wide ranges of
climatic parameters (Liddicoat et al., 2015) and (multi-temporal) imaging spec-
troscopy (Mulder et al., 2011; Poggio et al., 2013; Viscarra Rossel et al., 2015;
Fitzpatrick et al., 2016; Hengl et al., 2017; Maynard and Levi, 2017) all contribute
to generate high-dimensional sets of partly multi-collinear covariates. One usually
presumes that DSM techniques benefit from large number of covariates even if a
method selects only a small subset of relevant covariates for creating the predic-
tions. DSM model building therefore faces the challenge of dealing with (very)
large covariates sets. If, in addition, many responses have to be mapped a DSM
approach should
1. efficiently build models without much user interaction, even if there are more
covariates p than observations n (n < p),
2. cope with numerous multi-collinear and likely noisy covariates,
3. result in predictions with good accuracy and
4. avoid over-fitting the calibration data.
Besides, the method should fulfil basic DSM requirements like modelling nonlinear
and nonstationary relations between response and covariates, considering spatial
autocorrelation, allowing to check pedological plausibility of the modelled relation-
ships and quantifying predictive uncertainty.
DSM approaches used in the past can broadly be grouped in 1) linear regres-
sion models (LM), 2) variants of geostatistical approaches, 3) generalized additive
models (GAM), 4) methods based on single trees like classification and regression
trees (CART), 5) (ensemble) machine learners like boosted regression trees (BRT)
or random forest (RF), and 6) averaging predictions of any of the mentioned meth-
ods (model averaging, MA). LM (e.g. Meersmans et al., 2008; Wiesmeier et al.,

42
3.1. Introduction

2013) can not be fitted for n < p, and estimates of coefficients become unstable
with collinear covariates. Liddicoat et al. (2015) and Fitzpatrick et al. (2016) used
lasso (least absolute shrinkage and selection operator), a form of penalized LM
suitable for large correlated covariate sets. Fitzpatrick et al. (2016) found that
lasso clearly outperformed different stepwise LM selection procedures. Geostatis-
tical approaches are generally popular in DSM (McBratney et al., 2003), and they
have clear advantages over other methods: They allow change-of-support, and pre-
dictive uncertainty follows straightforwardly from the kriging variances. Similar
to LM external-drift kriging (EDK) requires a parsimonious linear trend model.
In Chapter 5 we used lasso for initial covariate selection, but subsequent manual
model building steps were needed. Nonlinear additive modelling through GAM
also relies on covariate selection for stable trend estimation. Poggio et al. (2013)
used a covariate selection procedure with a random component, and in Chapter
2 we applied component-wise gradient boosting to preselect relevant covariates.
Unless combined with either lasso or boosting, large sets of covariates are diffi-
cult to process by LM, EDK or GAM. But these methods allow for simple model
interpretation by partial effects plots (Faraway, 2005, p. 73)
Generally, more complex approaches seem to yield more accurate predictions
than simpler DSM methods (Liess et al., 2012; Brungard et al., 2015). Effects
of interactions between covariates on responses can be modelled by tree-based
methods, but single trees (CART, e.g. Liess et al., 2012; Heung et al., 2016)
tend to be noisy (large variance). Cubist, an extension of CART with LM at the
terminal nodes of the tree, was only occasionally applied to large covariate sets
(e.g. Viscarra Rossel et al., 2015; Miller et al., 2015). Forming ensembles of trees
aims to reduce their variance and likely outperforms its single components (Liess
et al., 2012). RF seems stable for large sets of covariates (Behrens et al., 2010a,b,
2014) while BRT, compared to RF by Yang et al. (2016), yields similar model
accuracy. Averaging predictions from different models (MA) follows the strategy
of ensemble learners, possibly reducing prediction variance (Hastie et al., 2009,
pp. 288), but MA has rarely been used for DSM. Malone et al. (2014) explored
different weighing strategies for MA, but it was unclear from the study whether
MA was indeed better than predictions by a single method because predictions
were not validated by independent data. Li et al. (2011) averaged only predictions
computed by the best performing (very similar) models, and this did not result in
any advantage of MA over single models.
The comparative studies mentioned above used only small sets of covariates

43
3. Evaluation of statistical approaches

(p < 30), tested only few or very similar (Fitzpatrick et al., 2016) approaches or,
with exception of Vaysse and Lagacherie (2015), did not extend the evaluation to
several soil properties or study regions. It is therefore currently unclear how well
models can be built form large covariate sets by popular DSM methods. Empirical
evidence is still too limited to rate DSM methods with respect to the criteria 1–4
listed above. In particular, it is not known whether methods can be identified that
are more prone to over-fit soil data or that yield accurate predictions more often
than others.
The objectives of this study were to evaluate for a broad choice of currently
used DSM methods how well they cope with requirements 1–4 listed above. We
compared in our study a) lasso, b) robust EDK (georob), c) spatial GAM with
model selection based on boosting (geoGAM), two ensemble tree-methods d) BRT
and e) RF as well as f) weighted MA. In more detail, our objectives were to
i) automatically build models by methods a)–e) and compute MA of a)–e) for
numerous responses from large sets of covariates (300–500),
ii) evaluate predictive performance of these models with independent validation
data,
iii) evaluate over-fitting behavior and practical usage of approaches,
iv) briefly compare accuracies of DSM predictions and predictions derived from
a legacy soil map 1:5 000.
We focused on three study regions in Switzerland: A forested region and two
regions with agricultural land, where harmonized legacy soil data and in latter re-
gions airborne imaging spectrometer data were available. For the agricultural land,
soil properties required for assessing regulation, habitat and production functions
were mapped (Table 3.1). For forests, we had less diverse soil data and mapped
only properties to assess acidification status (Zimmermann et al., 2011).

3.2. Materials

3.2.1. Study regions

We chose three study regions on the Swiss Plateau with contrasting patterns re-
garding land use, geology, soil types and availability of airborne remote sensing
images (Fig. 3.1, Table 3.2). Agricultural land north of the city of Berne and

44
3.2. Materials

ZH forest

ra au
Ju P la te Greifensee

Berne
A lp s

0 50 100 km

Data sources: Biogeographical regions © 2001 BAFU / Swiss Boundary, Lakes © 2012
BFS GEOSTAT / Boundries Europe: NUTS © 2010 EuroGeographics

Figure 3.1.: Location of study regions Berne and Greifensee (agricultural soils)
and Canton of Zurich (forest soils).

around Lake Greifensee (Canton of Zurich) was selected within the outline of
imaging spectroscopy data gathered by the APEX spectrometer in the years 2013
and 2014 (Schaepman et al., 2015). Agricultural land was defined as the area
not covered by any areal features extracted from the Swiss topographic landscape
model (swissTLM3D, Swisstopo, 2013a), hence wetlands, forests, parks, gardens
and developed areas were excluded.
The majority (80 %) of the study region Berne was covered by crop land and
15 % by permanent grassland. In the Greifensee region crop land covered roughly
half of the area and one third was permanent grassland. The remaining areas were
orchards, vineyards, horticultural areas or mountain pastures (Hotz et al., 2005).
The third study region comprise of the forested areas of the Canton of Zurich
(ZH forest), as derived from the forested area of the topographic landscape model
(swissTLM3D, Swisstopo, 2013a). Two thirds of the forested area are dominated
by conifers (FSO, 2000b). In all three study regions soils formed mostly on weath-
ered Molasse formations and pleistocene sediments dominantly deposited during

Table 3.2.: Description of three study regions (a: area, h: elevation, Swisstopo,
2013b; p: mean annual precipitation; t: mean annual temperature, Zimmermann
and Kienast, 1999).

name land use a [km2 ] h [m] p [mm] t [◦ C]

Berne agriculture 235 430–910 960–1440 6.8–9.3


Greifensee agriculture 170 390–840 1040–1590 7.5–9.1
ZH forest forest 507 340–1170 880–1780 6.1–9.1

45
3. Evaluation of statistical approaches

the last glaciation. In the northeastern part, limestone Jura hills reach into ZH
forest (Hantke, 1967). In the western part of the Berne study region, alluvial plains
with silty sediments or peat formations prevail (Swisstopo, 2005).
Soils are rather young in all study regions (< 20 000 years old) as they mostly
formed after the end of the last glaciation. Typical soils are Cambisols and Luvisols
(calcaric to dystric), Gleysols and Fluvisols (reflecting frequent wet conditions) and
Histosols (on former peatlands). Shallower soils are often Regosols (FSO, 2000a).

3.2.2. Soil data

Origin of soil data and data harmonization

We gathered and harmonized legacy soil data from various soil surveys performed
between 1960 and 2014. Berne data was collected mostly before 1980 in small
soil mapping projects for land improvement. Data for Greifensee and ZH forest
originate from long-term soil monitoring of the Canton of Zurich (KaBo), a soil
pollutant survey (Wegelin, 1989), field surveys for creating soil maps of the agricul-
tural land (scale 1:5 000, Jäggli et al., 1998) or soil investigations in the course of
forest vegetation surveys by the Swiss Federal Institute for Forest, Snow and Land-
scape Research (WSL, Walthert et al., 2004). Hence, the compiled soil database
comprised of data of soil properties that were measured or estimated for pedogenic
soil horizons of soil profiles or measured at fixed depth from bulked soil samples.
Sites for pollution surveying were chosen on a regular grid. The remaining sites
were selected by purposive sampling (Webster and Lark, 2013, p. 86) by field
surveyors to best represent soils typical for the given landform. The sites of WSL
were chosen by purposive sampling according to the aims of the project. Collating
the data from the different sources showed that soil data were not directly compa-
rable, and tailored harmonisation procedures were required to provide consistent
soil datasets. The heterogeneity of legacy soil data resulted from several standards
of soil description and soil classification, different data keys, different analytical
methods and particularly, often missing metadata for a proper interpretation of
the datasets. Therefore, we elaborated a general harmonisation scheme that covers
all steps required to merge different legacy soil data into one common consistent
database (Walthert et al., 2016). Sampling sites were recorded in the field on
topographic maps (scale 1:25 000), hence we estimated accuracy of coordinates to
about ± 25 m.

46
3.2. Materials

Horizon-based (and nonfixed depth) soil property data was converted to fixed-
depth data for 0–10, 10–30, 30–50 and 50–100 cm soil depth for Berne and Greifensee
and 0–20 and 40–60 cm depth for ZH forest. The latter intervals were chosen be-
cause at the majority of forest sites only these depths had been sampled. Values
ot for depth t were computed from horizon (or fixed-depth) data oi by

h
X
ot = wi oi , (3.1)
i=1

with wi given by the product of the fraction of the thickness of horizon/fixed


depth i within t and the bulk density ρi of the soil fraction with particle size ≤
2 mm. Because we lacked estimates of volumetric gravel content for the majority
of samples we assumed that it was constant. ρi was partly derived by pedotransfer
functions (PTF, Appendix, Table A.4).
Soil properties were either measured by standard laboratory procedures, esti-
mated in the field or calculated by PTF (see overview in Appendix, Table A.4).
We accounted for fluctuations of the observations over the long period which the
data had been collected and for possible differences between laboratory measure-
ments, field estimates and PTF predictions by statistical modelling. We included
categorical covariates (factors) in the statistical models that coded separately for
laboratory measurements, field estimates and PTF predictions the period when
the data had been gathered. For Berne three periods (years 1968–1974, 1975–1978
and 1979–2010) were coded separately for laboratory measurements and field es-
timates. For Greifensee and ZH forest coding required more care because we had
replicate samples from soil monitoring. Instead of only using mean or median val-
ues per site this coding allowed us to use all individual observations. For Greifensee
we coded the years of 1960–1989, 1990–1994 and 1995–1999 separately for labo-
ratory and field data and 2000–2014 for laboratory measurement only. For ZH
forest we distinguished the periods 1985–1994, 1995–1999, 2000–2004, 2005–2009,
2010–2014 for laboratory measurements and a further two levels for predictions by
PTF or pH measurements on field-moist samples (Appendix, Table A.4). Older (or
newer) data on pH, soil organic matter (SOM) and effective cation exchange ca-
pacity (ECEC) than reported above was discarded. To compute model predictions
for mapping we used the most recent time period and laboratory measurements
as reference level.

47
3. Evaluation of statistical approaches

Soil properties

For the agricultural land (Berne, Greifensee) we modelled clay and silt, gravel
content, pH, SOM and effective soil depth available to plants (SD) and for ZH forest
ECEC, pH and bulk density of the fine soil fraction (≤ 2 mm, BD). For Berne
and Greifensee (possibly incomplete) soil data was available for 1052 and 2050
sites respectively, and for ZH forest we had 2379 sites with soil data (Appendix,
A.8 to A.10). We used roughly 20 % of the sampled sites for independent model
validation. Depending on data availability, this resulted in 120–300 validation sites
that were chosen by weighted random sampling. We ensured an even distribution
of validation sites over the study regions by assigning to each site a sampling weight
that was proportional to respectively the forested and agricultural area within its
Dirichlet polygon (Dirichlet, 1850).
Models for properties of agricultural soils were calibrated with data of 700–900
sites. For SOM there were more topsoil sites available (1140), but in the subsoil
we had only data from 400 (Greifensee) and 530 (Berne) sites, respectively. For
ZH forest topsoil chemical properties were available for 1055 (ECEC) to 1470 (pH)
sites, but for subsoils data was again scarce (ECEC 380 and pH 690 sites). For
modelling BD we had only 550 (topsoil) to 370 (subsoil) sites. On average we
calibrated the models with the following spatial data densities: Berne 2.9–3.6,
Greifensee 4.2–5.1 and ZH forest 1.2–1.8 observations per km2 .
Tables A.6 to A.10 in the Appendix report descriptive statistics of all soil prop-
erties. In general, soils in the Greifensee region were richer in clay (mean clay
content 26 %) than in Berne (17–19 %) and had larger gravel content (8–13 % vs.
3–5 %). In both agricultural study regions, large SOM contents were occasionally
found (> 40 %) as drained organic soils were sampled at some sites. Topsoil pH
showed in Berne and Greifensee similar variation (mean of 6.3–6.7 and standard
deviation of 0.7–0.9), because agricultural management probably evens out pedo-
genic differences. ZH forest soils were more acid (mean topsoil pH 4.7) and pH
varied more strongly (minimum pH 2.6).

3.2.3. Covariates for statistical modelling

To represent soil forming factors we used data from 28 sources, totalling to roughly
480 covariates for Berne and Greifensee and 330 for ZH forest where APEX imaging
spectrometer data was not available (Table 3.3 and Appendix, Table A.5). Exact

48
3.3. Methods

numbers of covariates used depended on soil properties. When sampling density of


soil data was small we excluded covariates that showed hardly any spatial variation
(e.g. coarse-gridded climate data) or that resulted only in few data points per
factor level. Wherever possible, we aggregated factor levels based on pedological
knowledge to obtain at least 20 observations per level.

3.3. Methods

The large number of responses – 21 for each of Berne and Greifensee, 6 for ZH
forest – and of covariates (Table 3.3) required that statistical models could be
automatically built without user interaction. Hence, we used five approaches:
lasso (Sect. 3.3.1) and robust external-drift kriging (georob, Sect. 3.3.2), geoaddi-
tive modelling (geoGAM, Sect. 3.3.3) as well as two tree-based machine learning
procedures (boosted regression trees [BRT], Sect. 3.3.4 and random forest [RF],
Sect. 3.3.5). The predictions by the five methods were moreover combined by
weighted averaging (MA, Sect. 3.3.6). To create the final maps we predicted each
response at the nodes of a 20 m-grid.
For parametric methods (Sect. 3.3.1 to 3.3.3) we transformed strongly positively
skewed responses Y (s) (see Appendix, Tables A.7, A.9, A.10 for skewness). Trans-
formation by natural logarithm was applied to soil organic carbon (SOM) and
effective cation exchange capacity (ECEC) while gravel content was transformed
by square root (sqrt). Predictions of log-transformed data were unbiasedly back-
transformed according to Cressie (2006, Eq. (20), Chapt. 2 of this thesis) and for
sqrt-transformed data we used

Ỹ (s) = fˆ(x(s))2 + σ̂ 2 − Var[fˆ(x(s))] (3.2)

with fˆ(x(s))2 being the prediction of the sqrt-transformed response, σ̂ 2 the esti-
mated residual variance of the fitted model and Var[fˆ(x(s))] the variance of fˆ(x(s))
as provided again by the final model. Predictions by group lasso (Sect. 3.3.1) were
backtransformed by exp(·) or (·)2 because Var[fˆ(x(s))] was not known.
For tree-based models (Sect. 3.3.4 and 3.3.5) responses were not transformed.
Clay and silt were modelled independently. Sand was computed as the remainder
to 100 % because for field estimates – a substantial part of the used texture data
(Appendix, Table A.4) – sand was obtained in the same way (Brunner et al., 1997;
Jäggli et al., 1998). Additive log-ratio transformation (ALR) for compositional

49
3. Evaluation of statistical approaches

Table 3.3.: Overview of geodata sets and derived covariates (for more information see Table A.5
in Appendix, r: pixel size for raster datasets or scale for vector datasets, a: limited to study
region Be: Berne, Gr: Greifensee or Zf: ZH forest, n: number of covariates per dataset, NDVI:
normalized differenced vegetation index, TPI: topographic position index, TWI: topographic
wetness index, MRVBF: multi-resolution valley bottom flatness).
geodata set r a n covariate examples
Soil physiographic units,
Soil overview map (FSO, 2000a) 1:200 000 8 historic wetlands,
Wetlands Wild maps (ALN, 2002) 1:50 000 Gr 1 presence of drainage
Wetlands Siegfried maps (Wüst-Galley et al., 1:25 000 Gr 1 networks or soil
2015) amelioration
Agricultural suitability (LANAT, 2015) 1:25 000 Be 1
Anthropogenic soil interventions (AWEL, 2012) 1:5 000 Gr 1
Drainage networks (ALN, 2014b) 1:5 000 Gr 2
Parent material (aggregated) geological
Geological overview map (Swisstopo, 2005) 1:500 000 Be 4 units, ice level during
Map of last glacial maximum (Swisstopo, 2009) 1:500 000 1 last glaciation,
Geotechnical map (BFS, 2001; BAFU and GRID- 1:200 000 2 aquifers, areas suitable
Europe, 2010)
for gravel exploitation
Geological map (ALN, 2014a) 1:50 000 7
Geological maps (Swisstopo, 2016b), roughly har- 1:25 000 Be 1
monized
Groundwater occurrence (AWEL, 2014; AWA, 1:25 000 Gr 2
2014b)
Hydrogeological infiltration zones (AWA, 2014a) 1:25 000 Be 2
Mineral raw materials (AGR, 2015) 1:25 000 Be 1
Climate mean annual/monthly
MeteoSwiss 1961–1990 (Zimmermann and Kien- 25/100 m 33 temperature and
ast, 1999) precipitation, radi-
MeteoTest 1975–2010 (Remund et al., 2011) 250 m 38 ation, continentality
Air pollutants (BAFU, 2011) 500 m Zf 2 index, site water
NO2 immissions (AWEL, 2015) 100 m Gr 3
balance, NH3
concentration in air
Vegetation band ratios, NDVI,
Landsat7 scene (USGS EROS, 2013) 30 m 9 imaging spectroscopy
DMC mosaic (DMC, 2015) 22 m 4 bands, aggregated
SPOT5 mosaic (Mathys and Kellenberger, 2009) 10 m Zf 12 vegetation units,
APEX spectrometer mosaics (Schaepman et al., 2 m Gr,Be 180
canopy height
2015)
Share of coniferous trees (FSO, 2000b) 25 m Zf 1
Vegetation map (Schmider et al., 1993) 1:5 000 Zf 2
Species composition data (Brassel and Lischke, 25 m Zf 1
2001)
Digital surface model (Swisstopo, 2011) 2m Zf 1
Topography slope, curvature,
Digital elevation model (Swisstopo, 2011) 25 m 62 northness, TPI, TWI,
Digital terrain model (Swisstopo, 2013b) 2m 134 MRVBF (various
radii/resolutions)

50
3.3. Methods

data (Aitchison, 1986, pp. 113) was tested for geoGAM (Sect. 3.3.3), but as ALR
had no advantage, we preferred to model textural components on their original
scale.
To find optimal tuning parameters, we minimized root mean squared error
(RMSE, Eq. (3.4)) in 10-fold cross-validation using the same cross-validation sub-
sets for all methods in Sect. 3.3.1 to 3.3.4. For RF (Sect. 3.3.5) root mean squared
error (RMSE) was computed for out-of-bag predictions. All computations were
done in R (R Core Team, 2016) using the functions reported below.

3.3.1. Group lasso

The lasso (least absolute shrinkage and selection operator) is a shrinkage method
that likely excludes nonrelevant covariates and is therefore an attractive framework
for high dimensional covariate selection. Lasso estimates coefficients of a linear
model by minimizing a penalized residual sum of squares, with the penalty being
equal to the weighted sum of absolute values of the estimated coefficients. By
increasing the weight λ of the penalty term, a kind of continuous subset selection
is performed. Covariates with coefficients shrunken exactly to zero are excluded
from the model (Hastie et al., 2009, Sect. 3.4).
We used the grouped lasso which jointly shrinks all coefficients of a factor (R
package grpreg, Breheny and Huang, 2015). The optimal λ was chosen such that
we obtained the least complex model with cross-validation mean squared error
(MSE) one standard error (SE) larger than the optimal MSE (Hastie et al., 2009,
p. 62).

3.3.2. Robust external-drift kriging (georob)

We applied external-drift kriging (EDK) with robustly estimated trend coefficients


and exponential variogram parameters (R package georob, Papritz, 2016, Chapt.
5 of this thesis). Building a parsimonious trend model from a large number p of
covariates was challenging for georob. We built trend models by concatenating
several covariate selection steps. First, we did a pre-selection by finding common
covariates in repeated lasso cross-validation runs (32 repetitions, optimal λ from
argmini (MSEi )+1 SE, R package glmnet, Friedman et al., 2010). Then we reduced
and expanded this initial covariate set by repeated stepwise covariate selection
(models were reduced by step function minimizing Bayesian information criterion

51
3. Evaluation of statistical approaches

[BIC], and enlarged by adding covariates with p ≤ 0.05 in Wald tests). Covariates
with inflated coefficients due to multi-collinearity had to be removed manually
from the final models (40 % of responses). We used a robustness parameter ψ
equal to 1.75. When the robust algorithm did not find a root of the estimating
equations, we first increased ψ and fitted the model nonrobustly if this did not
help (8 % of responses, see Tables A.13 and A.14 in Appendix).

3.3.3. Boosted geoadditive model (geoGAM)

Additive models accommodate as well as linear effects, smooth nonlinear effects of


continuous covariates. Spatial auto-correlation can be represented in geoGAM by
a smooth function of the spatial coordinates (smooth spatial surface), and nonsta-
tionary effects are modelled by interactions between smooth spatial functions and
covariates. We based model building for geoGAM on component-wise gradient
boosting, a slow stagewise additive model building algorithm. At each stage base
procedures are fitted to residuals of the previous model and the best fitting base
procedure is retained to update the model by a small step size v. We used non-
parametric penalized smoothing splines for continuous covariates and linear base
procedures for factors. After boosting further model reduction was achieved by
stepwise removal of covariates and aggregation of factor levels. Optimal number of
boosting iterations mstop and parameters for further model reduction were found
by minimizing cross-validation RMSE. For more details on the model building
procedure, see Chapter 2 and R package geoGAM (Nussbaum, 2017).
Nonstationary effects were added for all continuous covariates, but cross-validation
RMSE did not substantially decrease, and we preferred the simpler stationary
models throughout. Maximum boosting iterations mmax were kept on default 300
iterations (geoGAM, Nussbaum, 2017), except if visual inspection of the sequence
of cross-validation RMSE values suggested that RMSE had not yet levelled off
(20 % of the responses).

3.3.4. Boosted regression trees (BRT)

Classification and regression trees (CART) are based on recursive binary partition-
ing of the covariates and can capture complex interaction structures in a dataset.
Generally, single trees tend to be noisy (large variance), but to have small bias.
Combining trees by ensemble methods aims to reduce their variance (Hastie et al.,

52
3.3. Methods

2009, Chapt. 9 and 10). One such approach uses regression trees as base proce-
dures in component-wise gradient boosting (Sect. 3.3.3).
The optimal number of trees (= number of boosting iterations) ntrees and the
number of splits per tree id (representing interaction depth) was found by cross-
validation by iterating through a grid of ntrees = 2, 4, 8, .., 200, 210, .., 800 and id =
1, 2, .., 12, 14, .., 50 (R package gbm, Southworth, 2015, optimization done using R
package caret, Kuhn, 2015).
Learning rate was kept similarly small as for geoGAM with v = 0.1 (Sect. 3.3.3,
Hastie et al., 2009, Chapt. 10), and minimal number of observations in each end
node was set to 5 as in RF (Sect. 3.3.5).

3.3.5. Random forest (RF)

RF (Breiman, 2001), another method of balancing instability of CART, averages


a committee of fully grown trees. Two mechanisms are used to de-correlate trees
and, consequently, reduce the variance of the predictions: 1) bootstrap sampling
(bagging) creates a different response vector for each tree and, 2) at each node
only mtry < p randomly selected covariates are tested as candidates for binary
splitting. Predictions are simple means of all ntree fitted trees.
Tuning parameters are the number of trees ntree , the minimal number of obser-
vations at terminal nodes nmin and the number of tested covariates mtry at each
split. Tests with five different responses confirmed that tuning ntree and nmin did
not reduce out-of-bag RMSE substantially (Spiess, 2016). Therefore, we used de-
fault values of ntree = 500 and nmin = 5 for all RF fits (R package randomForest,
Liaw and Wiener, 2002). To find optimal mtry we minimized out-of-bag RMSE by
iterating through mtry = 1, 2, .., p.

3.3.6. Model averaging (MA)

The five methods described above likely represent different aspects of the covariates
and can be seen as different means of reducing the high-dimensional covariate in-
put. Hence, combining predictions by several models possibly improves predictive
performance over single methods as large variance of individual models is reduced
through averaging (Hastie et al., 2009, Sect. 8.8). We computed weighted sums
of the predictions by our five digital soil mapping (DSM) procedures with weights
proportional to the inverse cross-validation or out-of-bag RMSE (Appendix, Tables
A.11, A.13 and A.14).

53
3. Evaluation of statistical approaches

3.3.7. Legacy soil map

For Greifensee region a legacy soil map 1:5 000 was available, which reported classes
of clay and gravel content for top- and subsoil and effective soil depth available to
plants (SD, Jäggli et al., 1998). Experienced soil surveyors assigned to each class
or combination of classes a typical value of these soil properties (Nussbaum and
Papritz, 2017), and we used these as predictions when we computed the statistics
for the validation sets (Sect. 3.3.8).
The map defined topsoil by pedogenetic A horizon without indicating a partic-
ular depth. We compared therefore predictions for topsoil to values observed in
0–10 and 10–30 cm depth and predictions for subsoil to observations in 30–50 and
50–100 cm. Inhomogeneous mapping units (complex polygons with multiple soil
units assigned) were excluded from the validation. Since all the sites in the valida-
tion sets had been used to create the map, validation statistics give goodness-of-fit
instead of rigorous validation measures for the legacy map.

3.3.8. Evaluating predictive performance

The accuracy of predictions by the six statistical DSM approaches and the legacy
soil map was evaluated by comparing predicted Ỹ (si ) with observed Y (si ) soil
properties for all locations si of the validation sets. To rate the methods, we used
bias, RMSE and mean squared error skill score (SSmse , Wilks, 2011, p. 359):

n
1X
bias = − (Y (si ) − Ỹ (si )), (3.3)
n i=1
2 1/2
n
!
1X
RMSE = Y (si ) − Ỹ (si ) , (3.4)
n i=1
Pn  2
i=1 Y (si ) − Ỹ (si )
SSmse = 1 − Pn Pn 2 . (3.5)
1
i=1 Y (si ) − n i=1 Y (si )

SSmse has the same interpretation as the R2 which is occasionally reported, with
SSmse = 1 for perfect predictions (RMSE = 0), SSmse = 0 if predictions have the
same variance as the data of the validation set and SSmse < 0 for predictions with
larger variance. Note, however, that some DSM studies report R2 values identical
to SSmse (e.g. Vaysse and Lagacherie, 2015; Viscarra Rossel et al., 2015) while
others report R2 , where R is the Pearson correlation coefficient of Y (si ) and Ŷ (si )

54
3.4. Results and Discussion

(e.g. Behrens et al., 2014; Somarathna et al., 2016). Such R2 values differ, except
for linear models fitted by ordinary least squares, from SSmse . Since computation
of reported R2 is sometimes not clear, we call the statistic SSmse , which makes it
clear that it is a skill score.

3.4. Results and Discussion

3.4.1. Model building

Grouped lasso, robust external-drift kriging (georob) and boosted geoadditive


models (geoGAM) successfully selected strongly reduced sets of covariates. On
average, lasso models had 21, georob 27 and geoGAM only 12 covariates in the
final models. This corresponds to only 3-6 % of all covariates. Boosted regression
trees (BRT) performed weak covariate selection. The stagewise forward algorithm
selected on average 43 % of all covariates (covariates with importance > 0) for its
models. Nonetheless, complexity of BRT models varied quite strongly with 12 %
of covariates selected for the smallest and 86 % for the largest model. The num-
ber of covariates in final lasso, geoGAM, georob and BRT models was positively
correlated over the responses (Pearson correlation between methods 0.43–0.58).
Random forest (RF) included all available covariates in its models (all covariates
with importance > 0). Having models that depend only on a reduced set of the
initial input covariates is desirable, because computing predictions is then less
demanding and interpreting modelled effects of covariates is easier. For three re-
sponses we checked therefore, whether covariate importance (Hastie et al., 2009,
p. 368) can be used to select covariates for RF. We either selected q = 10, 20, .., 50
most important covariates or selected covariates by stepwise recursive elimination
of the least important covariate. For given q, both approaches selected similar sets
(correspondence 60–90 %), and root mean squared error (RMSE) computed with
independent validation data did not change much by the selection. For example
for effective cation exchange capacity (ECEC) 0–20 cm, RMSE increased only by
0.5 mmolc kg−1 for a model with 50 instead of 325 covariates. This increase was
clearly within normal fluctuations of RMSE by bagging and random covariate se-
lection (Spiess, 2016). Brungard et al. (2015) even improved prediction accuracy
of RF by recursive covariate elimination.
Optimal values of mtry were quite large, hence trees were not strongly de-

55
3. Evaluation of statistical approaches

p
correlated. Out of 48 models, 32 tuned fits had mtry > 3
which is the software
default (Liaw and Wiener, 2002). However, the gain obtained by optimizing mtry
was generally small. On average RMSE of models fitted with default mtry were
only 1.015 times as large as the RMSE of models with optimized mtry . The largest
relative benefit of tuning was found for topsoil bulk density of the fine soil fraction
(BD) in ZH forest where optimal mtry reduced out-of-bag RMSE from 0.052 to
0.049 Mg m−3 .
In contrast, BRT profited more from tuning its parameters ntrees and id . In
particular, optimizing ntrees resulted in some reduction of cross-validation RMSE.
69 % of the fits had smaller optimal ntrees than the software default (100). Tuning
ntrees reduced RMSE on average by a factor of 0.941. Optimizing the interaction
depth id (mean optimal value = 10, default = 1), decreased RMSE on average
by a factor of 0.982. Tuning ntrees and id had the largest effect for subsoil ECEC
and pH in ZH forest where cross-validation RMSE was reduced from 47.3 to 40.9
mmolc kg−1 and 0.91 to 0.75 pH units, respectively.
Residual spatial autocorrelation of georob models was much weaker than auto-
correlation of the original responses (Appendix, Tables A.7, A.9 and A.10). Effec-
tive ranges for Greifensee and ZH forest were less than 300 m for most models, and
for Berne effective ranges varied between 1–10 km with rather large nugget effects
(around 50 % of the total sill). Only 5 of 48 final geoGAM models contained a
smooth spatial surface. They seemed often too smooth to represent small-scale
residual spatial autocorrelation (median of effective range of residual variogram:
270 m).
Since cross-validation and out-of-bag RMSE did not vary much between the
five methods, model averaging (MA) weights did in general not differ much from
1/5 (interquartile range of weights: 0.18 - 0.21). Only for subsoil soil organic
matter (SOM, 50–100 cm) cross-validation RMSE of parametric models were larger
compared to BRT and RF and resulted in somewhat larger differences between
MA weights. A complete list of model parameters and MA weights is given in
Tables A.13 and A.14 in the Appendix.
Summing up, lasso, georob and geoGAM and partly BRT effectively selected
relevant covariates from a large set. Reduction of covariates in RF seems – tested
on a few responses – promising. The benefit of tuning model parameters was
sometimes only small, but remained relevant when considering all responses.

56
3.4. Results and Discussion

3.4.2. Evaluation of model performance

General performance

Tables 3.4 and 3.5 report RMSE and mean squared error skill score (SSmse ) of
all models for independent validation data, and Fig. 3.2 summarizes SSmse by
method and study region. Overall, the models accounted only for a moderate part
of the variance of the validation data (median SSmse of best performing method
per response: 0.257). SOM in 10–30 cm soil depth in study region Berne was best
predicted with SSmse of 0.677. Soil properties in Greifensee were in general more
difficult to predict, yielding for some responses negative SSmse for all methods.
For pH in 30–50 cm soil depth, for example, lasso performed “best” with SSmse
of -0.089 which is unquestionably a bad result. In general, topsoil properties were
predicted more accurately than subsoil properties (Fig. 3.3). We are aware of the
limitation that we did not validate the methods with data collected by a random-
ized statistical design (Brus et al., 2011). This is a common drawback if digital
soil mapping (DSM) is based on legacy soil data and thus represents a typical
situation. Other studies that validated DSM methods with independent data on
several soil properties from multiple depths – and likely did not suppress evidence
for poor performance – reported similar R2 values: Negative up to 0.75 (Vaysse
and Lagacherie, 2015), 0.1 to 0.48 (Mulder et al., 2016), 0.6 to 0.68 (Kempen et al.,
2011), 0.36 to 0.52 (Viscarra Rossel et al., 2015) and 0.26 to 0.55 (Adhikari et al.,
2013). Also, these studies found that R2 values of predictions of topsoil properties
were generally larger than R2 related to subsoils.

Performance of methods

There was no method that consistently performed best for all soil properties, soil
depths and study regions. Each of the tested methods (lasso, georob, geoGAM,
BRT, RF) performed best for at least one response, and SSmse varied more strongly
between responses than methods. Although no method consistently outperformed
the others, Fig. 3.2 and 3.3 suggest that the tree-based methods BRT and in partic-
ular RF performed on average best. For 28 out of 48 responses, RF had maximum
SSmse , and it never had minimum SSmse . In contrast, georob and geoGAM most
often fared worst (for 15 and 14 out of 48 responses, respectively) and were best
only for two (georob) and five responses (geoGAM). Lasso ranked between these
two methods and BRT. MA further improved on RF: For 14 out of the 28 responses

57
Table 3.4.: Accuracy of predictions of soil properties by soil depth mapped for Berne study region computed with independent validation data
(RMSE: root mean squared error, SSmse : mean squared error skill score according to Eq. (3.5), lasso: grouped least absolute shrinkage and selection
operator, georob: robust external-drift kriging, geoGAM: boosted geoadditive model, BRT: boosted regression trees, RF: random forest, MA: model
averaging, NA: no convergence of georob algorithm).
depth lasso georob geoGAM BRT RF MA
RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse
clay 0–10 6.698 0.230 5.928 0.396 5.776 0.427 5.897 0.403 6.096 0.365 5.838 0.417
10–30 7.666 0.162 6.974 0.307 7.450 0.209 6.812 0.339 6.638 0.367 6.717 0.352
30–50 9.056 0.090 8.743 0.152 9.990 -0.108 8.733 0.153 8.619 0.175 8.729 0.154
50–100 9.001 -0.002 9.706 -0.165 9.458 -0.106 9.050 -0.013 8.922 0.016 8.871 0.027
silt 0–10 12.954 0.001 12.245 0.107 12.031 0.138 12.19 0.115 11.369 0.230 11.644 0.192
10–30 11.810 0.115 11.606 0.145 11.391 0.176 11.135 0.213 10.493 0.304 10.778 0.266
30–50 14.231 0.143 14.151 0.153 14.163 0.151 14.263 0.139 13.809 0.193 13.701 0.206
50–100 15.604 0.081 15.661 0.074 15.829 0.054 15.108 0.139 15.161 0.136 14.923 0.163
3. Evaluation of statistical approaches

gravel 0–10 2.582 0.129 2.595 0.120 2.567 0.139 2.769 -0.002 2.635 0.113 2.522 0.188
10–30 3.280 0.200 3.277 0.201 3.281 0.199 3.311 0.185 3.299 0.200 3.143 0.274
30–50 4.846 0.207 4.767 0.232 4.462 0.328 4.852 0.205 4.843 0.224 4.641 0.287
50–100 6.146 0.144 6.343 0.088 6.582 0.018 6.367 0.081 6.040 0.173 5.992 0.186
SOM 0–10 4.528 0.634 5.456 0.469 5.137 0.529 5.291 0.501 4.698 0.608 4.742 0.601
10–30 4.167 0.677 4.981 0.539 4.648 0.599 5.235 0.491 4.910 0.554 4.431 0.636
30–50 7.817 0.096 7.627 0.139 9.167 -0.243 7.174 0.239 8.379 -0.025 6.562 0.371
50–100 12.871 -0.015 19.284 -1.279 14.518 -0.296 11.817 0.144 10.629 0.308 9.958 0.392
pH 0–10 0.564 0.549 0.569 0.542 0.547 0.577 0.564 0.549 0.554 0.565 0.536 0.593
10–30 0.601 0.495 0.591 0.511 0.609 0.482 0.616 0.469 0.601 0.494 0.581 0.527
30–50 0.715 0.408 0.762 0.327 0.725 0.390 0.722 0.395 0.691 0.447 0.690 0.448
50–100 0.769 0.425 0.811 0.361 0.791 0.392 0.763 0.434 0.761 0.437 0.728 0.484
SD – 31.413 0.094 32.61 0.023 33.286 -0.017 31.039 0.115 30.543 0.143 31.014 0.117

58
Table 3.5.: Accuracy of predictions of soil properties by soil depth mapped for Greifensee and ZH forest computed with independent validation data (for
description see Table 3.4, legacy map: legacy soil map 1:5 000).

depth legacy map lasso georob geoGAM BRT RF MA


RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse
Greifensee
clay 0–10 6.241 0.206 6.208 0.214 6.208 0.214 6.095 0.243 6.296 0.192 6.129 0.234 5.958 0.277
10–30 6.397 0.293 6.637 0.239 6.662 0.233 6.474 0.276 6.813 0.198 6.575 0.253 6.412 0.289
30–50 8.478 -0.123 7.651 0.085 7.402 0.144 7.488 0.124 7.286 0.170 7.177 0.195 7.129 0.206
50–100 8.972 0.037 8.741 0.086 7.944 0.245 9.356 -0.047 8.183 0.199 8.031 0.228 8.048 0.225
silt 0–10 6.624 0.062 6.375 0.131 7.385 -0.167 6.322 0.145 6.007 0.228 6.225 0.171
10–30 6.676 0.047 6.479 0.102 6.785 0.015 6.360 0.135 6.310 0.148 6.309 0.149
30–50 7.959 0.021 8.512 -0.120 8.160 -0.030 8.429 -0.099 8.071 -0.007 8.039 0.001
50–100 9.189 -0.026 10.006 -0.217 9.817 -0.171 9.253 -0.041 9.091 -0.005 9.251 -0.040
gravel 0–10 6.440 -0.128 5.896 0.059 5.549 0.167 5.431 0.202 5.300 0.240 5.326 0.233 5.218 0.263
10–30 5.831 0.184 6.086 0.116 6.066 0.121 5.335 0.321 5.454 0.290 5.560 0.264 5.438 0.296
30–50 8.655 0.049 8.778 0.027 8.346 0.120 8.089 0.173 7.991 0.193 7.887 0.214 7.945 0.203
50–100 9.811 0.314 11.77 0.018 10.821 0.170 10.402 0.233 10.373 0.237 10.696 0.189 10.407 0.232
SOM 0–10 3.504 0.078 3.210 0.226 3.219 0.222 3.244 0.209 3.202 0.230 3.158 0.251
10–30 3.675 0.028 3.349 0.192 3.455 0.141 3.282 0.224 3.315 0.191 3.258 0.218
30–50 5.838 -0.072 5.599 0.014 5.900 -0.095 5.352 0.099 5.259 0.130 5.481 0.055
50–100 7.536 -0.223 NA NA 11.917 -2.058 6.090 0.201 6.512 0.087 6.620 0.056
pH 0–10 0.714 0.043 0.701 0.077 0.742 -0.035 0.707 0.061 0.700 0.081 0.693 0.097
10–30 0.700 0.078 0.720 0.024 0.718 0.031 0.708 0.056 0.691 0.102 0.683 0.121
30–50 0.751 -0.089 0.810 -0.266 0.830 -0.332 0.790 -0.205 0.752 -0.092 0.756 -0.103
50–100 0.750 -0.085 0.856 -0.412 0.799 -0.228 0.753 -0.092 0.747 -0.075 0.750 -0.083
SD – 11.076 0.763 19.345 0.278 21.009 0.148 20.928 0.155 19.511 0.265 18.820 0.316 18.858 0.314
ZH forest
ECEC 0–20 75.382 0.356 83.040 0.261 74.900 0.365 73.378 0.423 72.548 0.436 72.294 0.440
40–60 55.926 0.240 83.238 -0.683 69.113 -0.160 54.681 0.274 51.369 0.359 54.531 0.278
pH 0–20 0.871 0.406 0.913 0.348 0.928 0.325 0.870 0.407 0.856 0.426 0.839 0.448
40–60 1.122 0.268 1.248 0.093 1.452 -0.227 1.138 0.246 1.093 0.305 1.107 0.287

59
3.4. Results and Discussion

bd 0–20 0.052 0.203 0.048 0.334 0.055 0.128 0.050 0.271 0.047 0.343 0.046 0.389
40–60 0.047 0.283 0.061 -0.221 0.051 0.148 0.045 0.336 0.043 0.400 0.044 0.373
3. Evaluation of statistical approaches

SSmse

-2.0

ZH forest

Figure 3.2.: Boxplots of SSmse (for independent validation data) grouped by


method and study region. Boxplots summarize SSmse values of n = 21 soil proper-
ties for study regions Berne and Greifensee (20 for georob in Greifensee). For ZH
forest SSmse are individually shown for n = 6 soil properties (lasso: grouped least
absolute shrinkage and selection operator, georob: robust external-drift kriging,
geoGAM: boosted geoadditive model, BRT: boosted regression trees, RF: random
forest, MA: model averaging).

for which RF was best, MA resulted in even larger SSmse , and MA was best for
another 9 of the 20 remaining responses. Hence, for 23 out of 48 responses MA
had overall largest SSmse .
Apart from overall accuracy as captured by RMSE and SSmse also bias matters
for choosing a DSM method. In general, marginal bias was small (median bias2 -
to-MSE-ratio < 6 %, Fig. 3.4, Appendix, Table A.12). Bias contributed more to
mean squared error (MSE) when SSmse was small (methods lasso, georob, geo-
GAM, study region Greifensee), except for the tree-based methods RF and BRT
which often had very small bias2 -to-MSE-ratios. BRT had slightly lower bias2 -to-
MSE-ratios compared to RF, confirming that boosting reduces bias in an adaptive
way while bagging in RF lowers only variance but not bias (Hastie et al., 2009,
p. 588). Largest bias2 -to-MSE-ratios were most often found for lasso, and they
were especially large (12 to 17 %, Appendix, Table A.12) for predicting gravel
content in Greifensee and SOM in Berne in 50–100 cm depth. Shrinkage meth-
ods such as lasso trade reduced variance of predictions for increased bias (Hastie
et al., 2009, Chapt. 3). Also RF resulted occasionally in biased predictions, for

60
3.4. Results and Discussion

SSmse

-2.0

Figure 3.3.: Boxplots of SSmse (for independent validation data) grouped by


method and soil depth. Statistics of 0–10 and 0–20 cm soil depths and 20–40
and 30–50 cm were pooled. (SD: effective soil depth available to plants, lasso:
grouped least absolute shrinkage and selection operator, georob: robust external-
drift kriging, geoGAM: boosted geoadditive model, BRT: boosted regression trees,
RF: random forest, MA: model averaging).

example for SOM 30–50 cm in Berne. Conditional bias – distortion of predictions


conditional on the observed values (Wilks, 2011, p. 304) – did not differ between
methods. Predictions were only conditionally biased if overall accuracy was small.
Lastly, we evaluated whether the various methods tended to over-fit the data by
computing differences between cross-validation (CV) or out-of-bag (OOB, RF)
SSmse and independent validation SSmse (Fig. 3.5). Through repeated cross-
validation on the same subsets and choice of tuning parameters with OOB statis-
tics (RF), the cross-validation and OOB SSmse can be considered as conservative
goodness-of-fit SSmse . We interpret positive (negative) differences in the sequel
as indications of over-(under-)fitting, although we cannot exclude that differences
between calibration and validation datasets contributed to discrepancies in SSmse .
In particular, replicated observations from a given site were not always assigned to
the same CV subset, and this possibly contributed to overly optimistic CV or OOB
results. Except for lasso all methods ([b] to [f] in Fig. 3.5) often had larger CV or
OOB than independent validation SSmse . As also found by Liddicoat et al. (2015)
lasso partly under-fitted the data, likely because we penalized the residual sum
of squares by the “optimum plus 1 standard error” rule (Sect. 3.3.1). Why BRT
tended partially to under-fit the data remained unclear. georob and RF tended
to over-fit the data most, and geoGAM was intermediate. For all the methods

61
3. Evaluation of statistical approaches

ratio Bias2 / MSE

Figure 3.4.: Boxplots of bias2 -to-MSE-ratio (for independent validation data)


grouped by method and study region. Boxplots summarize ratios of n = 21 soil
properties for study regions Berne and Greifensee (20 for georob in Greifensee). For
ZH forest ratios are individually shown for n = 6 soil properties (lasso: grouped
least absolute shrinkage and selection operator, georob: robust external-drift krig-
ing, geoGAM: boosted geoadditive model, BRT: boosted regression trees, RF: ran-
dom forest, MA: model averaging).

differences in SSmse were largest for poorly performing models (small SSmse in
independent validation). For georob this was most pronounced. For ZH forest
ECEC (40–60 cm) CV yielded SSmse of 0.70 and independent validation SSmse of
-0.683. Hence, repeated covariate selection steps based on BIC and Wald test tend
to over-fit the data when responses only weakly depend on covariates.

Factors controlling predictive performance

We explored whether characteristics of the (spatial) empirical distributions of the


responses were in some way related to variations of predictive performance ob-
served between responses. We checked whether SSmse and bias2 -to-MSE-ratios
depended on spatial sampling density, skewness, (robust) coefficient of variation,
strength of spatial autocorrelation and tuning parameters of methods (Appendix,
Tables A.6 to A.14), but no clear relationships became evident. Particularly, we
could not find any relationships between predictive performance and strength of au-
tocorrelation as measured by spatially structured variance ratios (1−nugget/silltotal ,
Vaysse and Lagacherie, 2015) or spatial ranges of response variograms.
Only for extremely positively skewed responses (SOM below 30 cm in Greifensee)
we found that BRT and RF were clearly better than lasso, georob and geoGAM,
likely because log-transformation was too weak to fully account for skewness. For
skewness < 2 the advantage of tree-based methods disappeared.

62
3.4. Results and Discussion

-0.4 -0.2 0.0 0.2 0.4 0.6

and
of

-0.4 -0.2 0.0 0.2 0.4 0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

Figure 3.5.: Difference of 10-fold cross-validation and independent validation


SSmse plotted against independent validation SSmse , grouped by method (lasso:
grouped least absolute shrinkage and selection operator, georob: robust external-
drift kriging, geoGAM: boosted geoadditive model, BRT: boosted regression trees,
RF: random forest, MA: model averaging, SSmse < −1 were omitted).

Performance of legacy soil map

RMSE and SSmse of the legacy soil map (Table 3.5) were mostly within the range
of values observed for DSM methods. Only for subsoil gravel (50–100 cm) and
the effective soil depth available to plants (SD) predictions by the legacy soil map
were better than DSM predictions. (Note, however, that RMSE and SSmse of
the legacy soil map are rather goodness-of-fit than rigorous validation measures,
because all data of the validation set had been used to create the soil map.) Vaysse
and Lagacherie (2015) also found that a legacy soil map predicted only SD more
accurately than DSM methods. To create the legacy soil map at a scale of 1:5 000
many auger samples were taken to delineate map units (Jäggli et al., 1998), but
this data was not recorded and therefore unavailable for DSM. This might explain
why the legacy map modelled SD substantially better (SSmse 0.76) than DSM
methods (SSmse 0.15–0.32).

63
3. Evaluation of statistical approaches

3.4.3. Evaluation of covariate relevance

Covariate importance

To characterize “predictive skill” of covariates by topic, we computed weighted


averages of RF covariate importance (Hastie et al., 2009, p. 368), weighing impor-
tance of covariates by validation SSmse (Fig. 3.6). Overall, terrain attributes were
important covariates. For Greifensee they were the main source of information for
modelling soil properties. None of the other covariate groups was able to capture
much of the variation of soil properties for this study region. Likely, this explains
why DSM generally performed poorly here (Sect. 3.4.2) and indicates that the
performance of DSM depends also on regional specific conditions. In the study
region Berne climatic covariates were important for chemical but not for physical
soil properties, particularly in topsoil. Additionally, geology, information on soil,
sampling period and type of data had moderate importance for this study region,
also for physical properties. Similarly, for ZH forest covariate importance differed
between chemical (pH, ECEC) and physical properties (BD). Vegetation was very
influential for modelling pH and ECEC whereas for BD spatial location, sampling
period and type of data were important as well.
Sampling period and type of soil data was important for many responses (le-
gacy data correction, Fig. 3.6). As mentioned also by Mulder et al. (2016) this
emphasizes the necessity to compensate temporal changes and differences in an-
alytics when using legacy soil data. Topsoil pH and SOM in Berne – among the
responses predicted best in this study – were mostly “explained” by maps of mean
monthly and yearly precipitation, a geological overview map, an agricultural suit-
ability map and topographic wetness indices smoothed by different radii (7–60 m).
The geological and soil overview maps were also important for modelling SD in
Berne. Unlike Greifensee, terrain attributes did not contribute much to modelling
SD in Berne. In Greifensee, a map of historic wetlands and distances to water
bodies were in addition to terrain attributes (mostly indicating local depressions)
important for modelling SD. Predictions of physical and chemical properties in
Greifensee relied mainly on vertical and horizontal distance to water bodies, local
topographic indices and curvatures (50–90 m radii) as well as the multi-resolution
valley bottom flatness (MRVBF). For ZH forest by far the most important covari-
ate was the vegetation map accounting for nearly half of predictive skill in topsoil
for pH and ECEC and for one third in subsoil. Terrain attributes were important
for ZH forest predictions on both small (variation of slope in 20 m radius) and
large scale (topographic indices in radii 50 and 125 km).

64
3.4. Results and Discussion

Berne Greifensee ZH forest


clay, silt, gravel bulk density

(a) SSmse= 26.3 (b) SSmse= 23.2 (c) SSmse= 34.3

mean pred. skill [%]


mean pred. skill [%]
topsoil
physical soil properties

A B C D E F G A B C D E F G A B C D E F G

(d) SSmse= 15.3 (e) SSmse= 12.6 (f) SSmse= 40.0

mean pred. skill [%]


mean pred. skill [%]
subsoil

A B C D E F G A B C D E F G A B C D E F G

pH, soil organic matter pH, ECEC

(g) SSmse= 55.5 (h) SSmse= 15.1 (i) SSmse= 43.1

mean pred. skill [%]


mean pred. skill [%]
topsoil
chemical soil properties

A B C D E F G A B C D E F G A B C D E F G

(j) SSmse= 29.8 (k) SSmse= 5.4 (l) SSmse= 33.2


mean pred. skill [%]
mean pred. skill [%]
subsoil

A B C D E F G A B C D E F G A B C D E F G

Covariate theme
(m) (n)
effective soil depth

SSmse= 14.3 SSmse= 31.6


mean pred. skill [%]

A: soil
B: climate
C: vegetation
D: topography
E: parent material
F: spatial coordinates
G: legacy data correction
A B C D E F G A B C D E F G

Figure 3.6.: Mean predictive skill [%] of covariates (weighted averages of covariance importance,
Hastie et al., 2009, p. 368) grouped by covariate theme (see Table 3.3, for legacy data correction
see Sect. 3.2.2). Predictive skill is reported separately for study region (Berne, Greifensee and
ZH forest), top- (0–30 cm) and subsoils (30–100 cm) and type of response (physical, chemical
soil properties, effective soil depth). Mean predictive skill was computed from 30 most important
covariates and summed up by topic. The resulting value was weighed by validation SSmse and
plotted as grey dots for each response. Mean validation SSmse [%] per covariate theme is given
by black horizontal lines (if responses n > 2).

65
3. Evaluation of statistical approaches

Overall, APEX covariates had very small importance (average rank of covariate
importance of 168 for RF and 48 for BRT). Differences of reflectance intensities
between autumn and spring flights and between agricultural land with partly bare
soils and various crops most likely obscured relations between surface reflectance of
vegetation and soil properties. Preprocessing using co-kriging with data from bare
soil areas possibly improves predictive capabilities for the present study regions
(Lagacherie et al., 2012).

Covariate interpretation

Besides studying covariate importance, we evaluated the effects of single covariates


on the responses by using partial effects or dependence plots. Given the large
number of models and covariates we chose a continuous and a factor covariate to
illustrate the effects for one response. Figure 3.7 shows how MRVBF and the factor
for different sampling period and type of soil data (legacy data correction) affected
topsoil clay content (0–10 cm, Greifensee). The effect of MRVBF on clay content
was similar for all five methods. Large MRVBF values point to accumulation sites
in the landscape (Gallant and Dowling, 2003), and such sites often have larger
clay contents. BRT and RF partial dependence plots suggests that the relation
is nonlinear with a sharp transition at MRVBF equal to 4. Patterns of estimated
differences between sampling periods and type of data were similar for the five
methods, which further strengthens the evidence that such differences should be
compensated when one uses legacy soil data.

3.4.4. Mapping

In addition to the reported analysis, we visually inspected the soil property maps
generated by the six DSM methods. Figure 3.8 shows DSM maps of topsoil clay
content (0–10 cm) for a section of the Greifensee study region along with a map
of clay content derived from the legacy soil map. All methods, including the soil
map predicted soils rather rich in clay with clay content > 20 % for most sites
which agrees with available observations (coloured dots, [g] in Fig. 3.8). Modelled
patterns of the maps were similar, but lasso and particularly RF predictions were
very smooth. In contrast, predictions by georob, geoGAM and BRT varied more
with larger clay content on valley bottoms to the east. MA performed best for
this response (SSmse 0.28, Table 3.5) and, being a weighted average of (a) to (e),
showed smoother spatial predictions than georob and geoGAM because RF had

66
3.4. Results and Discussion

partial dependence [%]


MRVBF (DTM 2 m)

MRVBF [-] MRVBF [-] MRVBF [-] MRVBF [-] MRVBF [-]
legacy data correction

partial dependence [%]

Figure 3.7.: Example partial residual plots (e.g. Faraway, 2005, p. 72) for lasso, georob and
geoGAM (panels (a) to (c), (f) to (h)) and partial dependence plots (e.g Hastie et al., 2009,
pp. 369) for tree-based methods (panels (d), (e), (i), (j)) for two covariates that were present in
lasso, georob and geoGAM and had large importance in BRT and RF for the response clay 0–10
cm in Greifensee (MRVBF 2 m: multi-resolution valley bottom flatness [Gallant and Dowling
2003], legacy data correction: factor accommodating sampling period and type of soil data, see
Sect. 3.2.2, L: Laboratory measurements, F: field estimates, lasso: grouped least absolute shrink-
age and selection operator, georob: robust external-drift kriging, geoGAM: boosted geoadditive
model, BRT: boosted regression trees, RF: random forest).

highest model averaging weight (0.24, Appendix, Table A.14). The legacy soil
map predicted for most polygons the class sandy loam to loam with 10–30 %
clay to which we assigned a typical clay content of 20 %, which is less than most
DSM predictions. As typical for polygon maps small areas with deviating clay
content were delineated. The DSM methods were not able to map clay content
with similar detail because calibration data was too scarce (Sect. 3.4.2). According
to the legacy soil map, there were organic soils in depressions (darkgreen polygons,
[g] in Fig. 3.8), but these had not been sampled.
Maps of georob and BRT predictions showed artefacts (single pixels [georob] or
bands [BRT]) with very large predicted values. In the MA map outlying predictions
were smoothed out. Outlying georob predictions were caused by the multiflow
specific catchment area (2 m resolution), an extremely positively skewed terrain
attribute. This covariate was not chosen for the geoGAM model, in lasso its

67
3. Evaluation of statistical approaches

Figure 3.8.: Predictions of clay content [%] in 0–10 cm soil depth computed by six DSM methods
(a to f) on a grid of 20 m resolution and by a legacy soil map 1:5 000 (g) for a section of Greifensee
study region. The legacy soil map (g) predicted texture classes to each of which we assigned
a typical clay content displayed here. For complex polygons the texture class of the main unit
is shown. Dots in (g) depict observations of clay content used for calibrating (a) to (f) and for
creating the soil map (g).

coefficient was strongly shrunken, and BRT and RF do not create extrapolation
errors for extreme values of covariates. The cause of the artefact in BRT was
impossible to spot because the BRT model contained 148 covariates (Appendix,
Table A.14).

68
3.4. Results and Discussion

Besides creating extrapolation errors, parametric methods (lasso, georob, geo-


GAM) predicted physically impossible values (e.g. clay content < 0 % or > 100 %)
that we had to eliminate. In contrast, trees do not extrapolate beyond the range
of observed values of the response when computing predictions.

3.4.5. Practical use of statistical methods

All tested DSM methods were able to process large sets of factors and continuous
covariates. Although RF more often performed best and MA even improved on
that, the advantage measured in validation SSmse was small (Sect. 3.4.2). Hence
other reasons than accuracy might become more decisive for choosing a particular
approach.
In our study residual spatial autocorrelation was weak or short-ranged. For a
response with strong residual autocorrelation a geostatistical approach might still
offer an advantage. The smooth spatial surface of geoGAM is possibly too coarse
to capture short-ranged autocorrelation. BRT and RF include spatial coordinates
as covariates, but if the response depends only weakly on other covariates, spatial
coordinates become overly important. Repeated recursive splitting on coordinates
likely leads to “chessboard type” artefacts.
All methods allowed interpretation of modelled relationships (Fig. 3.7), but a
large number of remaining covariates in a model hinders interpretation of partial
effects or dependencies. The most parsimonious models were chosen by geoGAM
with only 12 remaining covariates remaining on average (lasso: 21, georob: 27).
For BRT and RF a covariate selection scheme would still need to be implemented
and tuned. Preliminary results suggest that this might be well worth the effort
(Sect. 3.4.1). But even without covariate selection, BRT and RF allowed to analyse
the importance of the covariates (Fig. 3.6).
R packages are readily available for all methods used in this study. Lasso and
geoGAM optimize their tuning parameters directly without any further input to
the software while RF and BRT require specification of parameter ranges to be
tested. The number of parameters to tune influences computing times consider-
ably. Using default mtry for RF (Sect. 3.4.1) and coarse grids for finding optimal
BRT parameters (Sect. 3.3.4) might be a good compromise to balance comput-
ing efforts with good predictive performance. Computational effort was especially
large for georob, where there is no established efficient procedure for building mod-
els from large sets of covariates. Lasso, based on a coordinate descent algorithm,

69
3. Evaluation of statistical approaches

built models most quickly (see also Fitzpatrick et al., 2016) while computational
effort for geoGAM model building was quite variable depending mainly on num-
ber of observations and number of covariates selected by the boosting step (Chapt.
2.2).
Moreover, ease of modelling predictive uncertainty is another factor relevant for
choice of a DSM method. In georob uncertainties can be directly derived from
the kriging variances. For RF, conditional quantiles of predictive distributions
can be estimated directly at the cost of a larger memory requirement (R package
quantregForest, Meinshausen, 2015). For lasso, geoGAM, BRT and MA model-
based bootstrapping can be used to simulate predictive distributions (see Fig.
A.3 in Appendix for geoGAM uncertainties for topsoil ECEC), but bootstrapping
involves quite some computational effort. Given the large number of responses
and methods requiring bootstrapping we were not able to compute and evaluate
uncertainty for the presented models within reasonable time.
Responses for DSM are not always continuous soil properties. Binary, multino-
mial (e.g. soil types) or ordinal (e.g. drainage classes) responses are sometimes
relevant. Grouped lasso is available for binary (R package grpreg, Breheny and
Huang, 2015) and nominal responses (R package glmnet, Friedman et al., 2010).
Logistic geostatistical models could be fitted in the generalized linear mixed model
framework (R package geoRGLM, Christensen and Ribeiro Jr, 2002; Diggle and
Ribeiro Jr, 2002; Pringle et al., 2014), but this is practical only for small datasets.
INLA (Integrated Nested Laplace Approximation, Rue et al., 2009; Lindgren
et al., 2011) could be a viable alternative. geoGAM accommodate binary and or-
dinal responses (Nussbaum, 2017), but extension to nominal responses would be
straightforward. Classification for binary and nominal responses are easily fitted
by RF (R package randomForest, Liaw and Wiener, 2002) and BRT (R package
gbm, Southworth, 2015) while ordinal response BRT could be implemented by R
package mboost (Hothorn et al., 2015) with slightly larger effort on model speci-
fication.

3.5. Conclusions

We applied – to a total of 48 soil responses observed in three study regions in


Switzerland – six statistical digital soil mapping (DSM) methods: grouped lasso
(least absolute shrinkage and selection operator), robust external-drift kriging (geo-

70
3.5. Conclusions

rob), boosted geoadditive models (geoGAM), boosted regression trees (BRT), ran-
dom forest (RF) and model averaging (MA). We used 300–500 environmental co-
variates as input to each method. Performance was assessed by comparing model
predictions with independent validation data.
From this study we conclude:

• All methods were successfully building models automatically from large sets
of covariates. The applied ad hoc procedure to find a parsimonious trend
model for georob was however very inefficient.
• Except for lasso, cross-validation and out-of-bag accuracy measures were
sometimes better than actually observed for the validation data. This sug-
gests that the methods partly tended to over-fit the data and underpins the
necessity of model evaluation with independent data.
• The best performing method frequently did not have much larger mean
squared error skill score (SSmse ) than its closest competitors, and the em-
pirical distributions of SSmse did not differ much for BRT, RF and MA
(Figures 3.2 and 3.3). Nevertheless, the frequencies of best and worst per-
formance clearly favoured RF if only one method is used. Applying model
averaging (MA) of several approaches even improves on RF.
• Correcting for sampling period and soil data type by adding a factor to the
models turned out to be important. Legacy soil data is inherently heteroge-
neous for various reasons, but one can (and should) compensate this variation
by careful statistical modelling.

Acknowledgements We thank the Swiss National Science Foundation SNSF for


funding this work in the frame of the National Research Program ”Sustainable
Use of Soil as a Resource” (NRP 68)” and the ”Swiss Earth Observatory Network”
(SEON) for funding aerial surveys using APEX and Sanne Diek for preprocessing
the imagery. The contribution of Michael E. Schaepman was supported by the
University of Zurich Research Priority Program on Global Change and Biodiver-
sity (URPP GCB). Special thanks go to WSL and the cantonal agencies for soil
protection of Zurich and Berne sharing their soil data to make this study possi-
ble. Moreover, we are grateful to soil surveyors Peter Schwab, Martin Zürrer and
Alexander Lehmann for their support to compare the legacy soil map with DSM
results and to Sudan Tandy for the language improvements.

71
4. Pedotransfer function to predict
density of forest soils

Chapter 4 was published as short communication: Nussbaum, M., Papritz, A., Zimmerman, S.
and Walthert, L.: Pedotransfer function to predict density of forest soils in Switzerland, Journal
of Plant Nutrition and Soil Science, 179, 321–326. doi: 10.1002/jpln.201500546, 2016

Abstract

Soil density is an important soil property, but respective measurements are usually
scarce. With data from 559 mineral soil horizons (134 sites) we developed a linear
regression pedotransfer function (PTF) for the density of forest soils (sieved to ≤
2 mm). The field estimate of density was the most important covariate. RMSE
of 0.205 Mg m−3 and R2 of 0.67, calculated on independent data (131 horizons),
were better than the statistics obtained by published, recalibrated PTF (RMSE
0.271–0.324 Mg m−3 ; R2 0.28–0.42).

4.1. Introduction

Data on soil density ρ is required inter alia for soil hydrological modelling (e.g.
Teepe et al., 2003) and for converting gravimetric to volumetric content of soil
constituents (e.g. Baritz et al., 2010). Collecting samples for soil density mea-
surements is cumbersome and expensive. Density data is therefore notoriously
scarce in soil datasets, and many pedotransfer functions (PTF) have been pro-
posed to approximate ρ by other soil properties (see De Vos et al., 2005, for a
review).
Most PTF use soil organic matter (SOM) or carbon content (SOC) and/or parti-
cle size fractions as covariates. Including additional covariates (Martin et al., 2009;

73
4. Pedotransfer function to predict density of forest soils

Jalabert et al., 2010) or stratification by land use (Martin et al., 2009; Jalabert
et al., 2010), soil type or geology (Hollis et al., 2012; Katterer et al., 2006; Vasilin-
iuc and Patriche, 2015) often enhances the precision of predictions of ρ. However,
such extended PTF cannot be applied in other regions where specific covariates
are missing. Also simple PTF that depend only on common covariates perform
worse when applied in other regions or for other land uses (De Vos et al., 2005;
Vasiliniuc and Patriche, 2015). De Vos et al. (2005) therefore recommended to
re-calibrate PTF with data of the target area. Re-estimation of model parameters
with data of the target area mainly reduces bias, but the precision might still be
poor (Chapter 5, De Vos et al., 2005; Jalabert et al., 2010).
Most PTF for ρ are linear regression models in which nonlinear relationships are
accounted for by transforming response and/or covariates (De Vos et al., 2005).
Ruehlmann and Körschens (2009) modelled a nonlinear relationship. More ad-
vanced approaches to build PTF for a larger set of covariates include Random
Forest (Jalabert et al., 2010), boosted regression trees (Martin et al., 2009) or
neural networks (Al-Qinna and Jaber, 2013). These methods sometimes (Mar-
tin et al., 2009; Al-Qinna and Jaber, 2013) but not always (Tranter et al., 2007)
outperform linear regression. Linear regression often yields parsimonious, pedolog-
ically interpretable models that are straightforward to apply and are hence more
readily accepted by pedologists. This study aims to
1. develop a parsimonious, easily applicable PTF to estimate ρ for Swiss forest
soils, and to
2. evaluate its predictive performance in comparison with published PTF.
The study is confined to mineral forest soils as for the time being harmonized soil
density measurements are only available for this land use in Switzerland. Develop-
ing a separate PTF for forest soils is further justified because forest soils are often
less dense than arable soils (Teepe et al., 2003) and arable topsoils are strongly
influenced by agricultural management (Hollis et al., 2012).

74
4.2. Materials and methods

4.2. Materials and methods

4.2.1. Data

Soil database

We used data of 168 forest soil profiles, studied by the Swiss Federal Institute for
Forest, Snow and Landscape Research (WSL) in various surveys over the past 30
years (mostly 1990–2000). These soil profiles range from 300 up to 2’500 m above
sea level and reflect the diversity of forest soils in Switzerland (Walthert et al.,
2004; Blaser et al., 2005; Zimmermann et al., 2006). Soil sampling and laboratory
analyses were done by genetic soil horizons. Data of 134 soil profiles (559 horizons)
were used to develop the PTF (calibration set). The accuracy of the PTF was
then evaluated by comparing predictions of soil density with measurements for
the remaining 34 profiles (131 horizons, validation set). To split the data into
calibration and validation sets, we stratified the data by the five main ecoregions
of Switzerland (Gonseth et al., 2001) and chose the validation sites within these
regions randomly.

Soil density

Three soils samples of fixed volume (1000 cm3 ) were collected from each soil horizon
with steel cylinders. In very stony soils steel cylinders could not be used. Instead,
we collected a single soil sample by a spade and measured volume of the sample
(around 3000 cm3 ) by filling the cavity by quartz sand. The weight of the samples
was recorded after drying for 48 hours at 105◦ C. The volume of coarse fragments
(diameter > 2 mm) was quantified by the water displacement method. This study
refers to soil density as the density of the fine soil fraction (diameter ≤ 2 mm),
calculated by assuming a density of 2.65 Mg m−3 for of the coarse fragments
(Walthert et al., 2004, p. 702). Soil density values were averaged per horizon for
the statistical analyses.

Covariates for statistical modelling

Besides soil density, the following data was available per soil horizon (for details on
of field and laboratory methods, see (Walthert et al., 2004, 2010, pp. 693): Sam-
pling depth, field estimate of soil density (penetration resistance of blade, ordinal

75
4. Pedotransfer function to predict density of forest soils

covariate with 5 levels), field estimate of volumetric content of coarse fragments


(particles with size > 2 mm), pH, sand, silt and clay fractions, and soil organic
carbon (SOC) content.
In addition to soil horizon data, general site information was also available:
hydro-morphologic classification of the soil (ordinal covariate with 9 levels), slope
measured at the profile and — as detailed geological information was missing — an
overview soil map (1:200 000, BFS, 2001), modified for forest soils as in Chapter
5 (nominal covariate with 30 levels). This map defines units with similar geomor-
phological and climatic conditions for soil formation. Other possibly important
covariates (e.g. Munsell soil colour, Oyama and Takehara, 1993) were not used
because they were not available for all soil horizons with soil density measurements.

4.2.2. Statistical analysis

We developed the PTF with the calibration data by the following steps:

1. Positively skewed covariates were transformed by square root or natural loga-


rithm. Using the regression model that included all covariates (”full” model),
we checked whether a Box-Cox transformation (Box and Cox, 1964) of the
response was needed. This was not the case, and we used therefore the un-
transformed density data. Sparsely populated levels of categorical covariates
were merged a priori based on expert knowledge.
2. The ”full” model was fitted robustly (M-estimate, e.g. Faraway, 2005, p. 99)
and by ordinary least squares (LS, e.g. Faraway, 2005, p. 11) to the data.
As we found no clear differences between the two fits, we used LS procedures
to select the covariates.
3. Starting from the ”full” model we eliminated nonrelevant covariates by step-
wise backward selection, accepting thereby a slight increase of the root mean
squared error (RMSE) in 10-fold cross-validation (e.g. Faraway, 2005, p. 139)
in favour of a more parsimonious model (e.g. increase of cross-validation
RMSE from 0.195 to 0.197 by removing a categorical covariate describing
the limitation of rooting depth). We preferred cross-validation over criteria
such as AIC (e.g. Hastie et al., 2009, p. 239) for model selection because
cross-validation gives a direct measure of the precision of predictions for new
data.
4. We used the grouped LASSO (least absolute shrinkage and selection oper-

76
4.3. Results

ator, Hastie et al., 2009, Sect. 3.8), an algorithm that likely finds relevant
covariates for linear models, to check by cross-validation whether adding
first order interactions between covariates improved the fit, which was not
the case.
5. The levels of categorical covariates were merged based on partial residual
plots (e.g. Faraway, 2005, p. 72) and cross-validation to obtain a final par-
simonious linear model.
6. The parameters of the final model were estimated again robustly (M-estimate),
and an optimal value of the robustness tuning constant was chosen by cross-
validation.
7. The relative importance (Groemping, 2006) of each covariate was evaluated
by decomposing the ”goodness-of-fit” R2 . The data were weighed by the
“robustness weights” of the M-estimate (step 6) when computing the impor-
tance measure (Lindeman et al., 1980, p.119).

We compared the precision of predictions of soil density by our and published


PTF (De Vos et al., 2005; Ruehlmann and Körschens, 2009) with measurements
of the validation set. Coefficients of published PTF were re-estimated with the
calibration data for this comparison. Moreover we compared our PTF with the
PTF in Chapter 5 that had been developed with a subset (441 horizons, 84 profiles)
of the data used in this study. Mean error (ME) and RMSE were used to rate the
predictions (McBratney et al., 2011). All statistical computations were done in R
(R Core Team, 2015).

4.3. Results

Table 4.1 lists, separately for the calibration and validation sets, descriptive statis-
tics of soil density and of horizon-specific covariates included in the PTF. Soil
samples had been collected predominantly from the topsoil (41 % sampled above
0.2 m). The mean density is therefore quite small (≈ 1 Mg m−3 ) for both datasets.
The standard deviation is similar while the range of values of the calibration set is
wider than in the validation set going as low as 0.15 Mg m−3 and as high as 1.84
Mg m−3 .
The model building procedure selected six covariates of which two were cate-
gorical (Table 4.2): Our PTF hence predicts soil density ρ of the fine soil fraction

77
4. Pedotransfer function to predict density of forest soils

Table 4.1.: Descriptive statistics of bulk density and of covariates used for the PTF, reported
separately for calibration (cal.) and validation (val.) sets.
Country/Area Switzerland, forest soils
Soil types mineral soils, various types
Depth intervals 0-150 m
(applicable)

response data covariates


density SOC sample depth slope coarse fragments
cal. val. cal. val. cal. val. cal. val. cal. val.

Units Mg m−3 Mg m−3 g kg−1 g kg−1 m m % % vol. % vol. %


N 559 131 559 131 559 131 559 131 559 131
Min 0.15 0.41 0.000 0.000 0.005 0.001 0.000 0.000 1.000 1.000
Max 1.84 1.74 341.560 189.600 2.600 3.100 90.000 92.000 87.500 87.500
Median 0.98 1.00 12.495 12.780 0.300 0.300 30.000 27.000 5.000 5.000
Mean 0.96 1.00 32.005 21.759 0.445 0.456 32.771 36.191 17.241 14.588
St.Dev. 0.35 0.36 46.905 30.599 0.453 0.509 25.902 31.474 23.024 20.545

with particle size ≤ 2 mm by


√ √
ρ = 0.948 − 0.002 c + 0.257 d − 0.025 s − 0.002 r + αk Ik (f ) + βi Ii (o) (4.1)

with c denoting SOC content (g kg−1 ) of a soil horizon, d its mean depth (m), r
its percentage of coarse fragments > 2 mm (vol.%), s the slope at the soil profile
(%), αk the coefficient of the category k of the field density estimate f and βi the
unit i of the overview soil map o with coefficients according to Table 4.2.
The field density estimate f was the most important covariate followed by the
overview soil map o, SOC content c and the mean sampling depth d (Fig. 4.1).
The optimal tuning constant of the robust fit was equal to 3. Thus, a moder-
ately robustified fit of the model was slightly better than a customary LS (cross-
validation RMSE 0.197 Mg m−3 for robust and 0.198 Mg m−3 for LS). Cross-
validation had low bias (ME of 0.04 Mg m−3 ) and R2 was 0.684.
Figure 4.2 shows observed soil density plotted against respective predictions for
the independent validation set. The solid line of the loess scatter plot smoother
(Cleveland, 1979) is close to the 1:1 line over the full range of variation. Hence
our PTF does not suffer from a conditional bias. The marginal bias (ME) is also
small (Table 4.4). For the validation set, our PTF explained 67 % of the variation
of the measurements and had a RMSE of 0.205 Mg m−3 . Cross-validation ME,
RMSE and R2 for the calibration data were close to these values. This suggests

78
4.3. Results

30
25
% of response variance
20
15
10
5
0

field density

soil map

SOC

sqrt(depth)

sqrt(slope)

coarse fragm.
Figure 4.1.: Barplot of relative importance of covariates for PTF of Eq. 4.1
(line segments: 95 % confidence intervals of model-based bootstrap with 1000
repetitions, Groemping 2006, p. 14; SOC: soil organic carbon; sqrt: square
root).

n = 131

1.5
observed density [Mg m−3]

1.0

0.5

0.5 1.0 1.5


−3
predicted density [Mg m ]

Figure 4.2.: Scatter plot of observed against predicted density of the fine
earth in the mineral soil, computed with PTF of Eq. 4.1 for the validation
set (solid line: loess scatter plot smoother, n: number soil samples).

79
4. Pedotransfer function to predict density of forest soils

Table 4.2.: Coefficients β of PTF with standard errors se (SOC: soil organic
carbon, for codes of overview soil map, see Table 4.3 and Appendix, Table
A.16).
covariate β se

Intercept 0.948 0.041


SOC −0.002 0.0002
sqrt(sampling depth) 0.257 0.038
sqrt(slope) −0.025 0.005
coarse fragments (>2 mm) −0.002 0.0005

field density estimate


codes 0.6, 0.7 Mg m−3 (reference) 0.0
code 1.00 Mg m−3 0.151 0.024
codes 1.35, 1.55 Mg m−3 0.362 0.032

overview soil map


A, B, C, D, E, G (reference) 0.0
F, H, DEC, MOR 0.055 0.034
J, Q, R, Z −0.114 0.048
K, L, MOW −0.047 0.036
M, N, O, P, X 0.094 0.057
S, T −0.091 0.047
U, Fly −0.033 0.034
V, Uv, Y −0.131 0.034
W −0.199 0.041

that the model did not over-fit the data.


ME of predictions of soil density computed by published, re-calibrated PTF for
the validation set, were small, too (Table 4.4). However, RMSE (0.271–0.324 Mg
m−3 ) of these PTF were about one third larger than RMSE of our PTF, and only
36 % to 42 % of the variation in the observed data was explained by these models.

4.4. Discussion

Model building resulted in a parsimonious linear regression model whose coeffi-


cients have straightforward interpretation. The field estimate of soil density – a
quantity easy to record – was the most important covariate (relative importance
27 %). On its own it accounted for about 40 % of the variation (see PTF in Chap-
ter 5, Table 4.4). The coding of this covariate can be further simplified because the
median soil density did not differ for the two lowest levels (0.6 and 0.7 Mg m−3 ).

80
4.4. Discussion

Table 4.3.: Description of the aggregated physiographic units of the overview soil map (based
on Frei et al., 1980).
aggregated description of the physiographic prevalent geology
units units

A, B, C, D, Table Jura, Plateau Jura, Jura chains limestones, marl, valleys partly filled
E, G including valleys with calcareous tertiary sediments,
moraines, loess or recent alluvium

F, H, DEC, plains on lower Central Plateau, Mo- calcareous sediments of (fluvio)glacial


MOR lasse hills at low altitude partly cov- origin or predominantly fine tex-
ered by moraines, old gravel terraces, tured tertiary sediments (Molasse)
moraines formed before last glaciation partly covered/mixed with calcareous
moraines of different age

J, Q, R, Z fluvial valleys on Central Plateau, recent alluvial sediments or post glacial


northern and southern Alpine valleys deposits

K, L, MOW Molasse hills at intermediate altitude predominantly calcareous sandy or


partly shaped by glaciers, Drumlin silty Molasse, calcareous moraines of
landscapes with marked relief, moraine different age
of last glaciation

M, N, O, P, Molasse hills with marked relief shaped predominantly calcareous conglomer-


X by erosion, northern and southern ates highly variable in texture and in
Alpine molassic foothills partly cov- stone content
ered by moraines

S, T Alpine Flysch and Buendner schists mostly calcareous Flysch sediments


highly variable in texture and in stone
content, calcareous Buender schists

U, Fly Alpine limestone mountains, Flysch mostly hard limestone partly layered
tending to form wetlands with marl, soft waterlogged Flysch for-
mations

V, Uv, Y Alpine mountains of crystalline base- crystalline rocks like granite, or-
ment, Permian conglomerates, fluvial thogneiss or gabbro; hard and carbon-
valleys in Ticino ate free Permian conglomerates (Ver-
rucano); alluvium with carbonate free
sediments

W Alpine mountains of crystalline base- crystalline rocks like paragneiss or


ment consisting of relatively easily greenschist
weatherable rock

81
4. Pedotransfer function to predict density of forest soils

Table 4.4.: Mean error (ME), root mean squared error (RMSE) and coefficient of determination
(R2 ) of predictions of soil density for the data of the validation set. Predictions were computed
by Eq. 4.1, PTF A to E of De Vos et al. (2005) and PTF proposed by Ruehlmann and Körschens
(2009) and Chapter 5 (ρ:soil density of the soil fraction with particle size ≤ 2 mm, ai : coefficients
of published PTF re-estimated by least squares with the calibration dataset n = 559, SOC: soil
organic carbon, ln: natural logarithm).
Structure ME RMSE R2
[Mg m−3 ] [Mg m−3 ] [-]

independent validation (n=131) Eq. 4.1 0.031 0.205 0.670


A ρ = ai + bi ln(SOC) 0.014 0.295 0.313

B ρ = ai + bi SOC -0.008 0.293 0.325
C ρ = 1/(ai + bi SOC) 0.099 0.321 0.283
D ρ = ai +aj ln(SOC)+ak ln(SOC)2 0.001 0.291 0.329
E ρ = ai + aj SOC + ak clay + -0.004 0.308 0.265
al clay 2 + am silt
Ruehlmann and Körschens (2009) ρ = ai eSOC 0.000 0.324 0.376
Chapter 5 median measured density assigned 0.012 0.271 0.419
to levels of field estimate of density

As found in other studies (see De Vos et al., 2005) density decreases with increas-
ing SOC content. Increase of soil density with increasing soil depth aligns with
pedological knowledge. Furthermore, the presence of coarse fragments counteracts
natural compaction of the fine soil material due to its supportive matrix. In steep
terrain the tendency towards less dense soils might partly be caused by the overes-
timation of vertically measured sampling depth. Soil density differed also between
the (partly merged) categories of the overview soil map although the map rep-
resents the very diverse pedogenetic conditions of Swiss forest soils only coarsely
(Table 4.3). Compared to the Jura region (reference category) where soils develop
mostly on limestone, old glacial deposits and tertiary sediments carry denser and
glacial deposits from the last glaciation, Alpine limestone and crystalline bedrock
less dense soils.
As shown by Martin et al. (2009), Ruehlmann and Körschens (2009), Jalabert
et al. (2010), Hollis et al. (2012) and Vasiliniuc and Patriche (2015) including
additional covariates, specific to a particular dataset, results in more precise pre-
dictions. The trade-off is reduced applicability of the PTF compared to simple
regression functions based on SOC and/or texture only. Considering the improve-
ment of predictive power (doubled R2 and reduction of RMSE by 1/3) using such
covariates appears justified.
Our PTF did not suffer from conditional bias as the one by Jalabert et al.
(2010). The (externally validated) precision of our PTF compares well with the

82
4.4. Discussion

precision of Jalabert et al.’s PTF for forest soils. For the best performing model,
they found (by external validation) R2 of 0.668 and RMSE of 0.168 Mg m−3 .
Their smaller RMSE can be explained by the larger variation of soil density in our
dataset (interquartile ranges of 0.57 and 0.4 in ours and in Jalabert et al.’s data
set, respectively). De Vos et al. (2005) reported ”goodness-of-fit” RMSE of 0.16
Mg m−3 and R2 of 0.59 for the best performing re-calibrated PTF for forest soils,
but less favourable results might be expected for an external validation.

Acknowledgements Research reported in this article was supported by the Swiss


National Science Foundation under grant number 406840 143096.

83
5. Estimating soil organic carbon
stocks of Swiss forest soils by
robust external-drift kriging

Chapter 5 was published as research article: Nussbaum, M., Papritz, A., Baltensweiler, A. and
Walthert, L.: Estimating soil organic carbon stocks of Swiss forest soils by robust external-
drift kriging. Geoscientific Model Development, 7, 1197-1210, doi: 10.5194/gmd-7-1197-201,
http://www.geosci-model-dev.net/7/1197/2014/gmd-7-1197-2014.html, 2014.

Abstract

Accurate estimates of soil organic carbon (SOC) stocks are required to quantify
carbon sources and sinks caused by land use change at national scale. This study
presents a novel robust kriging method to precisely estimate regional and national
mean SOC stocks, along with truthful standard errors. We used this new approach
to estimate mean forest SOC stock for Switzerland and for its five main ecoregions.
Using data of 1033 forest soil profiles, we modelled stocks of two compartments
(0–30, 0–100 cm depth) of mineral soils. Lognormal regression models that ac-
counted for correlation between SOC stocks and environmental covariates and
residual (spatial) auto-correlation were fitted by a newly developed robust re-
stricted maximum likelihood method, which is insensitive to outliers in the data.
Precipitation, near-infrared reflectance, topographic and aggregated information
of a soil and a geotechnical map were retained in the models. Both models showed
weak but significant residual autocorrelation. The predictive power of the fitted
models, evaluated by comparing predictions with independent data of 175 soil
profiles, was moderate (robust R2 = 0.34 for SOC stock in 0–30 cm and R2 =
0.40 in 0–100 cm). Prediction standard errors (SE), validated by comparing point
prediction intervals with data, proved to be conservative.

85
5. SOC estimation by robust external-drift kriging

Using the fitted models we mapped forest SOC stock by robust external-drift
point kriging at high resolution across Switzerland. Predicted mean stocks in
0–30 cm and 0–100 cm depth were equal to 7.99 kg m−2 (SE 0.15 kg m−2 ) and
12.58 kg m−2 (SE 0.24 kg m−2 ), respectively. Hence, topsoils store about 64 % of
SOC stocks down to 100 cm depth. Previous studies underestimated SOC stocks
of topsoil slightly and those of subsoils strongly. The comparison further revealed
that our estimates have substantially smaller SE than previous estimates.

5.1. Introduction

Greenhouse gas (GHG) reporting for the sector “LULUCF – Land Use, Land-Use
Change and Forestry” of the United Nations Framework Convention on Climate
Change and the Kyoto Protocol requires national estimates of soil organic carbon
(SOC) stock. SOC stock estimates are needed as baseline and for quantifying car-
bon (C) sources and sinks caused by land use change. Switzerland, as an example,
uses a Tier-2 approach (IPCC, 2003) for SOC stock changes due to conversion
between settlements, wetlands, forest-, crop-, grassland and other land cover types
(FOEN, 2012a). Respective estimates are reported for the five ecoregions Jura,
Central Plateau, Pre-Alps, Alps and Southern Alps and for the whole country
(Fig. 5.1, Brassel and Lischke, 2001).

ecoregion
Jura
Central Plateau
Pre-Alps
Alps
Southern Alps
600 - 1200 m
< 600 m

> 1200 m
altitude

Data Sources:
LFI ecoregions © 2001 WSL
±
1:4'500'000
Forest: SilvaProtect-CH © 2008 BAFU
Swiss Boundary, Lakes © 2012 BFS GEOSTAT 0 25 50 100 km
Boundries Europe: NUTS © 2010 EuroGeographics

Figure 5.1.: Ecoregions of Switzerland, stratified by altitudinal class.

86
5.1. Introduction

Mean SOC stocks were estimated by various approaches in previous studies: the
simplest is to use the arithmetic mean of the available SOC stock data as a national
estimate (Weiss et al., 2000). “Class-matching” (CM) estimates mean stocks for
bioclimatic (Chiti et al., 2012), land use or soil map strata (Xu et al., 2011; Krogh
et al., 2003) or intersections thereof (Arrouays et al., 2001; Lettens et al., 2004,
2005a; Meersmans et al., 2009) and combines them for a national estimate by
formulae for stratified random sampling (STR). Hence, CM capitalizes on benefits
of spatial stratification, but Perruchoud et al. (2000) demonstrated that respective
gains may be small.
Perruchoud et al. (2000) and other authors (e.g. Leifeld et al., 2005; Meersmans
et al., 2008, 2011, 2012a) used linear models (LM) to relate SOC to covariates
characterizing soil formation by climate, vegetation, topography, geology and land
management. More recently, Grimm et al. (2008); Martin et al. (2011) and Wies-
meier et al. (2011) used nonlinear machine learning (ML) methods to the same
end. Such statistical modelling of SOC needs covariates that are available contigu-
ously in space because mean stocks are estimated by averaging point predictions
done by LM or ML for the nodes of a fine-meshed grid over a region of interest,
which is equivalent to a discrete approximation of the geostatistical block kriging
approach (e.g. Gotway and Young, 2002). The restriction to contiguous spatial
covariates as only predictors generally limits the predictive power of fitted models
seriously. Accurate spatial information on important, soil-related, SOC control-
ling covariates (pH, clay content, reactive aluminium and iron, mineral surface
charge density, soil temperature and moisture; Schmidt et al., 2011) is commonly
unavailable.
If linear and ML models fit SOC data only poorly then spatially structured
variation in data becomes likely apparent as residual spatial autocorrelation. Be-
sides ordinary kriging (Mishra et al., 2009), regression kriging (RK, Hengl et al.,
2004), a variant of external-drift kriging (EDK), was used by Mishra et al. (2010,
2012) and Kumar et al. (2012) to map SOC stock at regional scale. Mishra et al.
(2010, 2012) demonstrated that RK indeed improves on LM and CM by exploiting
autocorrelation when computing SOC predictions. Considering autocorrelation
is also essential for unbiased significance testing of hypotheses on relations be-
tween SOC and environmental covariates. Many studies that built statistical SOC
models based on significance testing (e.g. Leifeld et al., 2005; Meersmans et al.,
2008, 2012a; Wiesmeier et al., 2013) neglected autocorrelation. The studies by
Perruchoud et al. (2000) and Wiesmeier et al. (2012) are here notable exceptions.

87
5. SOC estimation by robust external-drift kriging

Besides precise SOC estimates, standard errors of national SOC stocks are
needed for GHG inventories, e.g., to test the statistical significance of estimated
stock changes (Lettens et al., 2005a,b; Meersmans et al., 2009, 2011). Quantifi-
cation of uncertainties of spatial mean stock estimates (and changes) is straight-
forward for CM, where STR formulae can be employed. However, care is needed
when mean stock estimates are obtained by averaging LM, EDK or ML point
predictions: the point prediction errors for the nodes of the prediction grid are
then mutually correlated. This is true even if there is no residual autocorrelation
because predictions are computed from the same set of fitted parameters. Thus,
ignoring the correlation of fitted regression coefficients of LM as in Meersmans
et al. (2008, 2011, 2012b, personal communication, 2013) likely biases the stan-
dard errors (SE) of estimated mean stocks. If there is residual autocorrelation
then the correlation of prediction errors at adjacent nodes of the prediction grid
is stronger. Neglecting residual autocorrelation biases SE of estimated mean SOC
stocks even more.
The truthfulness of reported SE is best checked with independent validation
data, along with the actual precision of mean stock estimates. We are currently
not aware of any study that validated modelled SE of stock estimates. As pointed
out by Minasny et al. (2013) only few studies (Mishra et al., 2009, 2010, 2012;
Wiesmeier et al., 2011) tested the precision of the estimates with independent
data. Grimm et al. (2008); Martin et al. (2011) and Meersmans et al. (2012b)
used cross-validation to the same purpose, which is clearly better than merely
reporting notoriously over-optimistic goodness-of-fit R2 values as done in most
studies.
The choice of transformations for SOC data is a further issue that requires
some care. Statistical inference for CM, LM and EDK relies on the assumption of
normally distributed errors with constant variance. Frequently, this assumption
is violated by SOC data as empirical distributions are often positively skewed
(Minasny et al., 2013) and their dispersion increases with the mean (e.g. Mishra
et al., 2009, 2010; Chiti et al., 2012; Kumar et al., 2012; Wiesmeier et al., 2012,
2013). SOC data should then be log-transformed for statistical analyses as in
Mishra et al. (2010); Kumar et al. (2012) and Wiesmeier et al. (2012). Neglecting
data transformations (Meersmans et al., 2009; Chiti et al., 2012) will likely affect
stock estimates only mildly but will invalidate reported SE. Another error is to
fit LM to untransformed SOC stocks, but at the same time to assume that the
prediction errors have constant relative dispersion (Meersmans et al., 2011).

88
5.1. Introduction

Last but not least, outliers are a common nuisance in SOC datasets (Meersmans
et al., 2008; Mishra et al., 2009; Martin et al., 2011; Chiti et al., 2012; Wiesmeier
et al., 2012, 2013). In most instances, they are genuine observations that do not
follow the “majority pattern” of a dataset. A common but suboptimal recipe
is to exclude such observations (Chiti et al., 2012) from the analyses. Outlier
deletion biases statistical inference if not properly taken into account (Maronna
et al., 2006, chap. 1). A better approach is therefore to use robust methods that
are insensitive to outliers. Excepting Martin et al. (2011) and Wiesmeier et al.
(2011), who used nonparametric tree-based methods, no robust procedures were
used so far to estimate mean SOC stocks.
This review shows that there is scope to improve on previously used statistical
methodology for estimating SOC stocks at the regional and national scales. When
estimating SOC stocks stored in the mineral soil of Swiss forests, our objectives
were therefore,
i. to employ a statistically sound, robust lognormal EDK approach that ac-
counts for dependence of SOC stock on environmental covariates and auto-
correlation;
ii. to fit such models for mapping SOC stocks of two compartments (0–30 cm, 0–
100 cm depth) of the mineral soil with 100 m spatial resolution across Switzer-
land;
iii. to rigorously validate both precision of predictions and truthfulness of mod-
elled SE with independent validation data, and
iv. to compute reliable estimates and associated SE of mean stocks for whole
Switzerland and its ecoregions – stratified further by altitude into the groups
≤ 600 m, 600–1200 m, > 1200 m above sea level – by robust lognormal block
EDK.
The present study is confined to mineral soils under forest. Comprehensive, har-
monized and georeferenced SOC data is for the time being available only for this
land use. Data on SOC stock stored in organic layers of Swiss forest soils is at
present too scarce to allow for a similar analyses, and comprehensive legacy data
on SOC stocks of Swiss crop- and grasslands will become available in the future
only, as this data is currently being digitized and geo-referenced (Nationale Boden-
beobachtung Schweiz, 2014). Nonetheless, mineral forest soil stocks are important
for GHG reporting because forest cover 45.5 % of the vegetated area of Switzer-
land (Hotz et al., 2005). Furthermore, the currently available stock estimates for

89
5. SOC estimation by robust external-drift kriging

topsoils suggest that forest soils store 1.5 times more organic carbon (OC) than
cropland and still 1.2 times more SOC than grassland soils (FOEN, 2012a). Lastly,
Martin et al. (2011) showed for France that forest SOC stocks are more variable
than stocks on cultivated land. These figures underpin the importance of forest
SOC stock, which, in our view, justifies a separate analysis of the respective data.

*
#
! !#
*
# **
#
* # **
#
*
#*
#
*
#
! # *# *# *
#
* *
# *
# *
#
*!!
# # * # *
#
*!#
# *#
# *
# *
#*#
#
*
# *
# **#
# *#*
*#
#
*!
#
**#
# *
# * *
#
!
*
# *
#*#
# ! *
# *
# **
# ! *
# ! * #
*
#
*
#
# **
# *
* #
# **
*#
!#
*#
# *#*
#
**#
# #
*#
*
*
*#
*
#
! *
!#*!!#*
!*
#
*
#*
#
*
#
!!#
!
! *
! ! ! # *!*
#** #
*
#
# *! # **
# ! ! *
#
!#*# *#
#
*
# *
*
#
#
*
#
* #
# **
#!#
**
#
! ! !# * *#
# * !# *
!
!
*
#
*
#*
#*
#
*
#
!!
*
#
! ! ! *#
# * # *#**
#
*
#!#**#*
*
# *
#!*
#*
# ! *
# *
#
*!#
# !*!*
# *#
*#
# *
* ! #
#
*!*
!#
! *#*
*
#! ! # *#
#
** #
* * # * *
#*#
#
! *
*
#*#
# *! ! !
#
*
*#
# ! !# *#
*
*#
# ##
*#
* *
# * #
# * #
# *#*
*
# !#
*#
#
**#
*
**! # *#
* *
*
# *
# *
# *#
*
#**#
#*#
*
#
*
# *#*#
*#
#*# *#
#
*
#
* !#
#
*
#
**
#*
#
*
* #
# *# *
*#
#
*
# *#
*# *#
*#
*
#
*
#*! #
# * ! !
* *
# *
# *
#*
# *
# ! !#
**! !
#
* #
# *# *#* *#
*!
##**#
# * ! !# *! !
*
# ! #
*!#* ! !
*
#*
#
! #
*#* *
#
*
#
*
#
!#
* *
# ! ! ! ! *
# *#
# * *#
#*
*
#
*
#
*
# *
#
*
# *
# *#
#
*
#* *
#
* ! #
# *
#*
# *
# *
#
*
#*
# *
# *
# ! *
# ! *
# *#
# !
*
#
*
#*
# ! *#
#**
#
* *
# **
#*
# *#
**#
#
*
#* *
#
*
# *#
#**#
# *# *#
# *
# *
#
*
#! ! ! ! ! ! *
#*
# *
#
!# * # *#
*
#*
# *
#
*
#*
#
*#
# *#
*
! !
**
! !# * #
# *
*#
# !#
* *
# *
#*
# *
#
* #
# *
# ! *
#
*
# *! !#
*#
* *
# *#
*#* *
# *
#
*!#
* *
#
!!
*!#
* ! ! ! !*
* ! #
# *#
#
*#
*
#
*
#*
!*
#*
#*#
*#
# * !
# *#
#
*#
* *
#
** #
# * *
# *
#
*!!
#
# #
! *#
#
!#
* *#
#
!* #
*#
*#
*#
*
*
*#
# *!#
# !
*#
*!
! #*#
*# *!
*# *# *#
#
* #
*#**
#
**!# * #
*#
* *
# !
#
*
*
# *#
*
*!
# * #
# *#*
*
#*
#
*
#
!
** #
#
*
#
*
# *! !# *# *
# # *
# *
#
*
#
! !
!*
#
! *
# *
# *
#
! *#
#* ! !# * *#
#** #
# * *! #
*#
# * *
# !#
*#
*
#
* !# * # * *
# *
#!#* ! !
*
# ! *
# *
# *
#! *
# *
#
*
# *
#
*
#
*
# *!
# *
# *#
*!#
# #
*
*! *
#
*
# ! #
* ! # * ! ! !#*!#* !#*
#!#
*
#
*
#
*#
*
*
! # *# *!# *# * *
#
!*
#
!
!
*
#*!#
#
! * !
*
#
!*
# *
#*
#

±
*
# ! !
*
# *#
!# * !
!
!
*
#

Swiss forest 1:4'500'000


0 25 50 100 km

soil profiles
Data Sources:
* calibration set (858 sites)
# Soil profile points © 2011 WSL
Forest: SilvaProtect-CH © 2008 BAFU
! validation set (175 sites) Boundary, Lakes © 2012 BFS GEOSTAT

Figure 5.2.: Locations of the 1 033 soil profiles and Swiss forest area (subdivided
into calibration and validation sets).

5.2. Materials and methods

5.2.1. Study region

As our study focused on forest soils, we had to delimit the forest area of Switzerland
(Fig. 5.2). We used the same criteria as Giamboni (2008): six categories rendered
by VECTOR25 (Swisstopo, 2011) as forest plus former forest areas, devastated
in 1990 and 1999 by two hurricanes (Bundesamt für Umwelt BAFU, 2010) and
currently not classified as forest by VECTOR25. Areas shared with the National
Mire Inventory (FOEN, 2012b) were excluded. This removed some but not all
organic soils because the inventory does not cover all bogs and fens under forest,
in particular, if these had been drained in the past.
According to this definition, forests cover 11 800 km2 (29 % of total area of
Switzerland). Forests extend over altitudes from 190 m to 2390 m above sea level
(Swisstopo, 2011). Climatic conditions therefore vary notably within this area:
mean annual precipitation ranges from 600 mm to 2900 mm and mean annual tem-

90
5.2. Materials and methods

perature from −1 to 13 ◦ C (MeteoSwiss, 2011). Two thirds of the forested area


is dominated by coniferous trees, deciduous forests prevail only at lower altitudes
in the regions Jura, Central Plateau, Pre-Alps and Southern Alps (FSO, 2000b).
Considerable variation is also found in geologic parent material for soil formation:
predominantly limestones in the Jura and in parts of the Pre-Alps, fluvioglacial
sediments of several quaternary glaciations and of the Tertiary on the Central
Plateau and igneous and metamorphic rocks in the Alps and Southern Alps. This
large variation of pedogenetic factors is reflected in the development of very di-
verse soils (Walthert et al., 2004) with variable conditions for mineralization and
accumulation of SOC.

5.2.2. Data

Soil data

Soil profiles

We used data of 1033 forest soil profiles (Fig. 5.2), studied by the Swiss Federal
Institute for Forest, Snow and Landscape Research (WSL) in various surveys over
the past 30 yr (mostly 1990–2000). Use of legacy soil data is typical for many
SOC inventories (e.g. Krogh et al., 2003; Lettens et al., 2004, 2005b; Kumar et al.,
2012; Minasny et al., 2013). Two WSL surveys chose 269 sites on square grids with
1 km and 8 km spacing, respectively. The remaining sites were selected purposively
by field surveyors to best represent soils typical for given vegetation types. The
position of the soil profiles was recorded in the field on topographic maps (scale
1 : 25 000), hence the error in the coordinates is about ±25 m.
We assigned 175 out of the 1033 soil profiles to the validation set, which was used
to check the predictive power of the fitted statistical models, and the remaining
858 soil profiles were used as calibration set, used to build and fit these models.
All except three sites of the validation set with organic soils lay on the 1 km (38
sites) and 8 km grid (134 sites), respectively. This selection resulted in a fairly even
and spatially representative distribution of the validation sites across Switzerland
(Fig. 5.2). When splitting the data, we strived for a balanced representation of
soil map units and vegetation types between calibration and validation set.
The thickness of all soil horizons was recorded in the field on the faces of soil pits
and subsequent soil sampling and laboratory analyses were all done by pedogenetic
horizons.
91
5. SOC estimation by robust external-drift kriging

Stone content

The volumetric content of stones (particles with size > 2 mm) was estimated visu-
ally on the face of soil profiles, which is a common procedure (Baritz et al., 2010).
These estimates are bound to some error that is very difficult to quantify because
surveys were done by different staff. However, neglecting stone content as in many
other studies (e.g. Krogh et al., 2003; Meersmans et al., 2008; Xu et al., 2011)
would lead to overestimation of SOC stocks as stone content is large for many
Swiss forest soils.

Soil density

The density of the soil fraction with particle size ≤ 2 mm was measured for 440 out
of about 5000 mineral soil horizons with soil samples of fixed volume (Walthert
et al., 2004, p. 702) collected from the soil profiles. In addition, a field estimate
(penetration resistance of blade) of soil density was available for all soil horizons
(ordinal variable with 5 categories, Walthert et al., 2004, p. 695).
We computed the median of the measured densities for each category of this
variable and assigned these medians to all soil horizons without density measure-
ments. The accuracy of this pedotransfer function (PTF) was evaluated by tenfold
cross-validation (by re-estimating and re-assigning the median densities computed
from 9 cross-validation subsets to the 10th subset). The median of the cross-
validation errors was equal to 0.002 g cm−3 and the median absolute deviation
(MAD, see below) was equal to 0.256 g cm−3 . For comparison, we used also the
PTF by Adams (1973) and Honeysett and Ratkowsky (1989), which performed
best for forest soils in the evaluations of De Vos et al. (2005) and Baritz et al.
(2010). Bias and MAD of the cross-validation errors ranged between 0.33–0.34
and 0.50–0.52 g cm−3 without re-calibration and if the coefficients of the PTF were
re-estimated with our own data these measures were 0.06 and 0.30 g cm−3 . Hence,
our PTF had better predictive power than the PTF by Adams, but it was worse
than the one by Jalabert et al. (2010) who recalibrated their PTF by ML methods.

SOC content

SOC content was measured for all mineral soil samples by an elemental C/N-
analyser (combustion at 1000 ◦ C, Walthert et al., 2010). When pH of a soil sample
was larger than 6.0 then carbonates were removed by fumigation with hydrochloric

92
5.2. Materials and methods

acid prior to measuring C. Below this pH carbonates were assumed to be absent,


and the OC content of the sample was assumed to be equal to its total C content
(Walthert et al., 2010).

SOC stock

The SOC stock Si stored in horizon i per unit area [kg m−2 ] was calculated from
the thickness Di of the horizon [m], its volumetric stone content Gi [m3 m−3 ], soil
density ρi [kg m−3 ] and its SOC content Ci [kg kg−1 ] by

Si = Di (1 − Gi )ρi Ci , (5.1)

and the stock S in a given depth compartment was summed by

h
X
S= wi Si , (5.2)
i=1

where h is the number of horizons fully or partly included in the compartment and
wi is the fraction of the thickness of horizon i within the compartment.

Covariates for statistical modelling

Parent material and soil

Detailed information on soils and parent material is not available for whole Switzer-
land. Therefore, an overview soil map discriminating 25 units (map scale 1 :
200 000, FSO, 2000a) was used as a coarse representation of geologic and pedoge-
netic conditions. Being mainly designed for agricultural usage, certain map units
did not well reflect contrasting conditions of forest soil development. To lessen this
drawbacks five additional units were created by intersecting the soil map with se-
lected polygons of the Geological Map of Switzerland (map scale 1 : 500 000, Swis-
stopo, 2005) and the maps of the Last Glacial Maximum (map scale 1 : 500 000,
Swisstopo, 2009) and of the Biogeographic Regions of Switzerland (see Table A.16
in Appendix, Gonseth et al., 2001). Besides, we used the geotechnical map (map
scale 1 : 200 000, BFS, 2001) to extrapolate soil information available only at the
1033 soil profile sites (for details on sampling see Appendix 2, Walthert et al.,
2004) to whole Switzerland. Median values of soil properties measured at those
sites that lay within a given geotechnical map unit were assigned to the respective

93
5. SOC estimation by robust external-drift kriging

unit. Then we checked whether these newly generated covariates correlated with
SOC stocks. This was true for cation exchange capacity, iron and calcium stocks
and mass of soil particles < 2 mm that we consequently retained for the statistical
analyses.

Climate

Three climate datasets were available to us with spatial information on mean


annual/monthly temperature and precipitation, cloud cover, sunshine duration,
radiation, degree-days, continentality index (Gams, 1935), temperature variation,
ratio of actual to potential evapotranspiration and site-water-balance (Grier and
Running, 1977). Two datasets contained spatial information (resolution 25 m and
2 km, respectively) on climatic means for period 1961–1990 (Zimmermann and
Kienast, 1999; MeteoSwiss, 2011) and the third for 1975–2010 (spatial resolution
250 m). Since it was not a priori clear, which dataset would be best, we used them
all as covariates in the statistical analyses.

Vegetation

The percentage of coniferous trees was derived from spectral imagery (FSO, 2000a)
and species composition data of the National Forest Inventory (NFI, Brassel and
Lischke, 2001, both covariates rasterized with 25 m resolution). The SPOT5 mosaic
of Switzerland (Mathys and Kellenberger, 2009) with spectral reflectance in green,
red and near-infrared bands, band ratios, IHS colour space transformations and
the Normalized Differenced Vegetation Index (NDVI, Kriegler et al., 1969) were
available. Moreover, canopy height (difference of digital surface to digital elevation
model of 2 m resolution, Swisstopo, 2011) was included in the set of covariates.

Topography

Two digital elevation models (DEM, resolution 2 and 25 m, Swisstopo, 2011) al-
lowed to compute a broad range of terrain attributes covering multiple scales:
elevation, slope angle, aspect, north- and eastness, planar, profile and combined
curvatures and smoothed versions of these attributes. Furthermore, topographic
position indices were calculated based on Zimmermann (2000) and Jenness (2006)
with radii ranging from 6 m to 2 km. Flow accumulation area and topographic
wetness indices were computed by single and multi-flow algorithms (Tarboton,
1997).

94
5.2. Materials and methods

Accounting for errors in locations of soil profiles

We mentioned above that coordinates of soil profiles had been recorded with
a likely error of about ±25 m, which exceeds the resolution of the highly resolved
DEM clearly. Therefore, the values of all covariates were averaged for circular
neighbourhoods, centred on the recorded profile locations and having radii equal
to 13 m, 19 m or 26 m. Depending on the type of data, different summary statis-
tics were computed: arithmetic means for real numbers, medians for integers and
the most frequent category for nominal or ordinal variables. However, values of
covariates aggregated with the different radii were highly correlated, Therefore, we
used for statistical analyses only summaries computed with a radius of 26 m.

Figure 5.3.: Boxplots of calculated soil organic carbon (SOC) stocks in 0–100 cm
depth by ecoregion and altitudinal class (n: number of sites).

5.2.3. Statistical analyses

Model

Given past experiences (Mishra et al., 2010; Kumar et al., 2012; Wiesmeier et al.,
2012) and exploratory analyses (Fig. 5.3), we decided to use a lognormal model
for the SOC stock S(s) at location s:

Y (s) = log(S(s)) = x(s)T β + Z(s) + ε(s), (5.3)

where x(s)T β is the external drift that accounts for dependence of S on environ-
T
mental covariates x, (with β the regression coefficients), denotes transpose and

95
5. SOC estimation by robust external-drift kriging

Z(s) is a stationary autocorrelated Gaussian random field with zero mean and
isotropic exponential variogram with sill σ 2 and range α

γ(h) = σ 2 (1 − exp(−h/α)) . (5.4)

ε(s) is a zero mean, spatially uncorrelated variable with nugget variance τ 2 . In


our robust geostatistical approach ε(s) need not be Gaussian, allowing thereby for
outliers in the data. The coefficients β, the variogram parameters θT = (τ 2 , σ 2 , α)
and the values ZT = (Z(s1 ), Z(s2 ), . . . , Z(sn )) at the n soil profile locations si are
unknown and must be estimated from the data.

Model building

We used only the calibration set for model building, which involved the following
steps:
1. Positively skewed covariates (e.g. some terrain attributes) were transformed
by square root or natural logarithm.
2. Strongly correlated and therefore redundant covariates were eliminated based
on correlation-biplots (Gabriel, 1981).
3. The least absolute shrinkage and selection operator (LASSO, Hastie et al.,
2009, Sect. 3.4) – an algorithm that likely excludes nonrelevant covariates –
was used with various sets of covariates, partly enriched by first-order inter-
actions between pairs of covariates, to find an external drift that minimized
the mean squared error (MSE) in tenfold cross-validation.
4. The parameters of the geostatistical model (Eq. 5.3) were then estimated
by a novel robust restricted maximum method (REML, Künsch et al., 2011;
Künsch et al., in prep.) for the external drift selected by LASSO.
5. Using still the external drift of the optimal LASSO fit, an optimal value of
the tuning constant c that controls the robustness of REML was chosen by
tenfold cross-validation. We used the continuous ranked probability score
(CRPS, see below and Gneiting et al., 2007) as main criterion for choosing
the tuning constant (and for selecting covariates in step 6). We further
tested whether other variogram functions (spherical, Whittle-Matérn, etc.,
e.g. Diggle and Ribeiro, 2007) improved the fit but this was not the case.
6. Nonrelevant covariates were then removed step by step by tenfold cross-

96
5.2. Materials and methods

validating the robust REML fit (and added back along with interaction terms
at later stages if cross-validation results justified this).
7. The levels of categorical covariates (in particular of the soil map) were merged
based on partial residual plots (e.g. Faraway, 2005) and cross-validation
CRPS to obtain a final parsimonious geostatistical model.
The improvement of the cross-validation MSE from step 3 to 7 is shown in Fig. A.11
of the Appendix.

Evaluating predictive performance of statistical models

The predictive power of the fitted geostatistical models was tested by comparing
predicted (Eq. 5.15) with calculated SOC stocks (Eq. 5.2). The same criteria were
used in model building by cross-validation (see above). Marginal bias and overall
precision were assessed by
n
1 X (S(si ) − S̃(si ))
BIAS = − , (5.5)
n i=1 S(si )
!
S(si ) − S̃(si )
robBIAS = −median1≤i≤n , (5.6)
S(si )

n
!2 1/2
1 X S(si ) − S̃(si ) 
RMSE =  , (5.7)
n i=1 S(si )
!
S(si ) − S̃(si )
robRMSE = MAD1≤i≤n , (5.8)
S(si )

where S(si ) stands for calculated, S̃(si ) for predicted SOC stocks and MAD for
median absolute deviation. We computed summaries of the relative prediction
errors because the lognormal model implies constant relative dispersion, i.e., con-
stant coefficient of variation. We standardized the prediction errors by S instead
of S̃ to have a common standardization when comparing different models.
We also computed a standard and robust R2 by

Cov[S(si ), S̃(si )]2


R2 = , (5.9)
Var[S(si )]Var[S̃(si )]
Pn !2
| S(si ) − S̃(si ) |
robR2 = 1 − Pn i=1
. (5.10)
i=1 | S(s i ) − median1≤i≤n (S(si )) |

97
5. SOC estimation by robust external-drift kriging

Although the latter is tailored for robust L1 regression (Croux and Dehon, 2003),
we found it useful for our approach.
In addition, we computed the strictly proper scoring criterion CRPS (Gneiting
et al., 2007), which is equal to the integral over the Brier score (BS):
Z ∞ n
X
CRPS = BS(u) du ≈ BS(S(i) ) · (S(i+1) − S(i−1) )2 /2, (5.11)
−∞ i=1

where S(i) is the ith largest calculated stock, S(0) = S(1) , S(n+1) = S(n) and

n
1X
BS(u) = {F̃i (u) − I(S(si ) ≤ u)}2 , (5.12)
n i=1

with F̃i (u) the (estimated) lognormal predictive distribution function of the ith
datum and I(A) an indicator equal to one if A is true and zero otherwise. CRPS
measures the sharpness of predictive distributions (smaller values signal sharper
F̃i ), hence depends both on prediction precision and quality how prediction un-
certainty is modelled. Modelling of prediction uncertainty was further tested by
counting how many observation fall into two-sided 95 %-prediction intervals and
by checking the empirical distribution of the probability integral transform (PIT,
Gneiting et al., 2007)
PITi = F̃i (S(si )), (5.13)

which should be uniformly distributed.

Mapping SOC forest soil stocks across Switzerland

To get better parameter estimates for finally mapping SOC stocks by EDK we
fitted the model to the merged calibration and validation data. This was done
after computing validation statistics (see above). SOC stocks were then predicted
by robust lognormal kriging (Cressie, 2006; Künsch et al., in prep.) for the nodes
of a 100 m grid by

S̃(s) = exp(Ỹ (s) + 1/2{τ̂ 2 + σ̂ 2 − Var[Ỹ (s)]}), (5.14)

with
Ỹ (s) = x(s)T β̂θ̂ + γθ̂ (s)T Γ−1
θ̂
Ẑθ̂ , (5.15)

where ˆ denotes robust REML estimates, γθ̂ (s) is the vector with the estimated

98
5.2. Materials and methods

covariances between Z and Z(s), Γθ̂ is the estimated covariance matrix of Z and
 
Var[Ỹ (s)] = γθ̂ (s)T Γ−1
θ̂
, x(s) T
(5.16)
   
−1
Ẑθ̂ 
  Γ γθ̂ (s)
·Cov   , ẐT θ̂
, β̂ T   θ̂
θ̂   .
β̂θ̂ x(s)

Künsch et al. (2011) give in their Eq. (19) an approximation for the covariance
matrix of (ẐT
θ̂
, β̂θ̂T ). Approximate, lognormal kriging variances were obtained from
 
T 2 2
Var[S(s) − S̃(s)] = exp 2 x(s) β̂θ̂ + τ̂ + σ̂ (5.17)

· {exp(τ̂ 2 + σ̂ 2 ) − 2 exp(Cov[Ỹ (s), Y (s)])


+ exp(Var[Ỹ (s)])},

where
 
Cov[Ỹ (s), Y (s)] = b γθ̂ (s) T
Γ−1
θ̂
, x(s)T (5.18)
 
 γθ̂ (s) 
· M−1  
XT γθ̂ (s)

and b, X, M as in Künsch et al. (2011). Since outliers receive small weight when
computing β̂θ̂ and Ẑθ̂ by the robust REML algorithm, the prediction of SOC stock
by Eqs. (5.14) and (5.15) is also insensitive to outlying observations.

Predicting regional and national mean SOC stocks

The mean SOC stocks in the five ecoregions (and for whole Switzerland), stratified
by altitude, were computed from the robust lognormal point kriging predictions
at the nodes of the 100 m grid by
X
S̃(Bk ) = 1/Nk S̃(si ), (5.19)
si ∈ B k

P
where the notation si ∈ Bk means summation over the Nk nodes of the grid falling
into region Bk . Equation (5.19) is a discrete approximation to the lognormal block
kriging predictor (e.g. Cressie, 2006, Eq. 14). The block kriging variance, i.e., the
variance of the prediction error, S(Bk ) − S̃(Bk ), for region Bk can similarly be

99
5. SOC estimation by robust external-drift kriging

approximated by the covariance (Eq. A.2 in Appendix)

1 X X
Var[S(Bk ) − S̃(Bk )] = (5.20)
Nk2 s ∈ B s ∈ B
i k j k

Cov[S(si ) − S̃(si ), S(sj ) − S̃(sj )].

However, Nk is usually too large (in our case: 104 –105 ) to evaluate the double
sum of Eq. (5.20) in acceptable computing time. We used therefore a Monte-Carlo
approximation for Eq. (5.20), where the covariances were repeatedly computed and
averaged for randomly selected subsets of nodes in Bk . Full details can be found in
the respective appendix of the Appendix. Of course, this approximation can also
be used if there is no residual autocorrelation, and it is straightforward to derive
analogous expression for untransformed data. For sufficiently large regions, one
can safely assume – due to the central limit theorem – that the prediction errors
S(Bk ) − S̃(Bk ) are normally distributed, in spite of the fact that point prediction
errors follow lognormal laws.
All statistical computations were done in R (R Core Team, 2013), using several
add-on packages, in particular georob (Papritz, 2013) for robustly fitting geosta-
tistical models and for robust kriging. Processing and mapping of spatial data was
done in ArcGIS 10.0 (ESRI, 2010).

5.3. Results

5.3.1. Calculated SOC stocks

SOC stock stored in the top 30 cm of the mineral soil at the 1 033 sites varied con-
siderably from 0.8 to 36.1 kg m−2 (median 6.3 kg m−2 ), and to 100 cm depth stocks
ranged from 1.0 to 96.4 kg m−2 (median 9.2 kg m−2 ). Stocks in both depths were
strongly correlated (Spearman correlation 0.91). On average, calculated stocks
were slightly larger for the validation set (Table A.15 of Appendix). Except for
the Central Plateau and the Southern Alps, mean stocks down to 1 m depth in-
creased with altitude (Fig. 5.3). For most strata, the frequency distribution of
stocks was positively skewed and dispersion increased with the mean, calling for
log-transformation for the statistical analyses.

100
5.3. Results

a) 0−30 cm (n = 175) b) 0−100 cm (n = 175) ●

50

20

● ●
●● ●

●● ● ●

Measured SOC stock [kg m−2]


Measured SOC stock [kg m−2]
● ●
● ●
● ●

● ● ●
● ● ●

● ● ●

● ● ● ●
●● ●
● ●
● ●
● ● ●●● ●● ●
● ● ● ●

● ●● ●

20

●● ● ● ●

10

●● ●
● ● ●●
● ● ● ● ● ● ● ●

● ● ●
● ● ● ●
●● ● ●
● ●
● ● ●
● ● ● ●●
● ● ● ●
● ●●
● ● ● ● ●● ●
● ● ●
● ●● ● ● ●
● ● ●● ● ● ● ● ●● ● ●
● ●
● ●● ● ●● ● ● ●
● ● ●
●● ● ● ● ●
● ● ●● ●
● ● ●

● ● ●

● ● ● ●● ●
● ● ● ● ● ●
● ●●● ●
●● ●

10
●● ● ●
● ● ●● ●
● ● ● ●
●● ● ● ● ●

● ● ● ●
● ●
● ●● ●● ● ● ●
●●●
● ●●

5
● ● ●●●● ●
● ● ●

● ●● ●● ●
●● ●
● ● ● ● ●
● ●●●● ● ● ● ●
●● ● ● ● ● ●
● ●● ●
● ●●●●

● ● ● ●● ● ● ● ●
●● ● ●
● ●● ● ●

● ●● ●
● ●
●●
● ● ●
● ●

5
● ● ● ●
● ● ●
● ● ●

● ●● ●
● ● ●
● ●
● ● ● ● ●
● ●
● ●

2

2 5 10 20 5 10 20 50

Predicted SOC stock [kg m−2] Predicted SOC stock [kg m−2]

Figure 5.4.: Scatter plots of measured against predicted soil organic carbon (SOC)
stocks in 0–30 cm (a) and 0–100 cm (b) of the mineral soil, computed with the
calibration data for the sites of the validation set (solid line: loess scatter plot
smoothers, n: number of sites).

5.3.2. Models for SOC stocks in 0–30 cm and 0–100 cm depth

Not surprisingly, given the strong correlation of stocks in the two depths, the struc-
ture of the external drifts did not differ much. Both drifts were parsimonious, with
10 and 12 fitted coefficients, respectively, and included covariates characterizing
soils, vegetation, climate and topography (see Table 1 as well as Tables A.17 and
A.18 of Appendix).
Tenfold cross-validation resulted for both depths in similar robR2 (0.31). How-
ever, based on CRPS, the fit was better for topsoil stocks (0.238 vs. 0.252). Resid-
uals of both models were spatially autocorrelated, but spatial dependence was
rather weak with nugget/total-sill ratios and effective ranges of 0.37 and 600 m for
0–30 cm and 0.41 and 660 m for 0–100 cm depth (see Table A.19 of Appendix).
The optimal tuning constant was equal to c = 2 for both models, and robustly es-
timated parameters fitted the data slightly better than customary Gaussian REML
estimates (cross-validation CRPS of 0.239 for non robust model fit for 0–30 cm and
of 0.253 for 0–100 cm depth).

Table 5.1.: Statistics of relative prediction errors of soil organic carbon (SOC)
stocks in two depth compartments (0–30 cm, 0–100 cm) for the validation set.

BIAS RMSE R2 robBIAS robRMSE robR2 CRPS

0–30 cm 0.135 0.488 0.346 0.070 0.388 0.337 0.221


0–100 cm 0.152 0.556 0.477 0.066 0.420 0.403 0.247

101
5. SOC estimation by robust external-drift kriging

Table 5.2.: Covariates of external drift selected by model building procedure for soil organic
carbon (SOC) stocks in 0–30 cm and 0–100 cm depth.
SOC stock 0–30 cm SOC stock 0–100 cm

soil categorical covariate with 5 aggregated categorical covariate with 9 aggregated


soil map units soil map units
mass of soil particles < 2 mm assigned
to geotechnical map units

climate mean annual precipitation (square mean March precipitation (square root)
root)

vegetation near-infrared band (SPOT5 mosaic) near-infrared band (SPOT5 mosaic)

topography topographic position index with radius slope (resolution 2 m)


500 m (Jeness, 2006) for soil map units
rich and poor in clay

5.3.3. Validation of SOC stock predictions with independent


data

Figure 5.4 shows calculated SOC stocks in 0–30 and 0–100 cm of the mineral soil,
plotted against respective predictions for the independent validation set. The solid
lines of the loess scatterplot smoothers (Cleveland, 1979) are close to the 1 : 1-lines,
indicating absence of conditional bias. This is confirmed by the BIAS and rob-
BIAS statistics (Table 5.1). Irrespective how the statistics were computed, relative
marginal bias was less than 15 %. However, variation of the data around the 1 : 1-
line was quite large, which was reflected in large root mean squared relative errors.
robRMSEs were about 40 % and nonrobust RMSE 49 % for topsoils and 56 % for
stocks down to 100 cm. As seen from the robust R2 , the models explained about
34 % of the variation of calculated SOC stock in 0–30 cm and 40 % of calculated
SOC stock in 0–100 cm. The kriging variances overestimated the prediction errors
somewhat: only 3.4 % of the validation observations (both models) were outside
of 95 %-prediction intervals (Fig. A.15 in Appendix). Overestimation of prediction
uncertainty was also indicated by convex-shaped PIT histograms (Fig. A.16 in
Appendix), which had more probability mass in the centre than in the tails.

5.3.4. Prediction of SOC stocks for Swiss forest soils

For computing the predictions, the parameters of the final models (Table 5.2)
were estimated with data of 1022 sites (combined calibration and validation sets,

102
5.3. Results

±
1:2'200'000

0 25 50 km
-2
SOC stock 0-100 cm [kg m ]

0 3 6 9 12 15 18 21 24 40 Data Source:
Lakes: Vector 200 © 2007 swisstopo (DV033492.2)
Relief 1:1'000'000 © 2012 swisstopo
Swiss Boundary: BFS GEOSTAT, swisstopo

Figure 5.5.: Robust lognormal kriging prediction of the soil organic carbon (SOC) stock in
0–100 cm of the mineral soil of Swiss forests (computed with best-fit model with covariates
according to Table 5.2 and tuning constant c = 2, smoothed with focal mean with a radius of 1
pixel = 100 m).

Figure 5.6.: Block kriging predictions of the soil organic carbon (SOC) stocks in 0–30 cm and
0–100 cm soil depth in the five ecoregions stratified by altitude asl [m] into three classes (vertical
lines: prediction intervals).

103
5. SOC estimation by robust external-drift kriging

excluding 11 sites with missing covariate information). Robust lognormal kriging


predictions of stocks stored to 100 cm depth are mapped in Fig. 5.5 for the nodes
of the 100 m grid. The map with the predicted topsoil SOC was very similar and
is therefore not shown. Block kriging predictions of the mean stocks for the five
ecoregions and for the entire Swiss forest area are shown in Fig. 5.6.
Largest SOC stocks were predicted for higher altitudes and for the Southern
Alps in general. Predicted stocks were smallest for the Central Plateau and
for lower altitudes of the Pre-Alps, where stocks down to a depth of 100 cm re-
mained below 10 kg m−2 . For whole Switzerland, predicted mean SOC stocks in
0–30 cm were equal to 7.99 kg m−2 (SE 0.15 kg m−2 , 95 %-prediction interval [7.69,
8.29] kg m−2 ). Down to 100 cm a SOC stock of 12.58 kg m−2 SOC was predicted
(SE 0.24 kg m−2 ) resulting in a 95 %-prediction interval of [12.11,13.05] kg m−2 .
Thus, about 4.5 kg m−2 SOC are stored in subsoils (30–100 cm) of Swiss forests.
Our estimates do not include carbon stored in forest floor horizons. Spatially
explicit estimation was not possible for this compartment because we largely lacked
C and soil density measurements. Based on the available data, Nussbaum et al.
(2012) estimated that about 1.7 kg m−2 (SE 0.08 kg m−2 ) of C are stored in forest
floors of Swiss forests.

5.4. Discussion

5.4.1. Model building and covariate selection

The model building procedure effectively reduced the 360 potential covariates and
their first-order interactions to a small and meaningful set. Precipitation was
a covariate of both models (with positive coefficients, Figs. A.12 and A.12 in Ap-
pendix). Perruchoud et al. (2000); Martin et al. (2011); Meersmans et al. (2012b);
Kumar et al. (2012); Chiti et al. (2012) and Wiesmeier et al. (2013) previously
reported that wet climate favours SOC accumulation. Near-infrared reflectance of
the forest canopy was also selected for both models: smaller reflectance of conifers
for wavelength of 750 to 1300 nm (Cipar et al., 2004) and negative regression coeffi-
cients imply larger SOC stocks under conifers than deciduous trees. Additionally,
information on parent material was important for SOC prediction: aggregated
units of the overview soil map were meaningful covariates despite representing the
heterogeneous pedogenetic conditions typical for Switzerland only coarsely (see
Figs. A.12 and A.12 in Appendix).

104
5.4. Discussion

5.4.2. Residual spatial autocorrelation

Spatial autocorrelation of residuals remained weak in both models, suggesting that


spatial patterns in calculated SOC stocks were reasonably well modelled by the
external drifts. Due to short-ranged spatial dependence, only 5 % of the nodes of
the prediction grid were within a distance equal to the effective variogram ranges
of the soil profile sites. From the validation set only 14 of 175 sites were within
these zones. Neglecting spatial autocorrelation but using the same set of covari-
ates would slightly lower the precision of SOC stock estimates for the 0–30 cm and
increase precision for the 0–100 cm depth compartment (Table A.20 in Appendix).
Although kriging predictions differ only within the estimated range of spatial de-
pendence from predictions obtained by the regression models, consideration of
autocorrelation was important for accurate modelling of prediction uncertainty.

5.4.3. Robust parameter estimation

Moderate robustification of the parameter estimation procedure (tuning constant


c = 2) increased the predictive power of the fitted models in cross-validation
slightly compared to customary REML and kriging. This is reflected in the slight
increase of robRMSE (0.6 % for SOC stock predictions in 0–30 cm and 0.5 %
for stocks in 0–100 cm) compared to a nonrobust fit of the model with the same
covariates (Table A.20 in Appendix). A further advantage of robust estimation
is clear labelling of data that are fitted only poorly by the models. Scrutinizing
environmental conditions for those observations revealed that these were: (i) sites
on calcareous bedrock in inner Alpine valleys where recurring drought hinders OC
mineralization, resulting in thick forest floor and SOC rich A horizons (Walthert
et al., 2004), and (ii) sites in the Southern Alps with acid podsolic soils that
show pronounced humus translocation down the profile. Moreover, these sites are
influenced by forest fires (leading to accumulation of black carbon) and stabilize
SOC effectively by large content of aluminium and iron weathered from silicate
rich bedrock (Blaser et al., 1997). Using robust procedures ensured that SOC data
resulting from sites subject to such special conditions did not confound statistical
analyses.

105
5. SOC estimation by robust external-drift kriging

5.4.4. Predictive performance of fitted models

Random dispersion of the prediction errors remained large as our robRMSEs of


39 % and 42 % demonstrated. This was also reflected in rather modest R2 of 0.35
and 0.48 (Table 5.1). Further validation data from Swiss soil monitoring networks
were predicted with somewhat larger errors (Nussbaum et al., 2012, cf. Sect. 3.4).
Other studies found (cross-)validation R2 (all nonrobust) of similar magnitude:
Martin et al. (2011) obtained by cross-validation R2 = 0.36 for predicting topsoil
SOC stocks of forests in France. Mishra et al. (2009) found R2 = 0.46 for stocks in
0–50 cm and R2 = 0.56 for 0–100 cm, and Kumar et al. (2012) reported R2 = 0.36
for stocks down to 100 cm depth. The latter two studies validated with independent
data predictions of OC stocks in soils under various land-uses in the US states
Indiana and Pennsylvania.
On the one hand, incompleteness and partly insufficient quality of covariates is
likely responsible for the modest predictive power of the fitted models. In particu-
lar, spatial information on soil or vegetation parameters controlling SOC turnover
was completely lacking in our set of potential covariates. Also, data on forest man-
agement and land use history (e.g. stand age), found to be relevant by Schroeder
et al. (2009) and Schulp et al. (2013), was missing.
On the other hand, causes for the moderate precision of our predictions lie in the
missing soil density measurements (Schrumpf et al., 2011). For most horizons soil
density was derived from a PTF, which proved to be unbiased, but nevertheless
added additional variation to the data.

5.4.5. Spatial structure of SOC stock predictions

So far, no maps of SOC stocks have been published for Swiss forests that could be
used for verification of Fig. 5.5. Nevertheless, several patterns in our SOC stock
map matched our expectations: small SOC stock was predicted for acid soils at
lower altitude on the Central Plateau. Very small total SOC stock was estimated
for areas in the Eastern Pre-Alps and Alps where Permian Verrucano sand stones
form the bedrock. On these sites, SOC accumulates in the forest floor. The map
shows large stocks up to 40 kg m−2 in parts of the Eastern Pre-Alps where large
annual precipitation and water-logged soils prevail, and also in the Jura region,
where large stocks are likely related to organic matter stabilisation by calcium
(Walthert et al., 2004). Very large SOC stock was predicted for the Southern

106
5.5. Summary and conclusion

Alps, where a combination of forest fires and Al-rich soil on metamorphic parent
material led to an accumulation of organic matter, even in deeper soil horizons.
Excepting the special conditions in the Southern Alps, predictions of the mean
stocks by ecoregions and altitudinal class (Fig. 5.6) reflected the increase of SOC
stock with altitude described by Hagedorn et al. (2010).

5.4.6. Comparison with SOC stock estimates of previous


studies

Perruchoud et al. (2000) estimated for whole Switzerland a mean SOC stock of
7.59 kg m−2 (SE 0.30 kg m−2 ) for the top 30 cm of the mineral soil and 9.82 kg m−2
(SE 0.53 kg m−2 ) for mineral soils down to bedrock, which are both significantly
smaller than our current estimates (p values of one-sided z tests: 0.004 and < 10−12 ,
respectively). The estimate of 11.86 kg m−2 (SE 0.54 kg m−2 ) by Bolliger et al.
(2008) for total SOC stock (forest floor plus mineral soil down to bedrock) of Swiss
forests is also smaller than our estimate for 0–100 cm. Our standard errors (0–
30 cm: 0.15 kg m−2 ; 0–100 cm: 0.24 kg m−2 ) are smaller (by a factor of about two)
than those of Perruchoud et al. (2000) and Bolliger et al. (2008). Since we validated
uncertainty modelling for point predictions and used a coherent framework to
quantify the uncertainty of our regional and national mean estimates, one can
trust that these figures accurately represent the uncertainty of our estimates.
Perruchoud et al. (2000) estimated that about 77 % of SOC stock of Swiss forests
is stored in the mineral topsoil (0–30 cm), whereas we predicted a proportion of
only 64 %, which matches the proportion of 64.3 %, computed directly from the
observed SOC data (n = 1033) very well.

5.5. Summary and conclusion

Greenhouse gas reporting requires estimates of regional or national mean SOC


stocks that are computed from observations with quasi-point support. The geo-
statistical block kriging approach is method of choice for such change-of-support
problems as it guarantees that estimates are unbiased and precise and prediction
standard errors correctly account for the spatial averaging. Rather surprisingly,
our study seems to be the first to employ such an approach in the context of GHG
reporting.

107
5. SOC estimation by robust external-drift kriging

Based on spatially referenced data about 1 033 soil profiles, we built parsimo-
nious, pedologically interpretable, geostatistical models for SOC stocks in two
depth compartments (0–30 cm, 0–100 cm) of mineral soils of Swiss forests. The
models relate calculated stocks to environmental covariates that characterize the
pedogenetic conditions at the profile sites and account for residual spatial auto-
correlation. The fitted models were rigorously validated by comparing predic-
tions with independent data. Using the models, we mapped forest SOC stock
across Switzerland by robust external-drift kriging at high spatial resolution and
aggregated the kriging results coherently to come-up with reliable block kriging
estimates (and standard errors) of national mean SOC stocks in Swiss forests.
A comparison with earlier studies on SOC in Swiss forest revealed that previ-
ous estimates of SOC stock down to 1 m depth were distinctly smaller than our
estimate. Moreover, our (independently validated) standard errors were only half
as large as the previously reported SE. As we used a substantially larger database
and sound geostatistical methods we trust our estimate more and conclude that
SOC stocks of Swiss forests have been considerably underestimated in the past.

Acknowledgements We thank the Swiss Federal Office for the Environment


(FOEN) for funding this work. A special thank goes to the WSL for collecting
and analysing innumerable soil samples and to the WSL GIS group for providing
the infrastructure for spatial data processing.

108
6. Concluding remarks and outlook

Digital soil mapping (DSM) is a mature field of research. Specifications for appli-
cation on a global scale have been developed (Arrouays et al., 2014), and software
manuals specifically targeting DSM have been published (Malone et al., 2017).
Therefore, after some general concluding remarks, I would like to give a personal
and subjective outlook on future DSM application in Switzerland.

6.1. Concluding remarks

Comprehensive soil function assessment relies on knowledge about basic soil prop-
erties. Efficiently building predictive models for DSM becomes crucial if numerous
soil properties are to be mapped. Moreover, DSM approaches need to handle large
sets of environmental covariates that have become increasingly common in recent
years.
In this thesis a model building framework for generalized additive models for
spatial data (geoGAM) was developed. GeoGAM consistently modelled continu-
ous soil properties and categorical soil classification data, hence allows one to use
the same tool for both data types. Further, a geoGAM models nonlinear and non-
stationary relationships between soil data and covariates and accounts for spatial
autocorrelation. Applications of geoGAM to legacy soil data showed, however,
that the fitted smooth functions of spatial coordinates did not well capture the
generally weak, short-ranged spatial autocorrelation of the residuals. The appli-
cations further showed that nonstationary interaction terms between covariates
and spatial coordinates hardly ever improved the fitted models much. Hence, the
capability of geoGAM to model autocorrelation and nonstationary effects did not
matter much in practice.
GeoGAM could be extended to account for interactions between arbitrary co-
variates, but these were not considered in this thesis. Inclusion of first-order in-
teractions is in principle straightforward, but large sets of covariates would lead to

109
6. Concluding remarks and outlook

even larger sets of interactions from which one would have to select the relevant
ones. Without preselection geoGAM model building might become unstable and
computations quickly too demanding. GeoGAM often selected small models with
on average of 12 covariates that allowed interpretation of covariate effects in a
simple way. The largest model however included 30 (likely interrelated) covariates
what made effect interpretation again difficult.
Computing predictive distributions by bootstrapping was successful, but com-
putationally demanding. The somewhat too small coverage in the lower tails ob-
served for effective cation exchange capacity (ECEC) in ZH forest topsoils can be
attributed to the nonnormally distributed residuals. GeoGAM relies on normally
distributed errors, hence one has to find appropriate transformations of the re-
sponses. This begs the question how one has to transform predictions back to the
original measurement scale. Parametric formulae exist for logarithmic and square
root transformations, for other transformations one has to resort to simulations
for obtaining unbiased backtransformations.
Predictions computed by geoGAM were often similarly precise as those by other
methods, but sometimes clearly inferior. Comparison of geoGAM with grouped
lasso, robust external drift kriging (georob), boosted regression trees (BRT) and
random forest (RF) on a total of 48 responses from 3 study regions showed that RF
performed most often best and never worst. But differences in precision measures
(root mean squared error RMSE, mean squared error skill score SSmse ) between
RF and its closest competitors were small. GeoGAM was best for only 7 out of
48 responses, and it was the poorest of the five methods for 14 responses. Lasso,
which also resulted in strongly reduced covariate sets, had similar best performance
rate but with less very poor models.
RF performed well for non-Gaussian responses, too. In the few cases where it
was tested, quantile regression forest, the extension of RF to estimate quantiles
of predictive distributions, seemed to perform as least as well as the geoGAM
bootstrapping approach. Hence, RF emerged as overall best single DSM method
in this study. RF as applied in Chapt. 3 performed no covariate selection. The
results of my thesis suggest that there might be a trade-off between predictive
precision and ease of interpretation of fitted models: Complex methods like RF
may create better maps, but these may be less readily accepted by end-user because
it is more difficult to understand what information was used to generate them.
Removing covariates by recursive elimination based on their importance could
reduce RF model complexity, but loss (or even gain) of predictive precision after

110
6.1. Concluding remarks

such a covariate selection needed to be assessed. Further, illustrating the effect of


important covariates by partial dependence plots or maps and checking whether
effects agree with pedological understanding might help to overcome the black-box
nature of RF predictions and to increase acceptance of respective products by end-
users. When predictions by the five methods were averaged, precision was even
better than that of RF, but model averaging (MA) clearly hampers interpretation
further.
But model performance and interpretability are not the only factors to be con-
sidered when selecting an approach. When change-of-support is involved, such as
in the soil organic carbon (SOC) study (Chapt. 5), geostatistics is still the method
of choice. It allowed to compute point-support predictions and to estimate at the
same time regional and national means along with standard errors. Since only two
responses had to be modelled, the trend models could be built manually and no
automatic model building procedures were needed.

The soil property maps created in this thesis will be used to test digital soil
function assessment (project PMSoil, workpackage D). The estimated SOC stocks
in Swiss forest soils were used to improve the Swiss greenhouse gas inventory
(FOEN, 2016). Thus, apart from methodological advances, my thesis made a
very practical contribution to generate spatial information on soils in Switzerland,
which is unfortunately still missing for a large part of the country.

111
6. Concluding remarks and outlook

6.2. Outlook – DSM in Switzerland

Besides the work presented here, only few studies used DSM in Switzerland so
far (Egli et al., 2005; Steiner et al., 2006; Rehbein and Keller, 2007; Herbst and
Mosimann, 2008, 2010; Margreth, 2015). Apart from an overview soil map for
one canton (scale 1:50’000, Presler et al., 2005) and mapping of physical soil
properties for Swiss forests (Walthert et al., 2015), there has been no attempt
to create a comprehensive set of soil property maps by DSM at regional or even
national scale.
How far DSM will complement or even replace conventional soil mapping (CSM)
in Switzerland is difficult to foresee. Recent reports, commissioned by the Swiss
Federal Office for the Environment (FOEN), analysed users’ needs (Knecht et al.,
2017) and outlined a concept to establish a comprehensive soil information system
for Switzerland (Carizzoni et al., 2017). Based on these reports I discuss open
questions and issues, framing the current situation in Switzerland in regard to
possible future applications of DSM.

6.2.1. Map scale and precision

There is a large demand for spatial soil information for many fields of application.
Most users desire large-scale maps (1:5 000, Knecht et al., 2017) that should cover
the complete region of the end-user’s interests. Maps available only for scattered
patches would reduce their usefulness.
There is a common agreement among end-users that maps with a scale of 1:5 000,
generated according to up-to-date CSM standards (manual FAL24+, AfUSO,
2014) allow safe decisions related to landownership (Knecht et al., 2017). How-
ever, the precision of predictions derived from such maps remains unknown. I am
not aware of any attempt to validate the precision of such predictions rigorously
with independently sampled soil data. Hence, the precision of DSM products
of my thesis cannot be compared to current CSM. Comparison of predictions of
texture, gravel and effective soil depth (SD) derived from the (methodologically
outdated) conventional soil map of the Canton of Zurich (Jäggli et al., 1998) with
soil data not used in the survey resulted in about the same root mean squared
errors (RMSE) as obtained by DSM (see Chapt. 3 and Nussbaum and Papritz,
2017). Methodological issues of CSM (e.g. reproducibility, tendency to map con-
trasts where there might be none, restriction to report top- and subsoil properties

112
6.2. Outlook – DSM in Switzerland

only) were not discussed by Knecht et al. (2017) nor Carizzoni et al. (2017).
Meanwhile, DSM is still lacking specifications and practical experience for gen-
erating high-resolution maps. In CSM rules of thumb exist how many observations
are needed for a given mapping scale and, in turn, what mapping scale allows which
application. For DSM no accepted notion exist what RMSE, SSmse or width of
prediction intervals are equivalent to the (presumed) accuracy of the desired scale
of 1:5 000 or would be acceptable for decisions related to landownership. Neither is
there an established connection between soil data density and statistical precision
measures. Denser sampling does not necessarily result in more accurate maps,
especially if soil data was collected by purposive sampling (Kempen et al., 2012).
To allow well-informed methodological decisions on future mapping campaigns,
rigorous validation of CSM and clear definition of needed precision of CSM and
DSM are a prerequisite.

6.2.2. Soil data acquisition

Precision of DSM predictions in my thesis could very likely have been improved by
including more soil observations. The maps have been calibrated by much less soil
data than used by CSM, although the data originated from soil surveys (Chapt. 2
and 3). In CSM raw data is sampled during polygon demarcation (by augering
roughly every 50 m and obtaining field estimates of effective soil depth available to
plants (SD), texture, inorganic carbon content or pH). The gathered data is only
implicitly digitized in interpreted and aggregated form as map polygons. This data
is to my knowledge not georeferenced nor digitized and lost for any subsequent
(DSM) analysis to answer future questions.
Further, a large part of the observed data in CSM is estimated in the field
and not measured in a laboratory. Field estimates may agree with laboratory
measurements with smaller or larger discrepancies (for data used in this thesies
see Walthert et al., 2016). For a dataset in France, however, field estimates did
not at all agree with lab measurements (M. Lacoste, oral communication), pointing
out that field estimates may be prone to inconsistencies.
Possibly crowdsourcing (by e.g. farmers, gardeners, schools, Rossiter et al.,
2015) could contribute to gather soil observations. Processing such data would
require a mapping approach that accounts for varying reliability and quality of
the various types of soil data, for example similar to Sect. 3.2.2. Moreover, new

113
6. Concluding remarks and outlook

soil analytical methods replacing or complementing current data gathering in CSM


field surveys (e.g application of proximal sensing methods) should be explored.
Summarizing, soil sampling, proximal and field estimation strategies should be
evaluated to increase objectivity and efficiency of data acquisition. Further, long-
term availability of all data collected by CMS should be ensured.

6.2.3. Cost and duration of mapping campaigns

The missing specifications make cost estimates for DSM very difficult but not
impossible (see Kempen et al., 2012). However, Knecht et al. (2017) and Carizzoni
et al. (2017) did not even attempt a cost estimate for DSM. Costs for CSM are
clear for small areas, but for whole Switzerland costs estimated by Knecht et al.
(2017) and Carizzoni et al. (2017) largely differed, hence cost estimates are also
problematic for CSM.
It appears that smaller costs are currently the only driver to implement DSM.
Following Borer and Knecht (2014, p. 28), DSM would be acceptable as a replace-
ment of CSM if it could produce equally good or better maps at same or smaller
costs (criteria how this should be measured are not defined). However, advocates
of CSM largely neglect that DSM possibly provides spatial soil information in less
time. Carizzoni et al. (2017) state that comprehensive map generation for Switzer-
land by CMS likely will last 30 years. Current issues like planning of irrigation
infrastructure to adapt to climate change (BAFU, 2012) require soil information
much sooner. Conventional preparation of regional conceptual maps for purpo-
sively selecting soil profile locations for CSM could be replaced by DSM sampling
design that could be quickly provided for whole Switzerland. Time required for co-
variate processing and model building does not increase linearly with the mapped
area in contrast to CSM polygon demarcation. If there is a time and cost benefit
for DSM, it exists most likely for large areas (Kempen et al., 2012). Monitoring
changes of soil properties with time and regular updates of the soil maps were
identified as important user needs (Carizzoni et al., 2017; Knecht et al., 2017) and
therefore ask for a faster mapping process.
The imperative of at least as accurate maps by DSM as by CSM (Knecht et al.,
2017) could be put into perspective if DSM products could be more quickly and
less costly generated, even if they were less precise (half a loaf is better than no
bread!). For example, the successfully implemented spatial soil protection strategy
in the city of Stuttgart uses a map that is based on an extremely simplified view

114
6.2. Outlook – DSM in Switzerland

on soil (indicator system with levels 0–5 with 0 = completley sealed, 5 = very
good soil quality, Wolff, 2007).
Currently, between the overview soil map (1:200 000, FSO, 2000a) and the large-
scale maps (1:5 000, e.g. Jäggli et al., 1998) hardly any intermediate-scale maps
(1:10 000 or 1:25 000) exist. Due to missing experience, end-users did not ask for
products with intermediate mapping scales, although such scales might satisfy
their needs (Carizzoni et al., 2017).
Summing up, a discussion should be started whether slightly less precise maps
are tolerable if they become available much sooner and at lower cost.

6.2.4. Stakeholders’ objections to DSM

Scepticism amongst soil scientists and end-users about DSM is widespread (Omuto
and Nachtergaele, 2013). Swiss users do not believe that DSM could satisfy their
needs (Knecht et al., 2017), although they can hardly have much experience with
DSM products considering that only few publications on DSM in Switzerland are
available so far. As a notable exception, mapping of SOC stocks in Swiss forest
soils (Chapt. 5) was an end-user driven DSM application.
Part of the objection to DSM is possibly caused by misunderstandings: Knecht
et al. (2017, p. 22) and Carizzoni et al. (2017, p. 10) consider the fact that DSM
needs observed soil data as a drawback. This view does, however, not acknowledge
that not only CSM but also DSM is a field-based method in need of ground truth
data. And it ignores that it is still an unresolved question whether CSM is the best
strategy to generate high-resolution spatial information on soils. Similar precisions
of DSM and (outdated) CSM products for the Canton of Zurich (Sect. 6.2.1) show
that not differences in predictive precision but preferences of end-users are decisive
for practical use of CSM instead of DSM products.
End-users are familiar with crisp polygon information provided by CSM, and
they have established procedures (and likely adapted software) to process such
information. DSM products predict gradual changes of soil properties on a grid,
for which end-users have to develop new procedures. In addition, CSM maps
parametrize uncertainty through complex polygon units or less accurate informa-
tion attributed to such polygons. DSM again delivers uncertainty information in
form of prediction intervals, which is currently still unfamiliar to end-users.
DSM benefits form techniques of many neighbouring disciplines (e.g. remote

115
6. Concluding remarks and outlook

sensing, machine learning). These are mostly new to the applied soil science com-
munity, and persons applying DSM might not have the full required knowledge
for an appropriate application of the methodology including a proper sense of the
limitations of the approach. Gaps in knowledge transfer possibly lead to misuse
of DSM techniques and to meaningless output (Omuto and Nachtergaele, 2013).
Acceptance of DSM could clearly grow with specifications to ensure quality of
DSM products. Careful communication of DSM products including their uncer-
tainty seems crucial for DSM success.

6.2.5. Institutional structures discouraging DSM

Advancing and implementing DSM methods is hindered by an “institutional gap”


(Greiner et al., 2016): On the one hand, developing and establishing local DSM
specifications and working out step-by-step manuals are too applied to gain grants
by science funding agencies because they cannot be easily published in academic
research literature (lack of global relevance). Cantonal soil agencies on the other
hand mostly operate with very limited finances barely sufficient for their daily
tasks, and they do not have the resources for developing new soil mapping proce-
dures, nor do private companies that are active in this field. Research by Agro-
scope underwent a change of focus in the late 1990s, when the CSM service was
closed (Borer and Knecht, 2014). To this date, this gap has not been filled and an
institution that delves into this matter is urgently needed.
Moreover, cost and time advantages of DSM are likely unravelled only by map-
ping large areas (Sect. 6.2.3). DSM could further benefit by sharing infrastructure
(e.g. already existing geodata or computing facilities at research institutions to
speed up covariate processing). Due to the federal structure of Switzerland, the
Cantons are in charge of soil mapping. But these institutions might be too small
to benefit from DSM advantages in relation to cost and project duration, hence
broad collaborations or a national initiative would be necessary. But reality is
different: Assembling soil legacy data into a national database is already quite
difficult because of data ownership and licences (Carizzoni et al., 2017, p. 24).
Successfully implementation of DMS requires cross-institutional and cross-cantonal
collaborations targeting at a nationally coordinated effort.

116
6.2. Outlook – DSM in Switzerland

Specifically in Switzerland, conservation of soils is – compared to protection of


other resources as water, air or biodiversity – an undervalued topic on the political
agenda. Soil mapping is generally perceived as very expensive, meanwhile costs
for supporting the agricultural sector – being much larger – are covered without
many questions (Borer and Knecht, 2014). A fraction of the annual payments to
the agricultural sector would allow to tackle the issues that are listed here and
finally provide the needed soil information. In the mid- and long-term this would
help to better balance society’s interest to preserve soil functions against private
landowners’ interests who too often maximize profit by sealing the soil at cost of
complete loss of soil functionality.

117
References
Ad-hoc Arbeitsgruppe Boden: Bodenkundliche Kartieranleitung, Bundesanstalt für Geowis-
senschaften und Rohstoffe, Hannover, 5. Auflage, 2005.
Adams, W. A.: The effect of organic matter on bulk and true densities of some uncultivated pod-
zolic soils, Journal of Soil Science, 24, 10–17, doi:10.1111/j.1365-2389.1973.tb00737.x, 1973.
Adhikari, K., Kheir, R., Greve, M., Bøcher, P., Malone, B., Minasny, B., McBratney, A., and
Greve, M.: High-resolution 3-D mapping of soil texture in Denmark, Soil Science Society of
America Journal, 77, 860–876, doi:10.2136/sssaj2012.0275, 2013.
AfUSO: Bodenkartierung Kanton Solothurn, Projekthandbuch Teil III, Kartier-
methodik, Tech. Rep. 6. Auflage, Amt für Umwelt des Kantons Solothurn, URL
https://www.so.ch/verwaltung/bau-und-justizdepartement/amt-fuer-umwelt/
boden-untergrund-geologie/bodenkartierung/, 2014.
AGR: Geoprodukt Geologische Rohstoffkarte ADT, Metadaten komplett. Amt für Gemeinden
und Raumordnung des Kantons Bern, www.be.ch/geoportal, last access: 04.04.2017, 2015.
Aitchison, J.: The statistical analysis of compositional data, Chapman & Hall, doi:10.1007/
978-94-009-4109-0, 1986.
Al-Qinna, M. I. and Jaber, S. M.: Predicting soil bulk density using advanced pedotransfer
functions in an arid environment, Transactions of the ASABE, 56, 963–976, 2013.
ALN: Historische Feuchtgebiete der Wildkarte 1850. Amt für Landschaft und Natur des
Kantons Zürich, http://www.aln.zh.ch/internet/baudirektion/aln/de/naturschutz/
naturschutzdaten/geodaten.html, last access 29.03.2017, 2002.
ALN: Geologische Karte des Kantons Zürich nach Hantke et. al 1967, GIS-ZH Nr. 41. Amt für
Landschaft und Natur des Kantons Zürich, http://www.gis.zh.ch/Dokus/Geolion/gds_41.
pdf, last access: 15.02.2015, 2014a.
ALN: Meliorationskataster des Kantons Zürich, GIS-ZH Nr. 148. Amt für Landschaft und Natur
des Kantons Zürich., http://www.geolion.zh.ch/geodatensatz/show?nbid=387, last ac-
cess 29.03.2017, 2014b.
ARE: Sachplan Fruchtfolgeflächen FFF, Vollzugshilfe 2006, Tech. rep., Bundesamt für Raumen-
twicklung, 2006.
Arrouays, D., Deslais, W., and Badeau, V.: The carbon content of topsoil and its geographical
distribution in France, Soil Use and Management, 17, 7–11, 2001.
Arrouays, D., McBratney, A. B., Minasny, B., Hempel, J. W., Heuvelink, G. B. M., MacMillan,
R. A., Hartemink, A. E., Lagacherie, P., and McKenzie, N. J.: The GlobalSoilMap project
specifications, in: GlobalSoilMap Basis of the global spatial soil information system, pp. 9–12,
CRC Press, doi:10.1201/b16500-4, 2014.
AWA: Geoprodukt Versickerungszonen VSZ, Metadaten komplett. Amt für Wasser und Abfall
des Kantons Bern, www.be.ch/geoportal, last access: 04.04.2017, 2014a.
AWA: Geoprodukt Grundwasserkarte GW25, Metadaten komplett.Amt für Wasser und Abfall
des Kantons Bern, www.be.ch/geoportal, last access: 04.04.2017, 2014b.
AWEL: Hinweisflächen für anthropogene Böden, GIS-ZH Nr. 260. Amt für Abfall, Wasser, En-
ergie und Luft des Kanton Zürich, http://www.geolion.zh.ch/geodatensatz/show?nbid=
985, last access 29.03.2017, 2012.
AWEL: Grundwasservorkommen, GIS-ZH Nr. 327. Amt für Abfall, Wasser, Energie und Luft
des Kanton Zürich, http://www.geolion.zh.ch/geodatensatz/show?nbid=723, last access
29.03.2017, 2014.
AWEL: NO2 -Immissionen, GIS-ZH Nr. 82, Amt für Abfall, Wasser, Energie und Luft des Kanton
Zürich, http://geolion.zh.ch/geodatensatz/show?nbid=783, last access 29.03.2017, 2015.

xiii
References

BAFU: Strukturierung und Adressierung des Gewässernetzes 1:25’000 nach Modell gwn25-07.
Bundesamt für Umwelt, http://www.bafu.admin.ch/wasser/13462/13496/15011, last ac-
cessed 07.06.2016, 2009.
BAFU: Luftbelastung: Karten Jahreswerte, Ammoniak und Stickstoffdeposition, Jahresmit-
tel 2007 (modelliert durch METEOTEST), http://www.bafu.admin.ch/luft/
luftbelas-tung/schadstoffkarten, last access 15.02.2015, 2011.
BAFU: Umgang mit lokaler Wasserknappheit in der Schweiz, Bericht des Bundesrates zum Pos-
tulat ”Wasser und Landwirtschaft. Zukünftige Herausforderungen”, Bundesmamt für Umwelt,
Abteilung Wasser, 2012.
BAFU and GRID-Europe: Swiss Environmental Domains. A new spatial framework for re-
porting on the environment, Environmental studies 1024, Federal Office for the Envi-
ronment FOEN, Berne, URL http://www.bafu.admin.ch/publikationen/publikation/
01564/index.html?lang=en, 2010.
Baritz, R., Seufert, G., Montanarella, L., and Ranst, E. V.: Carbon concentrations and stocks
in forest soils of Europe, Forest Ecology and Management, 260, 262–277, doi:10.1016/j.foreco.
2010.03.025, 2010.
Bechler, K. and Toth, O.: Bewertung von Böden nach ihrer Leistungsfähigkeit, Leitfaden
für Planungen und Gestattungsverfahren, LUBW Landesanstalt für Umwelt, Messun-
gen und Naturschutz Baden-Württemberg, URL http://www.fachdokumente.lubw.
baden-wuerttemberg.de/servlet/is/99474/Bodenschutz_23_Lesefassung_aktuell.
pdf?command=downloadContent&filename=Bodenschutz_23_Lesefassung_aktuell.pdf&
FIS=199, 2. Auflage, last access: 04.04.2017, 2010.
Behrens, T., Schmidt, K., Zhu, A. X., and Scholten, T.: The ConMap approach for terrain-based
digital soil mapping, European Journal of Soil Science, 61, 133–143, doi:10.1111/j.1365-2389.
2009.01205.x, 2010a.
Behrens, T., Zhu, A., Schmidt, K., and Scholten, T.: Multi-scale digital terrain analysis and
feature selection for digital soil mapping, Geoderma, 155, 175–185, doi:10.1016/j.geoderma.
2009.07.010, 2010b.
Behrens, T., Schmidt, K., Ramirez-Lopez, L., Gallant, J., Zhu, A.-X., and Scholten, T.: Hyper-
scale digital soil mapping and soil formation analysis, Geoderma, 213, 578–588, doi:10.1016/
j.geoderma.2013.07.031, 2014.
Ben-Dor, E., Chabrillat, S., Demattê, J. A. M., Taylor, G. R., Hill, J., Whiting, M. L., and
Sommer, S.: Using imaging spectroscopy to study soil properties, Remote Sensing of the
Environment, 113, S38–S55, doi:10.1016/j.rse.2008.09.019, 2009.
Berry, J.: Use surface area for realistic calculations?, Geoworld, 15, 2002.
BFS: GEOSTAT Benützerhandbuch, Bundesamt für Statistik, Bern, 2001.
Blaser, P., Kernebeek, P., Tebbens, L., van Breemen, N., and Luster, J.: Cryptopodzolic Soils
in Switzerland, European Journal of Soil Science, 48, 411–423, doi:10.1111/j.1365-2389.1997.
tb00207.x, 1997.
Blaser, P., Zimmermann, S., Luster, J., Walthert, L., and Lüscher, P.: Waldböden der Schweiz.
Band 2. Regionen Alpen und Alpensüdseite, Eidg. Forschungsanstalt WSL and Hep Verlag,
Birmensdorf and Bern, 2005.
Bolliger, J., Hagedorn, F., Leifeld, J., Böhl, J., Zimmermann, S., Soliva, R., and Kienast, F.:
Effects of land-use change on carbon stocks in Switzerland, Ecosystems, 11, 895–907, 2008.
Borer, F. and Knecht, M.: Bodenkartierung Schweiz Entwicklung und Ausblick, Bericht,
Arbeitsgruppe Bodenkartierung der Bodenkundlichen Gesellschaft der Schweiz, URL
http://www.soil.ch/cms/fileadmin/Medien/Arbeitsgruppen/Bodenkartierung/
Bericht_BoKa_2014_Schlussfassung_BGS_Web.pdf, 2014.
Bourennane, H., King, D., Chéry, P., and Bruand, A.: Improving the kriging of a soil variable
using slope gradient as external drift, European Journal of Soil Science, 47, 473–483, doi:
10.1111/j.1365-2389.1996.tb01847.x, 1996.
Box, G. E. P. and Cox, D. R.: An Analysis of Transformations, Journal of the Royal Statistical
Society Series B, 26, 211–243, 1964.
Brassel, P. and Lischke, H., eds.: Swiss National Forest Inventory: Methods and models of the
second assessment, Swiss Federal Institute for Forest, Snow and Landscape Research WSL,
Birmensdorf, 2001.

xiv
References

Breheny, P. and Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic
regression models with grouped predictors, Statistics and Computing, 25, 173–187, doi:10.
1007/s11222-013-9424-2, 2015.
Breiman, L.: Random Forests, Machine Learning, 45, 5–32, 2001.
Brungard, C. W., Boettinger, J. L., Duniway, M. C., Wills, S. A., and Edwards Jr., T. C.:
Machine learning for predicting soil classes in three semi-arid landscapes, Geoderma, 239–
240, 68–83, doi:10.1016/j.geoderma.2014.09.019, 2015.
Brunner, J., Jäggli, F., Nievergelt, J., and Peyer, K.: Kartieren und Beurteilen von Land-
wirtschaftsböden, FAL Schriftenreihe 24, Eidgenössische Forschungsanstalt für Agrarökologie
und Landbau, Zürich-Reckenholz (FAL), 1997.
Brus, D. J., Kempen, B., and Heuvelink, G. B. M.: Sampling for validation of digital soil maps,
European Journal of Soil Science, 62, 394–407, doi:10.1111/j.1365-2389.2011.01364.x, 2011.
Buchanan, S., Triantafilis, J., Odeh, I. O. A., and Subansinghe, R.: Digital soil mapping of
compositional particle-size fractions using proximal and remotely sensed ancillary data, Geo-
physics, 77, WB201–WB211, doi:10.1190/geo2012-0053.1, 2012.
Bühlmann, P. and Hothorn, T.: Boosting algorithms: Regularization, prediction and model
fitting, Statistical Science, 22, 477–505, doi:10.1214/07-sts242, 2007.
Bundesamt für Umwelt BAFU: GIS-Daten Wald. Sturmschäden, URL www.bafu.admin.ch/
gis/02911/07405, 2010.
Calzolari, C., Ungaro, F., Filippi, N., Guermandi, M., Malucelli, F., Marchi, N., Staffilani, F.,
and Tarocco, P.: A methodological framework to assess the multiple contributions of soils to
ecosystem services delivery at regional scale, Geoderma, 261, 190–203, doi:10.1016/j.geoderma.
2015.07.013, 2016.
Campling, P., Gobin, A., and Feyen, J.: Logistic modeling to spatially predict the probability
of soil drainage classes, Soil Science Society of America Journal, 66, 1390–1401, doi:10.2136/
sssaj2002.1390, 2002.
Carizzoni, M., Cavelti, G., Hurst, T., and Zürrer, M.: Konzept für ein flächendeckendes
Bodeninformationssystem, Schlussbericht, BHP – Brugger und Partner AG, BABU
GmbH, myx GmbH, URL https://www.bafu.admin.ch/dam/bafu/de/dokumente/boden/
externe-studien-berichte/konzept-flaechendeckendes-bodeninformationssystem.
pdf.download.pdf/170512_KOBI_Schlussbericht_final.pdf, Auftraggeber: Bundesamt
für Umwelt (BAFU), 2017.
Chiti, T., Dı́az-Pinés, E., and Rubio, A.: Soil organic carbon stocks of conifers, broadleaf and
evergreen broadleaf forests of Spain, Biology and Fertility of Soils, 48, 817–826, doi:10.1007/
s00374-012-0676-3, 2012.
Christensen, O. and Ribeiro Jr, P.: geoRglm – A package for generalised linear spatial models, R-
NEWS, 2, 26–28, URL http://cran.R-project.org/doc/Rnews, iSSN 1609-3631, last access
04.04.2017, 2002.
Cipar, J., Cooley, T., Lockwood, R., and Grigsby, P.: Distinguishing between coniferous and de-
ciduous forests using hyperspectral imagery, in: Geoscience and Remote Sensing Symposium,
2004. IGARSS’04. Proceedings. 2004 IEEE International, vol. 4, pp. 2348–2351, 2004.
Cleveland, W. S.: Robust Locally Weighted Regression and Smoothing Scatterplots, Journal of
the American Statistical Association, 74, 829–836, doi:10.2307/2286407, 1979.
Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wich-
mann, V., and Böhner, J.: System for automated geoscientific analyses (SAGA) v. 2.1.4,
Geoscientific Model Development, 8, 1991–2007, doi:10.5194/gmd-8-1991-2015, URL http:
//www.geosci-model-dev.net/8/1991/2015/, 2015.
Cressie, N.: Block Kriging for Lognormal Spatial Processes, Mathematical Geology, 38, 413–443,
doi:10.1007/s11004-005-9022-8, 2006.
Croux, C. and Dehon, C.: Estimators of the Multiple Correlation Coefficient: Local Robustness
and Confidence Intervals, Statistical Papers, 44, 315–334, 2003.
Danner, C., Hensold, C., Blum, P., Weidenhammer, S., Aussendorf, M., Kraft, M.,
Weidenbacher, A., Holleis, P., and Kölling, C.: Das Schutzgut Boden in der Pla-
nung, Bewertung natürlicher Bodenfunktionen und Umsetzung in Planungs- und Genehmi-
gungsverfahren, Bayerisches Landesamt für Umweltschutz, Bayerisches Geologisches Lan-

xv
References

desamt, URL http://www.lfu.bayern.de/boden/bodenfunktionen/ertragsfaehigkeit/


doc/arbeitshilfe_boden.pdf, last access 29.03.2017, 2003.
Davison, A. C. and Hinkley, D. V.: Bootstrap Methods and Their Applications, Cambridge
University Press, Cambridge, doi:10.1017/cbo9780511802843, 1997.
de Brogniez, D., Ballabio, C., Stevens, A., Jones, R. J. A., Montanarella, L., and van Wesemael,
B.: A map of the topsoil organic carbon content of Europe generated by a generalized additive
model, European Journal of Soil Science, 66, 121–134, doi:10.1111/ejss.12193, 2015.
De Vos, B., Van Meirvenne, M., Quataert, P., Deckers, J., and Muys, B.: Predictive Quality of
Pedotransfer Functions for Estimating Bulk Density of Forest Soils, Soil Science Society of
America Journal, 69, 500–510, doi:10.2136/sssaj2005.0500, 2005.
Diek, S., Schaepman, M., and de Jong, R.: Creating multi-temporal composites of airborne
imaging spectroscopy data in support of digital soil mapping, Remote Sensing, 8, doi:10.
3390/rs8110906, 2016.
Diggle, P. and Ribeiro Jr, P.: Bayesian inference in gaussian model-based geostatistics, Geo-
graphical and Environmental Modelling, 6, 129–146, doi:10.1080/1361593022000029467, 2002.
Diggle, P. J. and Ribeiro, Jr., P. J.: Model-based Geostatistics, Springer, New York, 2007.
Dirichlet, G. L.: Über die Reduction der positiven quadratischen Formen mit drei unbestimmten
ganzen Zahlen, Journal für die reine und angewandte Mathematik, 40, 209–227, doi:10.1017/
cbo9781139237345.005, URL http://eudml.org/doc/147457, 1850.
DMC: Disaster Monitoring Constellation International Imaging, http://www.dmcii.com, last
access: 03.02.2015, 2015.
Dobson, A. J.: An Introduction to GeneralIzed Linear Models, Chapman & Hall/CRC, Boca
Raton, 2002.
DVWK: Filtereigenschaften des Bodens gegenüber Schadstoffen. Teil I: Beurteilung der Fähigkeit
von Böden, zugeführte Schwermetalle zu immobilisieren. DVWK-Merkblätter zur Wasser-
wirtschaft, Bericht, Deutscher Verband für Wasserwirtschaft und Kulturbau (DVWK), 1988.
Egli, M., Margreth, M., V??kt, U., Fitze, P., Tognina, G., and Keller, F.: Modellierung von Bo-
dentypen und Bodeneigenschaften im Oberengadin (Schweiz) mit Hilfe eines Geographischen
Informationssystems (GIS), Geographica Helvetica, 60, 87–96, doi:10.5194/gh-60-87-2005,
2005.
ELF: Schweizerische Referenzmethoden der Forschungsanstalten Agroscope – Boden- und Sub-
stratuntersuchungen zur Düngeberatung, Loseblattordner E1.011.d 1, Forschungsanstalten
Agroscope ART und ACW, Zürich und Changins, Ausgabe 1996 mit Änderungen von 1997
bis 2009, Version 2015, Methode “AAE-10”, 1996.
ESRI: ArcGIS Desktop: Release 10, ESRI Environmental Systems Research Institute, Redlands,
California, USA., URL www.esri.com, last access 29.03.2017, 2010.
Evans, I. S.: General geomorphometry, derivatives of altitude, and descriptive statistics, in:
Spatial Analysis in Geomorphology, edited by Chorley, R. J., pp. 17–90, Harper & Row, 1972.
Evans, J. S., Oakleaf, J., and Cushman, S.: A Toolbox for Surface Gradient Modeling, http:
//evansmurphy.wix.com/evansspatial, last accessed 04.04.2017, 2014.
FAC: Methoden für Bodenuntersuchungen, no. 5 in Schriftenreihe der FAC, Liebefeld, Eid-
genössische Forschungsanstalt für Agrikulturchemie und Umwelthygiene (FAC), 1989.
Fan, L., Lehmann, P., and Or, D.: Effects of soil spatial variability at the hillslope and catchment
scales on characteristics of rainfall-induced landslides, Water Resources Research, 52, 1781–
1799, 2016.
FAO and ITPS: Status of the World’s Soil Resources (SWSR), Main report, Food and Agriculture
Organization of the United Nations and Intergovernmental Technical Panel on Soils, Rome,
Italy, 2015.
Faraway, J. J.: Linear Models with R, vol. 63 of Texts in Statistical Science, Chapman &
Hall/CRC, Boca Raton, 2005.
Fitzpatrick, B. R., Lamb, D. W., and Mengersen, K.: Ultrahigh Dimensional Variable Selection
for Interpolation of Point Referenced Spatial Data: A Digital Soil Mapping Case Study, PLOS
ONE, 11, 1–19, doi:10.1371/journal.pone.0162489, 2016.
FOEN: Switzerland’s Greenhouse Gas Inventory 1990–2010, National inventory report 2012,
submission of 13 april 2012 under the United Nations Framework Convention on Climate

xvi
References

Change and under the Kyoto Protocol, Federal Office for the Environment FOEN, Climate
Division, Bern, 2012a.
FOEN: GIS-Daten Biodiversität. Federal Office for the Environment FOEN., URL www.bafu.
admin.ch/gis/02911/07403, 2012b.
FOEN: Switzerland’s Greenhouse Gas Inventory 1990–2014, National inventory report 2016,
submission of 15 april 2016 under the United Nations Framework Convention on Climate
Change and under the Kyoto Protocol, Federal Office for the Environment FOEN, Climate
Division, Bern, 2016.
Frei, E., Vökt, U., Flückiger, H., Brunner, H., and Schai, F.: Bodeneignungskarte der Schweiz,
Bundesamt für Raumplanung, Bundesamt für Landwirtschaft und Bundesamt für Forstwesen,
Bern, 1980.
Friedman, J., Hastie, T., and Tibshirani, R.: Regularization paths for generalized linear models
via coordinate descent, Journal of Statistical Software, 33, 1–22, doi:10.18637/jss.v033.i01,
2010.
Frigge, M., Hoaglin, D. C., and Iglewicz, B.: Some implementations of the boxplot, The American
Statistician, 43, 50–54, doi:10.2307/2685173, 1989.
FSO: Swiss soil suitability map. BFS GEOSTAT. Swiss Federal Statistical Office,
http://www.bfs.admin.ch/bfs/portal/de/index/dienstleistungen/geostat/
datenbeschreibung/digitale_bodeneignungskarte.html, last access 15.02.2015, 2000a.
FSO: Tree composition of Swiss forests. BFS GEOSTAT. Swiss Federal Statistical Of-
fice, http://www.bfs.admin.ch/bfs/portal/de/index/dienstleistungen/geostat/
datenbeschreibung/waldmischungsgrad.html, last access 15.02.2015, 2000b.
Gabriel, K. R.: Biplot Display of Multivariate Matrices for Inspection of Data and Diagnostics,
in: Interpreting Multivariate Data, edited by Barnett, V., John Wiley & Sons, Chichester,
proceedings of the Conference Entitled “Looking at Multivariate Data”; Sheffield, 24–27 March
1980, 1981.
Gallant, J., Dowling, T., and Austin, J.: Multi-resolution Ridge Top Flatness (MrRTF, 3”
resolution). v1, CSIRO. Data Collection, doi:10.4225/08/512EEA6332EEB, 2013.
Gallant, J. C. and Dowling, T. I.: A multiresolution index of valley bottom flatness for mapping
depositional areas, Water Resources Research, 39, doi:10.1029/2002WR001426, 2003.
Gams, H.: Zur Geschichte, klimatischen Begrenzung und Gliederung der immergrünen Mit-
telmeerstufe, Veröff. Geobot. Eidgenöss. Tech. Hochsch., Stift. Rübel Zürich, 12, 163–204,
1935.
Gasser, U., Gubler, A., Hincapié, I., Karagiannis, D.-A., Schwierz, C., and Zimmermann, S.:
Bestimmung der Austauschereigenschaften von Waldböden: Kostenoptimierung, Bulletin Bo-
denkundliche Gesellschaft der Schweiz, 32, 51–52, 2011.
Giamboni, M.: SilvaProtect-CH - Phase I, Projektdokumentation, Bundesamt für Umwelt
BAFU, 2008.
Gisler, S., Liniger, H., and Prasuhn, V.: Erosionsrisikokarte der landwirtschaftlichen Nutzfläche
der Schweiz im 2x2-Meter-Raster (ERK2), Technisch-wissenschaftlicher Bericht, CDE Univer-
sität Bern und Agroscope Reckenholz-Tänikon ART, 2010.
Gneiting, T., Balabdaoui, F., and Raftery, A. E.: Probabilistic forecasts, calibration and sharp-
ness, Journal of the Royal Statistical Society Series B, 69, 243–268, 2007.
Gonseth, Y., Wohlgemuth, T., Sansonnens, B., and Buttler, A.: Die biogeographischen Regionen
der Schweiz. Erläuterungen und Einteilungsstandard., Umwelt-Materialien Nr. 137, BUWAL,
Bundesamt für Umwelt, Wald und Landschaft, 2001.
Gotway, C. A. and Young, L. J.: Combining Incompatible Spatial Data, Journal of the American
Statistical Association, 97, 632–648, 2002.
Greiner, L., Nussbaum, M., and Papritz, A.: Protokoll der Diskussion am PMSoil-Workshop
vom 20.5.2016, Tagungsprotokoll, Agroscope, ETH Zürich, 2016.
Greiner, L., Keller, A., Grêt-Regamey, A., and Papritz, A.: Soil function assessment methods
for quantifying the contributions of soils to ecosystems services, Land Use Policy, 68, doi:
10.1016/j.landusepol.2017.06.025, 2017.
Grier, C. G. and Running, S. W.: Leaf Area of Mature Northwestern Coniferous Forests: Relation
to Site Water Balance, Ecology, 58, 893 – 899, doi:10.2307/1936225, 1977.

xvii
References

Grimm, R., Behrens, T., Märker, M., and Elsenbeer, H.: Soil organic carbon concentrations
and stocks on Barro Colorado Island – Digital soil mapping using Random Forests analysis,
Geoderma, 146, 102–113, doi:10.1016/j.geoderma.2008.05.008, 2008.
Groemping, U.: Relative Importance for Linear Regression in R: The Package relaimpo, Journal
of Statistical Software, 17, 1–27, doi:10.18637/jss.v017.i01, URL http://www.jstatsoft.
org/index.php/jss/article/view/v017i01, 2006.
Gurtz, J., Baltensweiler, A., and Lang, H.: Spatially distributed hydrotope based modelling of
evapotranspiration and runoff in mountainous basins, Hydrological Processes, 13, 2751–2768,
1999.
Hagedorn, F., Moeri, A., Walthert, L., and Zimmermann, S.: Kohlenstoff in Schweizer
Waldböden — bei Klimaerwärmung eine potentielle CO2 -Quelle, Schweizerische Zeitschrift
für Forstwesen, 161, 530–535, 2010.
Hantke, R. u.: Geologische Karte des Kantons Zürich und seiner Nachbargebiete, Kom-
missionsverlag Leemann, Zürich, Sonderdruck aus Vierteljahrsschrift der Naturforschenden
Gesellschaft in Zürich, 112(2): 91–122, 1967.
Hartemink, A. E., Krasilnikov, P., and Bockheim, J.: Soil maps of the world, Geoderma, 207–208,
256–267, doi:10.1016/j.geoderma.2013.05.003, 2013.
Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical Learning; Data Mining,
Inference and Prediction, Springer, New York, 2 edn., 2009.
Hastie, T. J. and Tibshirani, R. J.: Generalized Additive Models, vol. 43 of Monographs on
Statistics and Applied Probability, Chapman and Hall, London, 1990.
Haygarth, P. M. and Ritz, K.: The future of soils and land use in the UK: Soil systems for the
provision of land-based ecosystem services, Land Use Policy, 26, Supplement 1, S187–S197,
doi:10.1016/j.landusepol.2009.09.016, land Use Futures, 2009.
Henderson, B. L., Bui, E. N., Moran, C. J., and Simon, D. A. P.: Australia-wide predictions of soil
properties using decision trees, Geoderma, 124, 383–398, doi:10.1016/j.geoderma.2004.06.007,
2005.
Hengl, T., Heuvelink, G. B. M., and Stein, A.: A Generic Framework for Spatial Prediction of
Soil Variables Based on Regression-Kriging, Geoderma, 120, 75–95, 2004.
Hengl, T., de Jesus, J. M., MacMillan, R. A., Batjes, N. H., Heuvelink, G. B. M., Ribeiro, E.,
and Samuel-Rosa, A.: SoilGrids1km – Global Soil Information Based on Automated Mapping,
PLoS ONE, 9, doi:10.1371/journal.pone.0105992, 2014.
Hengl, T., Mendes de Jesus, J., Heuvelink, G. B. M., Ruiperez Gonzalez, M., Kilibarda, M.,
Blagotić, A., Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Guevara,
M. A., Vargas, R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler,
I., Mantel, S., and Kempen, B.: SoilGrids250m: Global gridded soil information based on
machine learning, PLOS ONE, 12, 1–40, doi:10.1371/journal.pone.0169748, 2017.
Herbst, P. and Mosimann, T.: Prognose der Wasserspeicherfähigkeit von Waldböden in der
Nordwestschweiz, Tech. rep., Physische Geographie und Landschaftsökologie Leitniz Univer-
sität Hannover, Hannover, 2008.
Herbst, P. and Mosimann, T.: Prognose ökologisch wichtiger Waldbodeneigenschaften mit Ran-
dom Forest in der Nordwestschweiz, Geomatik Schweiz, 108, 140–144, 2010.
Heung, B., Ho, H. C., Zhang, J., Knudby, A., Bulmer, C. E., and Schmidt, M. G.: An overview
and comparison of machine-learning techniques for classification purposes in digital soil map-
ping, Geoderma, 265, 62 – 77, doi:10.1016/j.geoderma.2015.11.014, 2016.
Hijmans, R. J., van Etten, J., Cheng, J., Mattiuzzi, M., Sumner, M., Greenberg, J. A.,
Lamigueiro, O. P., Bevan, A., Racine, E. B., and Shortridge, A.: raster: Geographic
Data Analysis and Modeling, R package versoin 2.4-15, URL http://CRAN.R-project.org/
package=raster, last access 29.03.2017, 2015.
Hobson, R. D.: Surface roughness in topography: a quantitative approach, in: Spatial analysis
in geomorphology, edited by Chorley, R. J., pp. 221–245, Harper & Row, 1972.
Hofner, B., Hothorn, T., Kneib, T., and Schmid, M.: A Framework for Unbiased Model Selection
Based on Boosting, Journal of Computational and Graphical Statistics, 20, 956–971, doi:
10.1198/jcgs.2011.09220, 2011.

xviii
References

Hofner, B., Mayr, A., Robinzonov, N., and Schmid, M.: Model-based boosting in R: A hands-
on tutorial using the R package mboost, Computational Statistics, 29, 3–35, doi:10.1007/
s00180-012-0382-5, 2014.
Hollis, J., Hannam, J., and Bellamy, P.: Empirically-derived pedotransfer functions for predicting
bulk density in European soils, European Journal of Soil Science, 63, 96–109, doi:10.1111/j.
1365-2389.2011.01412.x, 2012.
Honeysett, J. L. and Ratkowsky, D. A.: The use of ignition loss to estimate bulk density of forest
soils, Journal of Soil Science, 40, 299–308, doi:10.1111/j.1365-2389.1989.tb01275.x, 1989.
Hothorn, T., Müller, J., Schröder, B., Kneib, T., and Brandl, R.: Decomposing environmental,
spatial, and spatiotemporal components of species distributions, Ecological Monographs, 81,
329–347, doi:10.1890/10-0602.1, 2011.
Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., and Hofner, B.: mboost: Model-Based
Boosting, URL http://CRAN.R-project.org/package=mboost, R package version R package
version 2.4-2, last access 29.03.2017, 2015.
Hotz, M.-C., Weibel, F., Ringgenberg, B., Beyeler, A., Finger, A., Humbel, R., and Sager, J.:
Arealstatistik Schweiz Zahlen – Fakten – Analysen, Bericht, Bundesamt für Statistik (BFS),
Neuchâtel, 2005.
Hueni, A., Lenhard, K., Baumgartner, A., and Schaepman, M. E.: Airborne prism experiment
calibration information system, IEEE Transactions on Geoscience and Remote Sensing, 51,
5169–5180, doi:10.1109/tgrs.2013.2246575, 2013.
IPCC: Good Practice Guidance for Land Use, Land-Use Change and Forestry (IPCC GPG
LULUCF), URL http://www.ipcc-nggip.iges.or.jp/public/gpglulucf/gpglulucf.
htm[16.01.2013], 2003.
Iwahashi, J. and Pike, R. J.: Automated classifications of topography from DEMs by an unsu-
pervised nested-means algorithm and a three-part geometric signature, Geomorphology, 86,
409–440, doi:10.1016/j.geomorph.2006.09.012, 2007.
Jäggli, F., Peyer, K., Pazeller, A., and Schwab, P.: Grundlagenbericht zur Bodenkartierung
des Kantons Zürich, Tech. rep., Volkswirtschaftsdirektion des Kantons Zürich und Eidg.
Forschungsanstalt für Agrarökologie und Landbau Zürich Reckenholz FAL, 1998.
Jalabert, S. S. M., Martin, M. P., Renaud, J.-P., Boulonne, L., Jolivet, C., Montanarella, L., and
Arrouays, D.: Estimating forest soil bulk density using boosted regression modelling, Soil Use
and Management, 26, 516–528, doi:10.1111/j.1475-2743.2010.00305.x, 2010.
Jehle, M., Hueni, A., Damm, A., D’Odorico, P., Weyermann, J., Kneubühler, M., and Meule-
man, K.: APEX – current status, performance and validation concept, in: Sensors, 2010
IEEE, pp. 533–537, doi:10.1109/ICSENS.2010.5690122, URL http://ieeexplore.ieee.org/
document/5690122/?reload=true, 2010.
Jenness, J.: Topographic Position Index (TPI) v. 1.2, http://www.jennessent.com, last access:
04.04.2017, 2006.
Johnson, C. E., Ruiz-Méndez, J. J., and Lawrence, G. B.: Forest soil chemistry and terrain
attributes in a Catskills watershed, Soil Science Society of America Journal, 64, 1804–1814,
doi:10.2136/sssaj2000.6451804x, 2000.
Jolliffe, I. T. and Stephenson, D. B., eds.: Forecast verification: A practitioner’s guide in atmo-
spheric science, Wiley-Blackwell, Chichester, 2 edn., 2012.
Kammann, E. E. and Wand, M. P.: Geoadditive models, Journal of the Royal Statistical Society,
Series C: Applied Statistics, 52, 1–18, doi:10.1111/1467-9876.00385, 2003.
Katterer, T., Andren, O., and Jansson, P.-E.: Pedotransfer functions for estimating plant avail-
able water and bulk density in Swedish agricultural soils, Acta Agriculturae Scandinavica
Section B-Soil and Plant Science, 56, 263–276, doi:10.1080/09064710500310170, 2006.
Kempen, B., Brus, D., and Stoorvogel, J.: Three-dimensional mapping of soil organic mat-
ter content using soil type-specific depth functions, Geoderma, 162, 107–123, doi:10.1016/j.
geoderma.2011.01.010, 2011.
Kempen, B., Brus, D. J., Stoorvogel, J. J., Heuvelink, G. B., and de Vries, F.: Efficiency
Comparison of Conventional and Digital Soil Mapping for Updating Soil Maps, Soil Science
Society of America Journal, 6, 2097–2115, doi:10.2136/sssaj2011.0424, 2012.

xix
References

Kidd, D. B., Malone, B. P., McBratney, A. B., Minasny, B., and Webb, M. A.: Digital mapping of
a soil drainage index for irrigated enterprise suitability in Tasmania, Australia, Soil Research,
52, 107–119, doi:10.1071/SR13100, 2014.
Kiss, R.: Determination of drainage network in digital elevation models, utilities and limitations,
Journal of Hungarian Geomathematic, 2, 2004.
Knecht, M., Lüscher, C., and Borer, F.: Bedürfnisabklärungen Bodeninformationen, Schluss-
bericht Forschungsauftrag Nr. 14.0015.PJ / O453-1313, Schlussfassung Version B, AMBIO
GmbH Arbeitsgemeinschaft in angewandten Umweltwissenschaften, Zürich, URL https:
//www.bafu.admin.ch/dam/bafu/de/dokumente/boden/externe-studien-berichte/
Bed%C3%BCrfnisabkl%C3%A4rungen%20Bodeninformationen.pdf.download.pdf/ambio_
_Bed%C3%BCrfnisabkl%C3%A4rungen_Bodeninformationen_(2016)_.pdf, Auftraggeber:
Bundesamt für Umwelt BAFU, Bern, 2017.
Kneib, T., Hothorn, T., and Tutz, G.: Variable selection and model choice in geoadditive regres-
sion models, Biometrics, 65, 626–634, doi:10.1111/j.1541-0420.2008.01112.x, 2009.
Kreuzwieser, J. and Rennberg, H.: Molecular and physiological responses of trees to waterlogging
stress, Plant Cell and Environment, 37, 2245–2259, doi:10.1111/pce.12310, 2014.
Kriegler, F. J., Malila, W. A., Nalepka, R. F., and Richardson, W.: Preprocessing Transforma-
tions and Their Effects on Multispectral Recognition, in: Remote Sensing of Environment,
VI, p. 97, 1969.
Krogh, L., Noergaard, A., Hermansen, M., Greve, M. H., Balstroem, T., and Breuning-Madsen,
H.: Preliminary estimates of contemporary soil organic carbon stocks in Denmark using multi-
ple datasets and four scaling-up methods, Agriculture, Ecosystems & Environment, 96, 19–28,
doi:10.1016/s0167-8809(03)00016-1, 2003.
Kuhn, M.: caret: Classification and Regression Training, URL https://CRAN.R-project.org/
package=caret,https://github.com/topepo/caret, R package version 6.0-71, last access:
04.04.2017, 2015.
Kumar, S., Lal, R., and Liu, D.: A geographically weighted regression kriging approach for
mapping soil organic carbon stock, Geoderma, 189–190, 627–634, doi:10.1016/j.geoderma.
2012.05.022, 2012.
Künsch, H. R., Papritz, A., Schwierz, C., and Stahel, W. A.: Robust estimation of the exter-
nal drift and the variogram of spatial data, in: Proceedings of the ISI 58th World Statis-
tics Congress of the International Statistical Institute, Dublin, doi:10.3929/ethz-a-009900710,
URL http://e-collection.library.ethz.ch/eserv/eth:7080/eth-7080-01.pdf, 2011.
Künsch, H. R., Papritz, A., Schwierz, C., and Stahel, W. A.: Robust Geostatistics, in prep.
Lacoste, M., Mulder, V., de Forges, A. R., Martin, M., and Arrouays, D.: Evaluating large-extent
spatial modeling approaches: A case study for soil depth for France, Geoderma Regional, 7,
137–152, doi:10.1016/j.geodrs.2016.02.006, 2016.
Lagacherie, P., Bailly, J. S., Monestiez, P., and Gomez, C.: Using scattered hyperspectral imagery
data to map the soil properties of a region, European Journal of Soil Science, 63, 110–119,
doi:10.1111/j.1365-2389.2011.01409.x, 2012.
Lakanen, E. and Erviö, R.: A comparison of eight extractants for the determination of plant
available micronutrients in soils, Acta Agralia Fennica, 123, 223–232, 1971.
LANAT: Geoprodukt Landwirtschaftliche Eignungskarte LWEK74, Metadaten komplett.
Amt für Landwirtschaft und Natur, Kanton Bern, URL http://files.be.ch/bve/agi/
geoportal/geo/lpi/LWEK74_1974_01_LANG_DE.PDF, last access: 04.04.2017, 2015.
Lehmann, A., David, S., and Stahr, K.: TUSEC — Bilingual-Edition: Eine Methode zur Bewer-
tung natürlicher und anthropogener Böden (Deutsche Fassung), Hohenheimer Bodenkundliche
Hefte 86, Institut für Bodenkunde und Standortslehre, Universität Hohenheim, Stuttgart,
URL https://soil.uni-hohenheim.de/uploads/media/TUSEC_2.Aufl_03.pdf, 2. Auflage,
last access: 07.06.2016, 2013.
Leifeld, J., Bassin, S., and Fuhrer, J.: Carbon stocks in Swiss agricultural soils predicted by
land-use, soil characteristics, and altitude, Agriculture, Ecosystems and Environment, 105,
255–266, doi:10.1016/j.agee.2004.03.006, 2005.
Lemercier, B., Lacoste, M., Loum, M., and Walter, C.: Extrapolation at regional scale of local
soil knowledge using boosted classification trees: A two-step approach, Geoderma, 171–172,
75–84, doi:10.1016/j.geoderma.2011.03.010, 2012.

xx
References

Lettens, S., Van Orshoven, J., Van Wesemael, B., and Muys, B.: Soil organic and inorganic
carbon contents of landscape units in Belgium derived using data from 1950 to 1970, Soil Use
and Management, 20, 40–47, doi:10.1111/j.1475-2743.2004.tb00335.x, 2004.
Lettens, S., Van Orshoven, J., van Wesemael, B., De Vos, B., and Muys, B.: Stocks and fluxes
of soil organic carbon for landscape units in Belgium derived from heterogeneous data sets for
1990 and 2000, Geoderma, 127, 11–23, doi:10.1016/j.geoderma.2004.11.001, 2005a.
Lettens, S., Van Orshoven, J., Van Wesemael, B., Muys, B., and Perrin, D.: Soil organic carbon
changes in landscape units of Belgium between 1960 and 2000 with reference to 1990, Global
Change Biology, 11, 2128–2140, doi:10.1111/j.1365-2486.2005.001074.x, 2005b.
Li, J., Heap, A. D., Potter, A., and Daniell, J. J.: Application of machine learning methods to
spatial interpolation of environmental variables, Environmental Modelling and Software, 26,
1647–1659, doi:10.1016/j.envsoft.2011.07.004, 2011.
Liaw, A. and Wiener, M.: Classification and Regression by randomForest, R News, 2, 18–22,
URL http://CRAN.R-project.org/doc/Rnews/, last acces: 04.04.2017, 2002.
Liddicoat, C., Maschmedt, D., Clifford, D., Searle, R., Herrmann, T., Macdonald, L., and Bal-
dock, J.: Predictive mapping of soil organic carbon stocks in South Australia’s agricultural
zone, Soil Research, 53, 956–973, doi:10.1071/SR15100, 2015.
Liess, M., Glaser, B., and Huwe, B.: Uncertainty in the spatial prediction of soil texture.
Comparison of regression tree and Random Forest models, Geoderma, 170, 70–79, doi:
10.1016/j.geoderma.2011.10.010, 2012.
Lindeman, R., Merenda, P., and Gold, R.: Introduction to bivariate and multivariate analysis,
Scott, Foresman, Glenview, IL, USA, 1980.
Lindgren, F., Rue, H., and Lindström, J.: An explicit link between gaussian fields and gaus-
sian markov random fields: The stochastic partial differential equation approach, Jour-
nal of the Royal Statistical Society, Series B: Statistical Methodology, 73, 423–498, doi:
10.1111/j.1467-9868.2011.00777.x, 2011.
Litz, N.: Schutz vor Organika, in: Handbuch der Bodenkunde, edited by Blume, H.-P., vol. 5,
chap. 7.6.6, p. 28, Wiley-VCH, Landsberg, 1998.
Malone, B. P., Minasny, B., Odgers, N. P., and McBratney, A. B.: Using model averaging to
combine soil property rasters from legacy soil maps and from point data, Geoderma, 232–234,
34–44, doi:10.1016/j.geoderma.2014.04.033, 2014.
Malone, B. P., Minasny, B., and McBratney, A. B.: Using R for Digital Soil Mapping, Springer
International Publishing, Cham, doi:10.1007/978-3-319-44327-0 1, 2017.
Margreth, M.: Machbarkeitsstudie zur Verknüpfung der feldgestützten Bodenkartierung mit
einer GIS-gestützten Bodenmodellierung, Berichtsentwurf, SoilCom und Umwelt und Energie
(uwe) Boden und Abfall, Kanton Luzern, Zürich und Luzern, 2015.
Maronna, R. A., Martin, R. D., and Yohai, V. J.: Robust Statistics Theory and Methods, John
Wiley & Sons, Chichester, 2006.
Martin, M. P., Lo Seen, D., Boulonne, L., Jolivet, C., Nair, K. M., Bourgeon, G., and
Arrouays, D.: Optimizing Pedotransfer Functions for Estimating Soil Bulk Density Us-
ing Boosted Regression Trees, Soil Science Society of America Journal, 73, 485–493, doi:
10.2136/sssaj2007.0241, 2009.
Martin, M. P., Wattenbach, M., Smith, P., Meersmans, J., Jolivet, C., Boulonne, L., and Ar-
rouays, D.: Spatial distribution of soil organic carbon stocks in France, Biogeosciences, 8,
1053–1065, doi:10.5194/bg-8-1053-2011, 2011.
Mathys, L. and Kellenberger, T.: Spot5 RadcorMosaic of Switzerland, Tech. rep., National Point
of Contact for Satellite Images NPOC: Swisstopo; Remote Sensing Laboratories, University
of Zurich, Zurich, 2009.
Maynard, J. J. and Levi, M. R.: Hyper-temporal remote sensing for digital soil mapping:
Characterizing soil-vegetation response to climatic variability, Geoderma, 285, 94–109, doi:
10.1016/j.geoderma.2016.09.024, 2017.
McBratney, A. B., Mendonça Santos, M. L., and Minasny, B.: On Digital Soil Mapping, Geo-
derma, 117, 3–52, doi:10.1016/S0016-7061(03)00223-4, 2003.
McBratney, A. B., Minasny, B., and Tranter, G.: Necessary meta-data for pedotransfer functions,
Geoderma, 160, 627–629, doi:10.1016/j.geoderma.2010.09.023, 2011.

xxi
References

Meersmans, J., De Ridder, F., Canters, F., De Baets, S., and Van Molle, M.: A multiple regression
approach to assess the spatial distribution of Soil Organic Carbon (SOC) at the regional scale
(Flanders, Belgium), Geoderma, 143, 1–13, doi:10.1016/j.geoderma.2007.08.025, 2008.
Meersmans, J., Van Wesemael, B., De Ridder, F., Dotti, M. F., De Baets, S., and Van Molle, M.:
Changes in organic carbon distribution with depth in agricultural soils in northern Belgium,
1960-2006, Global Change Biology, 15, 2739–2750, doi:10.1111/j.1365-2486.2009.01855.x,
2009.
Meersmans, J., Van Wesemael, B., Goidts, E., Van Molle, M., De Baets, S., and De Ridder,
F.: Spatial analysis of soil organic carbon evolution in Belgian croplands and grasslands,
1960-2006, Global Change Biology, 17, 466–479, doi:10.1111/j.1365-2486.2010.02183.x, 2011.
Meersmans, J., Martin, M. P., De Ridder, F., Lacarce, E., Wetterlind, J., De Baets, S., Bas,
C., Louis, B. P., Orton, T. G., Bispo, A., and Arrouays, D.: A novel soil organic C model
using climate, soil type and management data at the national scale in France, Agronomy for
Sustainable Development, 32, 873–888, doi:10.1007/s13593-012-0085-x, 2012a.
Meersmans, J., Martin, M. P., Lacarce, E., De Baets, S., Jolivet, C., Boulonne, L., Lehmann,
S., Saby, N. P. A., Bispo, A., and Arrouays, D.: A high resolution map of French soil organic
carbon, Agronomy for Sustainable Development, 32, 841–851, doi:10.1007/s13593-012-0086-9,
2012b.
Meinshausen, N.: Quantile regression forests, Journal of Machine Learning Research, 7, 983–999,
2006.
Meinshausen, N.: quantregForest: Quantile Regression Forests, URL https://CRAN.
R-project.org/package=quantregForest, R package version 1.3-5, last access 29.03.2017,
2015.
MeteoSwiss: Mean Monthly and Yearly Mean Norm Values of Precipitation, Temperature and
Relative Sunshine Duration (1961-1990), URL http://www.meteoschweiz.admin.ch/web/
en/services/data_portal/grided_datasets.html, 2011.
Miller, B. A., Koszinski, S., Wehrhan, M., and Sommer, M.: Impact of multi-scale predictor
selection for modeling soil properties, Geoderma, 239–240, 97–106, doi:10.1016/j.geoderma.
2014.09.018, 2015.
Minasny, B. and McBratney, A.: Digital soil mapping: A brief history and some lessons, Geo-
derma, 264, Part B, 301 – 311, doi:10.1016/j.geoderma.2015.07.017, 2016.
Minasny, B., McBratney, A., Malone, B., and Wheeler, I.: Digital Mapping of Soil Carbon,
Advances in Agronomy, 118, 1–47, doi:10.1016/b978-0-12-405942-9.00001-3, 2013.
Mishra, U., Lal, R., Slater, B., Calhoun, F., Liu, D., and Van Meirvenne, M.: Predicting soil
organic carbon stock using profile depth distribution functions and ordinary kriging, Soil
Science Society of America Journal, 73, 614–621, doi:10.2136/sssaj2007.0410, 2009.
Mishra, U., Lai, R., Liu, D., and Van Meirvenne, M.: Predicting the spatial variation of the soil
organic carbon pool at a regional scale, Soil Science Society of America Journal, 74, 906–914,
doi:10.2136/sssaj2009.0158, 2010.
Mishra, U., Torn, M. S., Masanet, E., and Ogle, S. M.: Improving regional soil carbon inventories:
Combining the IPCC carbon inventory method with regression kriging, Geoderma, 189 – 190,
288 – 295, doi:10.1016/j.geoderma.2012.06.022, 2012.
Moran, C. J. and Bui, E. N.: Spatial data mining for enhanced soil map modelling, International
Journal of Geographical Information Science, 16, 533–549, doi:10.1080/13658810210138715,
2002.
Mulder, V., Lacoste, M., de Forges, A. R., and Arrouays, D.: GlobalSoilMap France: High-
resolution spatial modelling the soils of France up to two meter depth, Science of the Total
Environment, 573, 1352–1369, doi:10.1016/j.scitotenv.2016.07.066, 2016.
Mulder, V. L., de Bruin, S., Schaepman, M. E., and Mayr, T. R.: The use of remote sensing in soil
and terrain mapping – A review, Geoderma, 162, 1–19, doi:10.1016/j.geoderma.2010.12.018,
2011.
Müller, L., Schinder, U .and Behrendt, A., Eulenstein, F., and Dannowski, R.: The Muencheberg
Soil Quality Rating (SQR): Field manual for detecting and assessing properties and limita-
tions of soils for cropping and grazing, Report, Leibniz-Zentrum für Agrarlandschaftsforschung
(ZALF), Müncheberg, Germany, 2007.

xxii
References

Naef, F., Margreth, M., Schmocker-Fackel, P., and Scherrer, S.: Entwicklung und Anwen-
dung eines Regelwerkes zur automatischen Erstellung von Abflussprozesskarten in einem GIS.
Dezentraler Wasserrückhalt in der Landschaft durch vorbeugende Massnahmen der Wald-
wirtschaft, der Landwirtschaft und im Siedlungswesen, Mitteilungen FAWV, 64/07, 183–191,
2007.
Nationale Bodenbeobachtung Schweiz, N.: NABO-DAT, Aufarbeitung Bodendaten,
http://www.nabodat.ch/index.php/aufarbeitung-bodendaten32.html, URL http:
//www.nabodat.ch/index.php/aufarbeitung-bodendaten32.html, (last access: 22
May 2014), 2014.
Nussbaum, M.: geoGAM: Select Sparse Geoadditive Models for Spatial Prediction, URL https:
//CRAN.R-project.org/package=geoGAM, R package version 0.1-2, last access 29.03.2017,
2017.
Nussbaum, M. and Papritz, A.: Transferfunktionen Nährstoffmesswerte, Bericht, ETH Zürich,
Soil and Terrestrial Environmental Physics, doi:10.3929/ethz-a-010810702, Version 2, mit kl.
Änderung 27. Nov. 2016, 2015.
Nussbaum, M. and Papritz, A.: Validierung von konventionellen Bodenkarten mit unabhängigen
Bodendaten – Methodik mit Fallstudie, unpublished, 2017.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Organic Carbon Stocks of Swiss
Forest Soils, Final report, Institute of Terrestrial Ecosystems, ETH Zürich and Swiss Federal
Institute for Forest, Snow and Landscape Research (WSL), Zürich and Birmensdorf, URL
http://e-collection.library.ethz.ch/eserv/eth:6027/eth-6027-01.pdf, 2012.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Estimating soil organic carbon
stocks of Swiss forest soils by robust external-drift kriging, Geoscientific Model Development,
7, 1197–1210, doi:10.5194/gmd-7-1197-2014, URL http://www.geosci-model-dev.net/7/
1197/2014/, 2014.
Nussbaum, M., Papritz, A., Zimmerman, S., and Walthert, L.: Pedotransfer function to predict
density of forest soils in Switzerland, Journal of Plant Nutrition and Soil Science, 179, 321–326,
doi:10.1002/jpln.201500546, 2016.
Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., Schaepman,
M. E., and Papritz, A.: Evaluation of digital soil mapping approaches with large sets
of environmental covariates, SOIL Discussions, 2017, 1–32, doi:10.5194/soil-2017-14, URL
https://www.soil-discuss.net/soil-2017-14/, 2017a.
Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A.: Mapping of soil prop-
erties at high resolution in Switzerland using boosted geoadditive models, SOIL Discussions,
2017, 1–32, doi:10.5194/soil-2017-13, URL http://www.soil-discuss.net/soil-2017-13/,
(accepted on 06/10/2017 to be published in SOIL), 2017b.
Omuto, C. and Nachtergaele, F .and Vargas Rojas, R.: State of the Art Report on Global and
Regional Soil Information : Where are we? Where to go?, Tech. rep., Food and Agriculture
Organization of the United Nations, Rome, 2013.
Oyama, M. and Takehara, H.: Revised Standard Soil Color Charts, Fujihara Industry Co., Tokyo,
2 edn., eijkelkamp Agrisearch Equipment, Art. No. 08.11 Soil Color Chart Book, 1993.
Papritz, A.: georob: Robust Geostatistical Analysis of Spatial Data, r package version 0.1-0,
2013.
Papritz, A.: georob: Robust Geostatistical Analysis of Spatial Data, URL https://cran.
r-project.org/web/packages/georob/index.html, R package version 0.3-1, last access:
04.04.2017, 2016.
Peng, W., Wheeler, D., Bell, J., and Krusemark, M.: Delineating patterns of soil drainage class on
bare soils using remote sensing analyses, Geoderma, 115, 261–279, doi:10.1016/S0016-7061(03)
00066-1, 2003.
Perruchoud, D., Walthert, L., Zimmermann, S., and Lüscher, P.: Contemporay Carbon Stocks
of Mineral Forest Soils in the Swiss Alps, Biogeochemistry, 50, 111–136, 2000.
Peyer, K. and Frey, E.: Klassifikation der Böden der Schweiz, Tech. rep., Eidgenössische
Forschungsanstalt für landwirtschaftlichen Pflanzenbau, Zürich-Reckenholz, 1992.
Pluess, A. R., Brang, P., and Augustin, S.: Wald im Klimawandel: Grundlagen für Adapta-
tionsstrategien, Haupt, Bern und Birmensdorf, 1. auflage edn., Bundesamt fr̈ Umwelt BAFU,
Eidg. Forschungsanstalt WSL (Hrsg.), 2016.

xxiii
References

Poggio, L. and Gimona, A.: National scale 3D modelling of soil organic carbon stocks with
uncertainty propagation – An example from Scotland, Geoderma, 232–234, 284–299, doi:
10.1016/j.geoderma.2014.05.004, 2014.
Poggio, L., Gimona, A., and Brewer, M.: Regional scale mapping of soil properties and their
uncertainty with a large number of satellite-derived covariates, Geoderma, 209–210, 1–14,
doi:10.1016/j.geoderma.2013.05.029, 2013.
Presler, J., Zürrer, M., and Kaufmann, G.: Bodenübersichtskarte Kanton Thurgau, 1:50’000
(BÜK-TG), Schlussbericht, Amt für Umwelt des Kantons Thurgau, Amt für Raumplanung
des Kantons Thurgau, 2005.
Pringle, M., Zund, P., Payne, J., and Orton, T.: Mapping depth-to-rock from legacy data,
using a generalized linear mixed model, in: GlobalSoilMap: Basis of the global spatial soil
information system, edited by Arrouays, D., McKenzie, N., Hempel, J., Richer de Forges, A.,
and McBratney, A., pp. 295–299, CRC Press, doi:10.1201/b16500-55, 2014.
R Core Team: R: A Language and Environment for Statistical Computing, R Foundation for
Statistical Computing, Vienna, Austria, URL http://www.R-project.org/, 2013.
R Core Team: R: A Language and Environment for Statistical Computing, R Foundation for
Statistical Computing, Vienna, Austria, URL http://www.R-project.org/, 2015.
R Core Team: R: A Language and Environment for Statistical Computing, R Foundation
for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/, last access:
29.03.2017, 2016.
Rehbein, K. and Keller, A.: Räumliche Interpolation von Zinkgehalten in den Böden des Kantons
Thurgau, Tech. rep., Forschungsanstalt Agroscope Reckenholz-Tänikon ART, Zürich, 2007.
Remund, J., Frehner, M., Walthert, L., Kägi, M., and Rihm, B.: Schätzung standortspezifischer
Trockenstressrisiken in Schweizer Wäldern, 2011.
Ribi, A.: Messung des pH-Wertes in einer Boden-CaCl2 -Suspension, Standard-Arbeitsanweisung
Labor AN 104 (Version 2), Gewässerschutzlabor Amt für Abfall, Wasser, Energie und Luft,
Baudirektion Kanton Zürich (Prüfleiter: Christian Balsiger), Zürich, 2008.
Ribi, A.: Körnung der Feinerde, Standard-Arbeitsanweisung Labor FB 122 (Version 1), Fachstelle
Bodenschutz, Amt für Landschaft und Natur, Baudirektion Kanton Zürich (Prüfleiter: Ubald
Gasser), Zürich, 2014.
Riley, S., De Gloria, S., and Elliot, R.: A Terrain Ruggedness that Quantifies Topographic
Heterogeneity, Intermountain Journal of Science, 5, 23–27, 1999.
Robinson, D., Hockley, N., Cooper, D., Emmett, B., Keith, A., Lebron, I., Reynolds, B., Tipping,
E., Tye, A., Watts, C., Whalley, W., Black, H., Warren, G., and Robinson, J.: Natural capital
and ecosystem services, developing an appropriate soils framework as a basis for valuation,
Soil Biology and Biochemistry, 57, 1023–1033, doi:10.1016/j.soilbio.2012.09.008, 2013.
Rossiter, D.: Digital Soil Mapping Across Paradigms, Scales and Baunaries, chap. Digital Soil
Resource Inventories: Status and Prospects in 2015, pp. 275–286, Springer Environmental
Science and Engineering, 2016.
Rossiter, D. G., Liu, J., Carlisle, S., and Zhu, A.-X.: Can citizen science assist digital soil
mapping?, Geoderma, 259–260, 71 – 80, doi:10.1016/j.geoderma.2015.05.006, 2015.
Rue, H., Martino, S., and Chopin, N.: Approximate Bayesian Inference for Latent Gaussian
Models by Using Integrated Nested Laplace Approximations, Journal of the Royal Statistical
Society, Series B: Statistical Methodology, 71, 319–392, doi:10.1111/j.1467-9868.2008.00700.x,
2009.
Ruef, A. and Peyer, K.: Handbuch Waldbodenkartierung, Vollzug Umwelt, Bundesamt für
Umwelt, Wald und Landschaft (BUWAL), Bern, 1996.
Ruehlmann, J. and Körschens, M.: Calculating the Effect of Soil Organic Matter Concentration
on Soil Bulk Density, Soil Science Society of America Journal, 73, 876–885, doi:10.2136/
sssaj2007.0149, 2009.
Schaepman, M., Jehle, M., Hueni, A., D’Odorico, P., Damm, A., Weyermann, J., Schneider, F.,
Laurent, V., Popp, C., Seidel, F., Lenhard, K., Gege, P., Küchler, C., Brazile, J., Kohler,
P., Vos, L., Meuleman, K., Meynart, R., Schläpfer, D., and Itten, K.: Advanced radiometry
measurements and Earth science applications with the Airborne Prism Experiment (APEX),
Remote Sensing of the Environment, 158, 207–219, doi:10.1016/j.rse.2014.11.014, 2015.

xxiv
References

Schmid, M., Hothorn, T., Maloney, K. O., Weller, D. E., and Potapov, S.: Geoadditive regression
modeling of stream biological condition, Environmental and Ecological Statistics, 18, 709–733,
doi:10.1007/s10651-010-0158-4, 2011.
Schmider, P., Küper, M., Tschander, B., and Käser, B.: Die Waldstandorte im Kanton Zürich
Waldgesellschaften, Waldbau Naturkunde, vdf Verlag der Fachvereine an den schweizerischen
Hochschulen und Techniken, Zürich, 1993.
Schmidt, M., Torn, M., Abiven, S., Dittmar, T., Guggenberger, G., Janssens, I., Kleber, M.,
Kögel-Knabner, I., Lehmann, J., Manning, D., Nannipieri, P., Rasse, D., Weiner, S., and
Trumbore, S.: Persistence of soil organic matter as an ecosystem property, Nature, 478, 49–
56, doi:10.5167/uzh-51257, 2011.
Schroeder, W., Schmidt, G., and Pesch, R.: Regionalising the carbon fixation in forests of
North Rhine-Westphalia using inventory data and digital maps, Umweltwissenschaften und
Schadstoff-Forschung, 21, 516–526, doi::10.1007/s12302-009-0091-z, 2009.
Schrumpf, M., Schulze, E. D., Kaiser, K., and Schumacher, J.: How accurately can soil organic
carbon stocks and stock changes be quantified by soil inventories?, Biogeosciences, 8, 1193–
1212, URL http://www.biogeosciences.net/8/1193/2011/, 2011.
Schulp, C., Verburg, P., Kuikman, P., Nabuurs, G.-J., Olivier, J., Vries, W., and Veldkamp, T.:
Improving National-Scale Carbon Stock Inventories Using Knowledge on Land Use History,
Environmental Management, 51, 709 – 723, doi:10.1007/s00267-012-9975-6, 2013.
Scull, P., Franklin, J., Chadwick, O. A., and McArthur, D.: Predictive Soil Mapping: A review,
Progress in Physical Geography, 27, 171–197, doi:10.1191/0309133303pp366ra, 2003.
Siemer, B., Obmann, L., Hinrichs, U., Penndorf, O., Pohl, M., Schürer, S., Schulze, P., and Seif-
fert, S.: Bodenbewertungsinstrument Sachsen, Tech. rep., Sächsisches Landesamt für Umwelt,
Landwirtschaft und Geologie, Dresden, 2014.
Sindayihebura, A., Ottoy, S., Dondeyne, S., Meirvenne, M. V., and Orshoven, J. V.: Comparing
digital soil mapping techniques for organic carbon and clay content: Case study in Burundi’s
central plateaus, CATENA, 156, 161–175, doi:10.1016/j.catena.2017.04.003, 2017.
Somarathna, P., Malone, B., and Minasny, B.: Mapping soil organic carbon content over New
South Wales, Australia using local regression kriging, Geoderma Regional, 7, 38–48, doi:
10.1016/j.geodrs.2015.12.002, 2016.
Southworth, H.: gbm: Generalized Boosted Regression Models, URL https://CRAN.
R-project.org/package=gbm, R package version 2.1.1, last access: 04.04.2017, 2015.
Spiess, K.: Vorhersage von Bodeneigenschaften mit Quantile Regression Forest, Validierung
und Vergleich mit den Vorhersagen aus geoadditiven Modellen, BSc Thesis, Departement für
Umweltsystemwissenschaften der ETH Zürich, Zürich, 2016.
Steiner, C., Behrens, T., Telse, D., and Stamm, C.: Bodenkarten als Grundlagen für die Festle-
gung des Zuströmbereichs Zo Eine Machbarkeitsstudie, Unpublished report, Eawag, 2006.
Stolte, J., Tesfai, M., Øygarden, L., Kværnø, S., Keizer, J., Verheijen, F., Panagos, P., Ballabio,
C., and Hessel, R.: Soil threats in Europe, Jrc technical report, EUR 27607 EN, Joint Research
Centre(JRC), doi:10.2788/488054(print)10.2788/828742(online), 2015.
Swisstopo: Geologische Karte der Schweiz 1:500000, URL http://www.swisstopo.admin.ch/
internet/swisstopo/de/home/products/maps/geology/geomaps/gm500.html, last access:
07.06.2016, 2005.
Swisstopo: Switzerland during the Last Glacial Maximum 1:500 000, URL http:
//www.swisstopo.admin.ch/internet/swisstopo/en/home/products/maps/geology/
geomaps/LGM-map500.html, last access: 07.06.2016, 2009.
Swisstopo: Höhenmodelle, URL http://www.swisstopo.admin.ch/internet/swisstopo/de/
home/products/height.html, last access: 07.06.2016, 2011.
Swisstopo: VECTOR25, URL http://www.swisstopo.admin.ch/internet/swisstopo/de/
home/products/landscape/vector25.html, 2011.
Swisstopo: swissTLM3D: Topographic Landscape Model 3D. Version 1.1, http:
//www.swisstopo.admin.ch/internet/swisstopo/de/home/products/landscape/
swissTLM3D.html, last access: 08.03.2016, 2013a.
Swisstopo: swissAlti3D. Das hoch aufgelöste Terrainmodell der Schweiz, http://www.
swisstopo.admin.ch/internet/swisstopo/de/home/products/height/swissALTI3D.
html, last access: 07.06.2016, 2013b.

xxv
References

Swisstopo: swissBoundaries3D, http://www.swisstopo.admin.ch/internet/swisstopo/de/


home/products/landscape/swissBOUNDARIES3D.html, last access: 08.03.2016, 2016a.
Swisstopo: GeoCover, Zugang zu flächendeckende geologische Datensätze für alle, URL
https://shop.swisstopo.admin.ch/de/products/maps/geology/GC_VECTOR, last access:
14.11.2016, 2016b.
Taghizadeh-Mehrjardi, R., Nabiollahi, K., and Kerry, R.: Digital mapping of soil organic carbon
at multiple depths using different data mining techniques in Baneh region, Iran, Geoderma,
266, 98–110, doi:10.1016/j.geoderma.2015.12.003, 2016.
Tarboton, D. G.: A new Method for the Determination of Flow Directions and Upslope Areas in
Grid Digital Elevation Models, Water Resources Research, 33, PP. 309–319, doi:199710.1029/
96WR03137, 1997.
Teepe, R., Dilling, H., and Beese, F.: Estimating water retention curves of forest soils from soil
texture and bulk density, Journal of Plant Nutrition and Soil Science, 166, 111–119, 2003.
Tranter, G., Minasny, B., Mcbratney, A., Murphy, B., Mckenzie, N., Grundy, M., and Brough,
D.: Building and testing conceptual and empirical models for predicting soil bulk density, Soil
Use and Management, 23, 437–443, doi:10.1111/j.1475-2743.2007.00092.x, 2007.
Tutz: Regression for Categorical Data, Cambridge University Press, doi:10.1017/
cbo9780511842061, 2012.
U.S. Soil Survey Division Staff, ed.: Soil survey manual, vol. 18 of U.S. Department of Agriculture
Handbook , Soil Conservation Service, URL https://www.nrcs.usda.gov/wps/portal/nrcs/
detail/soils/ref/?cid=nrcs142p2_054262, 1993.
USGS EROS: USGS Land Remote Sensing Program, Landsat 7 Scene 01.09.2013. U.S. Geological
Survey’s Earth Resources Observation and Science Center, 2013.
Van Remortel, R. D., Maichle, R. W., and Hickey, R. J.: Computing the LS factor for the Revised
Universal Soil Loss Equation through array-based slope processing of digital elevation data
using a C++ executable, Computers and Geosciences, 30, 1043–1053, doi:10.1016/j.cageo.
2004.08.001, 2004.
Vasiliniuc, I. and Patriche, C. V.: Validating soil bulk density pedotransfer functions using a
romanian dataset, Carpathian Journal of Earth and Environmental Sciences, 10, 225–236,
2015.
Vaysse, K. and Lagacherie, P.: Evaluating Digital Soil Mapping approaches for mapping Glob-
alSoilMap soil properties from legacy data in Languedoc-Roussillon (France), Geoderma Re-
gional, 4, 20–30, doi:10.1016/j.geodrs.2014.11.003, 2015.
Venables, W. N. and Ripley, B. D.: Modern applied statistics with S-PLUS, Springer-Verlag,
New York, 4 edn., 2002.
Viscarra Rossel, R., Webster, R., and Kidd, D.: Mapping gamma radiation and its uncertainty
from weathering products in a Tasmanian landscape with a proximal sensor and random forest
kriging, Earth Surface Processes and Landforms, 39, 735–748, doi:10.1002/esp.3476, 2014.
Viscarra Rossel, R., Chen, C., Grundy, M., Searle, R., Clifford, D., and Campbell, P.: The
Australian three-dimensional soil grid: Australia’s contribution to the GlobalSoilMap project,
Soil Research, 53, 845–864, doi:10.1071/SR14366, 2015.
Wall, D. H., Nielsen, U. N., and Six, J.: Soil biodiversity and human health, Nature, 528, 69–76,
doi:10.1038/nature15744, 2015.
Walthert, L., Zimmermann, S., Blaser, P., Luster, J., and Lüscher, P.: Waldböden der Schweiz.
Band 1. Grundlagen und Region Jura, Eidg. Forschungsanstalt WSL and Hep Verlag, Bir-
mensdorf and Bern, 2004.
Walthert, L., Graf, U., Kammer, A., Luster, J., Pezzotta, D., Zimmermann, S., and Hagedorn,
F.: Determination of Organic and Inorganic Carbon, δ13C, and Nitrogen in Soils Containing
Darbonates after Acid Fumigation with HCl, Journal of Plant Nutrition and Soil Science, 173,
207–216, 2010.
Walthert, L., Pannatier, E. G., and Meier, E. S.: Shortage of nutrients and excess of toxic
elements in soils limit the distribution of soil-sensitive tree species in temperate forests, Forest
Ecology and Management, 297, 94–107, doi:10.1016/j.foreco.2013.02.008, 2013.
Walthert, L., Scherler, M., Stähli, M., Huber, M., Baltensweiler, A., Ramirez Lopez, L., and
Papritz, A.: Böden und Wasserhaushalt von Wäldern und Waldstandorten der Schweiz unter
heutigem und zukünftigem Klima (BOWA-CH), Schlussbericht, Eidg. Forschungsanstalt für

xxvi
References

Wald, Schnee und Landschaft WSL und ETH Zürich, Birmensdorf und Zürich, doi:10.3929/
ethz-a-010658682, URL http://e-collection.library.ethz.ch/view/eth:49159, 2015.
Walthert, L., Bridler, L., Keller, A., Lussi, M., and Grob, U.: Harmonisierung von Boden-
daten im Projekt “Predictive mapping of soil properties for the evaluation of soil functions
at regional scale (PMSoil)” des Nationalen Forschungsprogramms Boden (NFP 68), Bericht,
Eidgenössische Forschungsanstalt WSL und Agroscope Reckenholz, Birmensdorf und Zürich,
doi:10.3929/ethz-a-010801994, 2016.
Webster, R. and Lark, R.: Field Sampling for Environmental Science and Management, Envi-
ronmental science/statistics, Routledge, 2013.
Wegelin, T.: Schadstoffbelastung des Bodens im Kanton Zürich Resultate des kantonalen Bo-
denrasternetzes, Bericht, Amt für Gewässerschutz und Wasserbau Fachstelle Bodenschutz,
Zürich, 1989.
Weiss, P., Schieler, K., Schadauer, K., Radunsky, K., and Englisch, M.: Die Kohlenstoffbilanz
des Österreichischen Waldes und Betrachtungen zum Kyoto-Protokoll, Tech. rep., Umwelt-
bundesamt/Federal Environment Agency – Austria, Vienna, Austria, 2000.
Were, K., Bui, D. T., Dick, Ø. B., and Singh, B. R.: A comparative assessment of support vector
regression, artificial neural networks, and random forests for predicting and mapping soil
organic carbon stocks across an Afromontane landscape, Ecological Indicators, 52, 394–403,
doi:10.1016/j.ecolind.2014.12.028, 2015.
Wiesmeier, M., Barthold, F., Blank, B., and Kögel-Knabner, I.: Digital mapping of soil organic
matter stocks using Random Forest modeling in a semi-arid steppe ecosystem, Plant and Soil,
340, 7–24, doi:10.1007/s11104-010-0425-z, 2011.
Wiesmeier, M., Spörlein, P., Geuss, U., Hangen, E., Haug, S., Reischl, A., Schilling, B., Lützow,
M., and Kögel-Knabner, I.: Soil organic carbon stocks in southeast Germany (Bavaria) as
affected by land use, soil type and sampling depth, Global Change Biology, 18, 2233–2245,
doi:10.1111/j.1365-2486.2012.02699.x, 2012.
Wiesmeier, M., Prietzel, J., Barthold, F., Spörlein, P., Geuss, U., Hangen, E., Reischl, A.,
Schilling, B., von Lützow, M., and Kögel-Knabner, I.: Storage and drivers of organic carbon
in forest soils of southeast Germany (Bavaria) – Implications for carbon sequestration, Forest
Ecology and Management, 295, 162–172, doi:10.1016/j.foreco.2013.01.025, 2013.
Wilks, D. S.: Statistical Methods in the Atmospheric Sciences, Academic Press, 3 edn., 2011.
Wolff, G.: Das Bodenschutzkonzept Stuttgart (BOKS), (Kurzfassung), Schriftenreihe des Amtes
für Umweltschutz 4, Landeshauptstadt Stuttgart, Amt für Umweltschutz, 2007.
Wood, S. N.: Generalized Additive Models: An Introduction with R, Chapman and Hall/CRC,
2006.
Wood, S. N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of
semiparametric generalized linear models, Journal of the Royal Statistical Society, Series B:
Statistical Methodology, 73, 3–36, doi:10.1111/j.1467-9868.2010.00749.x, 2011.
Wüst-Galley, C., Grünig, A., and Leifeld, J.: Locating organic soils for the Swiss green-
house gas inventory, Agroscope Science 26, Agroscope, Zurich, URL https://www.bafu.
admin.ch/dam/bafu/en/dokumente/klima/klima-climatereporting-referenzen-cp2/
wuest-galley_c_gruenigaleifeldj2015.pdf.download.pdf, last access: 29.03.2017, 2015.
Xu, X., Liu, W., Zhang, C., and Kiely, G.: Estimation of soil organic carbon stock and its spatial
distribution in the Republic of Ireland, Soil Use and Management, 27, 156–162, 2011.
Yang, R.-M., Zhang, G.-L., Liu, F., Lu, Y.-Y., Yang, F., Yang, F., Yang, M., Zhao, Y.-G., and Li,
D.-C.: Comparison of boosted regression tree and random forest models for mapping topsoil
organic carbon concentration in an alpine ecosystem, Ecological Indicators, 60, 870–878, doi:
10.1016/j.ecolind.2015.08.036, 2016.
Yokoyama, R., Shirasawa, M., and Pike, R. J.: Visualizing Topography by Openness: A New
Application of Image Processing to Digital Elevation Models, Photogrammetric Engineering
and Remote Sensing, 68, 257–265, 2002.
Zhao, Z., Irfan, A. M., and Fan-Rui, M.: Model prediction of soil drainage classes over a large
area using a limited number of field samples: A case study in the province of Nova Scotia,
Canada, Canadian Journal of Soil Science, 93, 73–83, doi:10.4141/cjss2011-095, 2013.
Zimmermann, N. E.: Calculation of Topographic Position, http://www.wsl.ch/staff/
niklaus.zimmermann/programs/aml4_1.html, last access: 15.02.2015, 2000.

xxvii
References

Zimmermann, N. E. and Kienast, F.: Predictive mapping of alpine grasslands in Switzerland:


Species versus community approach, Journal of Vegetation Science, 10, 469–482, doi:10.2307/
3237182, 1999.
Zimmermann, S., Luster, J., Blaser, P., and Walthert, L.: Waldböden der Schweiz. Band 3.
Mittelland und Voralpen, Eidg. Forschungsanstalt WSL and Hep Verlag, 2006.
Zimmermann, S., Widmer, D., and Mathis, B.: Bodenüberwachung der Zentralschweizer Kantone
(KABO ZCH): Säurestatus und Versauerungszustand von Waldböden, Bericht im Auftrag der
Zentralschweizer Umweltdirektionen (ZUDK), Eidg. Forschungsanstalt für Wald, Schnee und
Landschaft WSL, 2011.

xxviii
A. Appendix

xxix
A.1. Supplementary material to Chapter 2

A.1. Supplementary material to Chapter 2

Table A.1.: Descriptive statistics of topsoil effective cation exchange capacity (ECEC)
[mmolc kg−1 ] of ZH forest (0–20 cm) split into calibration and validation sets (N: number of
soil samples, Ns: number of unique sites, min: minimum, max: maximum, stdv: standard
deviation, CV: coefficient of variation).

dataset N Ns min max mean median stdv CV

calibration 1316 1055 17.4 780.0 170.2 149.3 100.2 0.59


validation 528 293 17.8 492.4 150.4 121.4 94.0 0.63

Table A.2.: Frequency of presence of waterlogged soil horizons (horizon qualifiers ”gg” or ”r”
according to Swiss soil classification, Jäggli et al., 1998) down to a given soil depth split into
calibration and validation set (Ns: number of sites, #: number of present or absent, % percentage
present).

depth [cm] dataset Ns absent present


# # %

0–30 calibration 764 662 102 13.4


validation 198 173 25 12.6

0–50 calibration 764 563 201 26.3


validation 198 152 46 23.2

0–100 calibration 764 448 316 41.4


validation 198 133 65 32.8

Table A.3.: Frequency of drainage classes (aggregated profile qualifiers of Swiss soil classifi-
cation, Jäggli et al., 1998), split into calibration and validation sets (Ns: number of sites, #:
number of observations and % percentage of observations per level).

dataset Ns well drained moderately poorly drained


well drained
# % # % # %

calibration 732 476 65.0 94 12.8 162 22.1


validation 198 146 73.7 23 11.6 29 14.6

xxxi
A. Appendix

3, 4
8, 19

5, 6
13, 14
measure-
ments

(above

very

very
weak

weak
none,
minor
valleys)

major

deep
standard

PTF

minor to

(in valleys)

to deep

absent
medium

shallow

medium
effective soil depth carbonate index geotechnical
legacy data correction aquifer (overview soil map) (geotechnical map) map units
older moraines
quarternary sediments
old gravel terraces

last glaciation

alluvials, swamps
fresh water Molasse,
old fluvioglacial gravel
sediments, moraines

salt water Molasse,

1, 7d
6, 7(35), 7*, 8d, 8f

7e, 8as, 11,


12a, 12g, 17
7a, 7f, 7f(9), 8g

7as, 8e, 8f(12), 7g, 29

9, 10, 10w, 12c, 14


8a, 18, 26a, 27g, 26f,
26g, 27f, 29a, 29e, 30

0-55
95-100 % 55-95 %
max. possible share
of deciduous trees
(veg. map)

geological map vegetation map

(2 km radius)

Figure A.1.: Partial residuals [log mmolc kg−1 ] for factors and smooth effects (continuous co-
variates) of model for effective cation exchange capacity (ECEC) in 0–20 cm, ZH forests (PTF:
pedotransfer function predicting ECEC, SD: standard deviation in focal window).

xxxii
A.1. Supplementary material to Chapter 2

semivariance [log mmol c kg−1] (a) ●


(b) ●

0.2 0.4 0.6 0.8


● ●
●● ● ● ● ● ● ●
0.20
● ●
● ● ● ●●● ● ● ● ● ●

●●

● ● ● ● ● ●
● ● ● ●● ●● ●● ●●

● ●
●● ● ● ● ● ●● ● ●●●●
● ●● ●● ●●● ● ● ●● ● ● ● ●
● ● ●
● ● ● ● ● ●

semivariance
● ● ● ●● ●●
●● ●● ● ● ● ● ● ●● ● ● ●●
● ● ● ●
● ● ●● ● ●● ●
● ●
●● ● ● ● ● ● ●
●● ●● ●●● ● ●
● ● ● ●
● ● ●

● ● ●● ● ●●●●
●●

● ● ● ● ●
0.10

α: 0.36 α: 1.04
σ2n σ2t : 0.018 σ2n σ2t : 0.584
0.00

0.0
0 2 4 6 8 10 0 2 4 6 8
lag distance [km] lag distance [km]

0.0 0.2 0.4 0.6 0.8 1.0


0.30

(c) ●
(d) ●
● ● ● ●●
● ●
● ●●● ● ● ●
●● ●● ●● ●●
● ● ●
● ● ●● ● ● ● ● ● ●
●●



● ● ● ● ● ●● ●● ● ● ●● ● ● ●●

● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ●
semivariance

semivariance
● ● ● ● ●● ● ● ● ● ● ●● ● ● ●
● ● ● ●● ● ●
● ●● ● ● ●● ● ●●●●●● ●●
0.20

● ● ●● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ●
● ● ●● ● ●● ● ● ● ●
● ● ● ●● ●
● ●
● ● ● ●●
● ●

● ● ●

● ● ●

0.10



α: 4.45 α: 2.19
σ2n σ2t : 0.501 σ2n σ2t : 0.63
0.00

0 2 4 6 8 0 2 4 6 8
lag distance [km] lag distance [km]
0.0 0.2 0.4 0.6 0.8 1.0

(e) ● ● ● ●
● ● ●

● ●● ● ● ● ● ● ● ●

● ● ●
● ●● ● ● ●● ● ●●
●● ● ●● ● ●● ●● ●●
● ●● ●
● ●● ● ● ● ●● ● ●● ● ● ● ●
● ● ●
semivariance

● ● ● ●● ● ●
● ● ●
● ● ● ●

● ●●●


α: 1.9
σ2n σ2t : 0.458

0 2 4 6 8
lag distance [km]

Figure A.2.: Sample variograms of the residuals at scale of additive predictor (grey dots, robust
Qn-estimator) and least squares fit of exponential variogram (dotted line) for (a) ECEC 0–20 cm
depth, (b) drainage classes, presence of waterlogged soil horizons in (c) 0–30 cm, (d) 0–50 cm
and (e) 0–100 cm (α: effective range [km], σn2 /σt2 : ratio of nugget to total sill).

xxxiii
A. Appendix

ECEC 0-20 cm [mmolc kg-1]


Width of 90 %-prediction intervals

Winterthur

lake

no forest

Zurich

Uster

0 5 10 15 km

Data sources:
Soil sampling locations © 2013 FABO
Canton of Zurich (TID 22742)
Lakes: swissTLM3D © 2013 swisstopo
Relief: DHM25 © 2012 swisstopo
Reproduced with the authorisation
of swisstopo (JA100120 / D100042)

Figure A.3.: Width of 90 % prediction intervals for effective cation exchange capacity (ECEC)
in 0–20 cm depth of ZH forests computed by model based bootstrapping (ECEC legend classes
according to Walthert et al., 2004).

xxxiv
A.1. Supplementary material to Chapter 2

absent present
historic wetlands
on Wild maps

Figure A.4.: Partial residuals for parametric (binary covariates) and smooth effects (continuous
covariates) of model for presence of waterlogged soil horizons down to 30 cm, Greifensee.
present
absent
present
absent

historic wetlands
drainage systems on Siegfried maps

Figure A.5.: Partial residuals for parametric (binary covariates) and smooth effects (contin-
uous covariates) of model for presence of waterlogged soil horizons down to 50 cm, Greifensee
(SD: standard deviation in focal window, MRVBF: multi-resolution valley bottom flatness, TPI:
topographic position index, TWI: topographic wetness index).

xxxv
A. Appendix

sediments
quarternary
presence

presence

present

present

moraines
assumed

not assumed

absent

absent

< 800

800-1000

1000-1200

gravel terraces

upper fresh
water Molasse
swamp/peat
old fluvioglacial

alluvials fans
(last glaciation)
historic wetlands ice depth
antropogenic soils drainage systems on Siegfried maps last glaciation [m] geological map

(60 m)

Figure A.6.: Partial residuals for parametric (binary and nominal covariates) and smooth effects
(continuous covariates) of model for presence of waterlogged soil horizons down to 100 cm,
Greifensee (SD: standard deviation in focal window, MRVBF: multi-resolution valley bottom
flatness, TWI: topographic wetness index).

xxxvi
A.1. Supplementary material to Chapter 2

present
absent

uncovered

covered
present
absent

historic wetlands
drainage systems on Siegfried maps coverage aquifer

(60 m)

Figure A.7.: Partial residuals for parametric (binary covariates) and smooth effects (continuous
covariates) of model for drainage classes, Greifensee (SD: standard deviation in focal window,
MRVBF: multi-resolution valley bottom flatness, TPI: topographic position index, TWI: topo-
graphic wetness index).

xxxvii
A.2. Supplementary material to Chapter 3

A.2. Supplementary material to Chapter 3

lake validation sites (198)


agricultural area calibration sites (750)

Lyss

Data sources:
Soil sampling locations © 2013 LANAT Canton of Berne
Lakes: swissTLM3D © 2013 swisstopo / Relief: DHM25 © 2012 swisstopo 0 5 10 km
Reproduced with the authorisation of swisstopo (JA100120 / JD100042)

Figure A.8.: Location of sites in the Berne study region shown as an example for topsoil
clay content (0–10 cm). Black dots are locations used for model calibration, locations with red
triangles were used for model validation.

validation sites (194)


calibration sites (902)

lake
agricultural area

Zurich

0 5 10 km

Data sources:
Soil sampling locations ©
2013 FABO Canton of Zurich (TID 22742)
Lakes: swissTLM3D © 2013 swisstopo
Relief: DHM25 © 2012 swisstopo
Reproduced with the authorisation of swisstopo (JA100120 / JD100042)

Figure A.9.: Location of sites in the Greifensee study region shown as an example for topsoil
clay content (0–10 cm). Black dots are locations used for model calibration, locations with red
triangles were used for model validation.

xxxix
A. Appendix

calibration sites (1055)


validation sites (293)

lake
forested area

Winterthur

Zurich

Uster

0 5 10 15 km

Data sources:
Soil sampling locations © 2013 FABO
Canton of Zurich (TID 22742)
Lakes: swissTLM3D © 2013 swisstopo
Relief: DHM25 © 2012 swisstopo
Reproduced with the authorisation
of swisstopo (JA100120 / JD100042)

Figure A.10.: Location of sites in the ZH forest study region shown as an example
for topsoil effective cation exchange capacity (ECEC, 0–20 cm). Black dots are
locations used for model calibration, locations with red triangles were used for
model validation.

xl
Table A.4.: Details on soil analysis methods and remarks on soil data harmonization for modelled soil properties, more information can be found in Walthert
et al. (2016, F: percentage of field estimates, PTF: percentage of observations calculated by pedotransfer functions [PTF], a: study region Be: Berne, Gr:
Greifensee, Zf: ZH forest, SOM: soil organic matter, SD: soil depth, BD: bulk density of fine soil fraction ≤ 2 mm).
property r soil analysis method soil data harmonization F [%] PTF [%] unit
texture Be, Gr sedimentation (ELF, 1996; Ribi, 2014), Accordance of field estimates and pipette measurements: RMSE 7.1– Be: 78 wt.%
field estimates (ordinal data, Brunner 9.8 % for clay and 8.2–13.2 % for silt depending on source/survey Gr: 48
et al., 1997; Jäggli et al., 1998) (Walthert et al., 2016)
gravel Be, Gr volumetric content of coarse fragments Be: 100 vol.%
> 2 mm estimated in the field (ordinal Gr: 100
data, Brunner et al., 1997; Jäggli et al.,
1998)
pH Be, Gr, Zf potentiometric measurements in H2 O or robust linear regression to transfer H2 O measurements to CaCl2 , cal- Be: 72 Be: 9 –
CaCl2 , partly field-moist samples (Zf), ibrated on 227 samples (60 sites): pHCaCl2 = 1.00 ∗ pHH2 O − 0.46, Zf: 30 (field- Gr: 10
field estimates (ordinal data by pH indi- RMSE 0.195, SSmse 0.945 computed on 55 samples (12 sites) not used for moist)
cator solution, ELF, 1996; Jäggli et al., model calibration. Accordance of field estimates and pHCaCl2 : RMSE
1998; Ribi, 2008; Walthert et al., 2004) 0.521, SSmse 0.726.
SOM Be, Gr oxidation with K2 Cr2 O7 / H2 SO4 (ELF, accordance of field estimates with measurements: RMSE 6.822 %, Be: 77 wt.%
1996), field estimates (ordinal data, SSmse 0.745 for Be, RMSE 3.51 %, SSmse 0.786 for Gr. Gr: 33
Brunner et al., 1997; Jäggli et al., 1998)
ECEC Zf 1 M ammonium chloride extraction robust linear regression to estimate ECEC from Ca, Mg, K and Al mea- Zf: 9 mmolc kg−1
(FAC, 1989; Walthert et al., 2004, 2013), sured by ammonium acetate/EDTA extraction with 604 samples (198
0.5 M ammonium acetate and 0.02 sites), SSmse 0.939, RMSE 2.650 log mmolc , computed on 142 samples
EDTA extraction at pH 4.65 (Lakanen (49 sites) not used for calibration (Nussbaum and Papritz, 2015).
and Erviö, 1971; ELF, 1996)
SD Be, Gr Plant exploitable (effective) soil depth SD derived from horizon thickness di , presence of waterlogging horizon Gr: 100 Be: 100 cm
(SD) estimated in field (ordinal data, qualifiers (g: wi = 0.33, gg: wi =P 0.33, r: wi = 0.1) and C horizon
Jäggli et al., 1998) (parent material, wi = 0) by SD = ni=1 di (1 − si ) wi where wi = 1 in
case of absence of mentioned qualifiers and si volumetric gravel content
[%] of horizon i.
BD Zf1 bulk density of fine soil faction (diameter robust linear regression with samples from Be, Gr and Swiss forests Zf: 98 Mg m−3
≤ 2 mm) sampled by volumetric cores (Chapt. 4) with SOM, soil depth and land use as covariates, calibrated
(Brunner et al., 1997; Walthert et al., on 757 samples (245 sites) with RMSE 0.253 Mg m−3 , SSmse 0.508 com-
2004) puted on 279 samples (81 sites) not used for calibration. Where SOM
was missing PTF with sampling depth and land use only, calibrated on
1040 samples (308 sites) with RMSE 0.333 Mg m−3 , SSmse 0.147 com-
puted on same 279 samples (latter PTF used only to transfer samples to

xli
A.2. Supplementary material to Chapter 3

comparable depth intervals).


1
BD was used in Be, Gr and Zf to convert data related to soil horizons to fixed-depth intervals (see Sect. 3.2.2).
Table A.5.: Details on geodata sets and derived covariates (r: pixel size for raster datasets or scale for vector datasets, a: limited to study region Be: Berne,
Gr: Greifensee or Zf: ZH forest, n: number of covariates per dataset).
geodata set r a n description and details on covariate derivations
Soil adapted 30 physiographic units of soil overview map (Chapt. 5), evidence on drained wetlands from
Soil overview map (FSO, 2000a) 1:200 000 8 historic maps, presence of drainage networks on agricultural land, evidence on anthropogenic
Wetlands Wild maps (ALN, 2002) 1:50 000 Gr 1 interventions in soil (e.g. soil ameliorations), potential for agricultural production for different crops.
Wetlands Siegfried maps (Wüst-Galley et al., 2015) 1:25 000 Gr 1
Agricultural suitability (LANAT, 2015) 1:25 000 Be 1
Anthropogenic soil interventions (AWEL, 2012) 1:5 000 Gr 1
Drainage networks (ALN, 2014b) 1:5 000 Gr 2
Parent material geological/geotechnical map units (partly aggregated a priori), sheets of geological map (Swisstopo,
Geological overview map (Swisstopo, 2005) 1:500 000 Be 4 2016b) coarsely harmonized through legend matching and aggregation, CaCO3 index based on
Map of last glacial maximum (Swisstopo, 2009) 1:500 000 1 geotechnical map (BAFU and GRID-Europe, 2010), closest distance between soil sampling point and
Geotechnical map (BFS, 2001) 1:200 000 2 geological line or polygon object on Zurich geological map (ALN, 2014a, drumlins, moraines of
Geological map (ALN, 2014a) 1:50 000 7
different glacial stages), approximate ice level during last glaciation, presence of aquifer, aquifer
Geological maps (Swisstopo, 2016b) 1:25 000 Be 1
Groundwater map (AWEL, 2014; AWA, 2014b) 1:25 000 Gr 2 covered by impermeable layer, areas suitable for gravel exploitation.
Hydrogeological infiltration zones (AWA, 2014a) 1:25 000 Be 2
Mineral raw materials (AGR, 2015) 1:25 000 Be 1
Climate mean annual/monthly temperature and precipitation, cloud cover, sunshine duration, radiation,
MeteoSwiss 1961–1990 (Zimmermann and Kienast, 1999) 25/100 m 33 degree days, continentality index (Gams, 1935), temperature variation, ratio of actual to potential
MeteoTest 1975–2010 (Remund et al., 2011) 250 m 38 evapotranspiration and site water balance (Grier and Running, 1977; Gurtz et al., 1999), NH3
Air pollutants (BAFU, 2011) 500 m Zf 2 concentration in air, NO2 immissions of years 2000, 2005, 2010.
NO2 immissions (AWEL, 2015) 100 m Gr 3
Vegetation vegetation map units (aggregated a priori with ecograms), maximal potential to grow deciduous
Landsat7 scene (USGS EROS, 2013) 30 m 9 trees (vegetation map), percentage of coniferous trees derived from spectral imagery (FSO, 2000b)
DMC mosaic (DMC, 2015) 22 m 4 and species composition data of National Forest Inventory (Brassel and Lischke, 2001, NFI,), canopy
SPOT5 mosaic (Mathys and Kellenberger, 2009) 10 m Zf 12 height (difference of digital terrain to surface model), spectral reflectance in green, red, near infrared,
APEX images (Schaepman et al., 2015) 2m Gr,Be 180
band ratios, normalized difference vegetation index (NDVI, Kriegler et al., 1969) of SPOT5 and
Share of coniferous trees (FSO, 2000b) 25 m Zf 1
Vegetation map (Schmider et al., 1993) 1:5 000 Zf 2 DMC, bands of Landsat7, 180 hyperspectral bands in solar reflected wavelength from 470 to 2420
Species composition (Brassel and Lischke, 2001) 25 m Zf 1 nm of two fully processed and mosaiked (Jehle et al., 2010; Hueni et al., 2013) flight lines sampled by
Digital surface model (Swisstopo, 2011) 2m Zf 1 the APEX imaging spectrometer (flight campaings 09/2013, 04/2014, spectral resolution of 0.6–11
nm, Schaepman et al., 2015), with indicator covariate to account for different sampling dates (at
overlap only mosaic from 04/2014 with larger contiguous area was used).
Topography elevation, slope angle, northing and easting aspects, planar, profile and combined curvatures (all
Digital elevation model (Swisstopo, 2011) 25 m 62 with smoothed versions and standard deviations in circular neighbourhoods of different radii),
A. Appendix

Digital terrain model (Swisstopo, 2013b) 2m 134 topographic position indices (Zimmermann, 2000; Jenness, 2006), terrain ruggedness indices (Riley
et al., 1999), roughness (Evans et al., 2014), dissection (Evans, 1972), surface to area ratio (Berry,
2002), multi-resolution valley bottom flatness (Gallant and Dowling, 2003), multi-resolution ridge
top flatness (Gallant et al., 2013), positive and negative openness (Yokoyama et al., 2002),
convergence indices (Kiss, 2004), LS factor (Van Remortel et al., 2004), vector ruggedness measure

xlii
(Hobson, 1972), surface convexity (Iwahashi and Pike, 2007), flow accumulation area, flow length,
topographic wetness indices by single and multi-flow algorithms (Tarboton, 1997) and
vertical/horizontal distance to existing water bodies (BAFU, 2009, scale 1:25 000).
A.2. Supplementary material to Chapter 3

Table A.6.: Descriptive statistics of soil properties for Berne study regions by soil depth [cm]
(N: number of samples, Ns: number of sampling sites, set: calibration dataset (c), independent
validation dataset (v), Min: observed minimum, Max: observed maximum, µ: mean, σ: standard
deviation, SOM: soil organic matter, SD: effective soil depth, units see Table A.4).
response depth set N Ns Min Max Median µ σ
clay 0-10 c 750 750 0.000 65.690 15.000 17.396 7.316
v 198 198 2.000 53.930 15.000 17.670 7.650
10-30 c 771 771 0.000 76.200 15.000 17.919 8.383
v 198 198 2.200 53.930 15.750 18.501 8.397
30-50 c 733 733 0.000 76.200 16.180 18.433 9.145
v 198 198 0.000 67.000 16.000 18.794 9.516
50-100 c 741 741 0.000 76.200 16.000 17.832 9.864
v 198 198 0.000 48.550 16.000 17.281 9.016
silt 0-10 c 753 753 2.000 75.000 25.000 27.581 10.531
v 198 198 5.000 75.000 25.000 28.953 12.991
10-30 c 776 776 1.800 75.000 25.000 28.505 11.541
v 198 198 4.400 75.000 25.000 29.828 12.583
30-50 c 736 736 2.000 75.000 26.000 29.807 12.618
v 198 198 4.400 75.000 25.000 31.441 15.413
50-100 c 743 743 2.000 75.000 27.143 31.004 15.261
v 198 198 4.400 75.000 25.000 31.334 16.320
gravel 0-10 c 836 836 0.000 25.000 2.000 2.711 3.178
v 198 198 0.000 15.000 2.000 2.556 2.773
10-30 c 836 836 0.000 24.000 2.000 2.960 3.607
v 198 198 0.000 29.000 2.000 2.907 3.676
30-50 c 834 834 0.000 40.000 2.000 3.764 5.359
v 198 198 0.000 50.000 2.000 3.705 5.455
50-100 c 827 827 0.000 45.500 1.600 4.838 7.448
v 198 198 0.000 51.000 2.100 4.779 6.660
SOM 0-10 c 788 788 0.300 61.900 4.000 8.470 10.251
v 198 198 0.900 57.400 4.000 7.191 7.507
10-30 c 787 787 0.490 65.400 2.900 8.630 12.785
v 198 198 0.875 46.500 3.000 6.018 7.356
30-50 c 702 702 0.000 81.100 1.000 10.532 18.674
v 198 198 0.000 50.000 1.000 4.266 8.242
50-100 c 480 480 0.000 85.000 1.000 15.711 25.414
v 197 197 0.000 72.220 0.800 5.720 12.807
pH 0-10 c 728 661 4.500 8.500 6.300 6.375 0.761
v 211 198 4.600 8.600 6.400 6.410 0.842
10-30 c 723 657 4.400 8.500 6.304 6.371 0.755
v 211 198 4.301 8.671 6.390 6.404 0.848
30-50 c 713 647 4.300 8.700 6.400 6.414 0.803
v 211 198 4.300 9.000 6.504 6.482 0.931
50-100 c 716 650 3.676 9.100 6.500 6.472 0.875
v 211 198 4.300 9.000 6.731 6.609 1.016
SD – c 838 838 12.750 224.000 70.000 72.946 31.454
v 198 198 8.000 170.100 71.536 76.115 33.084

xliii
A. Appendix

Table A.7.: Descriptive statistics of soil properties for Berne study region by soil depth [cm],
continued (CV: coefficient of variation σ/µ, CVr : robust coefficient of variation [ratio of inter
quantile range to median], Skew: coefficient of skewness, α: effective range of experimental
sample variogram [km], SSVR: spatially structured variance ratio 1 − nugget/silltot [Vaysse
and Lagacherie, 2015], Tr: transformation of response, sqrt: transformed by square root, ln:
transformed by natural logarithm, SOM: soil organic matter, SD: effective soil depth available
to plants, units see Table A.4).
response depth set CV CVr Skew α SSVR Tr
clay 0-10 c 0.421 0.600 1.723 16.000 0.720
v 0.433 0.667 1.449
10-30 c 0.468 0.600 2.219 8.100 0.708
v 0.454 0.603 1.474
30-50 c 0.496 0.575 1.715 5.700 0.644
v 0.506 0.691 1.356
50-100 c 0.553 0.725 1.290 3.700 0.575
v 0.522 0.666 0.668
silt 0-10 c 0.382 0.354 1.857 1.800 1.000
v 0.449 0.200 2.157
10-30 c 0.405 0.454 1.635 2.300 0.973
v 0.422 0.349 1.727
30-50 c 0.423 0.433 1.231 4.200 0.812
v 0.490 0.445 1.240
50-100 c 0.492 0.568 0.893 3.700 0.786
v 0.521 0.699 1.068
gravel 0-10 c 1.172 1.000 2.545 8.900 0.613 sqrt
v 1.085 1.500 2.205
10-30 c 1.219 1.625 2.377 8.900 0.436 sqrt
v 1.265 1.694 3.212
30-50 c 1.424 2.500 2.543 4.700 0.529 sqrt
v 1.472 2.500 4.286
50-100 c 1.540 4.375 2.421 2.400 0.910 sqrt
v 1.394 3.298 2.771
SOM 0-10 c 1.210 0.750 2.233 4.100 0.888 ln
v 1.044 0.956 2.914
10-30 c 1.481 1.138 2.225 3.800 0.918 ln
v 1.222 1.137 2.533
30-50 c 1.773 6.244 1.894 2.600 0.974 ln
v 1.932 2.250 3.215
50-100 c 1.618 23.420 1.375 2.200 1.000 ln
v 2.239 2.344 3.006
pH 0-10 c 0.119 0.190 0.082 2.900 0.640
v 0.131 0.203 0.081
10-30 c 0.119 0.186 0.047 6.200 0.592
v 0.132 0.216 0.053
30-50 c 0.125 0.188 -0.065 4.500 0.634
v 0.144 0.230 -0.029
50-100 c 0.135 0.185 -0.253 4.500 0.671
v 0.154 0.229 -0.225
SD – c 0.431 0.703 0.464 – –
v 0.435 0.663 0.488

xliv
A.2. Supplementary material to Chapter 3

Table A.8.: Descriptive statistics of soil properties for Greifensee study region by soil depth
[cm] (N: number of samples, Ns: number of sampling sites, set: calibration dataset (c), inde-
pendent validation dataset (v), Min: observed minimum, Max: observed maximum, µ: mean, σ:
standard deviation, SOM: soil organic matter, SD: effective soil depth, units see Table A.4).
response depth set N Ns Min Max Median µ σ
clay 0-10 c 913 902 8.000 59.520 25.000 26.412 7.402
v 194 194 12.000 48.700 25.000 25.242 7.023
10-30 c 913 902 8.000 58.750 25.150 26.434 7.386
v 194 194 12.000 59.400 25.000 25.505 7.626
30-50 c 864 853 6.400 64.200 25.420 26.637 8.174
v 183 183 9.150 46.259 25.000 25.506 8.021
50-100 c 852 841 2.400 60.400 25.585 25.912 8.970
v 183 183 2.884 47.620 25.160 24.988 9.168
silt 0-10 c 913 902 12.300 60.000 31.990 32.383 6.189
v 198 198 12.250 55.000 32.000 32.664 6.855
10-30 c 913 902 17.000 60.000 32.000 32.524 6.273
v 198 198 12.250 55.000 32.002 32.728 6.854
30-50 c 866 855 15.000 65.900 32.250 33.312 7.365
v 198 198 1.500 61.400 32.000 32.856 8.062
50-100 c 852 841 4.600 71.000 33.000 34.239 9.062
v 198 198 7.700 69.000 32.025 32.867 9.093
gravel 0-10 c 743 743 0.000 35.000 6.000 7.534 6.365
v 193 193 0.000 25.000 8.000 8.580 6.095
10-30 c 744 744 0.000 35.000 7.500 8.120 6.691
v 193 193 0.000 25.000 9.000 9.333 6.489
30-50 c 739 739 0.000 56.250 8.400 9.967 8.795
v 195 195 0.000 41.000 11.000 11.619 8.920
50-100 c 719 719 0.000 65.000 11.240 12.707 10.901
v 195 195 0.000 60.000 14.200 14.952 11.909
SOM 0-10 c 1255 1141 0.900 32.000 4.400 5.285 3.264
v 453 277 1.000 44.700 4.820 5.554 3.653
10-30 c 1165 1051 0.500 49.000 4.000 4.803 3.673
v 453 277 1.000 44.700 4.600 5.315 3.731
30-50 c 723 689 0.000 65.000 1.000 2.278 4.957
v 134 119 0.000 38.600 1.480 2.806 5.659
50-100 c 443 409 0.000 68.188 1.000 2.198 5.935
v 134 119 0.000 51.700 1.400 2.867 6.841
pH 0-10 c 1210 1078 3.286 8.082 6.718 6.535 0.616
v 452 277 4.220 7.800 6.200 6.209 0.730
10-30 c 1121 989 3.286 7.805 6.686 6.521 0.621
v 452 277 4.220 7.642 6.199 6.224 0.730
30-50 c 412 379 4.431 8.137 6.794 6.671 0.720
v 135 119 3.886 8.286 6.531 6.433 0.722
50-100 c 371 338 4.164 8.383 6.775 6.708 0.713
v 135 119 3.886 8.286 6.597 6.494 0.723
SD – c 745 745 19.000 204.000 65.000 67.032 23.270
v 198 198 12.000 184.000 66.000 70.394 22.820

xlv
A. Appendix

Table A.9.: Descriptive statistics of soil properties for Greifensee by soil depth [cm], continued
(CV: coefficient of variation σ/µ, CVr : robust coefficient of variation [ratio of inter quantile range
to median], Skew: coefficient of skewness, α: effective range of experimental sample variogram
[km], SSVR: spatially structured variance ratio 1 − nugget/silltot [Vaysse and Lagacherie, 2015],
Tr: transformation of response, sqrt: transformed by square root, ln: transformed by natu-
ral logarithm, SOM: soil organic matter, SD: effective soil depth available to plants, units see
Table A.4).
response depth set CV CVr Skew α SSVR Tr
clay 0-10 c 0.280 0.315 1.007 4.800 0.465
v 0.278 0.349 0.703
10-30 c 0.279 0.330 0.932 5.400 0.475
v 0.299 0.450 0.894
30-50 c 0.307 0.401 0.519 1.500 0.485
v 0.314 0.465 0.278
50-100 c 0.346 0.462 0.166 3.200 0.381
v 0.367 0.517 0.022
silt 0-10 c 0.191 0.217 0.883 3.300 0.425
v 0.210 0.248 0.452
10-30 c 0.193 0.236 0.842 2.600 0.498
v 0.209 0.266 0.439
30-50 c 0.221 0.272 0.808 3.100 0.357
v 0.245 0.290 0.392
50-100 c 0.265 0.328 0.774 2.300 0.515
v 0.277 0.318 0.566
gravel 0-10 c 0.845 1.667 0.939 1.400 0.525 sqrt
v 0.710 1.125 0.474
10-30 c 0.824 1.333 0.799 1.600 0.426 sqrt
v 0.695 1.056 0.436
30-50 c 0.883 1.536 1.115 1.400 0.519 sqrt
v 0.768 1.000 0.767
50-100 c 0.858 1.406 0.927 1.900 0.530 sqrt
v 0.796 1.206 0.879
SOM 0-10 c 0.618 0.550 3.395 2.500 0.611 ln
v 0.658 0.521 5.392
10-30 c 0.765 0.708 4.091 3.400 0.747 ln
v 0.702 0.602 5.207
30-50 c 2.176 1.200 6.289 3.800 0.123 ln
v 2.017 0.579 4.866
50-100 c 2.701 1.641 6.064 1.500 0.998 ln
v 2.386 0.643 5.455
pH 0-10 c 0.094 0.134 -0.855 1.700 0.734
v 0.118 0.198 -0.146
10-30 c 0.095 0.138 -0.813 2.200 0.692
v 0.117 0.198 -0.157
30-50 c 0.108 0.173 -0.505 2.400 0.540
v 0.112 0.158 -0.492
50-100 c 0.106 0.158 -0.536 2.200 0.542
v 0.111 0.153 -0.490
SD – c 0.347 0.569 0.896 3.300 0.267
v 0.324 0.481 0.824

xlvi
Table A.10.: Descriptive statistics of soil properties of ZH forest by soil depth [cm] (N: number of samples, Ns: number of sampling sites, set:
calibration dataset (c), independent validation dataset (v), Min: observed minimum, Max: observed maximum, µ: mean, σ: standard deviation,
CV: coefficient of variation σ/µ, CVr : robust coefficient of variation [ratio of inter quantile range to median], Skew: coefficient of skewness, α:
effective range of experimental sample variogram [km], SSVR: spatially structured variance ratio 1 − nugget/silltot [Vaysse and Lagacherie, 2015], Tr:
transformation of response, ln: transformed by natural logarithm, ECEC: effective cation exchange capacity, BD: bulk density of fine soil fraction
≤ 2 mm), units see Table A.4).

response depth set N Ns Min Max Median µ σ CV CVr Skew α SSVR Tr

ECEC 0-20 c 1316 1055 17.364 780.000 149.333 170.157 100.202 0.589 0.952 1.169 1.100 1.000 ln
v 528 293 17.760 492.385 121.382 150.404 94.018 0.625 1.002 1.077
40-60 c 545 379 22.673 361.191 124.671 129.479 67.608 0.522 0.896 0.491 0.500 0.923 ln
v 269 119 27.318 326.885 113.804 120.929 64.282 0.532 0.831 0.790
pH 0-20 c 1804 1475 2.585 7.650 4.842 5.137 1.352 0.263 0.539 0.262 3.200 0.890
v 561 293 2.786 7.400 4.374 4.781 1.131 0.237 0.395 0.757
40-60 c 953 693 3.050 7.810 5.165 5.269 1.282 0.243 0.480 0.198 5.700 0.491
v 329 119 3.240 7.740 5.221 5.350 1.313 0.245 0.495 0.202

BD 0-20 c 780 551 0.411 1.510 0.870 0.870 0.066 0.076 0.079 0.788 62.700 0.631
v 294 119 0.659 1.129 0.855 0.855 0.059 0.069 0.070 0.203
40-60 c 469 368 0.908 1.550 1.133 1.140 0.060 0.053 0.046 1.885 0.400 0.887
v 201 119 1.028 1.537 1.133 1.138 0.055 0.049 0.035 3.113

xlvii
A.2. Supplementary material to Chapter 3
A. Appendix

Table A.11.: Cross-validation (CV) statistics for models by study region and soil depth com-
puted on the same CV subsets for all methods, except for RF where out-of-bag predictions were
used. (RMSE: root mean squared error [units see Table A.4], SSmse : mean squared error skill
score, lasso: grouped least absolute shrinkage and selection operator, georob: robust external-
drift kriging, geoGAM: geoadditive model, BRT: boosted regression trees, RF: random forest,
MA: model averaging, NA: no convergence of georob algorithm).
depth lasso georob geoGAM BRT RF MA
[cm] RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse RMSE SSmse
Berne
clay 0-10 6.418 0.230 5.592 0.415 5.623 0.408 5.814 0.368 4.569 0.609 5.264 0.482
10-30 7.855 0.121 6.762 0.348 7.124 0.277 6.952 0.311 5.333 0.595 6.394 0.417
30-50 8.735 0.087 7.615 0.306 7.530 0.321 7.938 0.246 6.125 0.551 7.189 0.381
50-100 9.848 0.002 8.669 0.227 9.002 0.166 9.085 0.150 6.960 0.501 8.259 0.298
silt 0-10 10.518 0.001 9.286 0.221 9.379 0.206 9.351 0.210 7.444 0.500 8.732 0.311
10-30 11.157 0.064 9.955 0.255 9.996 0.249 10.216 0.215 7.884 0.533 9.342 0.344
30-50 11.387 0.184 10.094 0.359 10.333 0.328 10.546 0.301 8.101 0.587 9.538 0.428
50-100 14.032 0.153 12.826 0.293 13.095 0.263 13.769 0.185 10.165 0.556 12.132 0.367
gravel 0-10 2.997 0.110 2.425 0.417 2.714 0.270 2.848 0.196 2.138 0.547 2.474 0.393
10-30 3.284 0.170 2.703 0.438 2.807 0.393 3.079 0.270 2.322 0.585 2.663 0.454
30-50 4.990 0.132 4.336 0.345 4.613 0.258 4.616 0.257 3.641 0.538 4.246 0.371
50-100 7.220 0.059 6.310 0.281 6.367 0.268 6.785 0.169 5.195 0.513 6.071 0.335
SOM 0-10 6.712 0.571 5.025 0.759 5.472 0.715 5.059 0.756 3.969 0.850 4.662 0.793
10-30 8.507 0.557 6.318 0.755 6.579 0.735 6.669 0.728 5.082 0.842 5.866 0.789
30-50 19.559 -0.099 12.337 0.563 14.695 0.380 9.781 0.725 7.863 0.822 9.229 0.755
50-100 27.293 -0.156 43.426 -1.926 23.872 0.116 14.125 0.690 10.760 0.820 13.879 0.701
pH 0-10 0.533 0.509 0.458 0.637 0.483 0.597 0.500 0.567 0.391 0.736 0.447 0.655
10-30 0.541 0.486 0.467 0.617 0.495 0.570 0.495 0.570 0.389 0.735 0.447 0.649
30-50 0.610 0.423 0.524 0.574 0.535 0.556 0.550 0.531 0.423 0.722 0.488 0.630
50-100 0.681 0.393 0.582 0.557 0.606 0.519 0.613 0.508 0.471 0.710 0.541 0.617
SD – 28.790 0.161 26.782 0.274 27.097 0.257 27.992 0.207 21.012 0.553 25.148 0.360
Greifensee
clay 0-10 6.423 0.246 5.986 0.345 5.985 0.345 6.055 0.330 4.735 0.590 5.564 0.434
10-30 6.438 0.239 5.999 0.340 6.000 0.339 6.258 0.281 4.832 0.571 5.631 0.418
30-50 7.738 0.103 7.047 0.256 6.992 0.267 7.409 0.178 5.555 0.538 6.603 0.347
50-100 8.602 0.079 7.899 0.224 7.976 0.208 8.259 0.151 6.229 0.517 7.376 0.323
silt 0-10 6.001 0.059 5.384 0.242 5.436 0.228 5.663 0.162 4.327 0.511 5.088 0.323
10-30 6.109 0.051 5.590 0.205 5.570 0.211 5.779 0.151 4.429 0.501 5.241 0.301
30-50 7.144 0.058 6.603 0.195 6.725 0.165 6.944 0.110 5.205 0.500 6.210 0.288
50-100 8.877 0.039 8.092 0.202 8.471 0.125 8.714 0.074 6.588 0.471 7.791 0.260
gravel 0-10 5.985 0.115 5.552 0.238 5.575 0.232 5.797 0.169 4.471 0.506 5.182 0.336
10-30 6.229 0.132 5.646 0.287 5.725 0.267 6.145 0.155 4.651 0.516 5.386 0.351
30-50 8.144 0.142 8.045 0.162 7.606 0.251 7.900 0.192 6.112 0.516 7.161 0.336
50-100 10.481 0.074 9.326 0.267 9.708 0.206 9.964 0.163 7.640 0.508 8.944 0.326
SOM 0-10 3.046 0.128 2.630 0.350 3.921 -0.444 2.915 0.202 2.210 0.541 2.588 0.371
10-30 3.468 0.107 3.068 0.302 4.249 -0.339 3.314 0.185 2.396 0.574 2.934 0.361
30-50 4.932 0.009 4.631 0.126 4.915 0.016 4.666 0.113 3.329 0.548 4.290 0.250
50-100 6.206 -0.096 NA NA 12.140 -3.193 5.152 0.245 3.966 0.553 5.023 0.282
pH 0-10 0.549 0.204 0.461 0.438 0.518 0.291 0.500 0.338 0.379 0.620 0.447 0.472
10-30 0.546 0.225 0.462 0.445 0.519 0.302 0.513 0.316 0.386 0.614 0.450 0.475
30-50 0.632 0.228 0.546 0.423 0.581 0.347 0.640 0.208 0.470 0.573 0.533 0.451
50-100 0.623 0.234 0.520 0.468 0.634 0.207 0.604 0.282 0.461 0.580 0.522 0.463
SD – 20.759 0.203 19.560 0.292 18.526 0.365 20.105 0.253 15.433 0.560 17.715 0.420
ZH forest
ECEC 0-20 83.288 0.309 68.589 0.524 73.800 0.431 73.661 0.451 53.449 0.711 66.399 0.554
40-60 50.873 0.433 37.079 0.699 46.296 0.530 40.890 0.634 31.315 0.785 38.965 0.667
pH 0-20 0.939 0.517 0.785 0.663 0.874 0.582 0.841 0.613 0.619 0.790 0.748 0.694
20-60 0.943 0.458 0.719 0.685 0.786 0.624 0.775 0.635 0.581 0.795 0.680 0.719
BD 0-20 0.059 0.205 0.049 0.449 0.053 0.352 0.051 0.415 0.038 0.670 0.046 0.515
20-60 0.051 0.282 0.044 0.462 0.045 0.430 0.053 0.218 0.039 0.575 0.043 0.489

xlviii
A.2. Supplementary material to Chapter 3

Table A.12.: Ratio [%] of squared bias to mean squared error (MSE) calculated for independent
validation data (leg. map: legacy soil map 1:5 000, lasso: grouped least absolute shrinkage
and selection operator, georob: robust external-drift kriging, geoGAM: geoadditive model, BRT:
boosted regression trees, RF: random forest, MA: model averaging, NA: no convergence of georob
algorithm).
area response depth [cm] leg. map lasso georob geoGAM BRT RF MA
Berne clay 0-10 0.497 2.795 2.396 1.272 0.924 1.695
10-30 0.570 1.556 1.206 0.025 0.280 0.764
30-50 0.045 0.077 0.190 0.022 0.057 0.026
50-100 0.377 0.753 1.198 1.551 1.897 1.305
silt 0-10 1.035 3.832 0.036 0.129 0.184 0.540
10-30 0.736 1.064 0.055 0.090 0.041 0.202
30-50 0.281 0.751 0.000 0.024 0.000 0.017
50-100 0.058 0.444 0.376 0.747 0.938 0.504
gravel 0-10 3.690 0.390 0.233 1.455 2.247 0.053
10-30 3.643 0.573 0.470 0.248 1.145 0.052
30-50 5.153 1.241 0.051 1.053 1.003 0.028
50-100 8.369 1.106 0.103 0.760 1.133 0.057
SOM 0-10 2.042 0.353 0.362 0.704 1.061 0.068
10-30 0.105 1.180 0.032 2.794 5.486 1.609
30-50 8.433 0.934 0.103 8.350 14.196 4.245
50-100 12.137 1.425 2.393 7.269 10.268 1.388
pH 0-10 0.883 0.242 0.562 0.126 0.313 0.402
10-30 0.778 0.194 0.227 0.222 0.379 0.353
30-50 1.515 0.039 0.731 0.019 0.352 0.272
50-100 3.605 0.308 1.155 0.447 3.288 1.667
SD – 2.059 4.671 2.808 3.002 3.003 2.841
Greifen- clay 0-10 10.500 0.897 1.341 1.120 0.497 1.063 1.038
see 10-30 12.700 0.198 0.177 0.299 0.065 0.234 0.200
30-50 11.500 1.080 0.000 0.020 0.018 0.122 0.117
50-100 14.300 0.495 0.056 0.027 0.001 0.007 0.007
silt 0-10 0.071 0.002 0.036 0.000 0.014 0.001
10-30 0.029 0.042 0.000 0.370 0.007 0.042
30-50 0.300 0.104 0.213 0.488 0.459 0.312
50-100 2.079 0.981 1.265 1.545 1.924 1.594
gravel 0-10 26.900 13.198 5.548 0.200 0.004 0.570 2.240
10-30 19.800 14.220 0.067 0.264 0.687 1.141 1.842
30-50 9.600 14.539 6.562 1.379 1.477 1.519 4.158
50-100 0.400 16.701 7.589 1.385 2.463 2.081 5.164
SOM 0-10 5.954 0.002 3.532 0.766 0.759 1.388
10-30 8.024 0.217 2.789 0.981 0.432 1.181
30-50 6.717 3.632 5.074 0.470 0.719 2.684
50-100 5.775 NA 0.028 1.402 1.291 1.739
pH 0-10 6.285 4.702 4.252 4.931 6.038 5.532
10-30 4.715 4.694 4.600 4.023 5.564 5.084
30-50 6.604 4.615 5.982 5.020 7.798 6.450
50-100 5.570 2.378 2.897 5.097 5.551 4.438
SD – 4.400 0.064 1.064 0.003 0.007 0.798 0.261
ZH ECEC 0-20 2.839 1.540 0.707 0.692 0.754 0.243
forest 40-60 0.244 10.645 1.484 4.575 3.755 5.355
pH 0-20 1.116 1.348 0.314 0.460 2.414 1.202
40-60 0.687 3.136 1.623 0.143 0.000 0.094
BD 0-20 4.475 0.043 0.033 0.040 0.407 0.189
40-60 0.181 0.672 0.154 0.198 1.352 0.000

xlix
Table A.13.: Optimal model parameters selected by minimizing RMSE in 10-fold cross-validation or out-of-bag for RF for Berne study region by soil depth
[cm] (λ: lasso shrinkage parameter, pl : number of covariates with non-zero coefficients, pg : number of covariates in final georob model, σn2 : nugget effect, σp2 :
partial sill, α: effective range [m], ψ: robustness tuning parameter, ptot : total number of covariates in final geoGAM, ps : number of smooth effects in final
geoGAM, nt : number of boosting iterations in BRT, i: interaction depth, pp: percentage of covariates with non-zero importance, mtry : number of covariates
randomly sampled as candidates at each split in RF).
lasso georob geoGAM BRT RF model averaging weights
response depth λ pl pg σn2 σp2 α ψ ptot ps nt i pp mtry lasso georob geoGAM BRT RF
clay 0-10 0.617 16 14 19.26 21.46 26870.2 1.75 17 15 98 5 44.9 273 0.172 0.198 0.197 0.190 0.242
10-30 0.869 12 39 22.55 1.34 × 104 1.71 × 107 1.75 5 3 40 6 26.3 325 0.170 0.198 0.188 0.193 0.251
30-50 1.204 7 9 33.84 22.90 6998.3 1.75 12 10 48 3 17.7 191 0.171 0.197 0.199 0.189 0.244
50-100 1.969 1 28 37.41 34.66 1073.0 1.75 7 7 14 20 33.4 304 0.175 0.198 0.191 0.189 0.247
silt 0-10 1.303 5 6 28.71 39.30 1.44 × 104 1.75 5 3 28 8 27.0 75 0.173 0.196 0.194 0.194 0.244
10-30 1.021 13 7 39.16 47.43 1.25 × 104 1.75 7 2 30 11 34.8 181 0.174 0.195 0.194 0.190 0.246
30-50 0.784 16 22 59.69 48.07 7072.1 1.75 7 3 32 11 36.7 113 0.175 0.197 0.193 0.189 0.246
50-100 1.226 15 16 94.59 65.43 1290.7 1.75 8 3 24 20 44.9 274 0.180 0.197 0.193 0.183 0.248
gravel 0-10 0.052 26 28 0.33 0.20 2897.5 1.75 10 2 84 7 47.6 119 0.174 0.212 0.186 0.183 0.244
10-30 0.058 28 22 0.35 0.23 2928.3 1.75 15 9 152 3 44.1 92 0.172 0.206 0.195 0.184 0.243
30-50 0.106 15 21 0.62 0.25 3759.9 1.75 10 4 22 18 39.9 109 0.177 0.200 0.187 0.192 0.243
50-100 0.138 20 26 0.89 0.51 3644.3 1.75 13 6 38 7 30.1 131 0.176 0.197 0.193 0.188 0.245
SOM 0-10 0.033 24 29 0.09 0.10 2385.7 1.75 11 5 184 5 56.5 141 0.153 0.204 0.182 0.203 0.258
10-30 0.039 29 34 0.14 0.09 1857.4 1.75 13 8 106 6 49.5 337 0.153 0.206 0.191 0.195 0.256
30-50 0.537 5 16 0.60 1.78 × 104 3.98 × 109 1.75 14 10 150 5 53.0 384 0.119 0.189 0.157 0.238 0.296
50-100 0.566 11 11 6.32 2.42 × 105 1.52 × 109 1.75 11 6 118 5 45.6 116 0.161 0.002 0.118 0.311 0.408
A. Appendix

pH 0-10 0.045 20 22 0.12 0.13 1114.7 1.75 13 5 250 1 27.7 232 0.176 0.204 0.194 0.187 0.240
10-30 0.040 30 20 0.13 0.13 1176.6 1.75 7 1 100 6 48.6 121 0.174 0.202 0.191 0.191 0.243
30-50 0.045 30 19 0.20 0.14 3541.3 1.75 10 6 144 9 64.0 391 0.171 0.199 0.195 0.189 0.246
50-100 0.041 26 34 0.12 0.24 416.9 1.75 19 12 190 14 74.8 288 0.171 0.200 0.192 0.190 0.247

l
SD – 1.782 25 22 619.62 101.10 4634.1 1.75 10 4 28 16 42.3 183 0.181 0.194 0.192 0.186 0.247
Table A.14.: Model parameters for Greifensee and ZH forest study regions (for description see Table A.13, NA: no convergence of georob algorithm).
lasso georob geoGAM BRT RF model averaging weights
response depth λ pl pg σn2 σp2 α ψ ptot ps nt i pp mtry lasso georob geoGAM BRT RF
Greifensee
clay 0-10 0.517 21 25 3.29 33.37 279.4 1.75 12 5 134 2 30.0 455 0.180 0.193 0.193 0.191 0.244
10-30 0.514 22 27 4.82 31.90 259.2 3.00 11 5 44 8 37.8 245 0.182 0.195 0.195 0.187 0.242
30-50 0.930 11 26 1.72 48.34 158.1 1.75 10 7 32 14 44.4 273 0.177 0.195 0.196 0.185 0.247
50-100 0.836 13 25 51.10 16.52 7074.3 1.75 17 15 164 1 21.6 225 0.179 0.195 0.193 0.186 0.247
silt 0-10 0.656 6 27 6.02 22.59 279.3 1.75 15 12 28 11 34.1 316 0.177 0.197 0.195 0.187 0.245
10-30 0.662 7 25 1.42 30.36 237.0 1.75 11 8 16 42 51.8 414 0.178 0.194 0.195 0.188 0.245
30-50 0.798 8 37 2.26 40.87 233.0 1.75 7 5 40 9 37.1 256 0.180 0.195 0.192 0.186 0.248
50-100 1.224 4 38 6.40 60.18 270.8 1.75 6 4 12 12 21.8 413 0.181 0.199 0.190 0.185 0.244
gravel 0-10 0.111 20 22 1.19 0.00 3.6 1000.00 10 4 36 10 41.0 212 0.183 0.193 0.191 0.189 0.245
10-30 0.100 22 24 0.11 1.05 191.7 1000.00 11 5 48 3 20.0 284 0.182 0.197 0.193 0.184 0.244
30-50 0.128 21 43 1.45 0.41 2244.8 1.75 10 6 40 2 12.0 410 0.185 0.184 0.193 0.191 0.247
50-100 0.163 19 20 1.56 0.72 1339.3 1.75 11 6 44 2 13.1 53 0.180 0.197 0.188 0.189 0.246
SOM 0-10 0.021 38 48 0.01 0.14 240.8 1.75 27 22 90 20 71.7 168 0.179 0.207 0.180 0.187 0.247
10-30 0.036 21 49 0.01 0.19 279.5 1.75 19 14 24 18 40.2 147 0.175 0.198 0.191 0.183 0.253
30-50 0.077 39 6 0.46 0.16 2739.5 1.75 14 12 12 11 15.7 78 0.178 0.190 0.178 0.189 0.264
50-100 0.596 11 NA NA NA NA NA 11 8 36 16 44.8 382 0.250 0.000 0.057 0.301 0.392
pH 0-10 0.030 35 36 0.04 0.25 214.1 1.75 17 11 160 10 72.1 250 0.173 0.205 0.183 0.189 0.250
10-30 0.031 29 42 0.04 0.24 260.9 1.75 16 9 134 7 62.3 131 0.175 0.207 0.184 0.186 0.248
30-50 0.045 28 25 0.03 0.29 279.8 1.75 11 8 32 6 26.8 50 0.179 0.208 0.195 0.177 0.241
50-100 0.061 25 25 0.04 0.27 319.5 1.75 7 1 86 3 30.3 98 0.180 0.216 0.177 0.186 0.243
SD – 0.712 49 45 32.38 323.81 6.0 1000.00 14 11 44 5 27.9 384 0.180 0.191 0.202 0.186 0.242
ZH forest
ECEC 0-20 0.023 33 45 0.01 0.19 207.4 1.75 11 6 72 42 75.3 128 0.170 0.200 0.185 0.187 0.257
40-60 0.014 53 59 0.01 0.09 211.7 1.75 13 3 124 16 85.7 120 0.158 0.216 0.173 0.196 0.256
pH 0-20 0.029 46 38 0.05 0.79 89.0 1.75 20 16 96 18 81.2 127 0.169 0.203 0.182 0.189 0.257
40-60 0.029 44 41 0.07 0.77 188.8 1.75 30 24 780 2 77.3 138 0.157 0.206 0.189 0.192 0.256
BD 0-20 0.007 8 26 0.00 0.00 276.6 1.75 7 2 36 18 65.0 320 0.166 0.199 0.184 0.193 0.258

li
A.2. Supplementary material to Chapter 3

40-60 0.009 6 9 0.00 0.00 1048.4 1.75 4 3 24 7 32.1 290 0.181 0.209 0.203 0.173 0.235
A.3. Supplementary material to Chapter 5

A.3. Supplementary material to Chapter 5

Descriptive statistics of calculated SOC stocks

Table A.15.: Descriptive statistics of SOC stock, calculated for the mineral topsoil (0–30 cm),
the mineral soil to 100 cm depth and the subsoil (30–100 cm, cs: calibration set [n = 858], vs:
validation set [n = 175], stdv: standard deviation, MAD: median absolute deviation).

SOC 0–30 cm SOC 0–100 cm SOC 30–100 cm


cs vs cs vs cs vs
[kg m−2 ] [kg m−2 ] [kg m−2 ] [kg m−2 ] [kg m−2 ] [kg m−2 ]

minimum 0.83 2.66 0.99 2.80 0.00 0.00


maximum 36.10 22.60 96.39 56.97 63.17 37.27
mean 7.09 8.09 10.89 13.29 3.80 5.20
median 6.09 7.30 8.94 11.18 2.79 3.56
stdv 4.10 4.12 7.25 8.33 4.06 5.16
MAD 3.03 3.45 4.53 6.05 2.14 3.03

Evolution of root mean squared error in model building

Figure A.11.: Mean squared errors (MSE) plotted against the complexity of the model for
SOC stock in 0–100 cm depth. Shown are MSE of the complete LASSO selection path and of the
robust REML model fits after model reduction by LASSO, backward selection by tenfold cross
validation and of the final model after aggregating levels of categorical covariates (p: number of
covariates in model). The vertical bars give the standard errors computed from the 10 cross-
validation subsets.

liii
A. Appendix

Information on covariates representing parent material and soil

Table A.16.: Description of map units used to represent parent material and pedogenetic con-
ditions. The physiogeographic units A–Z of the soil map (SM, map scale 1:200 000, FSO, 2000a)
were intersected with selected polygons of the Geological Map of Switzerland (GM, map scale
1:500 000, Swisstopo, 2005) and with units of the maps of the Last Glacial Maximum (LGM,
map scale 1:500 000, Swisstopo, 2009) and of the Biogeographic Subregions (Gonseth et al.,
2001).

Label Description
A Tabular Jura
B Basins and valleys in Haute Chaine and Tabular Jura
C Elongated valleys in Haute Chaine Jura
D Plateau Jura
E Ridges in Haute Chaine Jura
F Plains on lower Central Plateau
G Moraine hills
H Lower Molasse hills partly covered by moraines
J Fluvial valleys on Central Plateau
K Molasse hills at intermediate altitude partly shaped by glaciers
L Drumlin landscapes with marked relief
M Higher Molasse hills with marked relief shaped by erosion (Hörnli)
N Higher Molasse hills with marked relief shaped by erosion (Napf)
O Northern Alpine foothills consisting predominantly of sandy Molasse
P Northern Alpine foothills consisting predominantly of Molasse conglomerates
Q Wide Alpine valleys
R Narrow Alpine valleys
S Alpine Flysch and Bündner slate, mainly within the Northern Alps
T Alpine Bündner slate in the upper Rhone valley and Ticino
U Alpine limestone mountains
V Alpine mountains of crystalline basement consisting of hard rocks
W Alpine mountains of crystalline basement consisting of easily weatherable rocks
X Southern Alpine foothills consisting of Molasse sediments and partially covered by
moraines
Y Fluvial valleys in Ticino
Z Plains of Magadino and Mendrisio
Additionally created units:
FLY Flysch formations tending to form wetlands (GM units ’ha’,’hd’,’hj’,’ie’,’ja’,’jd’,’jc’)
DEC Old fluvioglacial gravel-rich terraces dominantly with strongly acid soils (GM unit ’an’
within SM units F, G, H, P, A, E and J)
MOR Old glacial till and moraines dominantly with strongly acid soils (GM unit ’al’ outside
LGM)
MOW Younger glacial drift with less acid soils (GM unit ’al’ within LGM)
Uv Permian sand stones (Verrucano) often carrying podzols, but intermingled with other-
wise calcareous soils (GM units ’jo’, ’fq’ and ’fp’ within SM unit U)
Vsa SM unit V within the Biogeographic Subregion SA1 (Southern Alps including Poschi-
avo and Val Bregaglia)
Vst SM unit V within the Biogeographic Subregion SA2 (Southern Ticino)
Wsa SM unit W within the Biogeographic Subregion SA1
Wst SM unit W within the Biogeographic Subregion SA2

liv
A.3. Supplementary material to Chapter 5

Estimated parameters of final models for SOC stocks

Table A.17.: Regression coefficients β and standard errors (SE) for each covariate used in the
final model for prediction of SOC stock in 0–30 cm. Soil map units (factor with 34 levels, see
Table A.16) encoded as treatment contrasts with first level of categorical covariate as reference.
TPI500: topographic position index (Jenness, 2006) with radius of 500 m.
Covariate β SE
Intercept 4.1220 0.2317
Mean annual precipitation (square root) [mm] 0.0109 0.0014
Reflection in near-infrared band of SPOT5 mosaic [%] -0.0011 0.0034
TPI500 for soil map units rich in clay (SM units A, E, FLY) [m] -0.0018 0.0005
TPI500 for soil map units poor in clay (remaining SM units) [m] 0.0029 0.0001
Mass of soil particles < 2 mm assigned
to geotecnical map units [0.1 kg m-2 ] -0.0003 0.0001
Soil map units A, E, FLY, Vst, Wsa, Wst (reference units) 0.0000
B, C, D, G, F, H, L, K, M, N, MOW, MOR, DEC -0.3770 0.0623
J, Q, O, P, V, W, X, Y, Z, Vsa -0.4978 0.0574
R, S, T, U -0.2596 0.0555

Table A.18.: Regression coefficients β and standard errors (SE) for each covariate used in the
final model for prediction of the SOC stock in 0–100 cm. Soil map units (Table A.16) encoded
as treatment contrasts with unit A as reference.
Covariate β SE
Intercept 3.8429 0.2244
Mean March precipitation (square root) [mm] 0.0336 0.0049
Reflection in near-infrared band of SPOT5 mosaic [%] -0.0094 0.0036
Slope angle (resolution 2 m) [◦ ] 0.0037 0.0017
Soil map units A (reference unit) 0.0000
B, C, D, G, F, H, L, V, Y, MOR, MOW, DEC -0.2740 0.1440
E, FLY, Vsa,Wsa, Wst 0.1761 0.1504
J, Q, O, P, X, Z -0.4165 0.1501
K -0.3772 0.1589
M, N -0.2905 0.1744
R, S, T, U, W -0.0754 0.1466
Uv -1.1397 0.2262
Vst 0.5959 0.1695

Table A.19.: The estimated parameter of the exponential variograms fitted for the final models
for SOC stocks in 0–30 and 0–100 cm show weak spatial autocorrelation. For both models an
effective range of about 600 m was fitted.
nugget τ 2 [(log(kg m−2 ))2 ] sill σ 2 [(log(kg m−2 ))2 ] range α [m]
0–30 cm 0.0008 0.0013 203.6
0–100 cm 0.0009 0.0013 211.1

lv
A. Appendix

Partial residual plots for covariates of final models


The partial residual plots (e.g. Faraway, 2005, p. 72) in Figs. A.12 and A.13 reflect
the positive coefficients of precipitation by ascending and the negative coefficients
of near-infrared reflectance by descending curves (solid line). The partial residuals
of the soil map units (panel f in Fig. A.12 and in Fig. A.13) show large SOC stocks
for map units belonging to the Jura mountains with dominantly calcareous soils
(A, E), the Flysch formations (FLY), both rich in clay, and the Southern Alps
(Wsa, Wst, Vsa, Vst). Very small SOC stocks were found on Permian Verrucano
(Uv).

1.0 ● ●

a) ●
b)
● ● ● ●


● ●

● ● ● ●

0.5 0.5 ●
partial residuals

partial residuals
● ●
● ● ●
● ● ●● ● ●
● ●
● ●● ● ●
● ● ● ●● ● ● ● ●
● ● ● ● ● ● ●
● ●● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ●● ● ●● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●
● ● ● ● ●● ● ●● ● ● ● ● ● ●
● ● ● ●

● ● ●● ● ● ● ● ● ● ●
●● ● ●
● ●
● ●● ●● ● ● ●
● ● ●

● ● ●
●● ● ● ● ●● ● ● ●●●●● ● ●● ●● ● ● ● ● ● ●
● ● ● ● ● ● ●




● ●
● ● ● ● ● ● ● ●● ● ●
● ● ● ● ●
● ● ● ● ● ●



● ● ● ● ●

● ● ●
● ●
● ●
● ●● ●●● ●● ● ●● ●● ●●●● ● ●● ● ● ● ●
● ●●
● ● ●
● ●● ● ● ● ● ●

● ● ●



●● ● ● ●●
●● ● ●●● ●●●
●●
● ●●●●● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ●● ● ● ●●
●●● ●● ●●●●●●●●●● ● ● ● ● ● ● ● ● ●
● ●
● ● ●
● ● ●
●●● ● ●● ● ● ●● ●
●● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ●


● ●
● ● ● ●● ● ● ● ●●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ●
● ●
● ● ● ● ● ●● ●● ● ●● ● ●● ●● ●●●●

●● ● ● ● ● ● ●
● ● ● ● ● ● ●

● ●
● ●
● ●
● ●
● ●


● ●●● ● ● ●
● ●●● ●● ● ●● ● ●
● ●● ●● ● ●●●●
● ● ●●●● ●● ●● ● ● ● ● ●
● ● ●


● ● ● ●


● ●





● ● ● ● ● ●
● ● ●●●● ●● ●●●● ● ●● ●● ●●●● ● ●●
● ●
●● ●
● ● ●
● ● ●
● ●
● ● ● ● ●

● ●
● ●
● ●
● ●
● ●

0.0
● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●

0.0
● ●● ● ● ●● ● ● ●● ●● ●● ●
● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ●● ● ●
● ● ●● ● ●●●● ●
●●● ● ● ●● ● ●● ● ● ●
● ● ●
● ● ●● ●● ● ●●●●● ● ●●● ●● ●
● ●
● ● ● ● ●

● ● ●


● ●





● ●





● ● ● ●
● ● ● ● ● ● ● ● ●● ●● ● ●●
● ●●●
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ●●

●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ●●● ● ●●
●● ●

●● ●
●●● ● ● ● ● ● ● ●● ● ●
● ●
● ●


● ●












● ●



● ●
● ●
● ● ●●● ●
●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ●
●●● ● ● ● ● ● ● ●●●●●● ●● ●● ● ●●● ● ●●
● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ●
● ●
● ●
● ●


● ● ●●● ●● ●●● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●

● ● ●
● ● ● ●
● ●
● ●
● ● ●
● ● ●● ●●● ●● ●● ● ● ● ● ● ● ● ●

● ●
● ●
● ●
● ● ●
● ● ● ● ●

● ●● ● ●
● ● ●● ●● ●● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●



● ●
● ● ● ● ● ● ● ●
● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●●● ● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●● ●● ●● ●● ● ● ●
● ●
● ● ●

● ● ●

● ●● ● ● ●●


● ● ●


● ●
● ● ● ●● ● ● ● ● ● ● ● ●
● ●
● ● ●● ● ● ● ●
● ● ●
●●●●● ● ● ●● ●●●
●● ● ● ●● ●● ● ●
● ● ● ● ●
● ●● ● ● ● ● ● ●
● ●● ● ● ● ● ● ● ●
● ● ●● ● ● ● ●● ●● ● ● ●

● ● ● ●

● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ● ● ● ●

−0.5 ●


● ●●



● ●

−0.5
● ● ● ●

● ● ●


● ●

−1.0 ● ●

−1.0
10000 15000 20000 10 20 30 40 50

square root of mean annual precipitation near−infrared reflectance

● ●

c) ● d) ●

● ●


● ●

0.5 ●

0.5 ● ●
partial residuals

partial residuals


● ● ●

●● ● ●
● ● ●
● ●


● ●
● ● ●● ●●
● ●


● ● ● ●

● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●● ● ●
● ●

● ● ● ●

● ● ● ● ● ●

● ● ● ● ● ● ● ●● ●
● ●
● ● ● ●●● ●
●●



● ● ● ● ● ● ●● ●●

● ● ● ●● ●● ● ●
● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ●
● ●
● ● ● ● ● ●● ●

● ● ●

● ● ● ●● ●

●●


●● ●●●● ●
● ● ● ●●
● ● ●
● ●●
● ● ● ● ●● ● ●● ● ● ●
● ●● ● ●

●● ● ● ●

● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ●
● ● ● ● ●

● ● ● ● ●● ● ● ●●●●●● ●● ●
●●●● ●●●● ●● ● ● ●
●● ● ● ● ● ● ● ● ●
● ●●● ● ● ● ●● ● ●
●●
●● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ●●● ●●● ● ●●
● ●
● ●
● ● ●
● ● ● ● ●● ●●●● ●●
●● ●● ●
●●●● ● ● ●● ●●●● ●

0.0
● ● ● ● ● ● ●●●● ● ● ● ● ●
● ● ●


● ● ● ●
● ● ● ●●
● ● ● ● ●●●●
●●
●● ●
●● ●●●● ● ●● ● ● ● ●
● ● ● ● ●● ● ●
● ●
● ●● ● ●● ●●● ●
●●● ● ●●● ●● ● ●● ●

0.0
● ● ● ● ● ●● ● ● ●● ●●● ● ●● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●● ●
●● ●
●●●●● ● ●● ● ●● ● ●● ●

● ●●●
● ● ● ●
● ● ● ●●● ● ● ● ●

● ● ●●●● ● ●● ●● ●● ● ●● ● ● ● ●
● ●
● ● ● ● ●● ● ●● ●
● ● ●● ● ● ● ●● ●
● ● ●● ●● ● ● ● ●●●●● ●●● ● ●● ● ●
● ● ● ● ● ● ● ● ●●● ● ●●●

● ●
●●
● ● ● ● ●●●
●● ● ● ● ● ●● ● ● ●
● ●●
● ● ● ● ● ●● ● ●●
● ●
●●
● ●●● ●● ●● ●●●
● ●
● ● ● ●● ● ●

●● ●●●● ●● ● ●●●●●●
● ● ●●● ● ● ● ●●● ●
● ● ●
● ● ●
● ●
● ● ● ●●
●● ● ●
●● ●
● ● ●● ● ● ●
● ●● ● ● ●● ●●● ●● ● ● ●● ● ● ● ●
● ● ● ●●●
● ●
● ● ● ●● ● ●
● ●●●●● ● ●● ● ●● ● ● ● ● ● ●

●● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ● ● ●●●●● ● ● ●●● ●
●● ● ●● ● ●
● ●
● ● ●● ● ● ●●● ● ●● ● ● ●
● ● ●● ● ● ● ●

● ● ● ● ● ●● ● ●● ●
● ●


● ● ● ● ● ● ●● ● ● ● ●
● ●

● ●● ●


● ●● ● ● ● ● ●
● ● ●

● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●

● ● ●●●● ●
● ● ● ●● ● ●
● ●
● ● ●
● ●
● ● ● ● ● ●● ●
● ●● ●

● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ● ● ●
● ● ● ● ●● ●

−0.5 ●
● ●

−0.5
● ●



● ● ●


● ●

−100 −50 0 50 100 −100 −50 0 50 100 150

TPI 500 m * soils poor in clay TPI 500 m * soils rich in clay
1.0 ●
1.0 ●

e) f)






● ●


● ●
● ●
● ●

0.5

● ●



● ● ●
● ● ●
● ●
● ●
● ●
● ●
partial residuals

partial residuals


● ● ●
● ● ●

0.5 ●





● ●



● ●










































● ●
● ●
● ● ●
● ●
● ● ● ● ● ●

● ● ●

● ● ● ● ● ●
● ●
● ●

● ● ●
● ●

● ● ●


● ● ●
● ● ●

● ● ● ● ●
● ● ●
● ●

● ●
● ● ● ● ● ● ●
● ●

● ●
● ● ● ●
● ●
● ●


● ● ● ● ● ●

0.0
● ●
● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●
● ● ●

● ● ●
● ● ● ● ●
● ●
● ●


● ●
● ● ● ● ● ● ● ● ●
● ● ●

● ● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ● ● ● ● ●
● ●
● ●

● ● ● ● ●● ● ● ● ● ● ●
● ●
● ● ● ● ● ●● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ●

● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ● ● ● ● ●

● ●
● ●

● ● ● ● ●
● ●

● ●
● ● ●
● ●
● ●
● ●
● ●
● ●●








● ● ● ● ●









● ●
● ● ● ● ● ●
● ● ●
● ●
● ●

● ● ● ● ● ● ● ●●● ● ● ● ●
● ●

● ● ● ● ● ● ●
● ● ● ● ● ●
● ●●
● ● ● ● ● ●

0.0

● ● ●
● ● ● ● ● ●
● ● ● ●

● ●
● ● ● ● ● ● ● ●
● ●
● ●
● ●
● ● ● ● ●
● ●



● ●●● ●
● ● ● ●

● ●

● ●
● ●
● ● ● ●
● ●
● ● ●● ● ●

● ●

● ●

● ● ● ●
● ● ● ● ●
● ● ● ●

● ● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ● ●●
● ● ● ●
● ● ●

● ● ● ● ●
● ● ● ●
● ● ●
● ●

● ● ●
● ●
● ● ● ●●
● ●



● ●
● ● ●

−0.5
● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ●●
● ● ●
● ●
● ● ●
● ● ●
● ● ● ● ● ● ●
● ●

● ●
● ● ● ● ●●● ●
● ●
● ●
● ●
● ● ●●● ● ● ●

● ● ● ● ● ● ●
● ●
● ● ●

● ● ●
● ●● ●

● ●
● ●

● ● ●
● ●
● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●

● ● ● ● ●
● ● ●
● ● ● ●

● ● ●
● ●
● ●
● ●


● ● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ●

● ●

−0.5 ● ●

● ●
● ●

−1.0


● ●

A,E,FLY, B,C,D,G,F, J,Q,O,


1600 1800 2000 2200 2400 Vst,Wsa, H,K,L,M,N, P,V,W,X,
Wst MOW,MOR,DEC Y,Z,Vsa R,S,T,U Uv

soil particles < 2 mm soil map units

Figure A.12.: Partial residual plots for each covariate of the final model for prediction of the
SOC stock in 0–30 cm depth (TPI500: topographic position index with radius of 500 m [> 0 for
mounds and < 0 for depressions]; solid lines: fitted coefficient; dashed lines: fitted coefficients ±
SE, see Table A.16 for an explanation of the labels of the soil map units).

lvi
A.3. Supplementary material to Chapter 5

● ●

a) b)
1.5
1.0
partial residuals

partial residuals




● ●


● ● ●


1.0
0.5 ●

● ●●




● ●

● ●
●●

●●


● ● ●


● ● ●●●






● ●










●● ●●
●● ●●● ●●● ● ● ● ● ● ●
● ●● ●● ● ● ● ● ● ● ●
●● ● ● ● ● ●● ●●● ● ●● ● ●●●●●● ● ● ●

0.5
●● ● ● ●● ● ● ● ●● ● ● ●
● ●● ●● ● ● ● ● ● ● ● ● ●
●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●

●● ● ●●● ●● ●●●
● ●● ●● ●●●● ●●●●● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●
● ●
● ● ●
●●● ● ●●●● ●● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●●●
● ●●
●● ●● ●●● ●●● ●●● ●
● ● ●●●● ●
●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●●
●●● ●

●●
● ●●●●
●●●
● ● ●● ●● ●●● ●

● ●●● ●● ●
●●
● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ●●● ● ●● ●● ● ●●● ● ●●●●
● ●● ● ● ●● ● ● ● ● ●
● ● ●●● ● ●● ● ●

0.0 ● ● ● ● ●
●●

● ●● ●●● ●
● ●●● ●●● ● ●● ●● ● ● ● ●
●● ●
● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●● ● ● ●●●●
● ●●
● ●● ● ●
●●●● ● ●● ● ●● ●● ●
● ● ●● ● ● ● ●● ● ●
● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ●● ●●● ●●
● ●
●●

●●●● ● ● ●●● ● ● ●● ●● ●●
● ● ● ● ● ● ● ● ●
● ●●● ●● ● ●
●● ●●
● ●●
●●

● ●●
● ●●
●● ● ●● ●
●●●●●

● ● ●● ● ● ●● ● ● ● ●
● ● ●
● ● ●
● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●● ● ● ● ●● ●
● ●

● ●● ● ●

● ●●●

●● ● ●●
●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ●

● ●
● ● ● ● ●

● ● ●
●●● ● ●●●● ●●●●●
● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ●
●● ● ●● ● ● ●
●●●
● ● ●● ●
●●●●●
●●●● ●●●
● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ●
● ● ●
● ● ●

● ●
● ● ● ●
● ● ●
● ●● ● ●● ●● ●●● ●● ●
●● ● ●● ● ● ●● ● ● ● ● ●
● ● ●●●●●●● ● ●●
● ● ●● ● ● ● ● ● ● ● ●

0.0
● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●●●●
● ●
●● ●●● ● ● ●● ● ● ● ●
● ● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ● ●
● ●● ● ● ●●● ●●● ●● ● ● ● ● ● ● ●

● ●
● ●
● ●
● ● ●
● ●


● ● ●

● ●


● ● ●
● ●


● ● ● ●
●●● ● ● ●●●● ●● ●● ● ●●● ● ●● ● ● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ●
● ● ●
● ●
●● ● ● ● ● ● ● ●●● ● ●
●●
●● ● ●● ● ● ● ● ● ●
● ● ●
● ● ●

● ●



● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ●● ● ● ● ●● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
●● ● ●
● ●●
● ● ● ● ● ● ●
● ●
● ●
● ● ● ●
● ●
● ●

● ●
● ●


● ●



● ●
● ● ● ●

● ● ● ● ● ●
●● ● ●
● ●● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ●


−0.5
● ● ● ● ● ● ● ● ●
● ●
● ●● ● ● ●
● ● ● ●

● ●
● ●
● ●




● ● ●
● ● ●

● ● ●
●● ● ● ●● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ●



● ●
● ● ● ● ●
● ● ●

● ● ●
● ● ● ● ●
● ●
● ● ● ●

−0.5
● ● ● ● ● ●
● ● ● ●

● ●

● ●

−1.0 ●

● ● ●

−1.0
500 1000 1500 2000 10 20 30 40 50

square root of mean March precipitation near−infrared reflectance

● ●

1.5
c) d)
1.5
partial residuals

partial residuals
1.0 1.0 ●









● ●


● ●

● ●

● ●
● ●

● ● ●


● ●



● ●

0.5 ●

















● ●








0.5 ● ● ●
● ●

● ●
● ● ● ●
● ● ●
● ●

● ● ●

● ●
● ● ● ● ● ● ●● ● ● ●
● ●
● ●



● ●




● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●

●● ● ● ●

● ●● ● ● ● ● ● ● ● ● ● ● ●

0.0
●● ● ● ● ● ● ● ● ●
● ●
● ● ●

● ● ● ● ● ● ●●
● ●●● ● ●●
● ● ● ● ● ● ● ● ●
● ●



● ●

● ●







● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ●
● ● ●● ● ● ●
● ●
● ● ●

●●
● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ●● ●●● ●● ●● ●
●● ●● ●● ● ● ●
● ●
● ● ● ●
● ●

● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●

● ●

● ● ● ●


● ● ● ● ●● ●● ● ●●● ●● ●● ●
●●
●● ●●●●● ● ●● ● ●
● ● ● ●
● ●
●● ● ●● ● ● ●● ● ●
● ● ●● ●● ● ● ● ● ●

● ●●
●● ● ● ● ●●● ●●● ● ● ●● ● ● ●
● ●
● ●
● ●

●● ● ● ● ● ● ●● ● ●
● ●● ●● ●● ● ● ● ●● ● ●
● ● ● ● ●
●● ● ●●● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ●● ● ●●● ● ●●●● ● ● ● ●
● ●
● ● ●

0.0 ●●
● ●●
● ● ●●● ● ● ● ● ●● ● ●
●● ●
● ● ● ● ●● ● ● ● ● ●●● ●
● ● ●
● ●
● ● ●
●● ● ● ●● ●● ● ● ● ●
● ● ●●● ●●● ● ●●● ●● ●
●●● ●●● ●● ● ●●● ● ●● ● ● ●
● ●●● ●●●● ● ● ● ●● ●● ● ● ●●●●● ●● ● ● ● ●●●●
●●
● ●
● ●● ●●● ● ●

● ●
● ●
● ●
●●●●● ● ● ● ● ● ●●● ● ● ● ●●● ●● ● ●●● ●●● ●●●●
●● ● ●●
● ●●
● ●
● ●●● ●●●

● ● ● ●

●●●● ● ● ● ● ●●●●●● ● ●
● ●● ● ●● ● ● ●● ● ● ● ● ●
● ● ●

−0.5
●●● ●
●●●
● ● ● ●●●● ●● ● ●
● ● ● ● ●
● ● ●
● ● ● ●
● ● ●

●●● ●● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ●●● ● ●

●●● ● ●●●●●
● ●● ● ● ● ●●● ●● ●●● ● ● ●● ●●● ●● ●●● ●●
● ● ● ●
● ●


●● ● ●●●● ●● ●● ●● ● ●● ● ●● ● ● ● ● ●
● ● ● ● ● ● ●
● ●●●●●● ● ●●
● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●●●
●●● ● ● ●
● ● ●
●●
●● ●●● ●●
●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ●● ●
● ● ●
● ● ● ●

●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ●● ● ●
● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ●● ●
● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ●● ●
● ●● ● ● ● ●● ●

−1.0

−0.5 ●
● ● ● ● ●





● ● ●


● ●

−1.5 A
B,C,D,G,F,H
L,V,Y,MOR
MOW,DEC
E,FLY,Wsa
Wst,Vsa
J,Q,O
P,X,Z

M,N

R,S
T,U,W

Uv

Vst
0 10 20 30 40 50

slope angle soil map units

Figure A.13.: Partial residual plots for each covariate of the final model for prediction of the
SOC stock in 0–100 cm depth. (solid lines: fitted coefficient; dashed lines: fitted coefficients ±
SE, see Table A.16 for an explanation of the labels of the soil map units).

Monte-Carlo approximation of the lognormal block kriging


variance

To approximate Var[S(Bk ) − S̃(Bk )] (Eq. 5.20 in Chapt. 5), we selected i) nk


of the Nk nodes of the 100-m grefid falling into region Bk , or ii) all Nk grid
nodes discretizing the forest area of Switzerland randomly without replacement
and computed for each such sample the approximation

1 X
Var[S(Bk ) − S̃(Bk )] ≈ Var[S(si ) − S̃(si )] (A.1)
Nk2 s ∈ B
i k

Nk − 1 X X
+ Cov[S(si ) − S̃(si ), S(sj ) − S̃(sj )].
Nk nk (nk − 1) s ∈ sample s ∈ sample, s 6=s
i j j i

For the ecoregions, we computed the above expression for 1 000 independently
chosen samples, each sample consisting of max(0.01 Nk , 500) nodes in Bk , and
approximated Var[S(Bk ) − S̃(Bk )] by their mean.

lvii
A. Appendix

Figure A.14.: Mean (averaged up to i samples) of approximated block kriging variances com-
puted each by Eq. A.1 for random samples of size nk = 500 (solid line) and value of block kriging
variance computed directly by Eq. 5.20 in main article for stratum Alps ≤600 m (Nk = 9 367
grid nodes, grey dashed line).

To predict the mean stocks for whole Switzerland, we averaged the approxima-
tions for 2 000 samples, each sample consisting of about 5 500 randomly chosen
grid nodes. Figure A.14 shows that the Monte-Carlo approximation is excellent
for a region with 9 367 grid nodes, for which we could evaluate Eq. 5.20 in main
article directly. In Eq. A.1 the covariance between the lognormal point prediction
errors at two locations si and sj was computed by

Cov[S(si ) − S̃(si ), S(sj ) − S̃(sj )] = µθ̂ (si ) µθ̂ (sj ) {exp (Cov[Y (si ), Y (sj )]) (A.2)
   
− exp Cov[Y (si ), Ỹ (sj )] − exp Cov[Ỹ (si ), Y (sj )]
 o
+ exp Cov[Ỹ (si ), Ỹ (sj )]

with µθ̂ (si ) (and µθ̂ (sj ) analogously) approximated by

µθ̂ (si ) ≈ exp x(si )T βˆθ̂+1/2 (τ̂ 2 +σ̂2 )



.

Cov[Y (si ), Y (sj )] can be computed from the estimated variogram, exploiting the
well-known relation between a weakly stationary variogram and an autocovariance
function (e.g. Diggle and Ribeiro, 2007, p. 47), and the remaining covariance terms
are given by

" ! # !
  Zˆθ̂   Γ−1 γθ̂ (sj )
T
Cov[Ỹ (si ), Ỹ (sj )] = γθ̂ (si ) Γ−1 , T
x(si ) Cov , Zˆθ̂T , βˆθ̂T θ̂ ,
θ̂ βˆθ̂ x(sj )
(A.3)

lviii
A.3. Supplementary material to Chapter 5

" ! #
  Zˆθ̂
Cov[Ỹ (si ), Y (sj )] = γθ̂ (si )T Γθ̂−1 , x(si )T Cov , Y (sj )
βˆθ̂
!
  γθ̂ (sj )
= b γθ̂ (si )T Γ−1 , x(si )T
M−1 , (A.4)
θ̂ XT γθ̂ (sj )

where b, X, M and the covariance matrix of (ẐT


θ̂
, β̂θ̂T ) are as in Künsch et al.
(2011).

Validation of prediction uncertainty

95 %−prediction interval
SOC stock 0−100 cm [kg m−2]

60 ● prediction ●
● calculated stock within interval
● calculated stock outside interval
40 ●●●

● ●
● ● ● ●
● ● ● ●●●
● ●●
● ● ●●●
● ● ● ● ● ● ●●●●● ●
● ● ● ● ●
20 ● ● ● ● ● ●
● ● ●
● ● ● ●●●●●●
● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●


● ● ● ●● ● ●

●●●●●●●●●●●●●●●●●●●●●●● ● ● ●
● ● ● ● ● ●● ● ● ●●
●●●●●●● ●
●●●●●●●●●●●
● ●● ● ● ● ● ● ● ●
● ● ● ● ●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●● ● ● ●●● ● ●
●●●●●●●●●●●● ●●●●●●●●
● ● ●● ●● ●
● ● ●● ● ● ●
●● ●●●●● ●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●● ● ● ● ●
●●●
●●●●●●
● ●● ● ●●●●●●● ● ● ● ● ● ● ● ● ●
● ● ●
● ●●● ● ●
● ● ● ●
●● ●
● ● ● ●

0
0 50 100 150
rank of predictions

Figure A.15.: Ranked predictions of the SOC stock down to 100 cm depth of the mineral soil
for the 175 sites of the validation set, along with 95 %-prediction intervals (vertical grey lines).
Calculated stocks inside the intervals are plotted by open circles, those outside by dark filled
symbols.

lix
A. Appendix

1.5 1.5

density 1.0 1.0

density
0.5 0.5

0.0 0.0
95 %−prediction interval 95 %−prediction interval

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

PIT SOC stock 0−30 cm PIT SOC stock 0−100 cm

Figure A.16.: Histograms of probability integral transform (PIT, Gneiting et al., 2007) com-
puted for the calculated SOC stock of the validation set (n = 175). The convex shape of the
histograms indicates slight overestimation of prediction uncertainty.

Table A.20.: Statistics of relative prediction errors of soil organic carbon (SOC) stocks in two
depth compartments (0–30 cm, 0–100 cm) for the validation set (n = 175). The statistics are
reported for the robust method used in the article (robEDK), for nonrobust external-drift kriging
(EDK), and predictions by nonrobustly (OLS) or robustly fitted (MM estimator) linear regression
models (ignoring residual autocorrelation). The model fits used the same set of covariates as for
the final robEDK.
model BIAS RMSE R2 robBIAS robRMSE robR2 CRPS

0–30 cm robEDK 0.135 0.488 0.346 0.070 0.388 0.337 0.221


EDK 0.128 0.483 0.349 0.063 0.394 0.342 0.220
MM est. 0.142 0.519 0.286 0.072 0.407 0.279 0.229
OLS 0.143 0.500 0.335 0.077 0.389 0.321 0.222

0–100 cm robEDK 0.152 0.556 0.477 0.066 0.420 0.403 0.247


EDK 0.147 0.553 0.473 0.067 0.425 0.401 0.248
MM est. 0.149 0.566 0.482 0.074 0.402 0.408 0.245
OLS 0.162 0.569 0.468 0.082 0.428 0.391 0.249

lx
A.4. List of publications

A.4. List of publications

Articles in peer-reviewed journals

Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Estimating soil or-
ganic carbon stocks of Swiss forest soils by robust external-drift kriging, Geoscien-
tific Model Development, 7, 1197–1210, doi:10.5194/gmd-7-1197-2014, URL http:
//www.geosci-model-dev.net/7/1197/2014/, 2014.
Nussbaum, M., Papritz, A., Zimmerman, S., and Walthert, L.: Pedotransfer function
to predict density of forest soils in Switzerland, Journal of Plant Nutrition and Soil
Science, 179, 321–326, doi:10.1002/jpln.201500546, 2016.
Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., Schaep-
man, M. E., and Papritz, A.: Evaluation of digital soil mapping approaches with
large sets of environmental covariates, SOIL Discussions, 2017, 1–32, doi:10.5194/
soil-2017-14, URL https://www.soil-discuss.net/soil-2017-14/, 2017a.
Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A.: Mapping of soil
properties at high resolution in Switzerland using boosted geoadditive models, SOIL
Discussions, 2017, 1–32, doi:10.5194/soil-2017-13, URL http://www.soil-discuss.
net/soil-2017-13/, (accepted on 06/10/2017 to be published in SOIL), 2017b.

Oral presentations

Nussbaum, M.: Bodenkarten mit statistischen Methoden, oral presentation at PMSoil


Stakeholder Meeting at ETH Zurich, 11 Febuary, 2016a.
Nussbaum, M.: Karten von Bodeneigenschaften mit statistischen Methoden, oral pre-
sentation at PMSoil Stakeholder Meeting at ETH Zurich, 20 May, 2016b.
Nussbaum, M.: Digital Soil Mapping for Switzerland Evaluation of statistical approaches
and mapping of soil properties, oral presentation, PhD Exam 3.3.2017 M. Nussbaum,
Zürich, Switzerland, 2017.
Nussbaum, M. and Papritz, A.: Mapping of soil properties by component wise gradi-
ent boosting, oral presentation at the Conference ”GeoENV 2014”, Mines ParisTech,
Paris, France, 2014.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Modellierung des or-
ganischen Kohlenstoffgehalts und -vorrats in Schweizer Waldböden, oral presentation
at the Annual Meeting of the Swiss Soil Science Society, Bellinzona, Switzerland,
2012a.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping organic car-
bon stocks of Swiss forest soil, Geophysical Research Abstracts, 14, EGU2012–4753,
2012b.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping organic car-
bon stocks of Swiss forest soils, poster presentation at 4th International Congress of
the European Soil Science Societies (Eurosoil 2012), 2012c.

lxi
A. Appendix

Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping Organic Car-
bon Stocks of Swiss Forest Soils, oral presentation at ASA, CSA & SSSA International
Annual Meeting, Cincinatti, USA, 2012d.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping of Organic
Carbon Stocks in Swiss Forest Soils, oral presentation at Informal Workshop ”Esti-
mating soil carbon stocks and associated uncertainties at field, regional and national
scales”, ETH Zurich, 2013a.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Modellierung organ-
ischer Kohlenstoffvorräte im Schweizer Waldboden, oral presentation at Workshop
”Boden-Projekte im Nationalen Treibhausgasinventar”, Federal Office for the Envi-
ronment FOEN, Bern-Ittigen, 2013b.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping Soil Organic
Carbon Stocks of Swiss Forest Soils by Robust External-drift Kriging, oral presenta-
tion at Soil Science Zvieri Geographical Institute University of Zürich (GIUZ), Zürich,
Switzerland, 2013c.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Modellierung des or-
ganischen Kohlenstoffgehalts und -vorrats in Schweizer Waldböden, oral presentation
at the Annual Meeting of the Swiss Soil Science Society, Zürich, Switzerland, 2013d.
Nussbaum, M., Papritz, A., and Walthert, L.: Predicting density of Swiss forest soils
by component wise gradient boosting, oral presentation at 11th Swiss Geoscience
Meeting, Lausanne, Switzerland, 2013e.
Nussbaum, M., Papritz, A., and Walthert, L.: Modeling of soil density as a pedotransfer
rule for Swiss forest soils, oral presentation at the workshop of the Working Group
Digital Soil Mapping of the German Soil Science Society, Tübingen, Germany, 2013f.
Nussbaum, M., Papritz, A., and Walthert, L.: Modeling of soil density as a pedotransfer
rule for Swiss forest soils, oral presentation at the workshop Digital Mapping and
Water Storage of Forest Soils, 27 June 2013, WSL, Birmensdorf, 2013g.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Digitale Bodeneigen-
schaftskarten mit Boosted Structured Regression am Beispiel des organischen Kohlen-
stoffvorrats im Schweizer Wald, oral presentation at the workshop of the Working
Group Digital Soil Mapping of the German Soil Science Society, Tübingen, Germany,
2014a.
Nussbaum, M., Papritz, A., Walthert, L., and Baltensweiler, A.: Modellierung or-
ganischer Kohlenstoffvorräte im Schweizer Wald, oral presentation at the ”Kollo-
quium Forschungseinheit Waldressourcen und Waldmanagement”, WSL Birmensdorf,
Switzerland, 2014b.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Keller, A.: Pre-
dictive mapping of soil properties at high resolution by component wise gradi-
ent boosting, Geophysical Research Abstracts, 17, EGU2015–2106, URL http:
//meetingorganizer.copernicus.org/EGU2015/EGU2015-2106.pdf, 2015.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Grob, U.: Kartierung von
Bodeneigenschaften mit statistischen Methoden — Fallstudie Landwirtschaftsfläche
Kanton Zürich, oral presentation at the Annual Meeting of the Swiss Soil Science
Society, Geneva, Switzerland, 2016a.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Grob, U.: Bodenkarten

lxii
A.4. List of publications

mit statistischen Methoden – Cartes pédologiques avec méthodes statistiques, oral


presentation at ”Aussprache zum Bodenschutz in der Schweiz” of the Swiss Federal
Office of the Environment, Berne, Switzerland, 2016b.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Keller, A.: Soil maps
by statistical methods, oral presentation at the IBP PhD Congress 2016, Zürich,
Switzerland, 2016c.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., Walthert, L., Keller, A.,
Grob, U., and Diek, S.: Predictive mapping of soil properties at high resolution by
component wise gradient boosting from legacy data, oral presentation at 7th Global
Digital Soil Mapping Workshop, Aarhus University, Denmark, 2016d.
Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., Schaep-
man, M. E., and Papritz, A.: Evaluation of statistical approaches with large sets of
covariates, oral presentation at Pedometrics 2017 Conference, Wageningen, Nether-
lands, 2017.

Poster presentations

Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping of Organic
Carbon Stocks in Swiss Forest Soils, poster presentation at ITES Research Day 2012,
ETH Zurich, 2012.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping soil or-
ganic carbon stocks by geostatistical and boosted regression models, in: Geophys-
ical Research Abstracts, vol. 15, pp. EGU2013–9287, European Geosciences Union,
General Assembly 2013, Vienna, URL http://meetingorganizer.copernicus.org/
EGU2013/EGU2013-9287.pdf, 2013a.
Nussbaum, M., Papritz, A., Baltensweiler, A., and Walthert, L.: Mapping Soil Organic
Carbon Stocks by Geostatistical and Boosted Regression Models, poster presentation
at ITES Research Day 2013, ETH Zurich, 2013b.
Nussbaum, M., Papritz, A., Fraefel, M., and Baltensweiler, A.: Predictive mapping of
soil pH in forests of Zurich by component wise gradient boosting, poster presentation
at the 12th Swiss Geoscience Meeting, Fribourg, Switzerland, 2014.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Gasser, U.: pH-Wert von
Waldböden im Kanton Zürich — räumliche Vorhersage mit statistischen Methoden,
poster Presentation at the Annual Meeting of the Swiss Soil Science Society, Basel,
Switzerland, 2015a.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Gasser, U.: pH-Wert von
Waldböden im Kanton Zürich — räumliche Vorhersage mit statistischen Methoden,
poster presentation at the workshop of the Working Group Digital Soil Mapping of
the German Soil Science Society, Tübingen, Germany, 2015b.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Gasser, U.: Acidic forest
soils — where exactly?, poster presentation at the IBP PhD Congress 2015, Düben-
dorf, Switzerland, 2015c.
Nussbaum, M., Papritz, A., Fraefel, M., Baltensweiler, A., and Keller, A.: Acidic forest
soils — where exactly? Generating spatial soil information by component wise gra-

lxiii
A. Appendix

dient boosting, poster presentation at Pedometrics 2015 Conference, Córdoba, Spain,


2015d.
Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., Schaep-
man, M. E., and Papritz, A.: Digital soil mapping with a large number of covariates,
poster presentation at the IBP PhD Congress 2017, Dübendorf, Switzerland, 2017.

Other reports and publications

Chervet, A., Hofer, P., Maurer-Troxler, C., Nussbaum, M., Ramseier, L., Schwarz,
R., Sturny, W. G., and Trachsel, P. et al.: Bodenbericht 2009, Tech. rep., Volk-
swirtschaftsdirektion des Kantons Bern, 2009.
Haller, R., Schmidt, R., Nussbaum, M., and Wallner, A.: Geoinformation und Informa-
tionsmanagement in Parks und Parkprojekten in der Schweiz, Vorabklärung für den
Aufbau eines Data Warehouse unter Berücksichtigung der Geoinformationsdaten für
Pärke von nationaler Bedeutung im Auftrag des BAFU. Schlussbericht, Tech. rep.,
Forschungskommission des Schweizerischen Nationalparks, 2008.
Nussbaum, M.: Vorstudie zur Wirkungsbeurteilung des Förderprogramms ”Boden” des
Kantons Bern, durch Analyse der kantonalen Bodenbeobachtung Bern, Certificate
of advanced studies, Diplomarbeit, Institut für mathematische Statistik und Ver-
sicherungslehre der Universität Bern, 2014.
Nussbaum, M.: geoGAM: Select Sparse Geoadditive Models for Spatial Prediction, URL
https://CRAN.R-project.org/package=geoGAM, R package version 0.1-2, last access
29.03.2017, 2017.
Nussbaum, M. and Papritz, A.: Transferfunktionen Nährstoffmesswerte, Bericht, ETH
Zürich, Soil and Terrestrial Environmental Physics, doi:10.3929/ethz-a-010810702,
Version 2, mit kl. Änderung 27. Nov. 2016, 2015a.
Nussbaum, M. and Papritz, A.: Transferfunktion Dichte, Bericht, ETH Zürich, Soil and
Terrestrial Environmental Physics, 2015b.
Nussbaum, M. and Papritz, A.: PTF pH H2 O zu CaCl2 , Vergleich Feld-Labormessungen,
Bericht, ETH Zürich, Soil and Terrestrial Environmental Physics, 2015c.
Nussbaum, M. and Papritz, A.: Validierung von konventionellen Bodenkarten mit un-
abhängigen Bodendaten – Methodik mit Fallstudie, unpublished, 2017.
Nussbaum, M., Ettlin, L., Cöltekin, A., Suter, B., and Egli, M.: The Relevance of Scale
in Soil Maps, Bulletin BGS, 32, 63–70, 2011.

lxiv

You might also like