Professional Documents
Culture Documents
1547
Using a 10-Year Radar Archive for Nowcasting Precipitation Growth and Decay:
A Probabilistic Machine Learning Approach
DANIELE NERINI
Federal Office of Meteorology and Climatology, MeteoSwiss, Locarno-Monti, and Institute for
Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland
LEA BEUSCH
Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland
URS GERMANN
Federal Office of Meteorology and Climatology MeteoSwiss, Locarno-Monti, Switzerland
ABSTRACT
Machine learning algorithms are trained on a 10-yr archive of composite weather radar images in the Swiss
Alps to nowcast precipitation growth and decay in the next few hours in moving coordinates (Lagrangian
frame). The hypothesis of this study is that growth and decay is more predictable in mountainous regions,
which represent a potential source of practical predictability by machine learning methods. In this paper,
artificial neural networks (ANN) are employed to learn the complex nonlinear dependence relating the
growth and decay to the input predictors, which are geographical location, mesoscale motion vectors, freezing
level height, and time of the day. The average long-term growth and decay patterns are effectively reproduced
by the ANN, which allows exploring their climatology for any combination of predictors. Due to the low
intrinsic predictability of growth and decay, its prediction in real time is more challenging, but is substantially
improved when adding persistence information to the predictors, more precisely the growth and decay and
precipitation intensity in the immediate past. The improvement is considerable in mountainous regions, where,
depending on flow direction, the root-mean-square error of ANN predictions can be 20%–30% lower compared
with persistence. Because large uncertainty is associated with precipitation forecasting, deterministic machine
learning predictions should be coupled with a model for the predictive uncertainty. Therefore, we consider a
probabilistic perspective by estimating prediction intervals based on a combination of quantile decision trees and
ANNs. The probabilistic framework is an attempt to address the problem of conditional bias, which often
characterizes deterministic machine learning predictions obtained by error minimization.
1. Introduction
Forecasting precipitation in the very short range (0–2 h)
commonly relies on extrapolation-based nowcasting tools
Denotes content that is immediately available upon publica- that exploit the persistence of the most recent weather
tion as open access. radar observations (see e.g., Germann and Zawadzki
2002). In this time range, many critical decisions are taken
Supplemental information related to this paper is available at to ensure people’s safety (e.g., closing of train lines sus-
the Journals Online website: https://doi.org/10.1175/WAF-D-18- ceptible to debris flow, optimization of airport operations,
0206s1. and evacuation of vulnerable construction zones; see e.g.,
Germann et al. 2017). Because the costs related to such
Corresponding author: Loris Foresti, loris.foresti@meteoswiss.ch interruptions are high, these activities need to remain
DOI: 10.1175/WAF-D-18-0206.1
Ó 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright
Policy (www.ametsoc.org/PUBSReuseLicenses).
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1548 WEATHER AND FORECASTING VOLUME 34
operational despite the warnings of severe weather is- 3) Model uncertainty related to the assumption of
sued by forecasters one or more days ahead. A short persistence of the atmospheric state. This comprises
interruption is considered only when the probability of the unknown future evolution of
occurrence or potential damage of a localized hazard is (i) the precipitation intensity (i.e., its initiation,
very high. To obtain the best possible prediction skill in growth, decay, and termination),
the 0–2-h range, one cannot solely rely on numerical (ii) the motion field, and
weather prediction (NWP) but must also use the avail- (iii) the statistical properties of precipitation fields
able observations in a more direct way. Therefore, im- (e.g., spatial and temporal autocorrelations,
provements of extrapolation-based nowcasting tools can Fourier spectra, degree of intermittency, and
have an important impact on these activities. probability density function).
a. Sources of uncertainty in persistence-based The uncertainty of radar-based quantitative precipi-
nowcasting of radar precipitation fields tation estimation (QPE; point 1 above) can be an im-
portant source of nowcast errors in the first hour (e.g.,
Sequences of radar precipitation fields exhibit per-
Fabry and Seed 2009). A common approach to charac-
sistence in both the Eulerian and the Lagrangian frame,
terize this uncertainty is to generate QPE ensembles
where the latter assumes persistence in the coordinates
(e.g., Germann et al. 2009).
moving with the storm (e.g., Zawadzki 1973; Germann
For a well-designed nowcasting system starting from a
and Zawadzki 2002). The forecasting procedure based
good quality radar QPE product, the main source of
on Lagrangian persistence involves using an optical flow
uncertainty beyond a lead time of about 30 min arises
method to estimate a field of radar echo motion and
from precipitation initiation, growth, decay, and termi-
applying an advection scheme to produce an extrapo-
nation processes that violate the persistence assumption
lation nowcast (e.g., Germann and Zawadzki 2002).
[point 3(i) above, Bowler et al. 2006; Germann et al. 2006].
The uncertainty of radar-based extrapolation now-
The focus of our study is on the predictability of
casts can be categorized into the following main classes
growth and decay (GD), which is the first time derivative
(adapted from Germann et al. 2006):
of a radar precipitation time series in the Lagrangian
1) Initial condition uncertainty related to radar mea- frame (see Fig. 1). Precipitation initiation and termina-
surement errors. In a well maintained and calibrated tion processes are not considered in this study.
radar network, the main error sources are related to
b. Predictability of precipitation growth and decay
the space–time variability of the vertical profile of
reflectivity (VPR) and the Z–R relationship, partial In this paper, the term predictability refers to the
and total beam blockage, and signal attenuation (e.g., practical predictability of the atmosphere, defined as the
Villarini and Krajewski 2010). The uncertainty also extent to which a forecasting technique, in our case, an
includes spatial and temporal sampling errors of the extrapolation-based or machine learning-based method,
radar measurements, which may affect the estimation provides useful prediction skill (see e.g., Lorenz 1996;
of the radar echo motion (see point 2). Surcel et al. 2015).
2) Model uncertainty related to imperfections of the The precipitation GD can be decomposed into a pre-
nowcasting model and of the selection of model dictable and unpredictable component. Most ensemble
parameters. The main errors are due to inaccuracies nowcasting systems do not attempt to predict the GD trend
of the algorithm for the motion field retrieval, the (i.e., the predictable part), but only generate stochastic en-
choice of model parameters and, to a lesser degree, semble members as a way to estimate the forecast uncer-
the numerical diffusion of the advection scheme. tainty. Examples of nowcasting systems exploiting the latter
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1549
strategy are the Short-Term Ensemble Prediction System (ANN) architectures (e.g., Fukushima and Miyake
(STEPS; Bowler et al. 2006; Seed et al. 2013), the String of 1982; Hinton et al. 2006; Goodfellow et al. 2016). In
Beads Model for Nowcasting (SBMcast; Berenguer et al. parallel, better training can be obtained by stochastic
2011) and the stochastic extension of the McGill Algorithm optimization routines (e.g., Kingma and Ba 2015), while
for Precipitation Nowcasting by Lagrangian Extrapolation the vanishing gradient problem can be mitigated by using
(MAPLE; Atencia and Zawadzki 2014). different activation functions [e.g., the rectified linear unit
Radhakrishna et al. (2012) studied the scale depen- (ReLU); Glorot et al. 2011). For a historical overview on
dence of the predictability of growth and decay fields by deep learning, we refer to Schmidhuber (2015).
Lagrangian persistence using data from the U.S. na- To our knowledge, the first study that tested the usage
tional radar composite. Results show that precipitation of ANNs for precipitation nowcasting is by French et al.
fields are much more persistent than GD fields, which (1992). The authors trained an ANN to predict the evo-
explains in part why previous attempts of predicting the lution of synthetic rainfall fields, but did not find signifi-
trend of thunderstorm intensity did not significantly cantly higher skill compared to Lagrangian persistence.
improve the forecast skill (e.g., Tsonis and Austin 1981; Grecu and Krajewski (2000) went a step further by
Wilson et al. 1998). More precisely, Radhakrishna et al. separating the prediction problem into two steps: the
(2012) found that GD patterns are persistent up to a lead estimation of the radar echo motion and the use of ANN
time of 2 h but only for scales larger than 250 km over for statistical prediction of the dynamic precipitation
the continental United States. changes (GD). Using radar data from Tulsa, Oklahoma,
The hypothesis underpinning our study is that pre- they did not find a substantial improvement compared to
cipitation GD is more predictable in mountainous re- Lagrangian persistence either.
gions, which represent a potential source of practical Given the increasing size of radar data archives
predictability. Compared to the flat continental United (Carbone et al. 2002; Fabry et al. 2017; Peleg et al. 2018),
States, the predictable spatial scales are expected to be it becomes now possible to study the dependence of the
smaller over orography (see e.g., Foresti and Seed 2015; predictability of GD on spatial location, time of day,
Foresti et al. 2018). orography and flow conditions. In short, there is po-
tential to better understand, predict and correct the
c. Machine learning applications in weather
forecast error of persistence-based nowcasts.
forecasting
d. Objectives of this study
First models for statistical weather prediction appeared
in the 1950s (e.g., Malone 1955; Lorenz 1956). Machine The aim of this study is to use machine learning to bring
learning (ML) deals with similar statistical tasks (e.g., precipitation nowcasting beyond the assumption of La-
classification and regression), but focuses on design- grangian persistence. The current paper completes the
ing flexible algorithms that maximize predictive power work of Foresti et al. (2018), who used the same 10-yr
(Breiman 2001b). Statistical weather forecasting with archive of radar composite fields in the Swiss Alpine re-
machine learning started in the early 1990s (e.g., McCann gion to derive a climatology of precipitation GD de-
1992; Kuligowski and Barros 1998; Hall et al. 1999) and pending on geographical location, freezing level height,
became more widespread after the year 2000 (see reviews mesoscale flow direction, and speed (input predictors).
by Haupt et al. 2009; McGovern et al. 2017). The domains The first goal of this study is to use ML, more precisely
of application include, among others, the processing of artificial neural networks, to automatically learn the
remote sensing observations (e.g., Marzban and Witt localized dependence of precipitation GD on the input
2001; Foresti et al. 2012; Besic et al. 2016; Beusch et al. predictors.
2018), NWP postprocessing (e.g., Kretzschmar et al. 2004; The second goal is to estimate the relative importance
Taillardat et al. 2016; Gagne et al. 2017; Rasp and Lerch of input predictors, and evaluate whether the machine
2018), nowcasting and short-range forecasting (e.g., learning nowcasts of GD can outperform a reference
Manzato 2005; Mecikalski et al. 2015; Han et al. 2017; model based on persistence.
Sprenger et al. 2017; Ukkonen et al. 2017). The third goal is to extend ML to give an indication of
Machine learning surged in popularity in recent years the forecast uncertainty. This is achieved by computing
thanks to various advances in computer hardware and prediction intervals using a combination of ANN and
algorithms. Processing of large datasets was made pos- decision trees (DT).
sible by the increase in computer memory, storage, and In this paper, we do not attempt to achieve the best
network capabilities. A notable example is the graphics possible predictive performance using the most advanced
processing unit (GPU) technology, which allows train- machine learning methods, but instead we aim to better
ing deeper and more complex artificial neural network understand the implications of the machine learning
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1550 WEATHER AND FORECASTING VOLUME 34
FIG. 2. Map of the study domain and the location of weather radars (LEM: Lema, ALB:
Albis, DOL: D^ ole, PPM: Plaine Morte, WEI: Weissfluhgipfel). The radars covering the
dataset period are displayed in red (LEM, ALB, DOL). Two example precipitation boxes at
the origin and destination are also shown.
approach as a whole. In particular, we focus on the con- is the local rate of change, and (u, y) is the flow vector.
sequences of error minimization and the importance of If we set the source/sink term to zero, we can derive
uncertainty quantification in the context of weather a Lagrangian persistence nowcast by advecting the
forecasting. rainfall R along the motion field (see e.g., Germann and
Zawadzki 2002):
e. Outline of the paper
R(t 1 t, s) 5 R(t, s 2 a), (2)
The paper is structured as follows. Section 2 formulates
the statistical learning framework for nowcasting. Section 3 where t is the forecast lead time, s 5 (X, Y) is the
describes the precipitation growth and decay dataset. spatial location (coordinates), and a 5 (u, y) t is the
Section 4 briefly reviews the used machine learning algo- displacement vector. In presence of GD (dR/dt 6¼ 0),
rithms. Section 5 illustrates the prediction results and their (2) becomes
verification. Section 6 introduces the probabilistic machine
learning framework and a new method to quantify the R(t 1 t, s) 5 R(t, s 2 a) 1 GD, (3)
prediction uncertainty. Probabilistic GD predictions are
shown and verified in section 7. Finally, sections 8 and 9 put where GD is the growth and decay term in moving co-
the contributions into perspective and conclude the paper. ordinates (see Figs. 1 and 2).
b. Nowcasting growth and decay with machine
2. Statistical nowcasting frameworks learning
a. Nowcasting by persistence Instead of relying only on a short sequence of radar
fields and assuming persistence of GD (Radhakrishna
Nowcasting by Lagrangian persistence can be for- et al. 2012), a machine learning approach potentially al-
mulated starting from the two-dimensional conservation lows recurrent patterns to be learned from historical ar-
equation by neglecting the compressibility term (Germann chives to be then applied for prediction.
and Zawadzki 2002): Let us define y as the target variable that we seek
dR ›R ›R ›R to predict (i.e., y 5 GD). The statistical prediction
5 1u 1y , (1) of GD involves the estimation of a function f as
dt ›t ›x ›y
follows:
where R is the rainfall intensity (or radar reflectivity),
dR/dt is the source/sink term (growth and decay), ›R/›t y(t 1 t, s) 5 f [y(t, s 2 ay ), x(t, s 2 ax )] 1 « , (4)
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1551
where y(t, s 2 ay) is the current GD value, x(t, s 2 ax) Monte Lema, Albis, and La D^ ole, which were com-
is a vector of external predictors, and « is the noise term pletely renewed and upgraded to dual-polarization in
(i.e., the unpredictable part). If ay 6¼ 0 the values are 2011 and 2012. The radar network was extended with 2
retrieved upstream (Lagrangian frame), while if ay 5 0 new radars in 2014 and 2016, respectively (see Germann
they are retrieved at the same location (Eulerian frame); et al. 2017). Composite radar images have a spatial
the same can be applied to ax. Note that one can also resolution of 1 km and a temporal resolution of 5 min.
use a sequence of k previous y and x values, as well as The preparation of the precipitation GD dataset is
their temporal increments [e.g., x0 5 x(t, s) 2 x(t 2 1, s)]. described in Foresti et al. (2018). In summary, the pro-
The sequence of previous GD values exploits persis- cedure involves the following steps:
tence and represents an endogenous variable, while the
1) Estimating fields of radar echo motion using the
external predictors are exogenous variables. Foresti
MAPLE variational echo tracking (Germann and
et al. (2018) provides a comprehensive analysis of the
Zawadzki 2002).
dependence of GD on external predictors over the Swiss
2) Calculating backward trajectories of radar echoes.
Alps, such as the freezing level height, the flow direction,
3) Defining a regular grid of overlapping boxes of a
and the geographical location.
given size.
In this study, we will train ANNs to predict the GD
4) Computing the mean areal precipitation (MAP;
of the next hour (t 5 1 h) and perform the following
mm h21) for each box using the destination location
experiments:
at time t 5 t 1 t and the origin located upstream at
1) y(t 1 t, s) 5 y(t, s), time t following the trajectories.
2) y(t 1 t, s) 5 f[y(t, s), s],
Note that the methods of points 1 and 2 are available in
3) y(t 1 t, s) 5 f[x(t, s 2 a), s], and
the open-source python library ‘‘pysteps’’ (Pulkkinen
4) y(t 1 t, s) 5 f[y(t, s), x(t, s 2 a), s].
et al. 2019, manuscript submitted to Geosci. Model Dev.
In summary, experiment 1 assumes Eulerian persistence Discuss.). As in Foresti et al. (2018), we selected a lead
of the GD, experiment 2 uses ML to improve the per- time of t 5 1 h and a box size of 64 3 64 km2, which are
sistence nowcast based on the archive and the spatial regularly distributed on an 8-km resolution grid. Through-
location, experiment 3 uses ML to predict the GD using out the paper, all the machine learning predictions and
only the set of exogenous predictors, and experiment 4 verification are done at these spatial and temporal res-
uses ML to predict the GD using both exogenous and olutions. Note that in addition to the MAP, one can also
endogenous predictors. derive other precipitation statistics, such as the fraction of
Experiment 3 is actually a reformulation of the strat- wet pixels.
ified climatology of Foresti et al. (2018) as a supervised As explained in section 2, the quantity of interest
statistical learning problem. In fact, the conditional ex- for nowcasting is the precipitation GD, which is defined
pectation of GD for a subset of weather conditions here as the multiplicative difference between the MAP
characterized by the predictors x is given by at the origin and destination location (Foresti et al. 2018):
1
E(yjx 2 Xv ) 5 å y 2 Yv
N i i GD 5 y 5 10 log10
MAPd 1 c
(dB), (6)
MAPo 1 c
limE(yjx 2 Xv ) 5 f (yjx) 5 f (x) 1 «, (5)
v/0
where c 5 0.01 is a small constant offset to avoid the
where v is the width of the stratification interval, Xv is division by 0, MAPd 5 MAP(t 1 1, s) is the MAP of
the set of predictor values that fall within that interval, the destination box at time t 1 1 and location s and
and Yv is the corresponding set of GD values. As will be MAPo 5 MAP(t, s 2 a) is the MAP of the origin box
shown later, it is also possible to compute various mo- located upstream following the displacement vector a at
ments of the distribution of y. In other words, ML offers time t (see Fig. 2). A backward-in-time semi-Lagrangian
the opportunity to estimate the marginal statistics of a scheme is used to advect the origin box to the destina-
variable y for an infinitesimally small interval width. tion by using all the 5-min motion fields. This procedure
helps to isolate the GD error from the one due to the
nonstationary motion (Germann et al. 2006).
3. The radar precipitation growth and decay
The multiplicative formulation of GD has two ad-
dataset
vantages: 1) it makes the distribution of GD symmetric
The radar archive covers the 10-yr period 2005–14 and around 0 and close to normal, and 2) it reduces the
comprises data from the Swiss C-band radars located at correction of a persistence nowcast to a summation:
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1552 WEATHER AND FORECASTING VOLUME 34
TABLE 1. Structure of data archive for training the machine learning algorithms. On the left is the set of input predictors and on the right
the output predictand(s). In a real-time application the destination location [Xd, Yd] is found by assuming stationarity of the motion
vectors [U, V] during the nowcast.
10 log10 (MAPpred
d ) 5 10 log10 (MAPo ) 1 GDpred (dB). GD is related to the variability of radar coverage over
the Swiss Alps. This was the main motivation for the
(7)
installation of the two new radars in the Valais (PPM)
The radar archive was extended to contain the freez- and the Grisons (WEI). For a more detailed discussion
ing level height (HZT), which was extracted from the on radar data uncertainty and GD, we refer to Foresti
hourly analyses of the COSMO NWP model (Baldauf et al. (2018).
et al. 2011). In fact, Foresti et al. (2018) found that the
spatial distribution of GD depends on HZT, which 4. Machine learning algorithms and training
constitutes a useful proxy of the air stability.
An additional forcing variable that could potentially Supervised machine learning provides flexible algorith-
control the GD is the diurnal cycle (see e.g., Mandapaka mic tools to solve tasks such as robust nonlinear classifi-
et al. 2013; Atencia et al. 2017; Fabry et al. 2017). The cation and regression of data in high-dimensional spaces
time of the day, h 2 R: 0 # h , 24, is a circular variable. (Haykin 1998; Breiman 2001a; Goodfellow et al. 2016).
To ensure the continuity of the predictor at midnight, Compared with traditional statistical data models, the
the hour of the day h was coded using the following two algorithmic approach is fully nonparametric, that is, it
variables: does not require making strong assumptions about the
data distribution (e.g., Gaussianity) or the form of sta-
Dsin 5 sin(2ph/24), Dcos 5 cos(2ph/24) . (8) tistical dependency between variables (e.g., linearity).
Instead, it is designed to maximize prediction skill while
The season could be characterized in a similar way. being robust to the curse of dimensionality (see e.g.,
However, it was found that seasonal variability is Breiman 2001b).
better explained by the freezing level height (Foresti
a. Artificial neural networks
et al. 2018).
The structure of the data archive is presented in Table 1. In this study, we used a feedforward artificial neural
The main target variable is the GD term, although it is also network model known as multilayer perceptron (MLP).
possible to directly predict the MAPd(t 1 1). We decided to The MLP architecture is composed of one input layer,
classify MAPo as an endogenous predictor since it con- one or more hidden layers, and one output layer, whose
tributes to the definition of GD [see (6)]. Note that the neurons are connected by synaptic weights (see an ex-
predictor GDd(t) is in Eulerian coordinates (i.e., it is at ample in Fig. 3). The number of neurons in the input
the same spatial location). In fact, in the Alpine region the layer is equal to the number of input predictors. The
orographic forcing generates precipitation, and conse- output layer usually contains one single neuron with the
quently GD patterns, that can remain persistent on the target variable to predict (predictand). Alternatively, a
same location for several hours (e.g., Panziera et al. 2011). multioutput MLP can be designed for joint predic-
Therefore, we did not perform experiments using the tion of multiple target variables, as will be explained in
Lagrangian GD [i.e., GDo(t)]. section 6b. The hidden layer(s) contain a set of neurons
Over the 10-yr period, we collected more than performing a nonlinear transformation (activation) of
21 million boxes (samples) with precipitation at both the the weighted linear summation of values coming from
origin and destination. Cases of precipitation initiation the input neurons.
and termination are discarded to simplify the learning The MLP training consists of an iterative optimization
problem. In fact, the choice of predictors was not tar- of the network weights to minimize the error between
geted for nowcasting the initiation of convective cells. the predicted and the target values in the output neuron.
Finally, it is important to mention that, despite using a In this study, we used the mean square error (MSE,
high-quality radar rainfall product, a certain fraction of L2-norm). Given the nonconvexity of the error function
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1553
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1554 WEATHER AND FORECASTING VOLUME 34
FIG. 4. ANN predictions of the mean GD with different flow directions and freezing level heights for a fixed flow speed of 30 km h21.
NW flows with (a) HZT at 1500 m and (b) HZT at 4000 m. SW flows with (c) HZT at 1500 m and (d) HZT at 4000 m.
practical consequences for the verification of the ANN, where r is the PCORR between predictions and obser-
and the comparison with forecast systems having a vations, and spred and sobs are the corresponding stan-
larger variance (e.g., an extrapolation nowcast), which dard deviations. This type of conditional bias occurs when
preserves the variance of the observations. Therefore, the conditional expectation of the predictions depends on
one should be cautious when comparing extrapolation- the observations, which is often a consequence of pre-
based with machine learning–based nowcasts. dictions with lower variance.
The mentioned issues could be overcome either by
normalizing the MSE or by comparing the two systems at 5. ANN predictions of growth and decay,
the same spatial frequencies (e.g., by low-pass filtering the verification, and predictability
persistence nowcast; Seed 2003; Turner et al. 2004). The a. Nowcasting of the mean growth and decay
latter, however, is not directly applicable to our problem
because of the intermittency of GD fields, which arises Figure 4 serves as illustrative example and shows
from the conditionality criterion (precipitation at both four prediction maps of the mean growth and decay.
origin and destination). For demonstration purposes we used an ANN model
Therefore, in this paper we choose to compare the with 5 input predictors [X, Y, U, V, HZT]. The
root-mean-square error (RMSE) to the Pearson corre- trained ANN was asked to predict the GD fields for
lation coefficient (PCORR) and the regression slope two different flow directions (SW, NW) and two differ-
b to gain further insight into how the bias-variance ent HZT (1500, 4000 m MSL) for a fixed flow speed of
trade-off manifests itself in our problem. In our case, 30 km h21.
b measures the degree of type 2 conditional bias, that is, The prediction maps reproduce well known GD
with respect to observations (Murphy 1995; Potts 2012): patterns in the Swiss Alps. As expected, precipitation
growth is generally located on the northern slopes of the
spred
b5 r, (9) Alpine chain with NW flows, while with SW flows it is
sobs located on the southern side. A notable exception is the
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1555
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1556 WEATHER AND FORECASTING VOLUME 34
FIG. 6. 2D histograms of long-term averages of observed and predicted GD. The example of Fig. 5 with SW flows is shown
in the second row, third column. The RMSE, regression slope b, and PCORR are computed by comparing the GDobs to
the GDpred .
accuracy and performance to indicate forecasts with low spatial coordinates (one at a time). Out of the three, the
RMSE and high PCORR, the latter being a measure of [U, V] vectors have the greatest impact on the prediction
potential skill (Murphy 1995). performance, which is another evidence of the strong
The first row in Fig. 7 is a two-input ANN model dependence of precipitation GD on flow direction and
using only the geographical coordinates as pre- speed in the Alpine region (see Foresti et al. 2018). Out
dictors and has a RMSE of 2.8 dB and a PCORR of of the three, the time of the day has the least predictive
0.21 on the test set. This experiment gives an esti- power.
mation of the baseline performance when no addi- Rows 5–6 reveal that combining all external pre-
tional information beyond the spatial location is dictors, with and without time of the day, leads to neg-
available. ligible differences. The PCORR is around 0.29–0.30,
Rows 2–4 illustrate the results by adding the time of and the RMSE is 2.72–2.73 dB. As expected, the re-
the day, the HZT, and the [U, V] flow vectors to the duction of RMSE is not as substantial as the increase of
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1557
FIG. 7. Analysis of the importance of external predictors to predict GD. Verification of ANN predictions on the
training, validation, and test sets using the RMSE and the PCORR. On the y-axis there is the list of input predictors
used, which are sorted from the top by increasing forecast quality.
PCORR, which can be attributed to the large contri- it is to decay and regress toward smaller values. The same
bution of the variance to the RMSE. applies to low MAP values but in the reverse sense.
Given the small contribution of the time of day to Starting from these findings, we will answer the fol-
prediction skill, in the following we will only work with lowing questions:
the predictors [X, Y, U, V, HZT].
1) Do the machine-learning-based predictions provide
Using the same set of predictors, we compared the pre-
better performance than assuming persistence?
dictive performance of the ANN, DT, and random forests.
2) What is the impact of adding persistence information
The skill being very similar, we decided to only include the
to the set of external predictors?
results of ANN, as it also provides the most realistic growth
3) What is the impact of conditioning the predictions to
and decay fields in terms of spatial continuity.
the mean areal precipitation at the origin?
d. Application to nowcasting: Can machine learning
Figure 9 illustrates the results of the experiments de-
improve beyond the persistence assumption?
signed to answer these three questions.
Figure 8 employs 2D histograms to analyze the persis- Row 1 shows the verification statistics by assuming
tence of both the precipitation (MAP) and the GD using Eulerian persistence of the GD values (same as Fig. 8c
the whole 10-yr archive. Figures 8a and 8b show that the but for the training, validation and test sets). The
MAP is more persistent in the Lagrangian than in the PCORR of the different sets is in the range 0.28–0.30,
Eulerian frame with a PCORR of 0.78 and 0.66, respec- while the RMSE is quite large and around 3.4–3.5 dB.
tively. Figure 8c shows that the Eulerian persistence of GD Row 2 depicts the base machine learning model using
only has a PCORR of 0.28, much lower than the one of the set of 5 external predictors (same as row 5 in Fig. 7
MAP (see also Radhakrishna et al. 2012). The Eulerian but for a different training run). The PCORR of ma-
persistence of GD reflects the stationary character of GD chine learning is only slightly larger than the one of
patterns over orography, which can be exploited to im- Eulerian persistence, but the RMSE is substantially
prove the prediction performance of the ANN. smaller (2.7 dB). This lower RMSE partly arises from
Finally, in Fig. 8d we can observe a relationship be- the smoother machine learning predictions. Without con-
tween MAPo and GDd, which reveals an effect of re- sidering PCORR, we would falsely conclude that machine
gression to the mean (Barnett et al. 2005; Pitkänen et al. learning provides much better accuracy than Eulerian
2016). In essence, the larger a given MAP, the more likely persistence (question 1). It is worth pointing out that these
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1558 WEATHER AND FORECASTING VOLUME 34
FIG. 8. 2D histograms to analyze the persistence of MAP and GD as well as their dependence.
(a) Eulerian persistence of MAP, (b) Lagrangian persistence of MAP, (c) Eulerian persistence of GD, and
(d) MAP o vs GD. The regression line is obtained by a classical ordinary least squares fit with errors only in
the y variable.
statements are only valid at the analyzed spatial (64 km) MAP o (as shown in Fig. 8d), but the effect of re-
and temporal scales (1 h). gression to the mean questions whether the in-
Row 3 shows an experiment using the spatial coordi- creased accuracy is real or merely a statistical
nates and the current GD as predictors, which reaches a artifact. This statistical property also has practical
PCORR of 0.34–0.35. Hence, learning the localized de- implications for operational nowcasting and warn-
pendence structure of the GD based on the radar archive ings since using the MAP o as predictor will have
seems to be better than the persistence assumption. tendency to reduce the high MAP values and thus
Row 4 helps answer question 2 by using both the ex- miss the extreme events. In such setting, it becomes
ternal predictors and the current GD to also exploit its essential to perform probabilistic predictions to al-
persistence. Surprisingly, the PCORR is around 0.37– low the GD to increase with a certain probability,
0.38, which is substantially higher than using either also when starting at high MAP values. This can be
persistence (0.28–0.30) or the set of external predictors achieved by using prediction intervals as will be ex-
(0.30–0.31). Thus, we can enhance a persistence nowcast plained in section 6.
of GD by learning from the historical radar archive in Finally, row 6 includes all the external and endoge-
combination with external predictors. nous predictors, which brings the PCORR beyond 0.5
Row 5 answers question 3 by analyzing the impact and the RMSE below 2.5 dB. This is a remarkable
of using MAP o as an additional predictor. The in- performance considering the fact that we are predict-
crease in PCORR to 0.44–0.45 and decrease of ing the first time derivative of moving precipitation
RMSE benefits from the dependence of GD with fields.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1559
FIG. 9. Analysis of the impact of adding endogenous predictors to predict GD. MAPo is given in dBR units
[10 log10(MAPo)].
e. Is it better to predict the growth and decay or to never degrades the persistence-based nowcast. Over
directly predict the precipitation intensity? orography, the reduction is up to 15–20% in the regions
of growth and 20–30% in the regions of decay.
This section compares the following two settings:
Row 5 in Fig. 10 tests the effect of introducing MAPo
1) Using ANN to predict the GD and derive the MAPd as a predictor. The RMSE is reduced further and the
(dB) as MAPo (dB) 1 GDpred (dB) [see (7)]. PCORR rises to 0.82–0.83. In this case, the skill with
2) Using ANN to directly predict the MAPd (dB) using respect to Lagrangian persistence is ’14% for the
the MAPo (dB) and the set of external predictors. RMSE and ’6%–7% for the PCORR.
Finally, row 6 shows the direct prediction of MAPd
Figure 10 illustrates the results of the two experiments.
using the same set of predictors of row 5. The perfor-
Row 1 shows the performance of Lagrangian persis-
mance is essentially indistinguishable from row 5. Thus,
tence, which has a RMSE of 3 dB and a PCORR of 0.78
according to our experiments, there is no difference in
(same as Fig. 8b but for training, validation, and test).
Row 2 predicts MAPd by adding the last observed GD performance between directly predicting the MAPd or
to MAPo. The PCORR is slightly reduced from 0.78 to predicting the GD term and adding it to the MAPo. How-
0.76 and the RMSE increases from 3 to 3.5 dB. The large ever, there is a practical advantage in predicting GD as
variance of observed GD values could explain the increase it simplifies the sensitivity analysis of input predictors
of the RMSE. (section 5a). This analysis would be more difficult by di-
Row 3 computes the MAPd by adding to the MAPo rectly predicting MAPd. In fact, it would require choos-
the GD predicted using the set of external predictors. ing an appropriate value for MAPo to avoid confusing
With respect to Lagrangian persistence there is now a the sensitivity analysis with the effect of regression to
slight increase in prediction performance with a PCORR the mean.
of 0.8 and a RMSE of 2.8 dB.
f. Verification scatterplots
Row 4 is the same but additionally uses GDd(t) as
predictor. The RMSE is reduced further to 2.75 dB and Figure 12 shows the 2D verification histograms on
the PCORR rises to 0.81–0.82. This represents an ’8% the test set to better understand the effect of adding GD
decrease in RMSE and ’5% increase in PCORR with to the MAP and the regression to the mean.
respect to Lagrangian persistence. Figures 12a and 12b show the verification for
Figure 11 shows maps of the RMSE reduction by GD predictions without and with MAPo as predictor.
ANN for the NW and SW flows. As expected, the av- One can easily recognize that in both cases the range of
erage 8% reduction of RMSE exhibits significant spatial GD predictions is much smaller than the one of the
variability depending on flow direction, but the ANN observations. This behavior is a natural consequence of
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1560 WEATHER AND FORECASTING VOLUME 34
FIG. 10. Verification results for direct and indirect prediction of MAPd. To be consistent with the multiplicative formulation of GD, MAPo
and MAPd are in dBR units.
the low predictability of GD and is observed as a con- the GD to the MAPo (Fig. 12d). In this case, the de-
ditional bias with respect to observations (regression crease of RMSE is not followed by a reduction of the
slope b different from 1). The larger the departure of b is conditional bias and the b deteriorates from 0.80 to 0.68.
from 1 the stronger is the conditional bias. Such bias is This can be noticed by the lower number of samples in
unavoidable for any machine learning or other statistical both the lower-left and upper-right corners of the 2D
model that tries to predict highly unpredictable atmo- histogram, where the range of MAP predictions is shrunk.
spheric variables by minimizing the MSE (e.g., Frei and This response is a direct consequence of the regression to
Isotta 2019). As already mentioned, a possible solution the mean, which prevents high MAP values to grow fur-
is to leave the deterministic world and perform proba- ther and generally increases the very low MAP values.
bilistic predictions. In conclusion, the effect of regression to the mean can
Figure 12b also shows that adding MAPo as predictor reduce the RMSE, but can lead to a larger conditional
reduces the RMSE, increases the PCORR, and de- bias. This statement is also valid for the direct prediction
creases the conditional bias. However, this conclusion is of MAPd instead of GD (row 6 in Fig. 10), which yields
different when verifying the MAPd prediction by adding the same b 5 0.68 (not shown).
FIG. 11. Maps of RMSE reduction (%) by ANN with respect to Lagrangian persistence when predicting the MAPd as MAPo 1 GDpred
using predictors [X, Y, U, V, HZT, GD(t)] (row 4 in Fig. 10). (a) NW flows and (b) SW flows.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1561
FIG. 12. Verification histograms on the test set for GD predictions: (a) without using MAPo and (b) using
MAPo as predictor. Indirect prediction of MAP by adding the GDpred: (a) without using MAPo and (b) using
MAPo.
6. Probabilistic machine learning and (i.e., Bayesian or frequentist) (see e.g., Heskes 1997;
quantification of prediction uncertainty Meinshausen 2006; Khosravi et al. 2011; Ghahramani
2015). ANN-based approaches typically derive the PI
a. Prediction interval estimation in machine learning
by training an ensemble of ANNs on bootstrap repli-
The topic of uncertainty quantification in machine cates of the training set and/or by fitting an ANN
learning has received increasing attention in recent years model to the squared residuals of the validation set
(see e.g., Ghahramani 2015). The uncertainty can be es- (Heskes 1997; Khosravi et al. 2011). The boot-
timated by computing prediction and confidence intervals strapping approach is only feasible with small datasets
(Heskes 1997): (see e.g., Khosravi et al. 2011), while fitting the
squared residuals implies assuming a Gaussian dis-
d The prediction interval (PI) estimates the range in
tribution of the errors. To relax this assumption, one
which a new observation will fall with a certain prob-
can approximate the distribution by estimating a finite
ability. It measures the uncertainty of predictions.
set of quantiles (e.g., Cade and Noon 2003), for ex-
d The confidence interval (CI) estimates the range in
ample using a dedicated ANN for each quantile (e.g.,
which a model parameter will fall with a certain prob-
Cannon 2011).
ability. It measures the uncertainty of parameters.
Decision trees can easily be extended to perform
The CI is contained in the PI and is usually much quantile regression (Meinshausen 2006). A naïve quantile
smaller. For instance, the spread of an NWP ensemble decision tree can be devised by computing a set of
can be used as an estimate of PI. empirical quantiles from the collection of target values
There are several ways to estimate the PI depending at each leaf. Random forests can be used to further
on the chosen ML algorithm and adopted philosophy stabilize the quantile estimations. The drawback of
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1562 WEATHER AND FORECASTING VOLUME 34
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1563
FIG. 14. Numerical experiment to test the QANN method. (a) Prediction results for the 90% PI interval
using (left) the quantile DT and (right) the QANN; the RMSE and PCORR measure the correspondence
between the observed values and the conditional median (red line). (b) Corresponding verification of the
predicted quantiles using a reliability diagram, where the observed frequency below a certain quantile is
plotted against the predicted frequency; the RMSE measures the average error between the target quan-
tiles of the decision tree and the ones predicted by the QANN. The selected hyperparameters are
also shown.
the target variable [e.g., on the right side of the reproduced by QANN, which are 46.4%, 89.1%, and
function (x . 1.0)]. This is due to the DT assuming a 94.4%, respectively.
constant mean value within the leaf. One solution
could be to perform a stepwise linear regression of the 7. Using QANN to estimate the uncertainty of
target values at the cost of additional computational growth and decay predictions
time.
a. Prediction interval estimation of growth and decay
Figure 14b illustrates the verification results for the
two approaches with a reliability plot on the training, Figure 15 uses the same predictors as Fig. 4, but in-
validation, and test sets. The correspondence of ob- stead of predicting the mean GD it uses QANN to
served and predicted frequencies of QANN is better predict the 90% prediction interval.
than the quantile DT. The proportions of observations The QANN is able to capture the larger GD uncer-
falling in the 50%, 90%, and 98% PI are also better tainty associated with high HZT conditions. In fact, with
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1564 WEATHER AND FORECASTING VOLUME 34
FIG. 15. QANN predictions of the growth and decay 90% prediction interval with different flow directions and freezing level heights
for a fixed flow speed of 30 km h21. NW flows with (a) HZT at 1500 m and (b) HZT at 4000 m. SW flows with (c) HZT at 1500 m and
(d) HZT at 4000 m.
high HZT the 90% PI interval is in the range 8–15 dB, Figure 16a shows an almost perfect correspondence
while with low HZT it is in the range 7–10 dB. It is also between the observed and predicted quantiles,
interesting to note that the PI is larger on the western which is even better than the one of the numerical
side of the domain with high HZT, which represents the example (section 6c), probably a consequence of the
lower predictability of growth and decay over the flat larger sample size. This demonstrates that the pre-
areas of France compared to the Alpine region. Also, dicted quantiles are well calibrated (unbiased).
the spatial patterns of PI display a lower spatial Their discrimination ability, however, is limited by
variability and dependence with flow direction the prediction performance of the decision tree. Fi-
compared to the conditional mean GD predictions of nally, no crossing quantiles were found in the
the ANN (Fig. 4). training, validation, and test sets, which confirms
Finally, it is important to mention that even a field that the ANN fully preserves the ranking of decision
with a mean GD ’ 0 everywhere (e.g., over flat tree quantiles.
continental regions) can still exhibit variability, In Fig. 16b we rank a random subset of observa-
and thus predictability, of the prediction interval. tions of the test set by increasing PI width. The
Hence, additional information about predictability is lowest values of the 98% PI are around 10 dB while
gained, which could not have been found by only the highest around 20 dB, which reflects again the
modeling the conditional mean (e.g., Cade and Noon low predictability of precipitation GD and the im-
2003). portance of estimating the forecast uncertainty. On
average, we should expect 2% of the observations to
b. Verification of growth and decay quantiles
fall outside the 98% PI, respectively, 10% outside the
Figure 16 illustrates the verification of the quan- 90% PI. Indeed, there are 9 out of 500 points falling
tiles predicted by the QANN of previous section. outside the 98% PI, which corresponds to 1.8%.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1565
FIG. 16. Verification of the predicted quantiles by QANN on the growth and decay dataset. (a) Reliability plot
showing the observed vs predicted quantiles and the percentage of values falling within a given PI. (b) Plot dis-
playing a subset of observed GD values on the test set ranked by increasing 98% PI width. The PI values are
centered to remove the variations of the median and improve the clarity of the plot as in Meinshausen (2006).
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1566 WEATHER AND FORECASTING VOLUME 34
An interesting question is to understand how to blend limits, we do not expect large improvements (i.e., it will
the growth and decay of the real-time radar and NWP remain necessary to estimate the prediction uncertainty,
fields with the one derived from the archive. Machine e.g., by using prediction intervals).
learning could provide a flexible framework to optimally The presented machine learning framework could
integrate these different data sources. readily be applied to derive a thunderstorm climatology
using the large archives of convective cell tracks (e.g.,
Goudenhoofdt and Delobbe 2013; Meyer et al. 2013;
9. Conclusions
Wapler and James 2015; Nisi et al. 2018). In fact, these
We presented a machine learning framework for datasets contain similar predictors, such as the spatial
nowcasting precipitation growth and decay in the Swiss location of the cell [X, Y], the tracked motion vectors
Alpine region based on a 10-yr archive of composite [U, V], and, potentially, NWP, satellite, and lightning
radar images. The trained artificial neural networks variables describing the environmental conditions and
were able to automatically learn and reproduce the cli- life cycle of the storm. This analysis would not only be
matological growth and decay patterns, in agreement interesting from a climatological perspective, but could
with the findings of Foresti et al. (2018). also form a basis to incorporate information about the
Forecast verification revealed the most relevant pre- evolution of individual convective cells into field-based
dictors, which are in order of importance: the geo- nowcasting systems (Sideris et al. 2018).
graphical location, the flow direction and speed, and the
freezing level height. The ANN predictions provided Acknowledgments. This study was supported by the
similar accuracy as assuming persistence of growth and Swiss National Science Foundation Ambizione project
decay, but when combined with the latter the performance ‘‘Precipitation attractor from radar and satellite data
improved substantially. The decrease of RMSE compared archives and implications for seamless very short-
with persistence is up to 20%–30% over orography. term forecasting’’ (PZ00P2 161316). We thank Ulrich
Deterministic machine learning predictions are Hamann, Alan Seed, Luca Panziera, Simona Trefalt,
designed to minimize prediction errors, which, however, Marco Gabella and Floor van den Heuvel for the useful
lead to smooth forecast fields characterized by strong discussions and feedback on the manuscript. Bertrand
conditional biases. This complicates the comparison of Calpini is thanked for his support to the project. We
machine learning predictions with the persistence baseline, are also grateful to Christoph Frei for the discussion on
which, by definition, preserves the variance of observa- error minimization and conditional bias.
tions. To overcome these limitations, we introduced a
probabilistic machine learning framework for precipitation REFERENCES
nowcasting and presented a novel method to estimate the
prediction uncertainty based on a combination of decision Andersen, H., J. Cermak, J. Fuchs, R. Knutti, and U. Lohmann,
2017: Understanding the drivers of marine liquid-water cloud
trees and ANNs (i.e., QANN). Such uncertainty estimates
occurrence and properties with global observations using
could be used in combination with stochastic simulation neural networks. Atmos. Chem. Phys., 17, 9535–9546, https://
(e.g., Nerini et al. 2017; Frei and Isotta 2019) to generate a doi.org/10.5194/acp-17-9535-2017.
realistic ensemble of precipitation fields. Future advances Atencia, A., and I. Zawadzki, 2014: A comparison of two techniques
in machine learning should consider extending also the for generating nowcasting ensembles. Part I: Lagrangian en-
semble technique. Mon. Wea. Rev., 142, 4036–4052, https://
deep convolutional neural networks (e.g., Shi et al. 2015) to
doi.org/10.1175/MWR-D-13-00117.1.
estimate the prediction uncertainty. ——, and ——, 2015: A comparison of two techniques for gener-
The analyses and conclusions of the paper are relative ating nowcasting ensembles. Part II: Analogs selection and
to a spatial scale of 64 km and a lead time of 1 h. Thus, an comparison of techniques. Mon. Wea. Rev., 143, 2890–2908,
interesting extension could be to study the scale and https://doi.org/10.1175/MWR-D-14-00342.1.
lead-time dependence of the predictive performance. ——, ——, and M. Berenguer, 2017: Scale characterization and
correction of diurnal cycle errors in MAPLE. J. Appl. Meteor.
Another open question concerns the residual radar mea-
Climatol., 56, 2561–2575, https://doi.org/10.1175/JAMC-D-16-
surement uncertainty, which locally affects the growth and 0344.1.
decay values (see a discussion in Foresti and Seed 2015; Baldauf, M., A. Seifert, J. Förstner, D. Majewski, M. Raschendorfer,
Foresti et al. 2018). and T. Reinhardt, 2011: Operational convective-scale numerical
Additional radar, satellite, and NWP predictors could weather prediction with the COSMO model: Description and
sensitivities. Mon. Wea. Rev., 139, 3887–3905, https://doi.org/
also be included to further enhance the prediction per-
10.1175/MWR-D-10-05013.1.
formance (e.g., Mecikalski et al. 2015; Han et al. 2017; Barnett, A., J. van der Pols, and A. Dobson, 2005: Regression to the
Zeder et al. 2018). However, as the atmosphere is a mean: What it is and how to deal with it. Int. J. Epidemiol., 34,
chaotic system characterized by intrinsic predictability 215–220, https://doi.org/10.1093/ije/dyh299.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1567
Berenguer, M., D. Sempere-Torres, and G. G. Pegram, 2011: French, M., W. Krajewski, and R. Cuykendall, 1992: Rainfall
SBMcast: An ensemble nowcasting technique to assess the forecasting in space and time using a neural network.
uncertainty in rainfall forecasts by Lagrangian extrapola- J. Hydrol., 137, 1–31, https://doi.org/10.1016/0022-1694(92)
tion. J. Hydrol., 404, 226–240, https://doi.org/10.1016/ 90046-X.
j.jhydrol.2011.04.033. Freund, Y., and R. Schapire, 1997: A decision-theoretic general-
Besic, N., J. Figueras i Ventura, J. Grazioli, M. Gabella, ization of on-line learning and an application to boosting.
U. Germann, and A. Berne, 2016: Hydrometeor classification J. Comput. Syst. Sci., 55, 119–139, https://doi.org/10.1006/
through statistical clustering of polarimetric radar measure- jcss.1997.1504.
ments: A semi-supervised approach. Atmos. Meas. Tech., 9, Fukushima, K., and S. Miyake, 1982: Neocognitron: A new algo-
4425–4445, https://doi.org/10.5194/amt-9-4425-2016. rithm for pattern recognition tolerant of deformations and
Beusch, L., L. Foresti, M. Gabella, and U. Hamann, 2018: Satellite- shifts in position. Pattern Recognit., 15, 455–469, https://
based rainfall retrieval: From generalized linear models to ar- doi.org/10.1016/0031-3203(82)90024-3.
tificial neural networks. Remote Sens., 10, 939, https://doi.org/ Gagne, D., A. McGovern, S. Haupt, R. Sobash, J. Williams, and
10.3390/rs10060939. M. Xue, 2017: Storm-based probabilistic hail forecasting with
Bowler, N. E., C. E. Pierce, and A. Seed, 2006: STEPS: A probabi- machine learning applied to convection-allowing ensembles.
listic precipitation forecasting scheme which merges an extrap- Wea. Forecasting, 32, 1819–1840, https://doi.org/10.1175/
olation nowcast with downscaled NWP. Quart. J. Roy. Meteor. WAF-D-17-0010.1.
Soc., 132, 2127–2155, https://doi.org/10.1256/qj.04.100. ——, S. Haupt, D. Nychka, H. Christensen, A. Subramanian, and
Breiman, L., 2001a: Random forests. Mach. Learn., 45, 5–32, A. Monahan, 2018: Generation of spatial weather fields with
https://doi.org/10.1023/A:1010933404324. generative adversarial networks. Fourth Conf. on Stochastic
——, 2001b: Statistical modeling: The two cultures. Stat. Sci., 16, Weather Generators (SWGEN 2018), Boulder, CO, University
199–231, https://doi.org/10.1214/ss/1009213726. Corporation for Atmospheric Research, http://opensky.ucar.edu/
——, J. H. Friedman, R. A. Olshen, and C. J. Stone, 1984: Classifi- islandora/object/conference:3343.
cation and Regression Trees. Chapman and Hall/CRC, 368 pp. Germann, U., and I. Zawadzki, 2002: Scale-dependence of the
Cade, B., and B. Noon, 2003: A gentle introduction to quantile regression predictability of precipitation from continental radar images.
for ecologists. Front. Ecol. Environ., 1, 412–420, https://doi.org/ Part I: Description of the methodology. Mon. Wea. Rev., 130,
10.1890/1540-9295(2003)001[0412:AGITQR]2.0.CO;2. 2859–2873, https://doi.org/10.1175/1520-0493(2002)130,2859:
Cannon, A., 2011: Quantile regression neural networks: Im- SDOTPO.2.0.CO;2.
plementation in R and application to precipitation down- ——, ——, and B. Turner, 2006: Predictability of precipita-
scaling. Comput. Geosci., 37, 1277–1284, https://doi.org/10.1016/ tion from continental radar images. Part IV: Limits to pre-
j.cageo.2010.07.005. diction. J. Atmos. Sci., 63, 2092–2108, https://doi.org/10.1175/
Carbone, R., J. Tuttle, D. Ahijevych, and S. Trier, 2002: Inferences JAS3735.1.
——, M. Berenguer, D. Sempere-Torres, and M. Zappa, 2009:
of predictability associated with warm season precipitation
REAL-Ensemble radar precipitation estimation for hydrol-
episodes. J. Atmos. Sci., 59, 2033–2056, https://doi.org/10.1175/
ogy in a mountainous region. Quart. J. Roy. Meteor. Soc., 135,
1520-0469(2002)059,2033:IOPAWW.2.0.CO;2.
445–456, https://doi.org/10.1002/qj.375.
Fabry, F., and A. Seed, 2009: Quantifying and predicting the ac-
——, D. Nerini, I. Sideris, L. Foresti, A. Hering, and B. Calpini, 2017:
curacy of radar-based quantitative precipitation forecasts.
Real-time radar - A new Alpine radar network. Meteorological
Adv. Water Resour., 32, 1043–1049, https://doi.org/10.1016/
Technology Int., 4 pp., https://www.meteosuisse.admin.ch/content/
j.advwatres.2008.10.001.
dam/meteoswiss/en/Mess-Prognosesysteme/Atmosphaere/doc/
——, V. Meunier, B. Treserras, A. Cournoyer, and B. Nelson, 2017: MTI-April2017-Rad4Alp.pdf.
On the climatological use of radar data mosaics: Possibilities Ghahramani, Z., 2015: Probabilistic machine learning and artificial
and challenges. Bull. Amer. Meteor. Soc., 98, 2135–2148, intelligence. Nature, 521, 452–459, https://doi.org/10.1038/
https://doi.org/10.1175/BAMS-D-15-00256.1. nature14541.
Foresti, L., and A. Seed, 2015: On the spatial distribution of rainfall Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier
nowcasting errors due to orographic forcing. Meteor. Appl., neural networks. Proc. Machine Learning Res., 15, 315–323.
22, 60–74, https://doi.org/10.1002/met.1440. Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning.
——, M. Kanevski, and A. Pozdnoukhov, 2012: Kernel-based map- Adaptive Computation and Machine Learning Series, F. Bach,
ping of orographic rainfall enhancement in the Swiss Alps as Ed., MIT Press, 800 pp., http://www.deeplearningbook.org.
detected by weather radar. IEEE Trans. Geosci. Remote Sens., Goudenhoofdt, E., and L. Delobbe, 2013: Statistical characteristics
50, 2954–2967, https://doi.org/10.1109/TGRS.2011.2179550. of convective storms in Belgium derived from volumetric
——, L. Panziera, P. V. Mandapaka, U. Germann, and A. Seed, weather radar observations. J. Appl. Meteor. Climatol., 52,
2015: Retrieval of analogue radar images for ensemble now- 918–934, https://doi.org/10.1175/JAMC-D-12-079.1.
casting of orographic rainfall. Meteor. Appl., 22, 141–155, Grecu, M., and W. Krajewski, 2000: A large-sample investigation
https://doi.org/10.1002/met.1416. of statistical procedures for radar-based short-term quantita-
——, I. Sideris, L. Panziera, D. Nerini, and U. Germann, 2018: A tive precipitation forecasting. J. Hydrol., 239, 69–84, https://
10-year radar-based analysis of orographic precipitation doi.org/10.1016/S0022-1694(00)00360-7.
growth and decay patterns over the Swiss Alpine region. Hall, T., H. Brooks, and C. Doswell III, 1999: Precipitation forecasting
Quart. J. Roy. Meteor. Soc., 144, 2277–2301, https://doi.org/ using a neural network. Wea. Forecasting, 14, 338–345, https://
10.1002/qj.3364. doi.org/10.1175/1520-0434(1999)014,0338:PFUANN.2.0.CO;2.
Frei, C., and F. Isotta, 2019: Ensemble spatial precipitation analysis Hamill, T., and J. Whitaker, 2006: Probabilistic quantitative pre-
from rain gauge data - Methodology and application in the cipitation forecasts based on reforecast analogs: Theory and
European Alps. J. Geophys. Res. Atmos., 124, 5757–5778, application. Mon. Wea. Rev., 134, 3209–3229, 10.1175/
https://doi.org/10.1029/2018JD030004. MWR3237.1.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
1568 WEATHER AND FORECASTING VOLUME 34
Han, L., J. Sun, W. Zhang, Y. Xiu, H. Feng, and Y. Lin, 2017: A McGovern, A., D. Elmore, K. L. Gagne, S. Haupt, C. Karstens,
machine learning nowcasting method based on real-time re- R. Lagerquist, T. Smith, and J. Williams, 2017: Using artificial
analysis data. J. Geophys. Res. Atmos., 122, 4038–4051, https:// intelligence to improve real-time decision-making for high-
doi.org/10.1002/2016JD025783. impact weather. Bull. Amer. Meteor. Soc., 98, 2073–2090,
Haupt, S., A. Pasini, and C. Marzban, Eds., 2009: Artificial In- https://doi.org/10.1175/BAMS-D-16-0123.1.
telligence Methods in the Environmental Sciences. Springer, Mecikalski, J., J. Williams, C. Jewett, D. Ahijevych, A. LeRoy, and
424 pp., https://doi.org/10.1007/978-1-4020-9119-3. J. Walker, 2015: Probabilistic 0–1-h convective initiation nowcasts
Haykin, S., 1998: Neural Networks: A Comprehensive Foundation. that combine geostationary satellite observations and numerical
2nd ed. Prentice-Hall, 842 pp . weather prediction model data. J. Appl. Meteor. Climatol., 54,
Heskes, T., 1997: Practical confidence and prediction intervals. 1039–1059, https://doi.org/10.1175/JAMC-D-14-0129.1.
Adv. Neural Info. Process. Syst., 9, 176–182. Meinshausen, N., 2006: Quantile regression forests. J. Mach. Learn.
Hinton, G. E., S. Osindero, and Y. Teh, 2006: A fast learning al- Res., 7, 983–999.
gorithm for deep belief nets. Neural Comput., 18, 1527–1554, Meyer, V., H. Höller, and H. Betz, 2013: Automated thunderstorm
https://doi.org/10.1162/neco.2006.18.7.1527. tracking: Utilization of three-dimensional lightning and radar
Kanevski, M., V. Timonin, and A. Pozdnoukhov, 2009: Machine data. Atmos. Chem. Phys., 13, 5137–5150, https://doi.org/
Learning for Spatial Environmental Data: Theory, Applica- 10.5194/acp-13-5137-2013.
tions, and Software. EPFL Press, 400 pp. Murphy, A. H., 1995: The coefficients of correlation and de-
Khosravi, A., S. Nahavandi, D. Creighton, and A. Atiya, 2011: termination as measures of performance in forecast verifica-
Comprehensive review of neural network-based prediction tion. Wea. Forecasting, 10, 681–688, https://doi.org/10.1175/
intervals and new advances. IEEE Trans. Neural Network, 22, 1520-0434(1995)010,0681:TCOCAD.2.0.CO;2.
1341–1356, https://doi.org/10.1109/TNN.2011.2162110. Nerini, D., N. Besic, I. Sideris, U. Germann, and L. Foresti, 2017: A
Kingma, D., and J. Ba, 2015: Adam: A method for stochastic op- non-stationary stochastic ensemble generator for radar rain-
timization. Third Int. Conf. on Learning Representations fall fields based on the short-space Fourier transform. Hydrol.
(ICLR 2015), San Diego, CA, ICLR, http://arxiv.org/abs/ Earth Syst. Sci., 21, 2777–2797, https://doi.org/10.5194/hess-21-
1412.6980. 2777-2017.
Kretzschmar, R., P. Eckert, D. Cattani, and F. Eggimann, 2004: Nisi, L., A. Hering, U. Germann, and O. Martius, 2018: A 15-year
Neural network classifiers for local wind prediction. J. Appl. hail streak climatology for the Alpine region. Quart.
Meteor., 43, 727–738, https://doi.org/10.1175/2057.1. J. Roy. Meteor. Soc., 144, 1429–1449, https://doi.org/10.1002/
Kuligowski, R., and A. Barros, 1998: Experiments in short-term qj.3286.
precipitation forecasting using artificial neural networks. Mon. Panziera, L., U. Germann, M. Gabella, and P. V. Mandapaka, 2011:
Wea. Rev., 126, 470–482, https://doi.org/10.1175/1520-0493(1998) NORA—Nowcasting of orographic rainfall by means of ana-
126,0470:EISTPF.2.0.CO;2. logues. Quart. J. Roy. Meteor. Soc., 137, 2106–2123, https://
Li, J., and R. Ding, 2011: Temporal-spatial distribution of atmo- doi.org/10.1002/qj.878.
Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning
spheric predictability limit by local dynamical analogs. Mon.
in Python. J. Mach. Learn. Res., 12, 2825–2830.
Wea. Rev., 139, 3265–3283, https://doi.org/10.1175/MWR-D-
Peleg, N., F. Marra, S. Fatichi, P. Molnar, E. Morin, A. Sharma, and
10-05020.1.
P. Burlando, 2018: Intensification of convective rain cells at
Lorenz, E. N., 1956: Empirical orthogonal functions and statistical
warmer temperatures observed from high-resolution weather
weather prediction. Department of Meteorology, Massachu-
radar data. J. Hydrometeor., 19, 715–726, https://doi.org/
setts Institute of Technology, 52 pp.
10.1175/JHM-D-17-0158.1.
——, 1969: Atmospheric predictability as revealed by naturally
Pitkänen, M., S. Mikkonen, K. Lehtinen, A. Lipponen, and
occurring analogues. J. Atmos. Sci., 26, 636–646, https:// A. Arola, 2016: Artificial bias typically neglected in compari-
doi.org/10.1175/1520-0469(1969)26,636:APARBN.2.0.CO;2. sons of uncertain atmospheric data. Geophys. Res. Lett., 43,
——, 1996: Predictability—A problem partly solved. Proc. Sem- 10 003–10 011, https://doi.org/10.1002/2016GL070852.
inar on Predictability, Vol. 1, Reading, Berkshire, United Potts, J., 2012: Basic concepts. Forecast Verification: A Practi-
Kingdom, ECMWF, 18 pp., https://www.ecmwf.int/en/ tioner’s Guide in Atmospheric Sciences, I. T. Jolliffe and
elibrary/10829-predictability-problem-partly-solved. D. B. Stephenson, Eds., Wiley-Blackwell, 11–29.
Malone, T., 1955: Applications of statistical methods in weather Radhakrishna, B., I. Zawadzki, and F. Fabry, 2012: Predictability of
prediction. Proc. Natl. Acad. Sci. USA, 41, 806–815, https:// precipitation from continental radar images. Part V: Growth
doi.org/10.1073/pnas.41.11.806. and decay. J. Atmos. Sci., 69, 3336–3349, https://doi.org/10.1175/
Mandapaka, P., U. Germann, and L. Panziera, 2013: Diurnal cycle JAS-D-12-029.1.
of precipitation over complex alpine orography: Inferences Rasp, S., and S. Lerch, 2018: Neural networks for postprocessing
from high-resolution radar observations. Quart. J. Roy. ensemble weather forecasts. Mon. Wea. Rev., 146, 3885–3900,
Meteor. Soc., 139, 1025–1046, https://doi.org/10.1002/qj.2013. https://doi.org/10.1175/MWR-D-18-0187.1.
Manzato, A., 2005: The use of sounding-derived indices for a neural Schmidhuber, J., 2015: Deep learning in neural networks: An over-
network short-term thunderstorm forecast. Wea. Forecasting, view. Neural Networks, 61, 85–117, https://doi.org/10.1016/
20, 896–917, https://doi.org/10.1175/WAF898.1. j.neunet.2014.09.003.
Marzban, C., and A. Witt, 2001: A Bayesian neural network for Seed, A., 2003: A dynamic and spatial scaling approach to advec-
severe-hail size prediction. Wea. Forecasting, 16, 600–610, tion forecasting. J. Appl. Meteor., 42, 381–388, https://doi.org/
https://doi.org/10.1175/1520-0434(2001)016,0600:ABNNFS. 10.1175/1520-0450(2003)042,0381:ADASSA.2.0.CO;2.
2.0.CO;2. ——, C. E. Pierce, and K. Norman, 2013: Formulation and evalu-
McCann, D., 1992: A neural network short-term forecast of signifi- ation of a scale decomposition-based stochastic precipitation
cant thunderstorms. Wea. Forecasting, 7, 525–534, https://doi.org/ nowcast scheme. Water Resour. Res., 49, 6624–6641, https://
10.1175/1520-0434(1992)007,0525:ANNSTF.2.0.CO;2. doi.org/10.1002/wrcr.20536.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC
OCTOBER 2019 FORESTI ET AL. 1569
Shi, X., Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and amounts. Atmos.–Ocean, 19, 54–65, https://doi.org/10.1080/
W.-c. Woo, 2015: Convolutional LSTM network: A machine 07055900.1981.9649100.
learning approach for precipitation nowcasting. Adv. Neural Turner, B., I. Zawadzki, and U. Germann, 2004: Predictability of
Info. Process. Syst., 28, 802–810, https://papers.nips.cc/paper/ precipitation from continental radar images. Part III: Opera-
5955-convolutional-lstm-network-a-machine-learning-approach- tional nowcasting implementation (MAPLE). J. Appl. Meteor.,
for-precipitation-nowcasting.pdf. 43, 231–248, https://doi.org/10.1175/1520-0450(2004)043,0231:
Sideris, I., L. Foresti, D. Nerini, and U. Germann, 2018: Now- POPFCR.2.0.CO;2.
Precip: An algorithm for localized probabilistic precipitation Ukkonen, P., A. Manzato, and A. Määkelä, 2017: Evaluation of
nowcasting in the complex terrain of Switzerland. Proc. 10th thunderstorm predictors for Finland using reanalyses and neural
European Conf. on Radar in Meteorology and Hydrology networks. J. Appl. Meteor. Climatol., 56, 2335–2352, https://
(ERAD), Ede-Wageningen, the Netherlands, ERAD, Ab- doi.org/10.1175/JAMC-D-16-0361.1.
stract number 192, 9 pp., http://projects.knmi.nl/erad2018/ Van Den Dool, H. M., 1994: Searching for analogs: How long
ERAD2018_extended_abstract_192.pdf. must we wait? Tellus, 46A, 314–324, https://doi.org/10.3402/
Solomatine, D., and L. Shrestha, 2009: A novel method to es- tellusa.v46i3.15481.
timate model uncertainty using machine learning tech- Villarini, G., and W. F. Krajewski, 2010: Review of the different
niques. Water Resour. Res., 45, 1–6, https://doi.org/10.1029/ sources of uncertainty in single polarization radar-based esti-
2008WR006839. mates of rainfall. Surv. Geophys., 31, 107–129, https://doi.org/
Sprenger, S., S. Schemm, R. Oechslin, and J. Jenkner, 2017: 10.1007/s10712-009-9079-x.
Nowcasting Foehn wind events using the AdaBoost ma- Wapler, K., and P. James, 2015: Thunderstorm occurrence and
chine learning algorithm. Wea. Forecasting, 32, 1079–1099, characteristics in Central Europe under different synoptic con-
https://doi.org/10.1175/WAF-D-16-0208.1. ditions. Atmos. Res., 158–159, 231–244, https://doi.org/10.1016/
Surcel, M., I. Zawadzki, and M. K. Yau, 2015: A study on the j.atmosres.2014.07.011.
scale dependence of the predictability of precipitation pat- Weingart, N., 2018: Deep learning based error correction of
terns. J. Atmos. Sci., 72, 216–235, https://doi.org/10.1175/ numerical weather prediction in Switzerland. M.S. thesis,
JAS-D-14-0071.1. Systems Group, Department of Computer Science, ETH
Taillardat, M., O. Mestre, M. Zamo, and P. Naveau, 2016: Calibrated Zurich, 67 pp.
ensemble forecasts using quantile regression forests and ensem- Wilson, J. W., N. A. Crook, C. K. Mueller, J. Sun, and M. Dixon, 1998:
ble model output statistics. Mon. Wea. Rev., 144, 2375–2393, Nowcasting thunderstorms: A status report. Bull. Amer. Meteor.
https://doi.org/10.1175/MWR-D-15-0260.1. Soc., 79, 2079–2099, https://doi.org/10.1175/1520-0477(1998)
Toth, Z., 1991a: Circulation patterns in phase space: A multi- 079,2079:NTASR.2.0.CO;2.
normal distribution? Mon. Wea. Rev., 119, 1501–1511, Zawadzki, I., 1973: Statistical properties of precipitation patterns.
https://doi.org/10.1175/1520-0493(1991)119,1501:CPIPSA. J. Appl. Meteor., 12, 459–472, https://doi.org/10.1175/1520-
2.0.CO;2. 0450(1973)012,0459:SPOPP.2.0.CO;2.
——, 1991b: Estimation of atmospheric predictability by cir- Zeder, J., U. Hamann, D. Nerini, L. Foresti, L. Clementi,
culation analogues. Mon. Wea. Rev., 119, 65–72, https:// A. Hering, and U. Germann, 2018: Comparison of thunder-
doi.org/10.1175/1520-0493(1991)119,0065:EOAPBC. storm characteristics as seen by SEVIRI and radar regarding
2.0.CO;2. lightning and hail initiation. EUMETSAT Meteorological
Tsonis, A., and G. Austin, 1981: An evaluation of extrapo- Satellite Conf., Tallinn, Estonia, MeteoSwiss, https://doi.org/
lation techniques for the short-term prediction of rain 10.13140/RG.2.2.23453.77285.
Brought to you by SEJONG UNIVERSITY LIBRARY | Unauthenticated | Downloaded 04/16/21 06:06 PM UTC