You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318811195

Variance Stabilizing Transformations for Electricity Spot Price Forecasting

Article in Power Systems, IEEE Transactions on · July 2017


DOI: 10.1109/TPWRS.2017.2734563

CITATIONS READS
116 1,519

3 authors:

Bartosz Uniejewski Rafał Weron


Wroclaw University of Science and Technology Wroclaw University of Science and Technology
17 PUBLICATIONS 726 CITATIONS 233 PUBLICATIONS 11,063 CITATIONS

SEE PROFILE SEE PROFILE

Florian Ziel
University of Duisburg-Essen
78 PUBLICATIONS 2,093 CITATIONS

SEE PROFILE

All content following this page was uploaded by Florian Ziel on 12 February 2018.

The user has requested enhancement of the downloaded file.


1

Variance Stabilizing Transformations for Electricity


Spot Price Forecasting
Bartosz Uniejewski, Rafał Weron and Florian Ziel

P24
Abstract—Most electricity spot price series exhibit price spikes. k · kp p-norm, i.e., kb
εX,d kp = ( h=1 |b εX,d,h |p )1/p
These extreme observations may significantly impact the obtained ∆X,Y,d Loss differential series of the multivariate
model estimates and hence reduce efficiency of the employed Diebold-Mariano (DM) test, see Eqn. (18)
predictive algorithms. For markets with only positive prices ∆X,Y,d,M Loss differential series of the multivariate DM
the logarithmic transform is the single most commonly used
technique to reduce spike severity and consequently stabilize the test across 11 European markets (M1 ,...,M11 ),
variance. However, for datasets with very close to zero (like the see Eqn. (19)
Spanish) or negative (like the German) prices the log-transform
is not feasible. What reasonable choices do we have then?
To address this issue, we conduct a comprehensive forecasting I. I NTRODUCTION
study involving 12 datasets from diverse power markets and
evaluate 16 variance stabilizing transformations. We find that the Over the last two decades short-term (also called spot or
probability integral transform (PIT) combined with the standard day-ahead; for a discussion see [1]) electricity price fore-
Gaussian distribution yields the best approach, significantly casting (EPF) has joined short-term load forecasting as one
better than many of the considered alternatives. of the core processes of an energy company’s operational
Index Terms—Electricity spot price, Forecasting, Variance activities. The reason is quite simple. Optimal asset scheduling
stabilizing transformation, Probability integral transform, Price and trading decisions are made through revenue contribution
spike, Diebold-Mariano test models, and market risk is managed via spot (day-ahead)
transactions [2]. The gain from more accurate predictions can
N OMENCLATURE be quantified. For instance, a 1% improvement in the mean
absolute percentage error (MAPE) in forecasting accuracy
EPF Electricity price forecasting would result in about 0.1%–0.35% cost reductions from short-
VST Variance stabilizing transformation term EPF [3]. In dollar terms, this would translate into savings
PIT Probability integral transform of ca. 1.5 million USD per year for a typical medium-size
Pd,h Electricity price in the day-ahead market for utility with a 5-GW peak load [4].
day d and hour h A critical issue in the calibration of electricity price fore-
P̂d,h Forecast of Pd,h made on day d − 1 casting (EPF) models is their sensitivity to price spikes, see [1]
pd,h ‘Normalized’ price, i.e., pd,h = 1b (Pd,h − a), for a recent review. The latter are one of the most pronounced
where a is the shift and b is the scale features of deregulated power markets and nearly all spot price
Yd,h VST-transformed price, i.e., Yd,h = f (Pd,h ), time series exhibit them. Spikes come in all sorts and sizes,
where f (·) is a given VST mostly positive but in some markets also negative [5]–[7]. A
FZ Cumulative distribution function (cdf) of statistically appropriate modeling framework would require a
random variable Z dedicated treatment of these ‘outliers’, either via robust estima-
αZ Tail index of random variable Z tion algorithms [8] or models with explicit spike components
min
Yd−1 Yesterday’s minimum VST-transformed price [9]–[14]. However, robust techniques are not popular in the
max
Yd−1 Yesterday’s maximum VST-transformed price EPF literature and most studies utilize Ordinary Least Squares
βh,i ith coefficient of the benchmark forecasting (OLS) methods, while non-linear models generally do not
model for hour h, defined in Eqn. (1) outperform linear ones in short-term EPF [1], [15].
Di Weekday dummy for day of the week i As a working remedy some authors suggest filtering elec-
εd,h Noise term for day d and hour h tricity prices with a ‘reasonable’ procedure for outlier detec-
F̂Z Estimate or distributional forecast of FZ tion, then calibrating the model to spike-filtered data, with the
εX,d
b Vector of 24 hourly out-of-sample errors for spikes replaced by more ‘normal’ values [16]–[19]. Others
day d of VST X, i.e., (b εX,d,1 , . . . , εbX,d,24 )0 advocate transforming the original data, running a model
on transformed prices, then applying the inverse transforma-
The study was partially supported by the National Science Center (NCN, tion to obtain forecasts [20], [21]. For markets with only
Poland) through grant no. 2015/17/B/HS4/00334 (to BU and RW).
BU is with the Faculty of Pure and Applied Mathematics and RW is with positive prices, the logarithm is the single most commonly
the Department of Operations Research, Wrocław University of Science and used transform to reduce spike severity and consequently
Technology, 50-370 Wrocław, Poland. FZ is with the Faculty of Business stabilize the variance [1]. However, with the increased market
Administration and Economics, Universität Duisburg-Essen, D-45141 Essen,
Germany. E-mails: uniejewskibartosz@gmail.com, rafal.weron@pwr.edu.pl, penetration of renewable energy, the price series recorded
Florian.Ziel@uni-due.de. nowadays quite often include very close to zero or negative
2

values. Obviously, the log-transform is not a feasible option and, hence, we restrict ourselves to monotonically increasing
then. Somewhat surprisingly, not too many viable alternatives VSTs. Naturally, we evaluate the accuracy of the considered
have been considered in the EPF literature so far. methods using P̂d,h , not the forecasts of the transformed series.
With this study we want to fill the gap and conduct a The general idea behind a VST is to reduce the variation
comprehensive forecasting study involving datasets from 12 of price data, so that the variation of Yd,h is smaller than
power markets and evaluate 16 variance stabilizing transfor- that of Pd,h . Lower variation and/or less spiky behavior of
mations (VSTs), ranging from simple threshold-type cutoffs, the input data usually allows the forecasting model to yield
via generalized Box-Cox type transforms to the probability more accurate predictions [19]. The latter behavior is closely
integral transform (PIT) based approaches. related to the concept of moment existence and the tail index.
The paper is structured as follows. In Section II we briefly Formally, tail index αZ of random variable Z is defined
describe the datasets, then in Section III define the notation, the via limt→0 1−F Z (tx)
1−FZ (t) = x
−αZ
, where FZ is the cumulative
basic forecasting model and discuss the 16 VSTs considered distribution function (cdf) of Z. For us it is important to
in this study. In Section IV we evaluate their performance know that all moments of order a < αZ exist. In general, the
across the 12 datasets. We also test for statistically significant larger the tail index the better for many estimation, training or
differences in their forecasting performance using two variants forecasting algorithms. For instance, the OLS works properly
of the Diebold-Mariano [22] test, thus provide robust guide- only if the tail index is larger than 2, i.e., when the input
lines to preprocessing electricity spot prices prior to fitting data has a finite variance [25]. If the VST is well chosen,
time series or computational intelligence models. Finally, in αYd,h will be larger than αPd,h . It can also happen that even
Section V we wrap up the results and conclude. though αPd,h < 2, i.e, the variance of Pd,h is infinite, for the
transformed prices αYd,h > 2 and their variance is finite. This
II. DATASETS also motivates the name variance stabilizing transformation.
We consider a total of twelve electricity spot price datasets,
see Table I. Eleven originate from six major European A. Two normalization schemes
power markets, including the European Power Exchange Moreover, most transformations we consider work on ‘nor-
(EPEX SPOT) for power trading in Germany, France, Austria, malized’ prices: pd,h = 1b (Pd,h − a), where a is the shift and
Switzerland and Luxembourg, the Nordic power exchange b is the scale. Later in the text we report results for two sets
Nord Pool and the Iberian OMIE (Spain and Portugal). All of ‘normalizing’ parameters:
eleven include day-ahead prices (quoted in EUR/MWh) at
1) set1 : (a, b) = (median, MAD), i.e., a is the median
hourly resolution and cover a six year period from 30 Jul
of Pd,h in the 730-day calibration sample and b is the
2010 to 28 Jul 2016. The 12th dataset comes from the
sample median absolute deviation (MAD) around the
price track of the Global Energy Forecasting Competition
sample median adjusted by a factor for asymptotically
2014 (GEFCom2014) and includes locational marginal prices
normal consistency to the standard deviation. This factor
(LMPs, i.e., zonal prices; quoted in USD/MWh) at an hourly 1
is z0.75 ≈ 1.4826 where z0.75 is the 75% quantile of the
resolution from 1 Jan 2011 to 17 Dec 2013 [15]. The exact
normal distribution.
origin of the data has never been revealed by the organizers
2) set2 : (a, b) = (mean, std), i.e., a is the mean and b is
but – given its features – it quite likely comes from one of the
the standard deviation of Pd,h in the 730-day calibration
U.S. markets. Note, that like Uniejewski et al. [23], we use
sample.
the terms spot and day-ahead interchangeably, which is line
with the majority of literature on European electricity markets. The (mean, std) normalization is more common in statistics,
However, in the U.S., the ‘spot’ is rather used to refer to the but the more robust to outliers (median, MAD) normalization
real-time market, while the day-ahead market is usually called is preferred when working with spiky electricity prices. Note
the forward market [1], [24]. also, that the inverse transformations listed below require
Furthermore, because of the clock changes to and from the inverting the normalization, i.e., Pd,h = b · pd,h + a, but for
daylight saving time, we have to do minor adjustments to the notational simplicity we leave them in terms of pd,h .
data to obtain well defined price processes. For the European
data we interpolate the missing hour in March and average the B. The benchmark forecasting model
doubled hour in October. The GEFCom2014 data was released
Our choice of the benchmark forecasting model used in
clock-change adjusted [15].
this study is guided by three factors. Firstly, the existing
literature on short-term EPF which has generally favored the
III. T HE MODELING SETUP multivariate framework, with prices for each hour of the day
First, let us fix the notation. We denote by Pd,h the modeled independently by 24 parsimonious models rather than
electricity price in the day-ahead market for day d and hour jointly by one large model [1], [26]. Secondly, the desire to
h and by Yd,h the transformed data, i.e., Yd,h = f (Pd,h ), perform a comprehensive study of the influence of VSTs on the
where f (·) is a given variance stabilizing transformation forecasting performance using a state-of-the-art structure, that
(VST). After computing the forecasts, we apply the inverse builds on the results of the most recent EPF studies on variable
transformation to obtain the electricity spot price forecasts, i.e., selection [21], [23], not of the optimal model structure itself.
P̂d,h = f −1 (Ŷd,h ). Note, that within this setup f −1 must exist Thirdly, computational efficiency required to produce forecasts
3

TABLE I
T HE TWELVE CONSIDERED ELECTRICITY SPOT PRICE SERIES AND THEIR DESCRIPTIVE STATISTICS . F OR THE E UROPEAN MARKETS THE PRICES ARE IN
EUR/MW H , FOR GEFC OM 2014 IN USD/MW H .

Electricity market & region Acronym Source Mean Std Median MAD Min Max
6-year period (30 Jul 2010 to 28 Jul 2016, 2189×24 hourly observations)
BELPEX, Belgium BELPEX.BE belpex.be 44.49 22.15 44.94 19.54 −200.00 2999.00
EPEX, Switzerland EPEX.CH epexspot.com 44.60 17.68 43.98 20.91 −45.68 300.04
EPEX, Germany & Austria EPEX.DE+AT epexspot.com 38.47 16.71 37.31 19.01 −221.99 210.00
EPEX, France EPEX.FR epexspot.com 41.61 22.04 41.50 20.62 −200.00 1938.50
EXAA, Germany & Austria EXAA.DE+AT exaa.at 38.70 15.65 37.37 18.56 −50.92 175.74
Nord Pool, West Denmark NP.DK1 nordpoolspot.com 35.31 24.23 33.51 17.93 −200.00 2000.00
Nord Pool, East Denmark NP.DK2 nordpoolspot.com 37.23 21.53 34.65 18.96 −200.00 2000.00
Nord Pool, System price NP.SYS nordpoolspot.com 34.07 15.12 31.95 16.26 1.14 224.97
OMIE, Spain OMIE.ES omie.es 45.23 15.72 47.94 17.56 0.00 112.00
OMIE, Portugal OMIE.PT omie.es 45.11 15.97 47.96 17.81 0.00 145.00
OTE, Czechia OTE.CZ ote-cr.cz 38.42 16.16 37.38 18.74 −150.00 170.00
3-year period (1 Jan 2011 to 17 Dec 2013, 1082×24 hourly observations)
GEFCom2014 competition GEFCom2014 Ref. [15] 48.19 26.18 42.87 23.97 12.52 363.80

for many VSTs and many datasets within a rolling calibration Input Hidden Ouput
window scheme, which favors regression over neural network layer layer layer
setups. Taking all three factors into account, we have decided 1
to use the expertDoW,nl model of Ziel and Weron [21], which Yd−1,h
is a parsimonious autoregressive structure estimated using
OLS, that outperformed not only 15 other expert1 models, Yd−2,h
but also much larger multi- and univariate autoregressive
Yd−7,h
specifications. In this model, the VST-transformed (or original
min
if no transformation is used) day-ahead electricity price for day Yd−1
Yd,h
d and hour h is given by: max
Yd−1
Yd,h = βh,1 + βh,2 Yd−1,h + βh,3 Yd−2,h + βh,4 Yd−7,h
| {z } Yd−1,24
autoregressive terms
min max
+ βh,5 Yd−1 + βh,6 Yd−1 + βh,7 Yd−1,24 D1
| {z } | {z }
end-of-day effect
(1)
non-linear effects
X7
+ βh,j+7 Dj +εd,h ,
j=1 D7
| {z }
weekday dummies
Fig. 1. Visualization of the NNET
where Yd−1,h , Yd−2,h and Yd−7,h are prices for the same hour
min max
yesterday, two days ago and a week ago, Yd−1 and Yd−1
are yesterday’s minimum and maximum prices (they provide
a non-linear link between yesterday’s and today’s prices), be used in such a context. Some have been already utilized in
Yd−1,24 is the price for midnight (the last known price), EPF, others are new. The first seven are illustrated in Fig. 2 and
D1 , ..., D7 are weekday dummies (D1 = 1 for Monday and 0 come in two variants denoted by a subscript, with 1 standing
otherwise, D2 = 1 for Tuesday and 0 otherwise, etc.) and εd,h for set1 and 2 for set2 normalization (see Section III-A). The
is the noise term (uncorrelated and with finite variance). Note, last class builds on the probability integral transform (PIT)
that we have also considered several other expert models from and does not require normalization. Of course, this selection of
[21], [23], but they all yielded qualitatively the same results VSTs does not exploit all possibilities, in particular time series
while being outperformed by the expertDoW,nl model. filters (like exponential smoothing, wavelets, the Hodrick-
Prescott filter [1], [19]) can be also used to stabilize the
variance. However, in this paper we are after ‘simple’ data
C. Variance stabilizing transformations (VSTs)
transformations that work with one observation at a time.
The logarithmic transform is by far the most popular ap- 3σ and 3σlog transformations. Some of the earliest sug-
proach to reducing spike severity and stabilizing the variance gested solutions to reducing the ‘outlier effect’ of electricity
[25]. However, for datasets with very close to zero (like price spikes include limiting their severity by setting an upper
OMIE.ES and OMIE.PT) or negative electricity prices (like limit on prices (known as clipping in signal processing [28]
EPEX.DE+AT) the standard log-transform is not feasible. In and winsorizing in statistics [25]) or damping all observations
this Section we discuss eight types of transformations that can above a certain threshold using the logarithmic function [9],
1 Since such models are built on some prior knowledge of experts, following [16]. In this study we consider both variants with the threshold
Uniejewski et al. [23] and Ziel [27], we refer to them as expert models. set to the commonly used level of k = 3 standard deviations
4

5
original [0.001, 0.999], i.e., replace all values outside the interval by
4 3σ the lower or upper bound, before applying Eqn. (6).
3σlog
3 logistic
Asinh transformation. The third class of transformations
asinh is built around the area hyperbolic sine, i.e., the inverse of
2
boxcox(0.5) the hyperbolic sine. Interestingly, this transformation has been
1
poly(0.5,1)
mlog(1) already used in the context of electricity prices by Schneider
f(x)

0
[20], but the article went unnoticed. Recently, Ziel and Weron
−1 [21] utilized it in an extensive empirical study on multi-
−2 and univariate EPF models, motivated by the transformation’s
−3
ability to preserve unimodality of the sample density. In this
paper we denote it by asinh and define as:
−4
 q 
−5
Yd,h = asinh(pd,h ) ≡ log pd,h + p2d,h + 1 , (7)
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
x
with inverse pd,h = sinh(Yd,h ). In Figure 2 we can clearly see
Fig. 2. Visualization of seven out of eight classes of transformations
the damping behavior of the logarithm in Eqn. (7), while in
considered in this study. All are shifted and scaled so that f (0) = 0 and Fig. 3 the transformation’s performance for a sample dataset.
the slope at x = 0 is 45◦ to emphasize the differences between them. The Boxcox transformation. Recall, that the log-transform is
effect of applying the 8th class is depicted in Fig. 3.
a special case of the so-called Box-Cox transform, a very
popular VST in time series analysis [30]. Like the logarithm,
of the (transformed) price series in the calibration window the standard Box-Cox transform is not defined for non-positive
[19]; we have also tested variants with k = 2 and 4, but they values. However, in this study we consider a robust (to zeros
performed worse. The ‘clipping’ variant is denoted by 3σ and and negative values) variant [31], denoted by boxcox(λ) and
given by: defined as:
( (|p |+1)λ −1
d,h
for λ > 0,
(
3 sgn(pd,h ) for |pd,h | > 3, Yd,h = sgn(pd,h ) λ
(8)
Yd,h = (2) log(|pd,h | + 1) for λ = 0,
pd,h for |pd,h | ≤ 3,

with ‘inverse’ pd,h = Yd,h , where sgn(x) is the sign of x. The with inverse:
1
‘log-damping’ variant is denoted by 3σlog and given by:
(
(λ|Yd,h | + 1) λ − 1 for λ > 0,
( pd,h = sgn(Yd,h ) |Yd,h |
(9)
sgn(pd,h ){log(|pd,h |−2)+3} for |pd,h |>3, e −1 for λ = 0.
Yd,h = (3)
pd,h for |pd,h |≤3,
Obviously, the robust (to zeros and negative values) variant of
with inverse: the log-transform is obtained for λ = 0. However, here we use
( λ = 0.5, which was selected based on a limited optimization
sgn(Yd,h ) e|Yd,h |−3 + 2

for |Yd,h | > 3, study. With this choice of λ, the boxcox(λ) transformation
pd,h = (4)
Yd,h for |Yd,h | ≤ 3. exhibits a polynomial damping effect, see Fig. 2.
Poly and mlog transformations. Motivated by the robust
In Figure 2 we can clearly see that 3σ is not differentiable
Box-Cox transform, we introduce two new transforms – the
and bounded at x = ±3, whereas 3σlog is a smooth function
polynomial and the mirror-log. We denote the former by
with 3σlog(x) → ±∞ as x → ±∞ and a log-damping effect.
poly(λ, c) and define as:
Logistic transformation. The next transformation class we  
consider is given by the logistic function that is a sigmoid  1 λ
c λ−1
 λ
c λ−1
Yd,h = sgn(pd,h ) |pd,h | + λ − λ , (10)
curve and has many applications in data analytics. For in-
stance, it is used as a link function in generalized linear models
(GLM) and is a very popular choice for the activation function with inverse:
 1 
in neural nets [29]. However, to our best knowledge, it has c
λ λ
 λ−1 c
1
 λ−1
never been applied as a VST in electricity price forecasting. pd,h = sgn(Yd,h ) |Yd,h | + λ − λ . (11)
We denote this transformation by logistic and define as:
The poly(λ, c) transform is a two parameter family. Here we
Yd,h = (1 + e−pd,h )−1 . (5) use λ = 0.125 and c = 0.05, which was selected based on a
limited optimization study.
Its inverse is the so-called logit function: The mirror-log is a straightforward generalization of the log-
 
Yd,h transform with a mirror image of the logarithm for negative
pd,h = log . (6) values. More precisely, with the logarithm flipped with respect
1 − Yd,h
to the origin from the first to the third quadrant, see Fig. 2.
As visualized in Fig. 2, the logistic transformation is bounded, We denote it by mlog(c) and define as:
like 3σ, but is a smooth function. Note, that to avoid nu-
Yd,h = sgn(pd,h ) log |pd,h | + 1c + log(c) ,
  
merical issues we clip the forecasts of Yd,h to the interval (12)
5

EPEX.DE+AT

200

0.020
100

Density
untransformed

0.010
0
3 −200 −100

0.000
−200 −100 0 100 200
2

0.4
asinh transformed

Density
0

0.2
−1
−2

0.0
−3

−3 −2 −1 0 1 2 3

0.4
4

0.3
N−PIT transformed

Density
0.2
0

0.1
−2

0.0
−4

−4 −2 0 2 4
2011 2012 2013 2014 2015 2016

Fig. 3. Time series plot of the original (in EUR/MWh; top), asinh2 -transformed (middle) and N-PIT-transformed EPEX.DE+AT spot prices. The marginal
densities are depicted in the right panels.

with inverse pd,h = sgn(Yd,h ) e|Yd,h |−log c − 1c . The mlog(c)


 
with G being either the standard normal or Student-t cdf.
transform is a one parameter family. We use c = 13 , which The N-PIT transformation, misleadingly called the Nataf
again was selected based on a limited optimization study. transformation2 , has been used in [34] to normalize Spanish
Similarly as the standard log-transform and asinh, the mirror- electricity prices. The t-PIT transformation, on the other hand,
log exhibits a log-damping effect, see Fig. 2. Both the mlog has not been used in EPF before.
and poly are constructed so that they have a slope of c at the Diaz and Planas [34] argued that a price cluster of zeros was
origin. Hence, mlog(1) is the same as boxcox(0). the reason why N-PIT(Pd,h ) was not normally distributed. As
N-PIT and t-PIT transformations. The last class we a remedy they introduced a ‘modified’ N-PIT transformation
consider here is based on the so-called probability integral that corrected for the observed zero prices and allowed them
transform: PIT(Z) = FZ (Z), where FZ is the cdf of Z. The to obtain a normal cdf of the transformed prices. However,
origin of the PIT is not known, but as Gneiting et al. [32] they have not tested whether the modified transformation led
argue, it can be traced back at least to the works of Karl to more accurate forecasts. To address this issue, we have
Pearson in the 1930s. In the empirical context we usually do conducted a limited empirical study in which we applied
not know the true distribution of Z, hence the PIT is rather their approach. First, we identified all price value clusters
defined as: PIT(Z) = F̂Z (Z), where F̂Z is an estimate (e.g., of size exceeding a given threshold – 0.5% or 1% of all
empirical cdf) or a distributional forecast of FZ . If the latter observations in the calibration sample; if present, the largest
is perfect, i.e., F̂Z = FZ , then PIT(Z) is an independent and identified cluster was typically at Pd,h = 0 EUR/MWh. Then
uniformly distributed variable. This property can be used to applied the ‘modified’ N-PIT transformation and compared
evaluate distributional forecasts [32]. the forecasting results with those obtained from applying
In our context, however, the following transformation will Eqn. (13). For most datasets the results were identical up
be more useful than the PIT itself: to a few decimal places. Only for EXAA.DE+AT, OMIE.ES
  and OMIE.PT there were some visible differences. For the
Yd,h = G−1 (PIT(Pd,h )) = G−1 F̂Pd,h (Pd,h ) , (13)
Spanish market the gain in MAE from using the much more
where G−1 is the inverse of some continuous distribution. complex to implement ‘modified’ transformation was ca. 0.1%
Note, that we apply the PIT to original prices, Pd,h , as this across the whole test period (and averaged for three different
transformation does not require normalization of the inputs. expert models). But to our surprise, for the other two markets
We consider two variants: (i) normal or N-PIT, with G−1 the ‘modified’ transformation performed either the same (for
being the inverse of the standard normal cdf, and (ii) Student-t the 1% threshold) or worse by ca. 0.05-0.1% (for the 0.5%
or t-PIT, with G−1 being the inverse of the standard Student- threshold).
t distribution with ν = 8 degrees of freedom (we have also Since the normal distribution has exponential tails and
tried other ν’s, but lower values yielded too heavy tails, while the Student-t has polynomial, the N-PIT has an exponential
larger a behavior too similar to that of N-PIT). The inverse 2 The Nataf transformation is a two stage procedure, popularized in re-
of Eqn. (13) is given by: liability engineering by [33]. First, correlated multivariate data is N-PIT-
transformed independently in each dimension, then the correlation is elimi-
Pd,h = F̂P−1
d,h
(G(Yd,h )) , (14) nated by applying the Cholesky factorization.
6

spike damping behavior and t-PIT a polynomial one. Hence, where ERRbest VST,j = min1≤i≤17 ERRi,j and ERR can be
extreme price spikes remain larger for t-PIT-transformed data the MAE, the RMSE or any other error measure for point
than for N-PIT-transformed. In Figure 3 we visualize the forecasts. The m.p.d.f.b. measure is reported for the original
effect of applying the N-PIT compared to that of original (i.e., data and 16 considered VSTs in the last two columns of Table
untransformed) and asinh2 -transformed data. We clearly see, II.
that the N-PIT-transformed data histogram looks pretty much From Table II we can see the dominance of the N-PIT over
like a standard normal density. In contrast, asinh2 preserves the competitors in terms of MAE. It has the lowest m.p.d.f.b.,
the original shape of the histogram of the untransformed data, nearly twice smaller than the next best transform, i.e., the
but damps the price spikes towards the center. mlog1 . Also for four datasets (EPEX.DE+AT, EXAA.DE+AT,
NP.DK2 and OTE.CZ) it is the best performer and second
IV. E MPIRICAL STUDY best for another two (BELPEX.BE and NP.DK1). However,
Like Ziel and Weron [21], we use a 730-day (ca. two- for some markets (NP.SYS, OMIE.ES and GEFCom2014) it
year) rolling calibration window testing scheme. First, all does not excel. The t-PIT transformation is the best for three
considered models are estimated using data from the initial markets (OMIE.ES, OMIE.PT and GEFCom2014), but the
calibration period (i.e., from 31 Jul 2010 to 30 Jul 2012 for overall performance does not seem to be robust and the t-
the European datasets and from 1 Jan 2011 to 30 Dec 2012 for PIT forecasting accuracy is poor for some markets (especially
GEFCom2014) and forecasts for all 24 hours of 31 Jul 2012 EPEX.CH and NP.SYS). The 3σ, 3σlog and asinh trans-
(respectively, 31 Dec 2012) are determined. Then the window formations show moderate MAE forecasting performance.
is rolled forward by one day, the models are reestimated and Still, their m.p.d.f.b. is better than that of the original (i.e,
forecasts for all 24 hours of the next day are computed. This untransformed) data and sometimes even that of the best
procedure is repeated until the predictions for the 24 hours of transformation for a given market (3σ 1 for EPEX.CH, 3σ 2
28 Jul 2016 (respectively, 17 Dec 2013) are computed. Note, for EPEX.FR and asinh1 for NP.DK1).
that for the European datasets the out-of-sample test period In terms of RMSE, both N-PIT and t-PIT are on average
covers roughly four years (1459 days) of data and for the outperformed by the generalizations of the Box-Cox trans-
GEFCom2014 dataset with only 352 days. form, i.e., poly and mlog, which show in general a very
similar behavior across all 12 datasets. The boxcox and PIT
A. MAE, RMSE and m.p.d.f.b error measures transformations follow closely, while the remaining ones lag
As the main evaluation criterion we consider the Mean behind. Interestingly, the t-PIT has a better RMSE forecasting
Absolute Error (MAE) for the full out-of-sample test period accuracy than the N-PIT. Furthermore, we see that the logistic
of D = 1459 days (for GEFCom2014 only 352 days). It is transformations are the only ones which are worse in terms of
computed for each VST and dataset as: m.p.d.f.b. (both MAE and RMSE-based) than the original (i.e.,
untransformed) data. Still, somewhat surprisingly logistic2 is
1 XD X24
MAE = |b
εd,h |, (15) the best choice in terms of MAE for the Belgian market,
24D d=1 h=1
which nicely illustrates that the overall behavior can depend
where εbd,h denotes the estimated forecasting error for day d on the considered market and data structure. Finally, we
and hour h. The MAE errors are reported for the 16 considered want to remark that there is no clear tendency if a scaling
VSTs and all 12 datasets in Table II. We have also analyzed by set1 (median, MAD) or set2 (mean, std) is preferable,
Root Mean Square Errors (RMSE): although for the Box-Cox type transformations (boxcox, poly
r
1 XD X24 2 and mlog) set1 leads to marginally better predictions in terms
RMSE = εb , (16) of m.p.d.f.b., see Table II.
24D d=1 h=1 d,h
To better understand the forecasting performance of the
but the results were similar and hence are not reported nor
VSTs we consider an evaluation conducted separately for each
analyzed here due to space limitations (but are available from
of three price regimes defined using the 3σ-rule:
the authors upon request). Only in Table II we provide for
• negative spike Pd,h < µ − 3σ,
comparison an aggregate measure of fit based on the RMSE
• normal range: µ − 3σ ≤ Pd,h ≤ µ + 3σ,
– the m.p.d.f.b. – as defined in (17) below. Although there
• positive spike: µ + 3σ < Pd,h ,
are some changes in the ranking, the overall picture is very
similar. where µ is the sample mean and σ is the sample standard
Given the large number of results it is hard to rank the deviation of Pd,h in the whole considered period, see the 4th
VSTs. To tackle this issue, following [21], we introduce the and 5th columns in Table I. Note, that across all 12 datasets
mean percentage deviation from the best (m.p.d.f.b.) VST, only 0.08% of prices fall into the negative spike regime,
which is inspired by the m.d.f.b. measure used in [9], [35] for 99.63% into the normal range and 0.29% into the positive
comparing models. The m.p.d.f.b. measure for VST i indicates spike regime.
how similar is this VST’s performance to the ‘optimal VST’ Table III contains the aggregate m.p.d.f.b. error measures,
composed of the best performing VST for each of the 12 see Eqn. (17), for the MAE and the RMSE for the three price
datasets: regimes and the 16 VSTs. We can observe that the forecasting
12 performance varies between the three price regimes, but the
1 X |ERRi,j − ERRbest VST,j | results based on the MAE and the RMSE are qualitatively
m.p.d.f.b.i = × 100%, (17)
12 j=1 ERRbest VST,j similar. The m.p.d.f.b. error measures for the N-PIT are
7

TABLE II
M EAN ABSOLUTE ERRORS (MAE) ACROSS THE WHOLE TEST PERIOD FOR THE 12 MARKETS ( IN COLUMNS ) AND THE 16 VARIANCE STABILIZING
TRANSFORMATIONS (VST S , SEE S ECTION III-C FOR DETAILS ; IN ROWS ). A HEAT MAP IS USED TO INDICATE BETTER (→ GREEN ) AND WORSE (→ RED )
PERFORMING VST S . W E ALSO REPORT THE AGGREGATE M . P. D . F. B . ERROR MEASURE , SEE E QN . (17), FOR THE MAE AND THE RMSE. T HE SUBSCRIPTS
IN THE VST NAMES REFER TO THE TWO NORMALIZATION SCHEMES : set1 : ( MEDIAN , MAD) AND set2 : ( MEAN , STD ), SEE S ECTION III-A. [RW: U PPER
FOR OLS, LOWER FOR ANN]

EXAA.DE+AT
EPEX.DE+AT

GEFCom2014
BELPEX.BE

EPEX.CH

OMIE.PT
EPEX.FR

OMIE.ES

OTE.CZ
NP.DK1

NP.DK2

NP.SYS
VST m.p.d.f.b.
MAE RMSE
original 6.379 4.019 5.370 5.296 4.282 6.200 5.186 1.890 5.825 5.999 4.670 7.472 5.22% 9.46%
3s 1 6.069 3.991 5.180 4.866 4.249 5.269 4.948 1.834 6.157 6.326 4.592 8.920 3.98% 6.80%
3s 2 6.088 3.989 5.199 4.880 4.257 5.379 5.005 1.830 5.858 6.013 4.599 7.720 2.04% 3.04%
3slog 1 6.114 3.993 5.197 4.882 4.270 5.331 5.004 1.829 6.043 6.325 4.605 7.591 2.59% 3.78%
3slog 2 6.137 3.993 5.221 4.905 4.273 5.434 5.055 1.838 5.865 6.055 4.611 7.517 2.29% 1.94%
logistic 1 6.138 4.133 5.369 5.149 4.486 5.296 4.971 1.980 6.817 6.918 4.740 8.042 7.38% 14.29%
logistic 2 6.047 4.126 5.250 4.946 4.422 5.303 4.928 1.908 6.578 6.721 4.639 7.921 5.24% 10.81%
asinh 1 6.064 4.074 5.226 4.999 4.289 5.197 4.884 1.808 6.134 6.290 4.601 7.022 1.85% 2.73%
asinh 2 6.057 4.080 5.207 4.949 4.288 5.317 4.920 1.803 6.018 6.168 4.590 7.125 1.73% 1.89%
boxcox 1 6.140 4.032 5.231 4.958 4.244 5.372 4.946 1.802 5.845 5.987 4.589 7.074 1.28% 1.08%
boxcox 2 6.153 4.035 5.242 4.967 4.255 5.523 4.988 1.808 5.842 5.989 4.596 7.152 1.80% 1.62%
poly 1 6.113 4.016 5.211 4.923 4.234 5.284 4.939 1.798 5.854 6.001 4.579 7.066 0.93% 0.60%
poly 2 6.134 4.018 5.226 4.933 4.245 5.457 4.987 1.807 5.841 5.993 4.587 7.173 1.54% 0.99%
mlog 1 6.098 4.020 5.206 4.924 4.235 5.257 4.928 1.796 5.870 6.016 4.577 7.049 0.86% 0.60%
mlog 2 6.118 4.023 5.220 4.928 4.246 5.419 4.976 1.803 5.850 6.001 4.584 7.158 1.42% 0.86%
N-PIT 6.054 4.004 5.162 4.890 4.188 5.205 4.847 1.826 5.854 5.993 4.553 7.129 0.45% 1.47%
t-PIT 6.154 4.089 5.197 4.961 4.264 5.264 4.901 1.868 5.824 5.984 4.611 6.991 1.37% 1.13%
EXAA.DE+AT
EPEX.DE+AT

GEFCom2014
BELPEX.BE

EPEX.CH

OMIE.PT
EPEX.FR

OMIE.ES

OTE.CZ
NP.DK1

NP.DK2

NP.SYS

VST m.p.d.f.b.
MAE RMSE
original 6.385 4.027 5.447 5.044 4.374 5.716 5.191 1.830 6.042 6.198 4.698 8.144 3.50% 3.42%
3s 1 6.203 3.982 5.305 4.970 4.334 5.314 4.984 1.817 6.383 6.559 4.689 9.320 3.97% 5.31%
3s 2 6.252 3.989 5.329 5.029 4.339 5.444 5.036 1.794 6.065 6.231 4.708 8.287 2.36% 2.80%
3slog 1 6.250 4.012 5.347 4.992 4.350 5.353 5.036 1.803 6.073 6.211 4.710 7.974 1.92% 2.09%
3slog 2 6.288 4.011 5.372 5.013 4.357 5.474 5.101 1.818 6.061 6.231 4.709 8.047 2.52% 2.31%
logistic 1 6.205 4.046 5.382 4.998 4.372 5.296 4.972 1.830 6.346 6.502 4.735 8.346 3.21% 5.13%
logistic 2 6.205 4.022 5.338 4.999 4.346 5.463 5.001 1.810 6.287 6.441 4.715 8.322 3.04% 4.30%
asinh 1 6.192 4.011 5.331 4.974 4.310 5.253 4.936 1.775 6.092 6.209 4.672 7.563 0.74% 1.08%
asinh 2 6.189 4.012 5.343 4.956 4.313 5.384 4.989 1.781 6.086 6.214 4.659 7.808 1.31% 1.45%
boxcox 1 6.256 4.008 5.382 5.006 4.320 5.360 5.042 1.785 6.007 6.184 4.697 7.628 1.34% 1.37%
boxcox 2 6.268 4.021 5.400 5.007 4.335 5.526 5.077 1.802 6.033 6.211 4.691 7.783 2.08% 1.98%
poly 1 6.261 4.002 5.355 4.996 4.314 5.341 5.007 1.785 6.019 6.165 4.690 7.665 1.19% 1.37%
poly 2 6.259 4.018 5.378 5.000 4.331 5.485 5.080 1.797 6.035 6.192 4.691 7.799 1.91% 1.68%
mlog 1 6.227 4.013 5.368 4.990 4.332 5.322 4.998 1.784 6.037 6.158 4.693 7.701 1.22% 1.41%
mlog 2 6.240 4.007 5.384 5.009 4.324 5.475 5.062 1.796 6.055 6.171 4.681 7.838 1.86% 1.85%
N-PIT 6.219 4.003 5.308 4.994 4.332 5.264 4.910 1.789 5.989 6.110 4.680 7.401 0.39% 0.36%
t-PIT 6.292 4.071 5.355 5.042 4.395 5.317 4.958 1.834 5.952 6.106 4.721 7.384 1.28% 1.22%

particularly interesting – the MAE-based is only 0.24% for decrease the performance for the extreme prices. Still, a VST
the normal price range, clearly lower than the overall value can help to improve the forecasting accuracy in a spike regime
of 0.45% reported in Table II. The performance in the as well, e.g., logistic1 has clearly the best performance of all
negative and positive spike regimes is quite poor, e.g., for considered VSTs in the negative spike regime. Finally, note
the MAE respectively 42.35% and 14.28%. Thus, all the that the two introduced here transformations – mlog and poly
improvement from using the N-PIT comes from the excellent – yield quite robust results across all three regimes. This,
forecasting accuracy in the normal price range. In contrast, together with their good performance overall (especially of
the untransformed (or original) prices lead to a very a low mlog1 and poly1 ; as reported in Table II), allows as to suggest
accuracy in the normal price range, but the best forecasting them as universal VSTs, slightly worse than N-PIT in the
performance in the positive spike regime – a m.p.d.f.b. of normal price range, but more robust to outliers.
less than 1%, both for the MAE and the RMSE. This shows
that the variance stabilization of the VSTs helps mainly in
the most important (common) normal price range, but may
8

B. Diebold-Mariano tests
The MAE values analyzed in Section IV-A can be used
to provide a ranking of transformations, but not statistically
significant conclusions on the outperformance of the forecasts
of one transformation by those of another. Therefore, we also
TABLE III computed the Diebold-Mariano (DM) test [22], which takes
AGGREGATE M . P. D . F. B . ERROR MEASURES , SEE E QN . (17), FOR THE MAE the correlation structure into account. It tests forecasts of each
AND THE RMSE FOR THREE PRICE REGIMES – ‘ NEGATIVE SPIKES ’,
pair of transformations against each other.
‘ NORMAL RANGE ’ AND ‘ POSITIVE SPIKES ’ ( IN COLUMNS ) – FOR THE 16
VST S DEFINED IN S ECTION III-C ( IN ROWS ). T HE SUBSCRIPTS IN THE In the EPF literature, the DM test is usually performed
VST NAMES REFER TO THE TWO NORMALIZATION SCHEMES : set1 : separately for each of the 24 hours of the day [1]. However,
(MEDIAN, MAD) AND set2 : (MEAN, STD), SEE S ECTION III-A. A HEAT Ziel and Weron [21] recently introduced a different approach,
MAP IS USED TO INDICATE BETTER (→ GREEN ) AND WORSE (→ RED )
PERFORMING VST S . [RW: U PPER FOR OLS, LOWER FOR ANN] where only one statistic for each pair of models (here: VSTs)
is computed based on the 24-dimensional vector of errors for
m.p.d.f.b. each day, and called it the multivariate or vectorized DM
MAE RMSE
test. Following [21], denote by b εX,d,1 , . . . , εbX,d,24 )0
εX,d = (b
VST 0
Negative Normal Positive Negative Normal Positive and bεY,d = (bεY,d,1 , . . . , εbY,d,24 ) the vectors of out-of-sample
spike range spike spike range spike errors for day d of VSTs X and Y , respectively. Then the
original 37.10% 5.84% 0.81% 28.55% 37.91% 0.37% multivariate loss differential series:
3s 1 47.04% 1.64% 32.32% 35.81% 1.55% 19.94%
3s 2
∆X,Y,d = kb
εX,d kp − kb
εY,d kp , (18)
44.91% 1.22% 13.57% 34.59% 0.52% 9.83%
3slog 1 38.75% 2.30% 8.66% 29.17% 3.59% 6.24% defines the differences
P24 of errors in the k · kp -norm, i.e.,
3slog 2 39.27% 2.31% 4.59% 29.29% 2.26% 4.26% kbεX,d kp = ( h=1 |b εX,d,h |p )1/p for p = 1, 2. For each
logistic 1 9.50% 6.37% 32.91% 6.76% 13.78% 21.43%
model pair and each dataset we compute the p-value of
logistic 2 18.78% 4.44% 23.86% 12.65% 10.71% 15.68%
asinh 1
two one-sided DM tests: (i) a test with the null hypothesis
24.10% 1.89% 13.01% 17.95% 3.47% 7.93%
asinh 2 24.96% 1.86% 10.65% 18.83% 2.92% 6.69% H0 : E(∆X,Y,d ) ≤ 0, i.e., the outperformance of the forecasts
boxcox 1 32.15% 1.62% 4.59% 24.87% 5.47% 2.82% of Y by those of X, and (ii) the complementary test with the
boxcox 2 32.33% 2.18% 4.17% 24.97% 8.57% 2.63% reverse null H0R : E(∆X,Y,d ) ≥ 0, i.e., the outperformance
poly 1 32.53% 1.21% 5.12% 25.06% 2.34% 3.22% of the forecasts of X by those of Y . As in the standard DM
poly 2 32.79% 1.87% 4.38% 25.22% 4.82% 2.84% test, we assume that the loss differential series is covariance
mlog 1 31.95% 1.10% 5.83% 24.59% 1.98% 3.64% stationary.
mlog 2 32.26% 1.73% 4.92% 24.80% 3.71% 3.18% To jointly evaluate the performance of the VSTs across all
N-PIT 42.35% 0.24% 14.28% 31.93% 1.38% 8.91%
11 European markets we introduce yet another variant of the
t-PIT 45.29% 1.51% 7.09% 33.70% 1.95% 3.97%
DM test, in which the norm is computed not only for all hours
m.p.d.f.b. but all datasets as well. Note, that we exclude GEFCom2014
VST MAE RMSE from this analysis due to a much shorter test period. The
Negative Normal Positive Negative Normal Positive computations are analogous, only this time the multivariate
spike range spike spike range spike loss differential series across all 11 European markets, i.e.,
original 6.79% 3.35% 5.19% 4.78% 4.36% 5.19% M1 = BELPEX.BE, ..., M11 = OTE.CZ, is given by:
3s 1 15.05% 2.01% 16.23% 10.72% 1.45% 16.23%
3s 2 12.32% 1.78% 8.58% 9.51% 1.93% 8.58% ∆X,Y,d,M = kb εX,d kp − kb εY,d kp , (19)
3slog 1 6.38% 1.59% 4.99% 4.96% 1.71% 4.99% P11 P24 p 1/p
where now kb εX,d kp = ( j=1 h=1 |b εX,d,h,Mj | ) and
3slog 2 7.11% 2.71% 1.95% 5.16% 4.36% 1.95%
logistic 1 9.36% 2.22% 13.99% 8.36% 3.37% 13.99%
εX,d = (b
b εX,d,1,M1 , . . . , εbX,d,24,M11 )0 is the vector of out-of-
logistic 2 10.59% 2.95% 10.91% 8.94% 5.80% 10.91% sample errors all hours and all markets on day d.
asinh 1 7.30% 0.63% 6.06% 6.54% 1.31% 6.06% In Figure 4 we plot the results for the multivariate DM-
asinh 2 8.60% 1.34% 4.45% 7.20% 2.83% 4.45% test for each market using the k · k1 -norm, i.e., for p = 1
boxcox 1 7.40% 1.33% 3.63% 4.84% 2.12% 3.63% in Eqn. (18), while in Figure 5 the aggregated DM-test for
boxcox 2 7.21% 2.34% 2.80% 5.25% 4.82% 2.80% both norms, i.e., for p = 1 and 2 in Eqn. (19). In both
poly 1 7.86% 1.11% 3.87% 6.18% 1.91% 3.87% figures we see the corresponding p-values of the conducted
poly 2 6.28% 2.22% 2.83% 5.09% 4.26% 2.83% pairwise comparisons. Green and yellow squares indicate
mlog 1 7.92% 1.16% 3.89% 5.95% 1.98% 3.89%
statistical significance at the 5% level, with the darkest green
mlog 2 7.82% 2.08% 2.28% 6.19% 4.29% 2.28%
N-PIT 5.97% 0.41% 5.43% 4.22% 0.84% 5.43%
corresponding to close to zero p-values. Red squares indicate
t-PIT 4.56% 1.38% 3.37% 3.00% 2.10% 3.37% weak significance with a p-value between 5% and 10%, while
black denoting no significance, i.e., a p-value of 10% or more.
For instance, we see in Fig. 4 for BELPEX.BE that the first
row is dark green, so that every transformation significantly
improved the forecasting accuracy compared to the original
untransformed prices. Similarly for EXAA.DE+AT, the col-
umn which corresponds to the N-PIT is dark green, meaning
9

||· ||1 : over 24h, BELPEX.BE, OLS ||· ||1 : over 24h, EPEX.CH, OLS ||· ||1 : over 24h, EPEX.DE+AT, OLS
0.1 0.1 0.1
original original original
3 σ1 0.09
3 σ1 0.09
3 σ1 0.09
3 σ2 3 σ2 3 σ2
3 σlog 1 0.08 3 σlog 1 0.08 3 σlog 1 0.08
3 σlog 2 3 σlog 2 3 σlog 2
logistic 1 0.07 logistic 1 0.07 logistic 1 0.07

logistic 2 logistic 2 logistic 2


0.06 0.06 0.06
asinh 1 asinh 1 asinh 1
asinh 2 0.05 asinh 2 0.05 asinh 2 0.05
boxcox 1 boxcox 1 boxcox 1
boxcox 2 0.04 boxcox 2 0.04 boxcox 2 0.04
poly 1 poly 1 poly 1
0.03 0.03 0.03
poly 2 poly 2 poly 2
mlog 1 0.02 mlog 1 0.02 mlog 1 0.02
mlog 2 mlog 2 mlog 2
N-PIT 0.01 N-PIT 0.01 N-PIT 0.01
t-PIT t-PIT t-PIT
0 0 0
3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2
N-PIT

N-PIT

N-PIT
t-PIT

t-PIT

t-PIT
original

original

original
||· ||1 : over 24h, EPEX.FR, OLS ||· ||1 : over 24h, EXAA.DE+AT, OLS ||· ||1 : over 24h, NP.DK1, OLS
0.1 0.1 0.1
original original original
3 σ1 0.09
3 σ1 0.09
3 σ1 0.09
3 σ2 3 σ2 3 σ2
3 σlog 1 0.08 3 σlog 1 0.08 3 σlog 1 0.08
3 σlog 2 3 σlog 2 3 σlog 2
logistic 1 0.07 logistic 1 0.07 logistic 1 0.07

logistic 2 logistic 2 logistic 2


0.06 0.06 0.06
asinh 1 asinh 1 asinh 1
asinh 2 0.05 asinh 2 0.05 asinh 2 0.05
boxcox 1 boxcox 1 boxcox 1
boxcox 2 0.04 boxcox 2 0.04 boxcox 2 0.04
poly 1 poly 1 poly 1
0.03 0.03 0.03
poly 2 poly 2 poly 2
mlog 1 0.02 mlog 1 0.02 mlog 1 0.02
mlog 2 mlog 2 mlog 2
N-PIT 0.01 N-PIT 0.01 N-PIT 0.01
t-PIT t-PIT t-PIT
0 0 0
3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2
N-PIT

N-PIT

N-PIT
t-PIT

t-PIT

t-PIT
original

original

original
||· ||1 : over 24h, NP.DK2, OLS ||· ||1 : over 24h, NP.SYS, OLS ||· ||1 : over 24h, OMIE.ES, OLS
0.1 0.1 0.1
original original original
3 σ1 0.09
3 σ1 0.09
3 σ1 0.09
3 σ2 3 σ2 3 σ2
3 σlog 1 0.08 3 σlog 1 0.08 3 σlog 1 0.08
3 σlog 2 3 σlog 2 3 σlog 2
logistic 1 0.07 logistic 1 0.07 logistic 1 0.07

logistic 2 logistic 2 logistic 2


0.06 0.06 0.06
asinh 1 asinh 1 asinh 1
asinh 2 0.05 asinh 2 0.05 asinh 2 0.05
boxcox 1 boxcox 1 boxcox 1
boxcox 2 0.04 boxcox 2 0.04 boxcox 2 0.04
poly 1 poly 1 poly 1
0.03 0.03 0.03
poly 2 poly 2 poly 2
mlog 1 0.02 mlog 1 0.02 mlog 1 0.02
mlog 2 mlog 2 mlog 2
N-PIT 0.01 N-PIT 0.01 N-PIT 0.01
t-PIT t-PIT t-PIT
0 0 0
3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2
N-PIT

N-PIT

N-PIT
t-PIT

t-PIT

t-PIT
original

original

original

||· ||1 : over 24h, OMIE.PT, OLS ||· ||1 : over 24h, OTE.CZ, OLS ||· ||1 : over 24h, GEFCom, OLS
0.1 0.1 0.1
original original original
3 σ1 0.09
3 σ1 0.09
3 σ1 0.09
3 σ2 3 σ2 3 σ2
3 σlog 1 0.08 3 σlog 1 0.08 3 σlog 1 0.08
3 σlog 2 3 σlog 2 3 σlog 2
logistic 1 0.07 logistic 1 0.07 logistic 1 0.07

logistic 2 logistic 2 logistic 2


0.06 0.06 0.06
asinh 1 asinh 1 asinh 1
asinh 2 0.05 asinh 2 0.05 asinh 2 0.05
boxcox 1 boxcox 1 boxcox 1
boxcox 2 0.04 boxcox 2 0.04 boxcox 2 0.04
poly 1 poly 1 poly 1
0.03 0.03 0.03
poly 2 poly 2 poly 2
mlog 1 0.02 mlog 1 0.02 mlog 1 0.02
mlog 2 mlog 2 mlog 2
N-PIT 0.01 N-PIT 0.01 N-PIT 0.01
t-PIT t-PIT t-PIT
0 0 0
3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2
N-PIT

N-PIT

N-PIT
t-PIT

t-PIT

t-PIT
original

original

original

Fig. 4. Results of the ‘multivariate’ DM test defined by the multivariate loss differential series in Eqn. (18) with p = 1, i.e., in the k · k1 -norm, for all 12
datasets. Like in Figure 5, we use a heat map to indicate the range of the p-values – the closer they are to zero (→ dark green) the more significant is the
difference between the forecasts of a model on the X-axis (better) and the forecasts of a model on the Y-axis (worse).
10

||· ||1 : over 24h, 11 datasets, OLS ||· ||1 : over 24h, 11 datasets, ANN
0.1 0.1
original original
3 σ1 0.09
3 σ1 0.09
3 σ2 3 σ2
3 σlog 1 0.08 3 σlog 1 0.08
3 σlog 2 3 σlog 2
logistic 1 0.07 logistic 1 0.07

logistic 2 logistic 2
0.06 0.06
asinh 1 asinh 1
asinh 2 0.05 asinh 2 0.05
boxcox 1 boxcox 1
boxcox 2 0.04 boxcox 2 0.04
poly 1 poly 1
0.03 0.03
poly 2 poly 2
mlog 1 0.02 mlog 1 0.02
mlog 2 mlog 2
N-PIT 0.01 N-PIT 0.01
t-PIT t-PIT
0 0
3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2
N-PIT

N-PIT
t-PIT

t-PIT
original

original
||· ||2 : over 24h, 11 datasets, OLS ||· ||2 : over 24h, 11 datasets, ANN
0.1 0.1
original original
3 σ1 0.09
3 σ1 0.09
3 σ2 3 σ2
3 σlog 1 0.08 3 σlog 1 0.08
3 σlog 2 3 σlog 2
logistic 1 0.07 logistic 1 0.07

logistic 2 logistic 2
0.06 0.06
asinh 1 asinh 1
asinh 2 0.05 asinh 2 0.05
boxcox 1 boxcox 1
boxcox 2 0.04 boxcox 2 0.04
poly 1 poly 1
0.03 0.03
poly 2 poly 2
mlog 1 0.02 mlog 1 0.02
mlog 2 mlog 2
N-PIT 0.01 N-PIT 0.01
t-PIT t-PIT
0 0
3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2

3 σ1
3 σ2
3 σlog 1
3 σlog 2
logistic 1
logistic 2
asinh 1
asinh 2
boxcox 1
boxcox 2
poly 1
poly 2
mlog 1
mlog 2
N-PIT

N-PIT
t-PIT

t-PIT
original

original
Fig. 5. Results of the ‘multivariate’ DM test defined by the multivariate loss Fig. 6. Results of the ‘multivariate’ DM test defined by the multivariate loss
differential series in Eqn. (19) with p = 1 (top) and p = 2 (bottom) for the differential series in Eqn. (19) with p = 1 (top) and p = 2 (bottom) for the
regression models and across all 11 European datasets. We use a heat map neural-net models and across all 11 European datasets. We use a heat map
to indicate the range of the p-values – the closer they are to zero (→ dark to indicate the range of the p-values – the closer they are to zero (→ dark
green) the more significant is the difference between the forecasts of a model green) the more significant is the difference between the forecasts of a model
on the X-axis (better) and the forecasts of a model on the Y-axis (worse). on the X-axis (better) and the forecasts of a model on the Y-axis (worse).

that N-PIT leads to significantly better forecasts than all other two variants of the Diebold-Mariano (DM) test to formally as-
transformations under consideration. sess the statistical significance of the forecasting performance.
Regarding the aggregated DM-tests in Fig. 5, we see that all The obtained results suggest that the choice of the optimal
transformations lead to significantly better predictions than the transformation depends on the forecasting framework and the
original data, except for logistic. This result holds for the k·k1 - considered dataset.
and the k·k2 -norm, even though the latter tends to return higher However, while for individual markets specifically tailored
p-values. For the k · k1 -norm the N-PIT transformation leads transformations can yield better results, due to the increasing
to significantly better forecasts than all other options, which demand for joint modeling of multiple markets, robust cross-
emphasizes the findings from Table II. But for the k · k2 -norm market transformations may turn out to be very useful. In par-
the results are not that clear-cut. Still poly1 and mlog1 lead to ticular, the probability integral transform-based N-PIT yields
forecasts that outperform most of the competitors, except each very robust results and promising forecasting accuracy in terms
other, 3σ2 and N-PIT. Heaving this in mind, the N-PIT seems of MAE. It leads to forecasts that significantly outperform all
to be the overall best performer. It leads to significantly better other competitors across all markets, according to the k · k1 -
predictions than all other transformations within the robust norm based DM test. However, if the forecasting focus is
evaluation framework with respect to the k · k1 -norm and not on spike sensitive measures (like the RMSE), then the newly
significantly worse than the best transformations in the k · k2 - introduced poly and mlog transforms tend to perform better.
norm framework. However, if the evaluation focus is on spike Given that the mlog requires only one shape parameter (c),
detection then we suggest to use the mlog1 as it has a very while poly needs two (λ, c), we suggest to use the former in
similar performance to the poly1 , but requires only one shape such a context.
parameter (i.e., c). Our study can be further expanded in several directions. In
particular, we report results for only one expert, regression-
V. C ONCLUSIONS based model. Although we have also considered several other
expert and autoregressive models from [21], [23] and the
We have conducted an extensive EPF study across 12 major results were qualitatively the same, we can only conjecture
markets to evaluate different variance stabilizing transforma- that our conclusions will hold for more complex models, like
tions. In line with the guidelines set forth in [1], we have used parameter rich structures estimated via the LASSO [21], [23],
11

[27] or well performing neural network feature selection algo- [20] S. Schneider, “Power spot price models with negative prices,” Journal
rithms [36]. Furthermore, we have focused on the point fore- of Energy Markets, vol. 4, no. 4, pp. 77–102, 2011.
[21] F. Ziel and R. Weron, “Day-ahead electricity price forecasting with
casting framework and ignored the full predictive distribution high-dimensional structures: Univariate vs. multivariate models,” Energy
of electricity prices (or its transformed version). As it holds in Economics, 2017, submitted.
general that the back-transformed mean price f −1 E(Yd,h ) is [22] F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,”
Journal of Business and Economic Statistics, vol. 13, pp. 253–263, 1995.
not equal to E(f −1 (Yd,h )) = E(Pd,h ), some subtle mathemati- [23] B. Uniejewski, J. Nowotarski, and R. Weron, “Automated variable
cal considerations are required to solve this problem properly, selection and shrinkage for day-ahead electricity price forecasting,”
especially when it comes to probabilistic forecasting. This, Energies, vol. 9, no. 8, p. 621, 2016.
[24] M. Burger, B. Graeber, and G. Schindlmayr, Managing energy risk: An
however, is left for future research. Finally, in this study we integrated view on power and other energy markets. Wiley, 2007.
have restricted ourselves to symmetric transformations. Future [25] J. S. Armstrong, Ed., Principles of Forecasting: A handbook for re-
research could elaborate on asymmetric functions, which may searchers and practitioners. Kluver, 2001.
[26] N. Karakatsani and D. W. Bunn, “Forecasting electricity prices: The
yield an even better forecasting performance. impact of fundamentals and time-varying coefficients,” International
Journal of Forecasting, vol. 24, pp. 764–785, 2008.
[27] F. Ziel, “Forecasting electricity spot prices using LASSO: On capturing
R EFERENCES the autoregressive intraday structure,” IEEE Transactions on Power
Systems, vol. 31, no. 6, pp. 4977–4987, 2016.
[1] R. Weron, “Electricity price forecasting: A review of the state-of-the- [28] N. Hamdy, Applied Signal Processing: Concepts, Circuits, and Systems.
art with a look into the future,” International Journal of Forecasting, CRC Press, Boca Raton, 2008.
vol. 30, no. 4, pp. 1030–1081, 2014. [29] F. Günther and S. Fritsch, “neuralnet: Training of neural networks,” The
[2] D. Chen and D. Bunn, “The forecasting performance of a finite mixture R journal, vol. 2, no. 1, pp. 30–38, 2010.
regime-switching model for daily electricity prices,” Journal of Fore- [30] R. Hyndman and G. Athanasopoulos, Forecasting: Principles and prac-
casting, vol. 33, no. 5, pp. 364–375, 2014. tice. Online at http://otexts.org/fpp/, 2013.
[3] H. Zareipour, C. A. Canizares, and K. Bhattacharya, “Economic impact [31] R. Sakia, “The Box-Cox transformation technique: A review,” The
of electricity market price forecasting errors: A demand-side analysis,” Statistician, vol. 41, no. 2, pp. 169–178, 1992.
IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 254–262, 2010. [32] T. Gneiting, F. Balabdaoui, and A. Raftery, “Probabilistic forecasts,
[4] T. Hong, “Crystal ball lessons in predictive analytics,” EnergyBiz, calibration and sharpness,” Journal of the Royal Statistical Society B,
Spring, pp. 35–37, 2015. vol. 69, pp. 243–268, 2007.
[5] J. H. Zhao, Z. Y. Dong, X. Li, and K. P. Wong, “A framework for [33] A. Der Kiureghian and P.-L. Liu, “Structural reliability under incomplete
electricity price spike analysis with advanced data mining methods,” probability information,” Journal of Engineering Mechanics, vol. 112,
IEEE Transactions on Power Systems, vol. 22, no. 1, pp. 376–385, 2007. no. 1, pp. 85–104, 1986.
[6] K. De Vos, “Negative wholesale electricity prices in the German, French [34] G. Diaz and E. Planas, “A note on the normalization of Spanish
and Belgian day-ahead, intra-day and real-time markets,” The Electricity electricity spot prices,” IEEE Transactions on Power Systems, vol. 31,
Journal, vol. 28, no. 4, pp. 36–50, 2015. no. 3, pp. 2499–2500, 2016.
[7] L. Hagfors, H. Kamperud, F. Paraschiv, M. Prokopczuk, A. Sator, and [35] J. Nowotarski, E. Raviv, S. Trück, and R. Weron, “An empirical
S. Westgaard, “Prediction of extreme price occurrences in the German comparison of alternate schemes for combining electricity spot price
day-ahead electricity market,” Quantitative Finance, vol. 16, no. 12, pp. forecasts,” Energy Economics, vol. 46, pp. 395–412, 2014.
1929–1948, 2016. [36] O. Abedinia, N. Amjady, and H. Zareipour, “A new feature selection
[8] P. J. Huber and E. M. Ronchetti, Robust Statistics, 2nd ed. Wiley, 2009. technique for load and price forecast of electrical power systems,” IEEE
[9] R. Weron and A. Misiorek, “Forecasting spot electricity prices: A Transactions on Power Systems, vol. 32, no. 1, pp. 62–74, 2017.
comparison of parametric and semiparametric time series models,”
International Journal of Forecasting, vol. 24, pp. 744–763, 2008.
[10] L. Wu and M. Shahidehpour, “A hybrid model for day-ahead price
forecasting,” IEEE Transactions on Power Systems, vol. 25, no. 3, pp.
1519–1530, 2010.
[11] V. Gonzalez, J. Contreras, and D. Bunn, “Forecasting power prices using
a hybrid fundamental-econometric model,” IEEE Transactions on Power
Systems, vol. 27, no. 1, pp. 363–372, 2012.
[12] P. Dev and M. Martin, “Using neural networks and extreme value
distributions to model electricity pool prices: Evidence from the Aus-
tralian National Electricity Market 1998-2013,” Energy Conversion and
Management, vol. 84, pp. 122–132, 2014.
[13] J. Janczura, “Pricing electricity derivatives within a Markov regime-
switching model: A risk premium approach,” Mathematical Methods of
Operations Research, vol. 79, no. 1, pp. 1–30, 2014.
[14] M. Bessec, J. Fouquau, and S. Meritet, “Forecasting electricity spot
prices using time-series models with a double temporal segmentation,”
Applied Economics, vol. 48, no. 5, pp. 361–378, 2016.
[15] T. Hong, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, and R. J.
Hyndman, “Probabilistic energy forecasting: Global Energy Forecasting
Competition 2014 and beyond,” International Journal of Forecasting,
vol. 32, no. 3, pp. 896–913, 2016.
[16] M. Shahidehpour, H. Yamin, and Z. Li, Market Operations in Electric
Power Systems: Forecasting, Scheduling, and Risk Management. Wiley,
2002.
[17] J. Contreras, R. Espı́nola, F. Nogales, and A. Conejo, “ARIMA models
to predict next-day electricity prices,” IEEE Transactions on Power
Systems, vol. 18, no. 3, pp. 1014–1020, 2003.
[18] M. Bierbrauer, C. Menn, S. T. Rachev, and S. Trück, “Spot and derivative
pricing in the EEX power market,” Journal of Banking & Finance,
vol. 31, pp. 3462–3485, 2007.
[19] J. Janczura, S. Trück, R. Weron, and R. Wolff, “Identifying spikes and
seasonal components in electricity spot price data: A guide to robust
modeling,” Energy Economics, vol. 38, pp. 96–110, 2013.

View publication stats

You might also like