You are on page 1of 67

Uncertainty

.
Concepts, modelling and verification

Pierre Pinson

Technical University of Denmark


DTU Electrical Engineering - Centre for Electric Power and Energy
mail: ppin@elektro.dtu.dk - webpage: www.pierrepinson.com

DTU Summer School 2017 - EES-UETP course, Kgs Lyngby, Denmark – 12 June 2017

(with ackn. to all funding sources, data providers, collaborators and students, for material and ideas)
1 / 67
Outline

Uncertainty and forecasting


input to decision-making
origins of uncertainty: weather forecasts, power curves, etc.
basic characteristics

From deterministic to probabilistic forecasts


what a deterministic forecast really is...
illustration of forecast types: point, quantile, intervals, densities, trajectories

Verifying probabilistic forecasts

Parametric probabilistic forecasts


wind energy example
wave energy example

Nonparametric probabilistic forecasts


“dressing” of point forecasts
ensemble-based approach

2 / 67
1 Uncertainty and forecasting

3 / 67
Why uncertainty?

Forecast information is widely used as input to several decision-making problems:

definition of reserve requirements


unit commitment and economic dispatch
operation of combined wind-hydro
operation of wind associated with storage
design of optimal trading strategies
electricity market design
etc.

Inputs for these methods are:

deterministic forecasts
probabilistic forecasts as quantiles, intervals, and predictive distributions
probabilistic forecasts in the form of trajectories (/scenarios)
risk indices (broad audience applications)

For nearly all of these problems, optimal decisions can only be obtained if fully
considering forecast uncertainty...

4 / 67
Contribution to forecast uncertainty/error

5 / 67
Numerical Weather Prediction

Future values of meteorological


variables (wind, temperature, etc.)
on a grid

Temporal/spatial resolution,
domain, forecast update and
forecast length vary depending
upon the NWP system

Large number of alernative system


today (global, mesoscale, etc.)
providing free or commercially
available output.

Origins of uncertainty in NWPs: initial state, model/physics, numerical aspects


(filtering)

6 / 67
The manufacturer power curve

Power curve of the Vest V44 turbine (600 kW)

Klim wind farm (North of Jutland, Denmark): 35 V44 turbines


Nominal capacity: 21 MW
Easy scaling of the power curve!
7 / 67
The actual power curve looks different!

Origins of uncertainty in the conversion process:

actual
meteorological
conditions seen
by turbines,
aggregation of
individual curves,
non-ideal power
curves,
etc.

8 / 67
Shaping forecast uncertainty

courtesy of Matthias Lange

9 / 67
2 From deterministic to probabilistic forecasts

10 / 67
Point forecasts: what they really are...

Forecasts are commonly provided in the form of point forecasts (i.e., the conditional
expectation for each look-ahead time)

Several studies have permitted to establish their level of accuracy, depending on


various factors (see other talks, and extensive analysis in, e.g., EU project Anemos)
11 / 67
For illustration: the Western Denmark dataset

Agg. zone Orig. zones % of capacity


1 1, 2, 3 31
2 5, 6, 7 18
3 4, 8, 9 17
4 10, 11, 14, 15 23
5 12, 13 10

Figure: The Western Denmark dataset: original locations for which measurements are available, 15 control zones defined by
Energinet, as well as the 5 aggregated zones, for a nominal capacity of around 2.5 GW.

12 / 67
Point forecast: definition

A point forecast informs of the conditional expectation of power generation

Mathematically:

ŷt+k|t = E[yt+k |Ω, M, θ̂]

given
the information
set Ω
a model M
its estimated
parameters θ̂
at time t

(Ω, M, θ̂ omitted in other


definitions)

13 / 67
Point forecasting

Figure: Example episode with point forecasts for the 5 aggregated zones of Western Denmark, as issued on
16 March 2007 at 06 UTC, along with corresponding power measurements, obtained a posteriori.
14 / 67
Quantile forecast: definition

A quantile forecast is to be seen as a probabilistic threshold for power generation

Mathematically:

(α) −1
q̂t+k|t = F̂t+k|t (α)

with
α:the nominal
level (ex: 0.5 for
50%)
F̂ : (predicted)
cumulative
distribution
function for Yt+k
Yt+k : the
random variable
“power
generation at
time t + k”
15 / 67
Prediction interval: definition

A prediction interval is an interval within which power generation may lie, with a
certain probability

Mathematically:

h i
(β) (α) (α)
Iˆt+k|t = q̂t+k|t , q̂t+k|t

with
β: nominal
coverage rate
(ex: 0.9 for 90%)
(α) (α)
q̂t+k|t , q̂t+k|t :
interval bounds
α, α: nominal
levels of quantile
forecasts
16 / 67
Predictive densities: definition

A predictive density fully describes the probabilistic distribution of power generation


for every lead time

Mathematically:

Yt+k ∼ F̂t+k|t

with
F̂ : (predicted)
cumulative
distribution
function for Yt+k
Yt+k : the
random variable
“power
generation at
time t + k” 17 / 67
Predictive densities

Figure: Example episode with probabilistic forecasts for the 5 aggregated zones of Western Denmark, as issued on 16 March
2007 at 06UTC. They take the form of so-called river-of-blood fan charts, represented by a set of central prediction intervals with
increasing nominal coverage rates (from 10% to 90%).
18 / 67
The conditional importance of correlation

almost no temporal
correlation

appropriate temporal
correlation

19 / 67
Trajectories (/scenarios): definition

Trajectories are equally-likely samples of multivariate predictive densities for power


generation (in time and/or space)

Mathematically:

(j)
zt ∼ F̂t

with
F̂ : multivariate
predictive cdf for
Yt
(j)
zt : the j th
trajectory

20 / 67
Space-time trajectories (/scenarios)

21 / 67
3 Verification of probabilistic forecasts

22 / 67
Properties of probabilistic forecasts

How do you want your forecasts?

Reliable? (also referred to as “calibrated”)

Sharp? (i.e., informative)

Skilled? (all-round performance, and of higher quality than some benchmark)

Of high resolution? (i.e., resolving among situations with various uncertainty


levels)

23 / 67
4 Parametric apporaches to probabilistic forecasting

24 / 67
The wave energy example

Predicting wave energy around North America... (at 13 buoys)

25 / 67
Data characteristics

Table: Characteristics of the buoys and of the corresponding hourly measurements.

Id. Dates Depth [m] Location (Latitude, Longitude) Operator

41025 1.12.2007-31.3.2010 68 Diamond Shoals, NC (35.001 N, 75.40 W) NDBC


41048 1.12.2007-31.12.2009 5261 West Bermuda (31.98 N, 69.65 W) NDBC
42001 1.12.2007-31.03.2010 3246 Mid-Gulf, Louisiana (25.90 N, 89.67 W) NDBC
42036 1.12.2007-31.03.2010 54 West Tampa, FL (28.50 N, 85.52 W) NDBC
42056 1.12.2007-31.03.2010 4446 Yucatan Basin (19.874 N, 85.06 W) NDBC
44014 1.12.2007-31.03.2010 48 Virginia Beach, VA (36.61 N, 74.84 W) NDBC
46001 1.12.2007-31.03.2010 4206 Gulf of Alaska, AK (56.30 N, 148.02 W) NDBC
46041 1.12.2007-31.03.2010 132 Cape Elizabeth, WA (47.35 N, 124.73 W) NDBC
46047 1.12.2007-31.03.2010 1393 Tanner Banks, CA (34.43 N, 119.533 W) NDBC
46083 1.12.2007-31.03.2010 136 Yakutak, AK (52.24 N, 137.99 W) NDBC
46132 1.12.2007-31.03.2010 2040 South Brooks, BC (49.74 N, 127.93 W) ISDM
46207 1.12.2007-31.03.2010 2125 East Dellwood, BC (50.88 N, 129.92 W) ISDM
51001 21.02.2008-24.12.2009 3430 Kauai, HI (23.45 N, 162.28 W) NDBC

ECMWF WAM forecasts are available for the same period:


issued twice a day (00UTC and 12UTC)
temporal resolution of 3 hours
here, forecast length of 48 hours

26 / 67
Welcome to Kauai

This is also where they filmed part of the Jurassic Park movies...!

27 / 67
Predicting the wave energy flux

Wave energy flux is defined as:

g 2ρ 2
Yt = H Tt , Yt ≥ 0
64π t
where
g the gravitational acceleration constant
ρ the density of seawater
Ht and Tt denote the significant wave height and mean wave period

Data analysis showed that probabilistic forecasts could be generated based on


a log-Normal assumption:

2
Yt+k ∼ ln N (µ̂t+k|t , σ̂t+k|t )

where µ̂t+k|t and σ̂t+k|t are forecasts of the location and scale parameters.
2
Note that this is equivalent to ln(Yt+k ) ∼ N (µ̂t+k|t , σ̂t+k|t )

28 / 67
Dynamic models for location and scale
Define x = ln(y ) the log of the wave energy flux
A simple model for the location parameter µ̂t+k|t is

µ̂t+k|t = θ >
t zt = θ0 + θ1 x̃t+k|t + θ2 xt + . . . + θl+2 xt−l

with
>
θt = [θ0 , θ1 , θ2 , . . . , θl+2 ]
 >
zt = 1, x̃t+k|t , xt , . . . , xt−l

and
xt−i : most recent measurements
x̃t+k|t : Raw forecasts from ECMWF WAM

A (very) simple model for the scale parameter σ̂t+k|t is

p
σ̂t+k|t := βt
that is, a constant.
29 / 67
Illustration of probabilistic forecasts

Example forecasts for a single buoy (West Tampa, Florida): 1.1.2010 at


00UTC

30 / 67
Reliability of probabilistic wave energy forecasts

Example results for 48-hour ahead probabilistic forecasts

The log-Normal assumption is fine for some buoys, but not for others...

31 / 67
Skill of probabilistic wave energy forecasts

Example results with Ignorance and CRPS for a single buoy

Note the two different benchmarks that are climatology and persistence
32 / 67
Skill of probabilistic wave energy forecasts

Example results with Ignorance and CRPS as improvements over


benchmarks (all buoys)

Very high skill is attained, though looking different depending upon the chosen
score...
33 / 67
Bye bye Kauai...!

Lets move on to wind energy and some other exotic case studies...
34 / 67
Horns Rev, Denmark

35 / 67
The wind energy example

Nonlinear and bounded!

36 / 67
Beta or not Beta?

An obvious choice of distributions for nonlinear and bounded process is the


Beta distribution...

(from Wikipedia)

It was shown that we could do better though


37 / 67
The GL transformation (1)

Write {xt } the original time-series, and {yt } the transformed one

The GL transformation γ is defined by

xtν
 
yt = γ(xt ; ν) = ln , xt ∈ (0, 1)
1 − xtν

while its inverse transformation, referred to as inverse generalized logit


(IGL), is defined as

−1/ν
xt = γ −1 (yt ; ν) = {1 + exp(yt )} , yt ∈ R

ν is a shape parameter

38 / 67
The GL transformation (2)

For wind power predictive densities, the model linking mean and standard
deviation (µX and σX ) could be

σX = αµνX (1 − µνX )

If so, then the variance of Y = γ(X ; ν) is α2 ...

39 / 67
Why the GL transform?

courtesy of Matthias Lange

40 / 67
The GL-Normal variable

We say that assuming that


Y ∼ N (µ, σ 2 )
then X is distributed GL-Normal, ie.
X ∼ Lν (µ, σ 2 )

Its density function writes


(  2 )
1 1 γ(x; ν) − µ
f (x) = √ exp − , x ∈ (0, 1)
σ 2πx ν (1 − x ν ) 2 σ

The interest of this proposal is that one can first transform wind power data
and then work in a Gaussian framework!

41 / 67
The form of predictive densities

It is fair to write the predictive density for the wind power generation Xt+k at
time t + k as
0 0 1 2 1
Xt+k ∼ ωt+k δ0 + (1 − ωt+k − ωt+k )Lν (µt+k , σt+k ) + ωt+k δ1

Equivalently if considering Yt+k the GL-transform of Xt+k , the form of its


predictive density is given by
0 0 1 2 1
Yt+k ∼ ωt+k δ−∞ + (1 − ωt+k − ωt+k )N (µt+k , σt+k ) + ωt+k δ+∞
Similar to the coarsening idea of Lesaffre et al. (2007), we rewrite it as
0 2 1
Yt+k ∼ ωt+k δγ(;ν) + N (µt+k , σt+k )1D̄y + ωt+k δγ(1−;ν)

This means that we ‘simply’ use censored Normal distributions, and that our
complex predictive densities are summarized by 2 parameters only (location
and scale)

42 / 67
Autoregressive dynamics

We consider that the location parameter µt+k has autoregressive dynamics


µt+k = θ > ỹt
with
θ = [θ0 θ1 . . . θl+1 ]> , ỹt = [1 yt . . . yt−l ]>

while the scale parameter is defined as a constant, i.e.


2
σt+k = α

A Recursive Least Squares (RLS) type of framework (with exponential


forgetting) is employed for estimation of the model parameters

43 / 67
Conditional Parametric AR dynamics

If considering CP-AR dynamics instead, the model for the location parameter
µt+k is changed to
µt+k = θ(ωt )> ỹt
where
θ(ωt ) = [θ0 (ωt ) θ1 (ωt ) . . . θl+1 (ωt )]>
In parallel the model for the scale parameter then becomes
2
σt+k = α(ωt )

Local kernel regression is used for obtaining local linear models (at a number
of fitting points)
A similar estimation framework is employed for adaptive estimation of local
linear models
An obvious candidate for ω is measured wind direction

44 / 67
Application to offshore wind power data

Location: Horns Rev


raw data from May 2005 to
January 2006
temporal resolution of 10 mins
(averages)
Power normalized by the rated
capacity of the wind farm (160
MW)

The exercise consists in 1-step ahead point and probabilistic forecasting


with:
a classical AR model with a Gaussian assumption
GL-Normal predictive densities with AR dynamics
GL-Normal predictive densities with CP-AR dynamics

Setup:
first 3 months for model identification and batch estimation
optimal models: 3 lags and an intercept
Evaluation is carried out on the remaining of the dataset
45 / 67
Point forecast evaluation

NMAE

Month Jun. Jul. Aug. Sep. Oct. Nov. Dec. Jan. All
Persistence 2.40 2.43 2.50 2.50 2.40 2.59 2.61 2.47 2.48
AR 2.34 2.42 2.48 2.43 2.33 2.64 2.63 2.45 2.45
GL-Normal AR 2.32 2.35 2.43 2.38 2.28 2.47 2.54 2.41 2.38
GL-Normal CP-AR 2.31 2.33 2.42 2.37 2.25 2.45 2.50 2.38 2.36

NRMSE

Month Jun. Jul. Aug. Sep. Oct. Nov. Dec. Jan. All
Persistence 3.98 4.68 4.47 4.66 4.17 6.23 4.76 4.28 4.70
AR 3.80 4.56 4.36 4.46 3.97 6.00 4.62 4.20 4.54
GL-Normal AR 3.81 4.57 4.32 4.43 3.96 5.87 4.55 4.12 4.50
GL-Normal CP-AR 3.79 4.54 4.30 4.43 3.93 5.83 4.52 4.09 4.47

46 / 67
Probabilistic forecast evaluation

The reliability of
GL-Normal
distributions is more
acceptable than that of
Normal and truncated
Normal distributions...
(and of Beta!)

CRPS

Month Jun. Jul. Aug. Sep. Oct. Nov. Dec. Jan. All
Normal AR 1.85 2.01 1.99 1.98 1.87 2.31 2.23 1.97 2.01
GL-Normal AR 1.76 1.80 1.86 1.86 1.81 1.93 1.92 2.03 1.86
GL-Normal CP-AR 1.73 1.78 1.82 1.81 1.71 1.94 1.88 1.79 1.80

47 / 67
Episode

Episode of 375 time-steps (with a time resolution of 10 minutes), with


90% prediction intervals and observations.

48 / 67
A look at model coefficients

One of the AR parameters as a function The variance of predictive densities as a


of wind direction function of wind direction

49 / 67
5 Nonparametric approaches to probabilistic forecasting

50 / 67
Ensemble forecasting

Ensemble forecasting is the


mainstream approach to uncertainty
estimation developed by the
meteorological forecasting community

Ensemble predictions of
wind power can be
obtained:
from meteorological
ensembles as input...
... consequently
converted to power with
appropriate statistical
methods

51 / 67
Probabilistic forecasts from ensembles
It is known that ensemble forecasts of wind power are not correct in a
probabilistic sense, i.e. they are not reliable
They can be converted to reliable probabilistic forecasts with appropriate
recalibration methods (e.g. nonparametric conditional models, smoothed
bootstrap, bayesian model averaging)

Example from a real application: probabilistic forecast from ensembles for the area
of Jutland (western Denmark)
52 / 67
Application: the case-study

Location: Horns Rev (again!)

Power measurements:
available raw data are from 16th February 2005 to 25th January 2006
Over 8200 hours, 73.4% of hourly averages are considered as valid
all power values are normalized by the wind farm nominal capacity

Meteorological ensembles:
MSPEPS from Weprog, available over the same period
originally issued every 6 hours
framed here in order to simulate real world application for which forecasts may be
delivered every hour...
... and also to increase the size of the dataset !!
53 / 67
Input meteorological ensembles

Meteorological ensembles originates from a multi-parametrization system developed


by WEPROG (75 ensemble members)

Downscaled at the
location of interest
Wind speed,
direction, and
other
meteorological
variables available
at various vertical
levels

In the present exercise, only wind speed at turbine nacelle level (app. 105m a.g.l.) is
used as input...

54 / 67
Setup for power curve modeling

A unique power curve for the wind farm, between the best available met forecasts
and power measurements

The mean of wind speed


ensemble members as the
best input forecast:
m
1 X (j)
x̄t+k|t = x̂ , ∀t, k
m j=1 t+k|t

The power curve model then


writes
yt+k = gt,k (x̄t+k|t ) + εt+k , ∀t, k

The power curve model for each horizon is based on local linear regression, and
adaptively estimated with an orthogonal fitting methods

55 / 67
Conversion of ensembles

At a given time t and for horizon k, the wind power forecast given by the j th
member is

(j) (j)
ŷt+k|t = ĝt,k (x̂t+k|t )

while the single point forecast that can extracted from the set of ensemble members
is chosen to be its mean, i.e.
m
1 X (j)
ŷt+k|t = ŷ
m j=1 t+k|t

Note that other approaches may also be envisaged, i.e.


a power curve model per ensemble member, and for each forecast horizon (75*48
power curve models!!)
a more advanced combination scheme for extracting the single point forecasts

56 / 67
Forecast performance of ensembles

NMAE

NRMSE

57 / 67
The wind power ensembles

Ensembles obtained with an orthogonally fitted power curve model better


span the full range of power values...
but they still are far from being reliable from a probabilistic perspective!!

Ensemble
forecasts
of wind
power for
Horns Rev

58 / 67
Kernel dressing of ensemble forecasts

Transform ensemble forecasts into nonparametric predictive densities:


for each horizon, ensemble members are ‘dressed’ with kernels
predictive densities are weighted combinations of kernels

m
(j)
X
fˆt+k|t (y ) = wj fˆt+k|t (y )
j=1

with
m
X
wj = 1, wj = 1/m
j=1

Several versions of that proposal can be found in the meteorological and


forecasting literature

59 / 67
Specifics of the wind power application

Gaussian kernels are employed:


Mean given by the forecast values of the ensemble members
Mean and standard deviation are linked by a model (the same for all ensemble
members!), i.e.
 
(j) 0 + τ1 (j) (j)
σt,k = τt,k t,k 1 − ŷt+k|t ŷt+k|t

increasing uncertainty in the


steep part of the power curve
low, but minimum level of
uncertainty otherwise

The model can be summarized by 2 parameters only, whatever the number of


ensemble members...
60 / 67
Estimation with Ignorance

The model parameters are adaptively estimated


This is done by maximizing the ignorance score of predictive distributions
(equivalent to maximum likelihood estimation)

61 / 67
Now close your eyes(!)

The objective function (negative log-likelihood) to be minimized at time t writes

t−k
1 X t−k−i
St,k (ν) = − λ ln (ui (ν))
nλ i=1

where
λ, λ ∈ (0, 1] is the forgetting factor,
ui (ν) = P[yi |ν] = fˆi|i−k (yi ) is the likelihood of the measurements given the
probabilistic forecasts

For recursive estimation, one uses:


∇ν ut (ν̂ t−1,k )
ht,k = ut (ν̂ t−1,k )
the information vector

The updating scheme at time t (based on Newton-Raphson step) consists of:


1
ν̂ t,k = ν̂ t−1,k + R −1 ht,k
nλ t,k
Rt,k = λRt−1,k + n1 ht,k ht,k >
λ

62 / 67
Application results

Out of the 7200 ensemble


forecast series, 1500 are
used as a learning and
cross-validation period
Evaluation is carried out
on the 5700 remaining
ones

One needs to:


recursively track 2 model
parameters for each
forecast horizon (total of
86 parameters)
decide on the forgetting
factor that controls
adaptivity (optimal:
λ = 0.996)

63 / 67
Reliability evaluation

Evaluation with reliability diagrams, depicting deviations from perfect reliability

k = 12 k = 24
Some comments:
Reliability is significantly improved after kernel dressing
There seems to be a remaining (slight) probabilistic bias...

64 / 67
Overall skill evaluation

Evaluation with the ignorance score, as a function of look-ahead time

Good reference: 0 is for


the uniform predictive
distributions (no
information)

Climatology-based
predictive distributions
are only slightly better
than not uniform ones !!
Ensemble based ones
have significantly higher
skill

Some comments:
High overall skill comes from acceptable reliability + maximized resolution
Maximum skill is for horizons between 5- and 10-hour ahead

65 / 67
Final remarks

Forecasting has become a key input to many decision-making problems, with


uncertainty information, depending upon:

what they are to be used for


the expertise/feeling of the decision-maker

Even though forecasting is receiving increasing attention, many important


problems remain:

input data (variety, confidentiality, etc.) → distributed learning?


dimensionality
forecast verification
forecasts in decision-making

66 / 67
Thanks for your attention!

67 / 67

You might also like