Extreme Event Statistics with the evd Package
This is an introduction to the basic commands in the evd package to estimate statistical models using
the GEV distribution (block maxima approach) and the GPD sistribution (peak over threshold). The
package full documentation is available at
[Link]
The package vignette can be accessed at
[Link]
Other packages with extreme value functionalities are ismev, evir, extRemes, evdbayes, see the
CRAN website [Link] for more details.
The package evd can be installed with the command
[Link]("evd")
and can be loaded in the current R session with
library(evd)
The library contains several functions for extreme value analysis. For instance, the GEV distribution
can be used through one of the functions
pgev(q, loc = 0, scale = 1, shape = 0, [Link] = TRUE)
dgev(x, loc = 0, scale = 1, shape = 0, log = FALSE)
qgev(p, loc = 0, scale = 1, shape = 0, [Link] = TRUE)
rgev(n, loc = 0, scale = 1, shape = 0)
which give respectively the cdf, pdf, quantile function and random generator of the GEV distribution.
Help for a function can be obtained with
help(pgev)
Similarly, for the GPD, we have
pgpd(q, loc = 0, scale = 1, shape = 0, [Link] = TRUE)
dgpd(x, loc = 0, scale = 1, shape = 0, log = FALSE)
qgpd(p, loc = 0, scale = 1, shape = 0, [Link] = TRUE)
rgpd(n, loc = 0, scale = 1, shape = 0)
1 Block Maxima Approach
We use the Port Pirie sea level example seen in class. The corresponding data set is contained in the
evd package and can be loaded with
data(portpirie)
portpirie
## [1] 4.03 3.83 3.65 3.88 4.01 4.08 4.18 3.80 4.36 3.96 3.98 4.69 3.85 3.96
## [15] 3.85 3.93 3.75 3.63 3.57 4.25 3.97 4.05 4.24 4.22 3.73 4.37 4.06 3.71
## [29] 3.96 4.06 4.55 3.79 3.89 4.11 3.85 3.86 3.86 4.21 4.01 4.11 4.24 3.96
## [43] 4.21 3.74 3.85 3.88 3.66 4.11 3.71 4.18 3.90 3.78 3.91 3.72 4.00 3.66
## [57] 3.62 4.33 4.55 3.75 4.08 3.90 3.88 3.94 4.33
The GEV can be fitted to the Port Pirie data set via maximum likelihood with the command fgev:
portpirie_gev <- fgev(portpirie)
Results of the estimation can be accessed as follows:
1
fitted(portpirie_gev)
## loc scale shape
## 3.87475133 0.19804888 -0.05011658
[Link](portpirie_gev)
## loc scale shape
## 0.02793260 0.02024787 0.09825585
deviance(portpirie_gev)/-2
## [1] 4.339058
vcov(portpirie_gev)
## [,1] [,2] [,3]
## loc 0.0007802302 0.0001970359 -0.0010740413
## scale 0.0001970359 0.0004099761 -0.0007774403
## shape -0.0010740413 -0.0007774403 0.0096542112
giving respectively the estimated parameters, standard errors, the log-likelihood (note that deviance
gives −2` where ` is the maximized likelihood) and the inverse of the observed information matrix.
Confidence intervals con be directly computed through
confint(portpirie_gev, level = 0.95)
## 2.5 % 97.5 %
## loc 3.8200044 3.9294982
## scale 0.1583638 0.2377340
## shape -0.2426945 0.1424613
and the diagnostic plots with
par(mfrow = c(2, 2))
plot(portpirie_gev)
2
Probability Plot Quantile Plot
5.0
0.8
4.5
Empirical
Model
0.4
4.0
3.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0 3.6 3.8 4.0 4.2 4.4 4.6
Empirical Model
Density Plot Return Level Plot
5.0
1.5
Return Level
4.5
Density
1.0
4.0
0.5
3.5
0.0
3.6 4.0 4.4 4.8 0.2 1.0 5.0 20.0 100.0
Quantile Return Period
The return level can be easily computed with the qgev function. For instance, the 50 years return level
is
mu <- fitted(portpirie_gev)["loc"]
sigma <- fitted(portpirie_gev)["scale"]
xi <- fitted(portpirie_gev)["shape"]
qgev(1 - 1/50, loc = mu, scale = sigma, shape = xi)
## [1] 4.576661
The profile log-likelihood plots can be obtained with
par(mfrow = c(2, 2))
plot(profile(portpirie_gev))
## [1] "profiling loc"
## [1] "profiling scale"
## [1] "profiling shape"
3
Profile Log−likelihood of Loc Profile Log−likelihood of Scale
4
4
profile log−likelihood
profile log−likelihood
3
3
2
2
1
1
−1 0
−1
3.80 3.85 3.90 3.95 0.15 0.20 0.25
loc scale
Profile Log−likelihood of Shape
4
profile log−likelihood
3
2
1
−1 0
−0.3 −0.1 0.1 0.3
shape
The confidence interval based on the profile likelihood can be exactly obtained with
confint(profile(portpirie_gev))
## [1] "profiling loc"
## [1] "profiling scale"
## [1] "profiling shape"
## lower upper
## loc 3.8211280 3.9312535
## scale 0.1634034 0.2446386
## shape -0.2177982 0.1703843
Finally, the Gumbel model can be estimated with
portpirie_gum <- fgev(portpirie, shape = 0)
The likelihood ratio test statistics (or deviance) is then
deviance(portpirie_gum) - deviance(portpirie_gev)
## [1] 0.2427531
4
Note that the reparametrized GEV model where the return level zp (for fixed 0 < p < 1) replaces the
location parameter can be estimated with (choosing for example the 50 years return level)
portpirie_gev_zp <- fgev(portpirie, prob = 1/50)
fitted(portpirie_gev_zp)
## quantile scale shape
## 4.57670252 0.19812758 -0.05033067
This can be used to construct a profile confidence interval for the return level:
plot(profile(portpirie_gev_zp, which = "quantile"))
## [1] "profiling quantile"
Profile Log−likelihood of Quantile
4
profile log−likelihood
3
2
1
0
−2
4.4 4.6 4.8 5.0 5.2 5.4 5.6
quantile
2 Peak Over Threshold Approach
We use the daily rain example contained in the package ismev:
data(rain, package = "ismev")
str(rain)
## num [1:17531] 0 2.3 1.3 6.9 4.6 0 1 1.5 1.8 1.8 ...
In order to choose the threshold, the mean residual life plot over different ranges can be obtained with
par(mfrow = c(2, 2))
mrlplot(rain, tlim = c(0, 80), nt = 200)
mrlplot(rain, tlim = c(0, 60), nt = 200)
mrlplot(rain, tlim = c(20, 50), nt = 200)
mrlplot(rain, tlim = c(30, 60), nt = 200)
5
Mean Residual Life Plot Mean Residual Life Plot
25
25
20
Mean Excess
Mean Excess
20
15
15
10
10
5
5
0 20 40 60 80 0 10 20 30 40 50 60
Threshold Threshold
Mean Residual Life Plot Mean Residual Life Plot
25
18
Mean Excess
Mean Excess
20
14
15
10
10
6
20 25 30 35 40 45 50 30 35 40 45 50 55 60
Threshold Threshold
As an alternative threshold choice tool, the repeated fitting of the shape and the adjusted scale over
some range of thresholds can be used:
par(mfrow = c(2, 1))
tcplot(rain, tlim = c(30, 50), nt = 20)
6
150
Modified Scale
50
−50
30 35 40 45 50
Threshold
1.0
Shape
−0.5
−2.0
30 35 40 45 50
Threshold
Fitting the daily rain data set with a threshold u0 = 30 is done with the function fpot:
u0 <- 30
npy <- length(rain)/(1962 - 1914 + 1)
rain_gpd <- fpot(rain, threshold = u0, npp = npy)
Note that the parameter npp is the number of observations per period — year in this case —, which in
case of daily observations is 365.25 (equal to 357.78 in the present example).
As with fgev, results of the estimation can be accessed with the functions fitted, [Link], deviance,
vcov. The number and proportion of exceedances are
rain_gpd$nat
## [1] 152
rain_gpd$pat
## [1] 0.008670355
As before plot(rain gpd) and plot(profile(rain gpd)) will generate diagnostic and profile log-
likelihood plots.
7
3 Additional Features
The functions fgev and fpot contains extra features such as
• fgev has a parameter nsloc for the modelling of the location parameter in a non stationary model.
• fpot allows for the possibility of fitting stationary data with the run declustering method, through
the parameters r and cmax. The clusters of exceedances can be explicitly obtained through the
function clusters.
The package evd has many other functions, in particular for the estimation of bivariate extreme value
models.
Among the other packages, ismev has the possibility of fitting nonstationary models by specifying the
location, shape and scale parameters with covariates both in the block maxima approach and the peak-
over-threshold, see the functions [Link] and [Link]. The function [Link] allows one to fit the
order statistics, that is the estimation of the r largest observations in the block maxima approach.
Check the task view on CRAN [Link] for a
complete list of R packages for extreme value theory.