Professional Documents
Culture Documents
Keywords Abstract
homogeneous region; probability distribution
The flood and drought cycles suffered of old by the province of Malaga, the
function; rainfall; regional frequency analysis
variability in the distribution of rainfall throughout the province and the
Correspondence reduced length of the data series make it of interest to carry out a regional
A. P. Garcı́a-Marı́n, Department of Rural analysis (RA) of the yearly maximum daily precipitation data to obtain
Engineering, University of Cordoba, PO Box appropriate rainfall quantiles. By taking these maximum precipitations values
3048, 14080 Cordoba, Spain. Email: from 72 weather stations, and their physiographic parameters latitude and
amanda.garcia@uco.es altitude, four regions with similar rainfall patterns have been determined by
the principal component analysis statistical technique. Then, carrying out an
doi:10.1111/j.1747-6593.2011.00251.x
RA of the yearly maximum daily precipitations for each of the regions
discriminated, it was observed that three of them were homogeneous for the
parameter being studied. In those homogeneous regions that grouped data of
different stations but close rainfall pattern, frequency curves could be calcu-
lated for several return periods by means of the functions that best fit the data
of each region. With these regional curves, it has been possible to obtain more
accurate values of the maximum daily quantiles for each of the stations
analysed than through the conventional local frequency analysis.
study of rain in the province of Malaga. On many occa- confidence at the moment of predicting certain hydro-
sions, in order to solve certain engineering problems, the logical variables (Nathan & McMahon 1990).
maximum intensities of rain for a specific return period The regional frequency analysis enables the calculation
are resorted to. In order to estimate with any reliability of data for a certain site of interest using data from
the amount of rain, some series of maximum rain events different places other than those of the site in question. If
have to be available. The records of observation series at there are N sites or stations, each one of them with n years
most Spanish weather stations are too short to be able to of maximum event records, it can be assumed that N n
extrapolate them with any confidence. Intuitively, it is data of the region will give more precise estimates of
obvious that a region that is homogeneous from a clima- quantiles as extreme as QNn.
tological, geological and/or geomorphological perspective Following the methodology proposed by Hosking &
can permit the transfer of information between different Wallis (1997), the RA consists of the following steps.
watersheds in that region. However, it is advisable to
establish mathematical criteria on which to base the
homogeneity of a region in order to fix its limits with
Analysis of the data available
regard to the study and analysis of the data from the The measurement of the discordance Di permits the
weather stations to be considered (Garcı́a 2000). identification of unusual stations in comparison with the
The inconvenience of not using or possessing very rest of those composing the study region, for which the
extensive data can be obviated by using relatively recent linear moments of variation (LCv), skewness (LCs) and
techniques, such as that of the regional analysis (RA) of kurtosis (LCk) of the series of data available in each place
frequencies (Hosking & Wallis 1997). This approach per- considered were calculated. It is considered that the
mits an alleviation of the problem of lack of data in the vector of linear moments of a station i constitutes a point
time with the abundance of data in the space, this in a tridimensional space. A group of stations produces a
currently being the generalized trend in the frequency cloud of points in this space so that any point located far
analysis of extreme events (Álvarez et al. 1999). Different from the centre of gravity of the whole point set should be
works back up regionalization as being the technique that considered as being discordant. Numerically, the measure
improves the estimations of the quantiles when working of discordance is given by:
with rain or flows (Sáenz de Ormijana et al. 1991; Hosking
1
& Wallis 1997; Parida et al. 1998; Yun & Chen 1998; Ferrer Di ¼ Nðui uÞT A1 ðui uÞ ð1Þ
& Mateos 1999; Chiang et al. 2002a, b). Within regionali- 3
P
N P
N
zation, the determination of homogeneous regions is the where A¼ ðui uÞðui uÞ, u ¼ N 1 ui and
i¼1 i¼1
most complex step and the continuity of the analysis h i
depends on its result. Thus, the objective of this work is ui ¼ LCvi ; LCsi ; LCki with N the number of stations.
the setting up of a RA of the daily maximum rainfall in the
province of Malaga using the principal component analy-
Identification of homogeneous regions
sis technique (PCA) in order to differentiate homoge-
neous regions from a maximum precipitation point of To assess when a region proposed can be considered to be
view. homogeneous, a homogeneity test should be carried out
to compare the variability of the sample linear moments
of the stations considered with that expected after using
simulation techniques. This uses the weighted variance of
Methods the linear coefficient of variation of each of the stations in
The fundamental objective of the regional frequency accordance with the statistic:
analysis is the estimation of extreme events correspond-
( )1=2
ing to different return periods by using probability dis- X
N X
N
Selection of a frequency distribution que is used to determine the factors determining the group-
ing of similar or different individuals (Taguas et al. 2008).
In the regional frequency analysis, the collection of data
The nucleus of the PCA consists of transforming the
existing in all the stations of the region under study has to
initial quantitative variables into new synthetic noncorre-
be fitted to a probability distribution. There is generally no
lated ones called principal components, in which the
one distribution possible and the best one is that which
relationships between the variables and between the
provides the best estimate of the quantiles. The goodness
individuals are efficiently observed. The PCA aims to
of fit judges to what extent the linear coefficients of
find the best bidimensional representation of the indivi-
skewness and kurtosis of the distribution selected satisfac-
duals by the orthogonal projection of the point cloud along
torily fit the regional average of those coefficients to the
the principal directions or axes, which represent
data observed. To measure the goodness of fit to a
the variability of the individuals as well as possible (Crivis-
distribution of three parameters (Hosking & Wallis 1997),
qui 1999). The variables considered for the analysis should
the following statistic must be calculated:
be important characteristics of the phenomenon analysed
ZDIST ¼ ðtDIST
4 t4R þ B4 Þ=s4 ð3Þ so that they permit its modelling (Philippeau 1986).
tDIST is the linear skewness coefficient for the distribution Let us consider a set of measurements fXi(wj)g of p
4
proposed; tR4 the linear kurtosis coefficient for the region; variables, fXi g, over a set of n subjects fwj g. These data
s4 the typical deviation of tR4 ; and B4 a coefficient function allow one to define a dimension matrix p n, which
of the number of simulations to improve the fit and of tR4 . corresponds to the initial data matrix. From now on, the
The fit of a specific distribution is considered to be steps to be taken are the following.
adequate if the statistic ZDIST is sufficiently close to zero, a
reasonable value for this criterion being a degree of Calculation of the correlation matrix
significance of 90%, which corresponds to an absolute
value of ZDIST below or the same as 1.64. It consists of the calculation of the matrix of variances and
covariances of the initial data matrix, which, as they are
typified, constitute the matrix of the correlation coefficient.
Fit to a frequency distribution: calculation of Diagonalization of the correlation matrix. The eigenvalues
quantiles of the matrix of correlations, which determine the degree
of goodness of the representation or the percentage of
The objective is to fit the observations in each station to variance explained by each principal axis, are calculated.
the frequency distributions selected, taking into account a In addition, also calculated are the eigenvectors associated
characteristic scale factor for each one of them, i.e. the with each particular value, which determines the orthogo-
flood index (Kite 1977). Thus, the quantiles associated nal base defining the principal directions of the lengthen-
with certain return periods can be estimated, both for the ing of the point cloud of the initial matrix.
region and for each of its stations. This method was
applied for the first time to hydrological data of important
floods (Dalrymple 1960), hence its name. However, it can Calculation of the projections of the individuals
be applied to any type of data. on the principal axes
For each of the N stations in the region, the first four This is carried out by the linear combination of eigenvec-
linear moments lj are determined, subsequently dividing tors with the data matrix. The new co-ordinates on the
each one by the mean of the series l1 to be able to make a directions obtained from the eigenvectors permit one to
comparison with no units. With the values obtained, the observe the relationship of each individual to the princi-
regional values are calculated lRj pal axes. An individual reaching a high score in a principal
X
M axis indicates that it takes on high values in the variables
lRðjÞ ¼ lðjÞs ½Ns =L ð4Þ with a strong correlation with that axis. Thus, the similar-
s¼1 ity between individuals according to the attributes con-
sidered can be observed and the groups distinguished
The contribution of each station with Ni observations (Crivisqui 1999).
on average is weighted in terms of the length of the series,
by means of L.
Calculation of the principal components or
Occasionally, the homogeneity test of the regional fre-
projection of the variables on the principal plane
quency analysis does not permit the discrimination of
homogeneous regions. It is thus necessary to resort to other This consists of the calculation of a new matrix of
statistical techniques. One of them is the PCA. This techni- coefficients of correlation between the matrix of the new
Table 1 Identifiers, time period of data and number of complete years of data from each station
Time period
Time period (number Time period (number (number of
Station ID of complete years) Station ID of complete years) Station ID complete years)
Agujero 1 1952–1999 (48) Canillas Aceituno 25 1942–1999 (40) Marbella IL 49 1946–1999 (34)
Alcaucı́n Cjo. 2 1947–1985 (35) Cartajima 26 1943–1999 (48) Mijas Faro 50 1942–1988 (37)
Monjas
Alcaucı́n Forestal 3 1946–1999 (48) Cartama Estación 27 1948–1990 (43) Moclinejo 51 1940–1999 (48)
Alfarnate 4 1941–1999 (47) Casabermeja VP 28 1947–1999 (45) Montejaque CE 52 1942–1998 (47)
Alhaurı́n Grande 5 1962–1999 (38) Casapalma 29 1959–1999 (41) Nerja 53 1947–1999 (44)
Aljaima 6 1946–1999 (48) Casarabonela 30 1955–1999 (45) Ojén 54 1948–1999 (48)
Forestal
Almargen 7 1945–1999 (48) Casares 31 1945–1999 (48) Parauta 55 1947–1991 (36)
Taramal
Almogia Los 8 1949–1999 (48) Chorro Estación 32 1948–1999 (48) Parchite 56 1946–1999 (48)
Llanes
Alora 9 1946–1999 (48) Coı́n 33 1943–1999 (47) Peña 57 1943–1999 (48)
Enamorados
Alozaina 10 1944–1999 (45) Colmenar 34 1943–1999 (46) Periana 58 1939–1998 (45)
Alpandeire 11 1940–1999 (48) Cómpeta 35 1942–1999 (48) Pizarra 59 1946–1998 (48)
Antequera 12 1943–1999 (48) Conde 36 1943–1999 (48) Rincón Victoria 60 1965–1998 (34)
Aguila Guadalhorce
Archidona 13 1936–1999 (30) Contaderas 37 1956–1999 (44) Riogordo 61 1945–1999 (48)
Forestal
Arriate 14 1952–1999 (48) Corchado Central 38 1942–1998 (48) Ronda CE 62 1940–1992 (41)
Benahavis 15 1953–1990 (39) Cuevas Becerro 39 1945–1999 (34) Sierra Yeguas 63 1981–1999 (18)
Benalmádena 16 1964–1999 (30) El Burgo 40 1943–1995 (48) Sierra Caballo 64 1965–1999 (34)
Benamargosa 17 1942–1999 (45) Fte. Piedra Herriza 41 1964–1999 (36) SPAlcántara 65 1946–1999 (47)
Benamocarra 18 1942–1999 (48) Gobantes Vivero 42 1945–1999 (35) Tolox Las Millanas 66 1965–1999 (35)
Benaoján CP 19 1959–1999 (33) Humilladero 43 1946–1998 (47) Torrox 67 1944–1999 (48)
Bobadilla 20 1944–1999 (48) Hundidero Pto. 44 1954–1999 (46) Vegueta Grama 68 1943–1996 (46)
Sapo
Borregos 21 1949–1999 (48) Istán 45 1942–1999 (48) Vélez-Málaga 69 1943–1999 (45)
Buitreras CE 22 1941–1999 (48) Jimena de Lı́bar CE 46 1946–1998 (47) Villanueva de 70 1950–1999 (45)
Tapias
Buitreras Presa 23 1943–1988 (47) Las Mellizas 47 1946–1999 (46) Viñuela 71 1942–1999 (48)
Campillos 24 1945–1999 (48) Málaga Azucarera 48 1982–1999 (17) Yedra (la) 72 1948–1999 (41)
co-ordinates of the individuals on the principal axes and of the values for the first two components is considered to
the initial matrix of data typified. The principal compo- be another quality index, and values below 0.30 indicate
nents indicate the degree of correlation of the variables that the projection of the individual is not adequate and
with the axes. If a variable takes on a high component in the individuals could not be compared because the point
an axis, good correlations will exist with the rest of the had moved away from the plane (Philippeau 1986).
variables whose principal components are high for that
axis.
Rainfall data analysed
To carry out this work, data existing in 72 stations in the
Quality test for the representation of the
province of Malaga and supplied by the Hydrographic
variables and of the individuals, in order to
Confederation of the Southern Basin were processed.
verify the quality of the representation on the
The temporal extension of these time precipitation
principal plane
series is summarized in Table 1. Because in some cases
This is performed by means of the sum of the first two data gaps were present, not all the stations were used in
principal components of the variable, showing the this study. Only time series with at least 17 consecutive
squared cosines to be the directors representing each years were selected. For each one of the stations, the
variable on the principal plane. If values of below 0.60 series of maximum annual daily rainfall was extracted.
are reached, the variable is not well represented. The sum Maximum precipitation time series was analysed by a
regional frequency analysis. Accuracy of RA results de- therefore, were removed from the analysis. They were
pends on quality and integrity of precipitation data used. stations 10, 39 and 63, with discordance values of 3.23,
For this reason, the quality control procedure defined as 10.53 and 11.11, respectively.
‘Range Test’ (Meek & Hatfield 1994; Shafer et al. 2000) To verify if the remaining 69 stations formed a homo-
was applied to the rainfall dataset. This validation method geneous region with regard to their maximum daily
is a prerequisite for this analysis, rejecting any observa- precipitations, the statistic H was calculated giving a value
tion that occurs outside the allowable range. of 3.85, therefore indicating that they constituted a
heterogeneous region.
The PCA is therefore resorted to in order to be able to
Results and discussion discriminate different subregions within the province that
After obtaining the series of yearly maximum daily rain- have common characteristics and that can perform homo-
fall for each of the 72 weather stations in the province of geneously from the point of view of their maximum daily
Malaga, the data available were analysed and the discor- rainfall. Hosking & Wallis (1997) advise forming homo-
dance was calculated. The critical value for this statistic geneous regions by considering the characteristics of
when the number of stations is over 15 is 3.00 (Hosking & the site, such as localization, height and rainfall, etc.
Wallis 1997). Only three stations exceeded this value and, Following these recommendations, each of the stations
Variables used: station (ID), longitude (XUTM), latitude (YUTM), altitude (Z) and maximum daily rainfall for a return period of 100 years (P100).
studied was characterized by four representative para- The results associated with the eigenvalues (Table 3)
meters: indicate that the principal plane is formed by the axes 1
Longitude: the UTM co-ordinate X can be associated and 2 and explain 71.5% of the total variance.
with the provenance of the rain fronts. As can be deduced from the values of the principal
Latitude: the UTM co-ordinate Y represented the proxi- components shown in Table 4, the variables most corre-
mity of the sea and its regulatory effect on rainfall. lated with the first axis are the co-ordinates YUTM and
Altitude: the co-ordinate Z supplies the height over sea the daily maximum rainfall for a return period of 100
level and gives an idea of the orography. years. As it can be observed in Table 4, the distance to
Maximum daily rainfall for a return period of 100 years the sea (determined by the co-ordinate Y) showed a
(P100). Calculated after fitting the series of maximum daily negative correlation with the maximum daily rainfall
data of each station to the Gumbel distribution function, for a return period of 100 years, which indicates drier
this permits taking into account the precipitation at the rainfall regimes in inside areas. The co-ordinate XUTM
moment of grouping the stations and it is a representative also has a good correlation with these variables, which
parameter of the long-term rainfall behaviour. illustrate a bigger arid degree towards the east. In axis 2,
The values of these parameters (XUTM, YUTM, Z and the variable with the best correlation is the altitude with
P100) for each station are as follows (Table 2). The follow- a notable difference with respect to the rest of the
ing Tables 3–7 show the results obtained in the PCA. variables, which appear as being poorly correlated with
this axis. The impact of the altitude is difficult to explain
Table 3 Eigenvalues and percentage of variance of each axis
because the mountain ranges show different aspects
that may influence notably the behaviour of rainfall
Axis Eigenvalues Variance (%) Cumulated variance (%)
fronts. Authors as Catalina & Fernández (2002) also
1 1.6886 0.4221 0.42214
2 1.1728 0.2932 0.71534 found that the precipitation gradient in Malaga declines
3 0.7043 0.1761 0.89140 from west to east as this is the direction of most of the
4 0.4344 0.1086 1 fronts originating the cyclonal type rainfall, which en-
ters conditioned by the distribution in parallel of the
Table 4 Principal components massif, discharging the rain as these mountains appear
Axis 1 Axis 2 Axis 3 Axis 4
in its trajectory. Therefore, different groups of stations
XUTM (m) 0.662 0.473 0.508 0.278 can be formed as a function of the co-ordinates. This
YUTM (m) 0.828 0.286 0.131 0.473 allowed to identify the rainfall regime and the altitude
Z (m) 0.310 0.886 0.019 0.341 effect obtained from each one of them on this principal
P 0.685 0.286 0.655 0.132
plane (Table 5). Taguas et al. (2008) applied a similar
The variables with the best correlation with first and second axes are approach to identify close flow patterns in different
indicated in bold. catchments in an area that included the province of
Malaga where they also evaluated notable differences of Table 7 Stations of each region
rainfall values. Region Quadrant Stations (ID)
Figure 1 is highly useful when delimiting these stations. PCA1 11 4, 7, 12, 13, 20, 24, 28, 29, 34, 37, 39, 41, 42, 43,
Most of them are grouped in the first and the third quad- 44, 56, 57, 58, 64, 70, 72
rants; whereas, in the two remaining ones there are fewer PCA2 21 5, 10, 11, 19, 23, 26, 30, 31, 40, 45, 46, 55
stations and they are more disperse. According to the para- PCA3 31 1, 6, 8, 9, 15, 16, 22, 27, 33, 38, 48, 49, 50, 54, 59,
meters most correlated with the principal axes, four possible 60, 65, 66
PCA4 41 2, 17, 18, 21, 25, 35, 51, 53, 61, 67, 68, 69, 71
regions can be differentiated, one in each quadrant of
the principal plane represented. However, the quality tests PCA, principal component analysis.
(Table 6) indicate that eight stations (ID in bold in the table)
are not well represented because they only reach quality On comparing the results obtained by the PCA for the
indexes of below 0.30. Therefore, the four regions are formation of regions, with the real characteristics of the
integrated as a whole by 64 stations. The stations belonging stations of each of them, a high concordance was ob-
to each of the differentiated regions appear in Table 7. served (Fig. 2):
PCA1: most of the stations are situated in the northern The regions PCA1, PCA3 and PCA4 are homogeneous.
area of the province of Malaga, with elevated heights the In the first of them, station 39 had to be removed from the
same as maximum rainfall. analysis as it was discordant. Region PCA2 gave a value of
PCA2: all the stations were also at elevated heights, in H of above 2, so that it was heterogeneous, and, therefore,
the south-western part of the province and located in rain it was not possible to carry out a valid RA with it.
shadow, which means that their rainfall records were not The regional frequency analysis was, thus, continued
excessive. for each of the three homogeneous regions obtained. In
PCA3: composed of stations located in the south of the each of them, the maximum precipitation data had to be
province, close to the sea and with little rainfall. fitted to a probability distribution function. Among the
PCA4: this region practically coincides with the area distribution functions, most often used to work with
known as the Axarquia with scant precipitations, because hydrological data are the following:
this part of the province is preferentially affected by the generalized logistics (LOGGEN);
fronts coming from Levante, which usually bring less rain. general extreme value distribution (GEV);
The following step is to determine which group of the generalized normal distribution (NORGEN);
four established form regions can be considered to be Pearson type III (PT3); and
homogeneous with regard to their precipitations. Table 8 generalized Pareto (PARGEN).
shows the values of the H statistic obtained for each of the Table 9 shows the values of the statistic ZDIST for each
regions considered. distribution function and for each study region. For
Table 8 Results of the homogeneity test Table 11 Heights of precipitation (mm) for different stations