You are on page 1of 26

Hydrologic Regionalization With

Clustering

By

Nirdesh Kumar-06004008
Introduction

Regionalization

 Areas with homogeneous hydrologic response.


 Applications-hydrologic design, planning, management of water resources
systems, regional trend analysis and frequency analysis of floods, low flows and
other variables.
 Attributes-factors influencing hydrology in the area.
• Physiographic-drainage area, slope of the mainchannel in the drainage
basin, soil runoff coefficient and storage.
• Location-latitude, longitude and elevation.
• Meteaorological- Specific humidity, temperature, wind velocity, wind
direction and rainfall.

 On basis of attributes, sites are selected-Feature vectors.


 Cluster-Regions containing feature vectors with similar hydrologic response.
 Optimum number of clusters obtained by application of cluster validity indices.
 Tests applied to check the homogeneity of the region-Regional homogeneity test.
 Regions adjusted to improve homogeneity.

Selection of variables influencing the hydrology in a region as attributes

Preparation of feature vectors using selected variables

Formation of clusters by applying clustering algorithm

Identification of optimum number of clusters

Validation of regions to test their homogeneity

Adjustment of heterogeneous regions


Clustering Techniques

 Clustering-Variety of multivariate statistical procedures that are used to


investigate, interpret and classify given data into similar groups or clusters, which
may or may not be overlapping.
 The data points within a cluster should be as similar as possible and the data
points of different clusters should be as dissimilar as possible.
 Various Algorithms are used for clustering-K-means algorithm, single linkage,
complete linkage and Ward’s algorithm.
Hydrologic Regionalization With Clustering

 Clustering Algorithms

 K-Means Algorithm

• N feature vectors in n-dimensional attribute space

is the value of attribute j in ith feature vector

• Each feature vector represents one of the N sites in the study region.
• Rescaling-process necessary to nullify the effects of the differences in their
variance and relative magnitudes.
denotes the rescaled value of
Represents standard deviation of attribute j.
Mean value of attribute j over all N feature vectors.

K-number of clusters.
Nk -number of feature vectors in cluster k.
-rescaled value of attribute j in the feature vector I assigned to cluster k.
-mean value of attribute j for cluster k, computed as
• Minimizing F, distance of each feature vector from the centre of the cluster to
which it belongs, is minimized.
• Steps involved in K-means algorithm to delineate clusters for a given value of K
are:

1- Set ‘‘current iteration number’’ t to 0 and maximum number of iterations to t_max.


2- Initialize K cluster centers to random values in the multidimensional feature vector
space.
3- Initialize the ‘‘current feature vector number’’ i to 1.
4- Determine Euclidean distance of ith feature vector from centers of each of the K
clusters, and assign it to the cluster whose center is nearest to it.
5- If i < N, increment i to i + 1 and go to step 4; otherwise continue with step 6.
6- Update the centroid of each cluster by computing average of the feature vectors
assigned to it. Then compute F for the current iteration t. If t = 0, increase t to t + 1
and go to step 3. If t > 0, compute the difference in the values of F for iterations t and
t - 1. Terminate the algorithm if change in the value of F between two successive
iterations is insignificant; otherwise, continue with step 7.
7- If t < t_max, update t to t + 1 and go to step 3; otherwise, terminate the algorithm.
Single linkage and complete linkage algorithms

• Single linkage-Distance between the cluster [yi ,yj ], formed by merging clusters yi
and yj ,and yk ,is the smaller of the distances between yi and yk or yj and yk .
• Complete linkage-distance between the new cluster [yi ,yj ] and any other singleton
cluster yk is the greater of the distances between yi and yk or yj and yk .

Single linkage Complete linkage


 Ward’s algorithm

•The objective function, W, of Ward’s algorithm minimizes the sum of squares of


deviations of the feature vectors from the centroid of their respective clusters.

• At each step in the analysis, union of every possible pair of clusters is considered
and two clusters whose fusion results in the smallest increase in W are merged.
• The change depends only on the relationship between the two merged clusters
and not on the relationships with other clusters.
 Cluster Validity Indices

 Identification of optimum number of compact and well separated


clusters.
Dunn’s index

δ( Ci ,Cj )-Distance between clusters Ci and Cj

Δ(Ck )-Intracluster distance of cluster Ck .


 Regional Homogeneity Test

 Heterogeneity of the set of plausible regions obtained from the cluster analysis
is assessed.
 Uses the advantages offered by sampling properties of L-moment ratios.
 Examines whether the between-site dispersion of the sample LMRs for the
group of sites under consideration is larger than the dispersion expected in a
homogeneous region.

tR –Regional average coeficient of L-variation(L-CV).


t4R –Regional average L-kurtosis.
t3R –Regional average L-skewness.

-Weight apllied to sample L moment ratios at site i.


• Heterogeneity measures (HM) can be based on three measures of dispersion.

(1) weighted standard deviation of the at-site sample L-CVs (V);

(2) weighted average distance from the site to the group weighted mean in the two
dimensional space of L-CV and L-skewness ();

(3) weighted average distance from the site to the group weighted mean in the
two dimensional space of L-skewness and L-kurtosis ().
• For each simulated realization(homogeneous region) V1 ,V2 and V3 are computed.
•μv ,μv2 ,μv3 are mean deviations and σv ,σv2 ,σv3 are the standard deviations of the
simulated realizations.

• HM<1-Acceptably homogeneous.
• 1≤HM≤2-Possibly homogeneous.
• HM≥2-Definitely heterogeneous.

 Adjustment of the Regions

The regions are adjusted to improve their homogeneity through the following:

(1) Eliminating (or deleting) one or more sites from the data set;
(2) Transferring one or more discordant sites from a region to other regions;
(3) Dividing a region to form two or more new regions;
(4) Allowing a site to be shared by two or more regions;
(5) Dissolving regions by transferring their sites to other regions;
(6) Merging a region with another or others;
(7) Merging two or more regions and redefining groups;
(8) Obtaining more data and redefining regions.

 First three options are useful in reducing the values of heterogeneity measures of a region
 Options 4–7 help in ensuring that each region is sufficiently large in terms of collective
data length at all the sites in it
Hydrologic Regionalization of India
 Description of the Study Region

 The study region India (Figure 2) lies between 8◦ 4’ and 37◦ 6’ north latitude and 68◦ 7’
and 97◦ 25’ east longitude, and has an area of 32,87,263 km2 .
 Climate- winter (January and February), summer (March to May), summer monsoon
(June to September), and post monsoon (October to December).
 Data Used
 Daily gridded rainfall data for the period 1951–2004 procured from IMD(India
Meteorological Department).
 Records a 2140 stations
 Gridded reanalysis data of the monthly mean atmospheric variables is taken from
database of National Centers for Environmental Prediction (NCEP) [1951-2004]
 Elevation of terrain in each of the NCEP grid boxes is computed from Shuttle Radar
Topography Mission (SRTM)
 Five maps of SMR regions currently in use by the IMD are used.
 Results and Discussion

 The statistical homogeneity of each of the five IMD SMR regions is tested using SMR
data at grid points in the region as shown in the table below.
Serial Region Number of Region Type
Number Name Grid Points
1 Peninsular 49 23.28 5.93 0.26 Definitely
heterogeneous
2 West 86 10.89 0.64 -1.33 Definitely
Central heterogeneous
3 Northwest 69 20.96 5.87 -1.08 Definitely
heterogeneous
4 Central 59 4.32 -0.73 -1.90 Definitely
Northeast heterogeneous
5 Northeast 36 4.44 -0.91 1.06 Definitely
heterogeneous

Table 1- Characteristics of the IMD SMR Regions Determined Using Heterogeneity Measures

 The IMD regions are adjusted to improve their homogeneity and tabulated in table 2.
 Figure 2 shows the number of sites removed to make the regions acceptably
homogeneous.
Figure 2- SMR regions that are considered as Figure 3-SMR regions after adjusting
homogeneous by IMD
Serial Number Region Name Number of Grid Heterogeneity Measures Number of Grid
Points Points
Eliminated

1 Peninsular 27 0.75 -0.34 1.35 22


2 West Central 62 0.80 -1.17 -2.03 24
3 Northwest 40 0.84 -0.86 -1.90 29
4 Central Northeast 45 0.74 -0.86 -1.47 14
5 Northeast 32 0.45 -1.30 -1.06 04

Table 2-Characteristics of SMR regions after adjusting

 To delineate new homogeneous SMR regions in the study region, 52 out of 60 NCEP
grid boxes covering India are considered
 Rain gauge density low in himalayan region(8 boxes discarded).
 mean monthly values of each of the 15 atmospheric variables are considered at each
NCEP grid point for the summer monsoon months.
 960 values (15 variables *16 grid points*4 months) are obtained for each grid point.
 The principal components and standardized location attributes (latitude, longitude,
and average elevation of terrain in each of the NCEP grid boxes) are considered as
attributes to form 52 feature vectors for K-means cluster analysis, to reduce
redundancy.
Figure 3- Grid boxes covering India. Figure 4- Identification of optimal partition
provided by K-means clustering algorithm

 Atmospheric variables influencing rainfall in the hashed box are considered at 16


NCEP grid points shown as black dots surrounding the box.
 To know the exact number of regions,K-means algorithm is applied and cluster
validity indices are computed to determine the optimum numbe rof clusters.
 Partition with the minimum value for Davies-Bouldin index and
the maximum value for Dunn’s and Calinski-Harabasz indices is
considered as the optimal partition.
 Several of the clusters obtained using K-means algorithm for
the choice of K greater than 15 are found to be quite small in
size, therefore clusters obtained for K = 15 are selected as
optimal partition.

Figure 5- Clusters in optimal partition obtained using K-means


algorithm.
Cluster Number Cluster Size(in Number of IMD Heterogeneity Measures
Grid Points)

1 15 -1.56 -0.46 0.03


2 29 10.91 2.79 -0.83
3 22 17.27 5.46 1.22
4 25 9.65 1.03 -0.33
5 38 5.20 -1.71 -3.11
6 53 5.43 -0.40 -1.78
7 9 4.27 0.36 -1.07
8 6 -0.43 -1.85 -1.55
9 13 6.08 -1.22 -1.86
10 53 12.37 0.30 -2.17
11 9 8.51 6.18 4.55
12 4 2.45 0.86 -0.14
13 46 12.67 2.74 -0.16
14 20 -0.20 -1.24 -1.07
15 11 2.46 -0.05 -1.06

Table 3- Characteristics of the Clusters in Optimal Partition


Obtained Using K-Means Algorithm.

 Table 3 shows that clusters 8 and 14 are found to be acceptably homogeneous,


cluster 1 is possibly homogeneous, whereas the remaining clusters are heterogeneous.
 Overall, 23 out of the 301 IMD grid points considered for regionalization are
unallocated, as they are eliminated from different regions to improve statistical
homogeneity.
Six sites are transferred to other regions, and 33 sites are separated from clusters to
form new regions.
Table 4- Details of Region Formation From Optimal Partition Obtained Using K-Means
Algorithm.
 The regions are adjusted and all 17 regions are classified as either acceptably
homogeneous or possibly homogeneous.
Table 5-Characteristics of the Regions Formed by Adjusting Clusters Obtained Using
K-Means Algorithm
Figure 6-Homogeneous rainfall regions obtained by adjusting the clusters

 We observe that the number of sites that had to be eliminated from the regions for
improving their statistical homogeneity is found to be excessive, indicating that the IMD
SMR regions are not useful as precursors to derive homogeneous SMR regions.
 New SMR regions are delineated using the proposed methodology.
Conclusion
 Existing approaches based on statistics computed from observed hydrology.

• Independent validation of the delineated regions for homogeneity in hydrology is not


possible.
• Uncertainty in forming homogeneous regions in areas having a limited hydrological
data available.

 Proposed method has the ability to form regions irrespective of the available data(rain
gauges for this study).
 However, as seen in this study, there is uncertainty in validating homogeneous regions in
areas having a few rain gauges.
Thank You

You might also like