You are on page 1of 38

WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS

 Population: A population is a collection of persons or objects, e.g., pupils in a school, workers


in a factory, people in a country, or motor cars produced in a factory. Each unit of the
population has many different attributes associated with it. These attributes might be: height,
volume or weight which are measurable with reference to respective scales of measurement;
or colour, condition, etc., which may not be numerically expressable.
 Sample Data: Sample data are the available data from the observations of an event.
 Random Events: Events whose occurrence is not influenced by the occurrence of the same or
similar events earlier.
 Probability Density Function: Probability density function (P.D.F.) is the probability of occurrence
of an event.
 Cumulative Density Function: Cumulative density function (C.D.F.) is the probability of
occurrence of all the events that are equal to or less than a given event.
 Probability Paper: A probability paper is a special graph paper on which an ordinate usually
represents the magnitude of the variate and the abscissa represents the probability P, or the
return period T. The ordinate and abscissa scales are so designed that the distribution plots
more nearly a straight line permitting better definition of the upper and lower parts of the
frequency curve. The probability paper is used to linearize the distribution so that the data to
be fitted appear close to a straight line, like the extreme value and the log normal probability
papers being used for the linearization of the extreme value and log normal distributions.
 Plotting Position: Determining the probability to assign a data point is commonly referred to as
determining its plotting position.

CIV 3204 WATER RESOURCES ENGINEERING I 1


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
 In sample statistics a number of parameters are measured to
determine:-
1. the central tendency (location) or the value around which all other
values are clustered,
2. the spread of the sample values around the mean value,
3. the asymmetry or skewness of the frequency distribution, and
4. the flatness of the frequency distribution.

CIV 3204 WATER RESOURCES ENGINEERING I 2


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
 In sample statistics a number of parameters are measured to
determine:-
1. the central tendency (location) or the value around which all other
values are clustered,
i. Mid-range – Average of the maximum and minimum values of a sample or
population
ii. Mode is the value in the sample having most frequent occurrences
iii. Median is the middle value of ranked values of sample or population
iv. Mean is the ration of sum of all values to the number of values

 Given the data of annual mean flows (cumecs) below determine


central tendency parameters

CIV 3204 WATER RESOURCES ENGINEERING I 3


9. 7 6 . 6 2,

WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS


10. 1 1 4 . 70,

11. 9 2 . 7 6,

12. 1 4 4 . 35,

13. 5 9 . 8 3,

1. 82.87, 14. 95.38,


2. 67.95, 15. 137.76,
3. 46.46, 16. 121.38,
4. 100.79, 17. 148.87,
5. 125.28 18. 121.00,
6. 60.01, 19. 47.99,
7. 96.73, 20. 92.34,
8. 95.75, 21. 90.26,
9. 76.62, 22. 77.70,
10. 114.70, 23. 76.11,
11. 92.76, 24. 95.09,
12. 144.35, 25. 95.4.7,
13. 59.83, 26. 78.62.
14. 9 5 . 3 8,

15. 1 3 7 . 76,

16.

17.
1 2 1 . 38,

1 4 8 . 87, CIV 3204 WATER RESOURCES ENGINEERING I 4


18. 1 2 1 . 00,
WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
 I n s ample s tatistic s a num ber of parameters ar e meas ured to determ ine:-

1. the central tendency (location) or the value around which all other values are clustered,

2. the spread of the sample values around the mean value,


I. Range is the difference between the minimum and maximum values
II. Interquartile Range is defined as l3-l1 where l1 is the value separating the
lowest quarter of the ranked data from the second quarter, and l3 separates
the third and fourth quarters of the ranked data. In other words, the
interquartile range (i.e., between 25% and 75% cumulative frequency
values) contains 50% of the values
III. Mean Deviation is the dispersion of values about the arithementic mean
௡ ଵ
௜ୀଵ ௡

IV. Variance is the measure of the mean square deviation σ but for a sample
ଵ ௡ ଶ

ேିଵ ௜ୀଵ ௜
V. Standard Deviation is the unbiased estimate of the poplution standard
deviation from the sample and is the root of variance
VI. Coenficient of variation is a dispersion parameter equal to the ratio of
standard deviation to the mean

CIV 3204 WATER RESOURCES ENGINEERING I 5


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
 I n s ample s tatistic s a num ber of parameters ar e meas ured to determ ine:-

1. the central tendency (location) or the value around which all other values are clustered,

2. the spread of the sample values around the mean value,

3. the asymmetry or skewness of the frequency distribution, and


If data values are dispersed in perfect symmetry about their mean value,
then the measure of symmetry of the data is said to be zero. However, if the
number of values, arranged in ascending order, falling to the right of the
mean are more spread out compared to the data falling to the left, then
data spread is obviously asymmetric (skewed), and the asymmetry is
conventionally termed positive.

i. Interquartile Measure of Asymmetry

ii. Third Central Moment (Skewness)

iii. Skewness Coefficient

CIV 3204 WATER RESOURCES ENGINEERING I 6


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
 I n s ample s tatistic s a num ber of parameters ar e meas ured to determ ine:-

1. the central tendency (location) or the value around which all other values are clustered,

2. the spread of the sample values around the mean value,

3. the asymmetry or skewness of the frequency distribution, and

4. the flatness of the frequency distribution.


i. Kurtosis Coefficient measures the peakedness or the flatness of the
frequency distribution near its centre. An unbiased estimate of this
coefficient is given by

Positive values of Kurtosis indicate that a given frequency distribution


is more peaked around its centre than the normal distribution and
this frequency distribution is known as Leptokurtic. The negative
values of Kurtosis indicate that a given frequency distribution is
more flat around its centre than normal and this distribution
frequency distribution is known as Playkurtic. Normal distribution is
said to be Mesokurtic.

CIV 3204 WATER RESOURCES ENGINEERING I 7


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
 In sample statistics a number of parameters are measured to
determine:-
1. the central tendency (location) or the value around which all other
values are clustered,
2. the spread of the sample values around the mean value,
3. the asymmetry or skewness of the frequency distribution, and
4. the flatness of the frequency distribution.

 The formulae used in these analysis can also be applied to


grouped data by using frequency

CIV 3204 WATER RESOURCES ENGINEERING I 8


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS
• Hydrologic systems are influenced by extreme Events - e.g., severe
storms, floods, droughts.
• The magnitude of an extreme event is inversely proportional to its
frequency of occurrence (i.e., more severe events occur less
frequently).
• Frequency analysis is a procedure for estimating the frequency (or
the probability) of occurrence of extreme events

CIV 3204 WATER RESOURCES ENGINEERING I 9


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS
• Objective of frequency analysis of hydrologic data is to relate the
magnitude of extreme events to their frequency of occurrence
using probability distributions.
• Hydrologic data to be analyzed is assumed to be independent and
identically distributed and the hydrologic system is assumed to be
stochastic, space-independent and time independent
• The data should be properly selected so that the assumptions of
independence and identical distributions are satisfied.
• The assumption of identical distribution or the homogeneity is achieved by
selecting the observations from same population (i.e., no changes in the
watershed and recording gauges are made)
• The assumption of independence is achieved by selecting the annual maximum
of the variable being analyzed as the successive observations from year to year
will be independent.
CIV 3204 WATER RESOURCES ENGINEERING I 10
WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS
• The results of flood frequency analysis can be used for many
engineering purposes – e.g.
• flood flow frequency analysis can be used in the design of dams, bridges,
culverts, flood controlling devices
• Urban flooding : design storms
• Drought frequency and magnitude for agricultural planning

CIV 3204 WATER RESOURCES ENGINEERING I 11


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
• An extreme event is defined to have occurred if a random variable
X is greater than (or equal to) a level XT
• The time between the occurrences of X ≥ xT is called the
“recurrence interval”(τ).
• The expected value of τ, E(τ), is the average number of years in
which the event X ≥ xT returns.
• E(τ) is the return period ‘T’ of the event X ≥ xT.
• The concept of return period is used to describe the likelihood of
occurrences.

CIV 3204 WATER RESOURCES ENGINEERING I 12


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
Example: Consider the annual maximum discharge Q, (in cumec) of a
river for 45 years

CIV 3204 WATER RESOURCES ENGINEERING I 13


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
Example: Consider the annual maximum discharge Q, (in cumec) of a
river for 45 years

CIV 3204 WATER RESOURCES ENGINEERING I 14


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
Example: Consider the annual maximum discharge Q, (in cumec) of a
river for 45 years

CIV 3204 WATER RESOURCES ENGINEERING I 15


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
Reccurance interval is as follows:-
Exceeded Recurrence
• During the period, the recurrence interval is
year interval in years
ranging from 1 to 14 years; there are 8
1952
recurrence intervals covering a total period of
1956 4
40 years between the first and last
1
1957 occurrences of the event.
1 • The return period for 1500 cumec annual
1958
maximum discharge on the river is equal to
1972 14
the average recurrence interval τ = 40/8 = 5
1976 4
years
1981 5 • The return period of a given magnitude is
1987 6 defined as the average recurrence interval
1992 5 between events equaling or exceeding a
specified magnitude

CIV 3204 WATER RESOURCES ENGINEERING I 16


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
The probability p = P(X≥xT) of occurrence of the event X ≥ xT in any
observation is related with return period as follows:
• For an observation, two outcomes are possible: success or failure
• Success is the probability p i.e., X ≥ xT ; Failure is probability (1 – p) i.e., X < xT
• Since the observations are independent, the probability of a
recurrence interval of duration τ is the product of probabilities of τ
– 1 failures followed by a success i.e., (1 – p)τ-1 p.

CIV 3204 WATER RESOURCES ENGINEERING I 17


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period

CIV 3204 WATER RESOURCES ENGINEERING I 18


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period

CIV 3204 WATER RESOURCES ENGINEERING I 19


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period

CIV 3204 WATER RESOURCES ENGINEERING I 20


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
• What is the probability that the annual maximum discharge in the river
will exceed or equal 1500 cumec for the data in Example-1
• Return period of the event [X ≥1500] is 5 years

• A useful question to be answered is: what is the probability that a T


year return period event will occur at least once in N years?
• Consider the situation that in N years, the event X ≥xT does not occur
at all.
• The probability is (1 – p) x (1 – p) x (1 – p) ……. N times = (1 – p)N
(1 – p) is the probability that in a year the event X > xT does not occur

CIV 3204 WATER RESOURCES ENGINEERING I 21


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Return Period
• The complimentary event is that the T year event occurs at least once in N year
period; and hence the probability is 1 – (1 – p)N
ଵ ே
• Since p = 1/T; Required probability is

• Obtain the probability that the annual maximum discharge in the river will equal
or exceed 1500 cumec at least once in the next five years
ଵ ହ


= 0.672

CIV 3204 WATER RESOURCES ENGINEERING I 22


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Hydrological Data Series
• Complete duration series: A series containing all the available data.
Partial duration series (Peaks over a threshold series ): A series of data which are
selected so that their magnitude is greater than a predefined base value.
• Annual exceedence series: If the base value is selected so that the number of
values is equal to the number of years.
• Extreme value series: Series including the largest or smallest values occurring in
each of the equally long time intervals of the record.
• Annual maximum (or minimum) series: Series with largest (or smallest) annual
values.
• The return period Te of the event developed from an annual exceedence series is
related with the corresponding return period T derived from an annual maximum
series by (Chow, 1964)

CIV 3204 WATER RESOURCES ENGINEERING I 23


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Hydrological Data Series
• Annual exceedance series may be difficult to verify that all the observations are
independent
• The occurrence of a large flood could be related to antecedent soil conditions or
a previous flood or an action in a river channel
• Usually annual maximum series is preferred to use.
• As the return period becomes large, the results from the two approaches become
similar as the chance that two such events will occur within any year is very small.

CIV 3204 WATER RESOURCES ENGINEERING I 24


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Extreme Value Distributions:
• Extreme events:
• Peak flood discharge in a stream
• Maximum rainfall intensity
• Minimum flow
• The study of extreme hydrologic events involves the selection of largest (or smallest)
observations from sets of data e.g.;
• the study of peak flows of a stream uses the largest flow value recorded at gauging station
each year.
• Three types of Extreme Value distributions are developed based on limited assumptions
concerning parent distribution
• Extreme value Type-I (EV I) (Double Exponential or Gumbel )– parent distribution
unbounded in direction of the desired extreme and all moments of the
distribution exist.
• Extreme value Type-II (EV II) – parent distribution unbounded in direction of the
desired extreme and all moments of the distribution do not exist.
• Extreme value Type-III (EV III) – parent distribution bounded in direction of the
desired extreme.

CIV 3204 WATER RESOURCES ENGINEERING I 25


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Extreme Value (EV1) Distributions:
• The cumulative probability distribution function is

• Y = (X – β)/ α → transformation

• The cumulative probability distribution function is

• THEREFORE

CIV 3204 WATER RESOURCES ENGINEERING I 26


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Extreme Value (EV1) Distributions:
Consider the annual maximum
discharge of a river for 45 years in
a previous example,
1. develop a model for annual
maximum discharge frequency
analysis using Extreme Value
Type-I distribution and
2. calculate the 20 year and 100
year return period maximum
annual discharge values

CIV 3204 WATER RESOURCES ENGINEERING I 27


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Extreme Value (EV1) Distributions:
Consider the annual maximum
discharge of a river for 45 years in
a previous example,
1. develop a model for annual
maximum discharge frequency
analysis using Extreme Value
Type-I distribution and
2. calculate the 20 year and 100
year return period maximum
annual discharge values

CIV 3204 WATER RESOURCES ENGINEERING I 28


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS:
Other types of distributions
1. Normal (Gaussian)
2. Exponential
3. Gamma
4. Pearson Type III
5. Log-normal
6. Log Pearson Type III
7. General Extreme Value (Jenkinson)
8. Poisson
9. Pareto

CIV 3204 WATER RESOURCES ENGINEERING I 29


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Frequency Factors
• Calculating the magnitudes of extreme events by the above
prescribed methods requires that the cumulative probability
distribution function to be invertible.
• Some probability distribution functions like normal, Pearson type-III
distributions are not readily invertible.
• The alternative method of calculating the magnitudes of extreme
events is using frequency factors
The magnitude of an event xT is represented as the mean μ plus the
deviation ΔxT of the variate from the mean
The deviation is taken as equal to the product of standard deviation σ
and a frequency factor KT
OR

CIV 3204 WATER RESOURCES ENGINEERING I 30


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Frequency Factors
• The deviation and the frequency factor are functions of the return
period and the type of probability distribution to be used in the
analysis
• If in the event, the variable analyzed is y = logx, then the same
method is used to the statistics for the logarithms of the data as

• For a given distribution, a K – T relationship can be determined


• The relationship can be expressed in terms of mathematical terms
or a table.
• The procedure is as follows:-
1. Obtain the statistical parameters required for the probability distribution from the given data
2. For a given return period T, the frequency factor is determined from the K – T relationship proposed
for the distribution
3. The magnitude of xT is then computed by

CIV 3204 WATER RESOURCES ENGINEERING I 31


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Frequency Factors

CIV 3204 WATER RESOURCES ENGINEERING I 32


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Frequency Factors
• When p > 0.5, 1 – p is substituted and the KT value computed is
given an negative sign
• The error in this formula is less than 0.00045 (Abramowitz and
Stegun, 1965)
• The frequency factor KT for the normal distribution is equal to
standard normal deviate z.
• The same procedure is applied for the lognormal distribution expect
that the logarithm of data is used to calculate mean and standard
deviation.

Consider the annual maximum discharge of a river for 45years in the


previous example

CIV 3204 WATER RESOURCES ENGINEERING I 33


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Frequency Factors
• The mean of the data, = 756.6 cumec Standard deviation, s =
639.5 cumec: Calculate the frequency factor and obtain the
maximum annual discharge value corresponding to 20 year return
period using Normal distribution
T = 20; p = 1/20 = 0.05

CIV 3204 WATER RESOURCES ENGINEERING I 34


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: Frequency Factors

CIV 3204 WATER RESOURCES ENGINEERING I 35


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: PROBABILITY PLOTTING
• To check whether probability distribution fits a set of data or not the data
is plotted on specially designed probability paper.
• The cumulative probability of a distribution is represented graphically on
probability paper designed for the distribution.
• The plot is prepared with exceedance probability or the return period ‘T’
on abscissa and the magnitude of the event on ordinate.
• The scales of abscissa and ordinate are so designed that the data to be
fitted are expected to appear close to straight line
• The purpose of using the probability paper is to linearize the probability
relationship
• The plot can be used for interpolation, extrapolation and comparison
purposes.
• The plot can also be used for estimating magnitudes with other return
periods.
• If the plot is used for extrapolation, the effect of various errors is often
magnified.

CIV 3204 WATER RESOURCES ENGINEERING I 36


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: PROBABILITY PLOTTING POSITION
• Plotting position is a simple empirical technique
• Relation between the magnitude of an event verses its probability of
exceedence.
• Plotting position refers to the probability value assigned to each of the
data to be plotted
• Several empirical methods to determine the plotting positions.
• Arrange the given series of data in descending order
• Assign a order number to each of the data (termed as rank of the data)
• First entry as 1, second as 2 etc.
• Let ‘n’ be the total no. of values to be plotted and ‘m’ is the rank of a
value, the exceedance probability (p) of the mth largest value is obtained
by various formulae.
• The return period (T) of the event is calculated by T = 1/p
• Compute T for all the events
• Plot T verses the magnitude of event on semi log or log log paper

CIV 3204 WATER RESOURCES ENGINEERING I 37


WATER RESOURCES ENGINEERING CONCEPTS: STATISTICS
FREQUENCY ANALYSIS: PROBABILITY PLOTTING POSITION
• Formulae for exceedance probability
• California method
• Hazens formula

• Weibull’s method

CIV 3204 WATER RESOURCES ENGINEERING I 38

You might also like