You are on page 1of 34

04/04/2006

Hydrologic Statistics

Reading: Chapter 11, Sections 12-1 and


12-2 of Applied Hydrology
Probability
• A measure of how likely an event will occur
• A number expressing the ratio of favorable
outcome to the all possible outcomes
• Probability is usually represented as P(.)
– P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 %
– P (getting a 3 after rolling a dice) = 1/6

2
Random Variable
• Random variable: a quantity used to represent
probabilistic uncertainty
– Incremental precipitation
– Instantaneous streamflow
– Wind velocity
• Random variable (X) is described by a probability
distribution
• Probability distribution is a set of probabilities
associated with the values in a random variable’s sample
space
3
Sampling terminology
• Sample: a finite set of observations x1, x2,….., xn of the random
variable
• A sample comes from a hypothetical infinite population
possessing constant statistical properties
• Sample space: set of possible samples that can be drawn from a
population
• Event: subset of a sample space
 Example
 Population: streamflow
 Sample space: instantaneous streamflow, annual
maximum streamflow, daily average streamflow
 Sample: 100 observations of annual max. streamflow
 Event: daily average streamflow > 100 cfs
5
Summary statistics
• Also called descriptive statistics
– If x1, x2, …xn is a sample then

1 n
Mean, X   xi m for continuous data
n i 1

  xi  X 
1 n
Variance, S 
2
s 2 for continuous data
n  1 i 1

Standard S  S2 s for continuous data


deviation,
S
Coeff. of variation, CV 
X

Also included in summary statistics are median, skewness, correlation coefficient,


6
Graphical display
• Time Series plots
• Histograms/Frequency distribution
• Cumulative distribution functions
• Flow duration curve

8
Time series plot
• Plot of variable versus time (bar/line/points)
• Example. Annual maximum flow series

600

500
Annual Max Flow (10 3 cfs)

400

300

200

100

0
1905
1900 1908 1900
1918 1927
19001938 1948
1900 1958 1968
1900 1978 1900
1988 1998
1900

Year
Year

Colorado River near Austin


9
Histogram
• Plots of bars whose height is the number ni, or fraction
(ni/N), of data falling into one of several intervals of
equal width
30
60
100
90
50
25
80
Interval = 50,000 cfs
occurences

70
of occurences
No. ofoccurences

40
20
60 Interval = 25,000
30
15
Interval = 10,000 cfscfs
50
40
No. of

20
10
30
No.

1020
5
10
0
00
0 0 50 50 100100 150
150 200
200 250
250 300
300 350 400
350 400 450
450 500
500
Annual
Annualmm
Annual m ax
ax
ax flow
flow (10
flow(10
3 33cfs)
(10cfs)
cfs)

Dividing the number of occurrences with the total number of points will give Probability
Mass Function 10
Using Excel to plot histograms

1) Make sure Analysis Tookpak is added in Tools.


This will add data analysis command in Tools

2) Fill one column with the data, and another with


the intervals (eg. for 50 cfs interval, fill 0,50,100,…)

3) Go to ToolsData AnalysisHistogram

4) Organize the plot in a presentable form


(change fonts, scale, color, etc.)

12
Probability density function
• Continuous form of probability mass function is probability
density function
0.9
100
90
0.8
80
0.7
70
occurences

0.6
Probability

60
0.5
50
0.4
40
No. of

0.3
30
0.2
20
0.1
10
00
0 0 50
100 100 150
200 200 300
250 300 400350 400500450 500
600
Annualmm
Annual axaxflow
flow(10
(10
3 3 cfs)
cfs)

pdf is the first derivative of a cumulative distribution function


13
Cumulative distribution function
• Cumulate the pdf to produce a cdf
• Cdf describes the probability that a random variable is less
than or equal to specified value of x

1
P (Q ≤ 50000) = 0.8
0.8

P (Q ≤ 25000) = 0.4
Probability

0.6

0.4

0.2

0
0 100 200 300 400 500 600
Annual m ax flow (103 cfs)

15
• Extreme events
– Floods
– Droughts
• Magnitude of extreme events is related to their
frequency of occurrence
1
Magnitude 
Frequency of occurence
• The objective of frequency analysis is to relate the
magnitude of events to their frequency of
occurrence through probability distribution
• It is assumed the events (data) are independent and
come from identical distribution

19
Return Period
• Random variable: X
• xT
Threshold level:
• Extreme event occurs if: X  xT
• Recurrence interval:   Time between ocurrences of X  x
T

• Return Period: E ( )
Average recurrence interval between events equalling or
exceeding a threshold
• If p is the probability of occurrence of an extreme
event, then E ( )  T  1
p

or P ( X  xT ) 
1
T
20
More on return period
• If p is probability of success, then (1-p) is the
probability of failure
• Find probability that (X ≥ xT) at least once in N years.

p  P ( X  xT )
P ( X  xT )  (1  p)
P ( X  xT at least once in N years)  1  P ( X  xT all N years)
N
 1
P ( X  xT at least once in N years)  1  (1  p )  1  1  
N

 T
21
Hydrologic data
series
• Complete duration series
– All the data available
• Partial duration series
– Magnitude greater than base value
• Annual exceedance series
– Partial duration series with # of
values = # years
• Extreme value series
– Includes largest or smallest values in
equal intervals
• Annual series: interval = 1 year
• Annual maximum series: largest
values
• Annual minimum series : smallest
values
22
Return period example
• Dataset – annual maximum discharge for 106
years on Colorado River near Austin
xT = 200,000 cfs
600
No. of occurrences = 3
500
Annual Max Flow (10 3 cfs)

2 recurrence intervals
400 in 106 years
300 T = 106/2 = 53 years

200
If xT = 100, 000 cfs
100
7 recurrence intervals
0
1905 1908 1918 1927 1938 1948 1958 1968 1978 1988 1998
T = 106/7 = 15.2 yrs
Year

X ≥ 100,000 cfs at least once in the next 5 years) = 1- (1-1/15.2)5 = 0.29


23
Probability distributions
• Normal family
– Normal, lognormal, lognormal-III
• Generalized extreme value family
– EV1 (Gumbel), GEV, and EVIII (Weibull)
• Exponential/Pearson type family
– Exponential, Pearson type III, Log-Pearson type
III

24
Normal distribution
• Central limit theorem – if X is the sum of n
independent and identically distributed random variables
with finite variance, then with increasing n the distribution of
X becomes normal regardless of the distribution of random
variables
• pdf for normal distribution
2
1  x 
1   
2  
f X ( x)  e
 2
m is the mean and s is the standard
deviation

Hydrologic variables such as annual precipitation, annual average


streamflow, or annual average pollutant loadings follow normal distribution
25
Standard Normal distribution
• A standard normal distribution is a normal
distribution with mean (m) = 0 and standard
deviation (s) = 1
• Normal distribution is transformed to standard
normal distribution by using the following
formula:
X 
z

z is called the standard normal variable
26
Lognormal distribution
• If the pdf of X is skewed, it’s not
normally distributed
• If the pdf of Y = log (X) is
normally distributed, then X is
said to be lognormally
distributed.
1  ( y   y )2 
f ( x)  exp   x  0, and y  log x
x 2  2 y 
2

Hydraulic conductivity, distribution of raindrop sizes in storm


follow lognormal distribution.

27
Extreme value (EV) distributions
• Extreme values – maximum or minimum
values of sets of data
• Annual maximum discharge, annual minimum
discharge
• When the number of selected extreme values
is large, the distribution converges to one of
the three forms of EV distributions called Type
I, II and III

28
EV type I distribution
• If M1, M2…, Mn be a set of daily rainfall or streamflow,
and let X = max(Mi) be the maximum for the year. If
Mi are independent and identically distributed, then
for large n, X has an extreme value type I or Gumbel
distribution.
1  xu  x  u 
f ( x)  exp   exp  
     
6sx
 u  x  0.5772

Distribution of annual maximum streamflow follows an EV1


distribution 29
EV type III distribution
• If Wi are the minimum streamflows
in different days of the year, let X =
min(Wi) be the smallest. X can be
described by the EV type III or
Weibull distribution.

 k  x 
k 1
  x k 
f ( x )     exp     x  0;  , k  0
         

Distribution of low flows (eg. 7-day min flow)


follows EV3 distribution.

30
Exponential distribution
• Poisson process – a stochastic
process in which the number of
events occurring in two disjoint
subintervals are independent
random variables.
• In hydrology, the interarrival time
(time between stochastic hydrologic
events) is described by exponential
distribution
 x 1
f ( x )  e x  0;  
x

Interarrival times of polluted runoffs, rainfall intensities, etc are described by


exponential distribution.
31
Gamma Distribution
• The time taken for a number of
events (b) in a Poisson process is
described by the gamma distribution
• Gamma distribution – a distribution
of sum of b independent and
identical exponentially distributed
random variables.

 x  1e  x
f ( x)  x  0;   gamma function
(  )
Skewed distributions (eg. hydraulic
conductivity) can be represented using
gamma without log transformation.
32
Pearson Type III
• Named after the statistician Pearson, it is also
called three-parameter gamma distribution. A
lower bound is introduced through the third
parameter (e)
 ( x   )  1 e   ( x  )
f ( x)  x   ;   gamma function
(  )

It is also a skewed distribution first applied in hydrology


for describing the pdf of annual maximum flows.

33
Log-Pearson Type III
• If log X follows a Person Type III distribution,
then X is said to have a log-Pearson Type III
distribution
 ( y   )  1 e   ( y  )
f ( x)  y  log x  
(  )

34

You might also like