Hydrologic Statistics: Reading: Chapter 11 in Applied Hydrology Some Slides by Venkatesh Merwade

04/04/2006
Hydrologic Statistics
Reading: Chapter 11 in Applied Hydrology
Some slides by Venkatesh Merwade

Hydrologic Models
Classification based on randomness.
• Deterministic (eg. Rainfall runoff analysis)

– Analysis of hydrological processes using deterministic
approaches
– Hydrological parameters are based on physical relations of
the various components of the hydrologic cycle.
– Do not consider randomness; a given input produces the
same output.
• Stochastic (eg. flood frequency analysis)
– Probabilistic description and modeling of hydrologic
phenomena
– Statistical analysis of hydrologic data.
2
Probability
• A measure of how likely an event will occur
• A number expressing the ratio of favorable
outcome to the all possible outcomes
• Probability is usually represented as P(.)
– P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 %
– P (getting a 3 after rolling a dice) = 1/6
3
Random Variable
• Random variable: a quantity used to represent
probabilistic uncertainty
– Incremental precipitation
– Instantaneous streamflow
– Wind velocity
• Random variable (X) is described by a probability
distribution
• Probability distribution is a set of probabilities
associated with the values in a random variable’s
sample space
4
Sampling terminology
• Sample: a finite set of observations x1, x2,….., xn of the random
variable
• A sample comes from a hypothetical infinite population
possessing constant statistical properties
• Sample space: set of possible samples that can be drawn from a
population
• Event: subset of a sample space
 Example
 Population: streamflow
 Sample space: instantaneous streamflow, annual
maximum streamflow, daily average streamflow
 Sample: 100 observations of annual max. streamflow
 Event: daily average streamflow > 100 cfs
6
Types of sampling
• Random sampling: the likelihood of selection of each member of the
population is equal
– Pick any streamflow value from a population
• Stratified sampling: Population is divided into groups, and then a random

sampling is used
– Pick a streamflow value from annual maximum series.
• Uniform sampling: Data are selected such that the points are uniformly far
apart in time or space
– Pick steamflow values measured on Monday midnight
• Convenience sampling: Data are collected according to the convenience of

experimenter.
– Pick streamflow during summer
7
Summary statistics
• Also called descriptive statistics
– If x1, x2, …xn is a sample then
1 n
Mean, X   xi  for continuous data
n i 1
  xi  X 
1 n
Variance, S 
2
 for continuous data
n  1 i 1
Standard S  S2  for continuous data

deviation,
S
Coeff. of variation, CV 
X
Also included in summary statistics are median, skewness, correlation coefficient,

8
Graphical display
• Time Series plots
• Histograms/Frequency distribution
• Cumulative distribution functions
• Flow duration curve
10
Time series plot
• Plot of variable versus time (bar/line/points)
• Example. Annual maximum flow series
600
500
Annual Max Flow (10 3 cfs)
400
300
200
100
0
1905
1900 1908 1900
1918 1927
19001938 1948
1900 1958 1968
1900 1978 1900
1988 1998
1900
Year
Year
Colorado River near Austin

11
Histogram
• Plots of bars whose height is the number ni, or fraction
(ni/N), of data falling into one of several intervals of
equal width
30
60
100
90
50
25
80
Interval = 50,000 cfs
occurences
70
of occurences
No. ofoccurences
40
20
60 Interval = 25,000
30
15
Interval = 10,000 cfscfs
50
40
No. of
20
10
30
No.
1020
5
10
0
00
0 0 50 50 100100 150
150 200
200 250
250 300
300 350 400
350 400 450
450 500
500
Annual
Annualmm
Annual m ax
ax
ax flow
flow (10
flow(10
3 3 3cfs)
(10cfs)cfs)
Dividing the number of occurrences with the total number of points will give Probability
Mass Function 12
Using Excel to plot histograms
1) Make sure Analysis Tookpak is added in Tools.

This will add data analysis command in Tools
2) Fill one column with the data, and another with

the intervals (eg. for 50 cfs interval, fill 0,50,100,
…)
3) Go to ToolsData AnalysisHistogram
4) Organize the plot in a presentable form

(change fonts, scale, color, etc.)
14
Probability density function
• Continuous form of probability mass function is probability
density function
0.9
100
90
0.8
80
0.7
70
occurences
0.6
Probability
60
0.5
50
0.4
40
No. of
0.3
30
0.2
20
0.1
10
00
0 0 50
100 100 150
200 200 300
250 300 400350 400500450 500
600
Annualmm 3 3 cfs)
Annual axaxflow
flow(10
(10 cfs)
pdf is the first derivative of a cumulative distribution function

15
Cumulative distribution function
• Cumulate the pdf to produce a cdf
• Cdf describes the probability that a random variable is less
than or equal to specified value of x
1
P (Q ≤ 50000) = 0.8
0.8
P (Q ≤ 25000) = 0.4
Probability
0.6
0.4
0.2
0
0 100 200 300 400 500 600
Annual m ax flow (103 cfs)
17
Flow duration curve
• A cumulative frequency curve that shows the percentage of

time that specified discharges are equaled or exceeded.
 Steps
 Arrange flows in chronological order
 Find the number of records (N)
 Sort the data from highest to lowest
 Rank the data (m=1 for the highest value and m=N for the lowest value)
 Compute exceedance probability for each value using the following
formula
m
p  100 
N 1
 Plot p on x axis and Q (sorted) on y axis

22
Flow duration curve in Excel
600
500
400
Median flow
Q (1000 cfs)
300
200
100
0
0 20 40 60 80 100
% of tim e Q w ill be exceeded
23
Statistical analysis
• Regression analysis
• Mass curve analysis
• Flood frequency analysis
• Many more which are beyond the scope of
this class!
24
Linear Regression
• A technique to determine the relationship between two
random variables.
– Relationship between discharge and velocity in a stream
– Relationship between discharge and water quality constituents
A regression model is given by : yi   0  1 xi   i i  1,2,..., n

yi = ith observation of the response (dependent variable)
xi = ith observation of the explanatory (independent) variable
0 = intercept
1 = slope
i = random error or residual for the ith observation
25
Least square regression
• We have x1, x2, …, xn and y1,y2, …, yn
observations of independent and dependent
variables, respectively.
• Define a linear model for yi, yˆi   0  1 xi i  1,2,..., n
• Fit the model (find b0 and b1) such at the sum

of the squares of the vertical deviations is
minimum
 y
– Minimize i i ˆ
y  2
 ( y i   0   x
1 i ) 2
i  1,2,..., n
Regression applet:
26
http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
Linear Regression in Excel
• Steps:
– Prepare a scatter plot
– Fit a trend line
1800
1500
TDS = 0.5946(sp. Cond) - 15.709 Data are for Brazos River
R2 = 0.9903
1200
near Highbank, TX
TDS (mg/L)
900
600
300
0
0 500 1000 1500 2000 2500 3000
Specific Conductance ( S/cm)
 Alternatively, one can use ToolsData

AnalysisRegression 27
Coefficient of determination (R ) 2
• It is the proportion of observed y variation that can

be explained by the simple linear regression model
SSE
R  1
2
SST
SST   ( yi  y ) 2 Total sum of squares, Ybar is the mean of yi
SSE   ( yi  yˆ i ) 2 Error sum of squares
The higher the value of R2, the more successful is the model in explaining y
variation.
If R2 is small, search for an alternative model (non linear or multiple
regression model) that can more effectively
28
explain y variation

Hydrologic Statistics: Reading: Chapter 11 in Applied Hydrology Some Slides by Venkatesh Merwade

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hydrologic Statistics: Reading: Chapter 11 in Applied Hydrology Some Slides by Venkatesh Merwade

Uploaded by

Copyright:

Available Formats

04/04/2006

Reading: Chapter 11 in Applied Hydrology

Some slides by Venkatesh Merwade

• Deterministic (eg. Rainfall runoff analysis)

• Stratified sampling: Population is divided into groups, and then a random

• Convenience sampling: Data are collected according to the convenience of

Standard S  S2  for continuous data

Also included in summary statistics are median, skewness, correlation coefficient,

Colorado River near Austin

1) Make sure Analysis Tookpak is added in Tools.

2) Fill one column with the data, and another with

4) Organize the plot in a presentable form

pdf is the first derivative of a cumulative distribution function

• A cumulative frequency curve that shows the percentage of

 Plot p on x axis and Q (sorted) on y axis

A regression model is given by : yi   0  1 xi   i i  1,2,..., n

• Fit the model (find b0 and b1) such at the sum

 Alternatively, one can use ToolsData

• It is the proportion of observed y variation that can

SST   ( yi  y ) 2 Total sum of squares, Ybar is the mean of yi

SSE   ( yi  yˆ i ) 2 Error sum of squares

You might also like