You are on page 1of 28

04/04/2006

Hydrologic Statistics

Reading: Chapter 11 in Applied Hydrology

Some slides by Venkatesh Merwade


Hydrologic Models
Classification based on randomness.

• Deterministic (eg. Rainfall runoff analysis)


– Analysis of hydrological processes using deterministic
approaches
– Hydrological parameters are based on physical relations of
the various components of the hydrologic cycle.
– Do not consider randomness; a given input produces the
same output.
• Stochastic (eg. flood frequency analysis)
– Probabilistic description and modeling of hydrologic
phenomena
– Statistical analysis of hydrologic data.

2
Probability
• A measure of how likely an event will occur
• A number expressing the ratio of favorable
outcome to the all possible outcomes
• Probability is usually represented as P(.)
– P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 %
– P (getting a 3 after rolling a dice) = 1/6

3
Random Variable
• Random variable: a quantity used to represent
probabilistic uncertainty
– Incremental precipitation
– Instantaneous streamflow
– Wind velocity
• Random variable (X) is described by a probability
distribution
• Probability distribution is a set of probabilities
associated with the values in a random variable’s
sample space

4
Sampling terminology
• Sample: a finite set of observations x1, x2,….., xn of the random
variable
• A sample comes from a hypothetical infinite population
possessing constant statistical properties
• Sample space: set of possible samples that can be drawn from a
population
• Event: subset of a sample space
 Example
 Population: streamflow
 Sample space: instantaneous streamflow, annual
maximum streamflow, daily average streamflow
 Sample: 100 observations of annual max. streamflow
 Event: daily average streamflow > 100 cfs
6
Types of sampling
• Random sampling: the likelihood of selection of each member of the
population is equal
– Pick any streamflow value from a population

• Stratified sampling: Population is divided into groups, and then a random


sampling is used
– Pick a streamflow value from annual maximum series.

• Uniform sampling: Data are selected such that the points are uniformly far
apart in time or space
– Pick steamflow values measured on Monday midnight

• Convenience sampling: Data are collected according to the convenience of


experimenter.
– Pick streamflow during summer

7
Summary statistics
• Also called descriptive statistics
– If x1, x2, …xn is a sample then

1 n
Mean, X   xi  for continuous data
n i 1

  xi  X 
1 n
Variance, S 
2
 for continuous data
n  1 i 1

Standard S  S2  for continuous data


deviation,
S
Coeff. of variation, CV 
X

Also included in summary statistics are median, skewness, correlation coefficient,


8
Graphical display
• Time Series plots
• Histograms/Frequency distribution
• Cumulative distribution functions
• Flow duration curve

10
Time series plot
• Plot of variable versus time (bar/line/points)
• Example. Annual maximum flow series

600

500
Annual Max Flow (10 3 cfs)

400

300

200

100

0
1905
1900 1908 1900
1918 1927
19001938 1948
1900 1958 1968
1900 1978 1900
1988 1998
1900

Year
Year

Colorado River near Austin


11
Histogram
• Plots of bars whose height is the number ni, or fraction
(ni/N), of data falling into one of several intervals of
equal width
30
60
100
90
50
25
80
Interval = 50,000 cfs
occurences

70
of occurences
No. ofoccurences

40
20
60 Interval = 25,000
30
15
Interval = 10,000 cfscfs
50
40
No. of

20
10
30
No.

1020
5
10
0
00
0 0 50 50 100100 150
150 200
200 250
250 300
300 350 400
350 400 450
450 500
500
Annual
Annualmm
Annual m ax
ax
ax flow
flow (10
flow(10
3 3 3cfs)
(10cfs)cfs)

Dividing the number of occurrences with the total number of points will give Probability
Mass Function 12
Using Excel to plot histograms

1) Make sure Analysis Tookpak is added in Tools.


This will add data analysis command in Tools

2) Fill one column with the data, and another with


the intervals (eg. for 50 cfs interval, fill 0,50,100,
…)
3) Go to ToolsData AnalysisHistogram

4) Organize the plot in a presentable form


(change fonts, scale, color, etc.)

14
Probability density function
• Continuous form of probability mass function is probability
density function
0.9
100
90
0.8
80
0.7
70
occurences

0.6
Probability

60
0.5
50
0.4
40
No. of

0.3
30
0.2
20
0.1
10
00
0 0 50
100 100 150
200 200 300
250 300 400350 400500450 500
600
Annualmm 3 3 cfs)
Annual axaxflow
flow(10
(10 cfs)

pdf is the first derivative of a cumulative distribution function


15
Cumulative distribution function
• Cumulate the pdf to produce a cdf
• Cdf describes the probability that a random variable is less
than or equal to specified value of x

1
P (Q ≤ 50000) = 0.8
0.8

P (Q ≤ 25000) = 0.4
Probability

0.6

0.4

0.2

0
0 100 200 300 400 500 600
Annual m ax flow (103 cfs)

17
Flow duration curve

• A cumulative frequency curve that shows the percentage of


time that specified discharges are equaled or exceeded.
 Steps
 Arrange flows in chronological order
 Find the number of records (N)
 Sort the data from highest to lowest
 Rank the data (m=1 for the highest value and m=N for the lowest value)
 Compute exceedance probability for each value using the following
formula
m
p  100 
N 1

 Plot p on x axis and Q (sorted) on y axis


22
Flow duration curve in Excel

600

500

400
Median flow
Q (1000 cfs)

300

200

100

0
0 20 40 60 80 100
% of tim e Q w ill be exceeded

23
Statistical analysis

• Regression analysis
• Mass curve analysis
• Flood frequency analysis
• Many more which are beyond the scope of
this class!

24
Linear Regression
• A technique to determine the relationship between two
random variables.
– Relationship between discharge and velocity in a stream
– Relationship between discharge and water quality constituents

A regression model is given by : yi   0  1 xi   i i  1,2,..., n


yi = ith observation of the response (dependent variable)
xi = ith observation of the explanatory (independent) variable
0 = intercept
1 = slope
i = random error or residual for the ith observation
25
Least square regression
• We have x1, x2, …, xn and y1,y2, …, yn
observations of independent and dependent
variables, respectively.
• Define a linear model for yi, yˆi   0  1 xi i  1,2,..., n

• Fit the model (find b0 and b1) such at the sum


of the squares of the vertical deviations is
minimum
 y
– Minimize i i ˆ
y  2
 ( y i   0   x
1 i ) 2
i  1,2,..., n

Regression applet:
26
http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
Linear Regression in Excel
• Steps:
– Prepare a scatter plot
– Fit a trend line
1800

1500
TDS = 0.5946(sp. Cond) - 15.709 Data are for Brazos River
R2 = 0.9903
1200
near Highbank, TX
TDS (mg/L)

900

600

300

0
0 500 1000 1500 2000 2500 3000
Specific Conductance ( S/cm)

 Alternatively, one can use ToolsData


AnalysisRegression 27
Coefficient of determination (R ) 2

• It is the proportion of observed y variation that can


be explained by the simple linear regression model
SSE
R  1
2

SST

SST   ( yi  y ) 2 Total sum of squares, Ybar is the mean of yi

SSE   ( yi  yˆ i ) 2 Error sum of squares

The higher the value of R2, the more successful is the model in explaining y
variation.
If R2 is small, search for an alternative model (non linear or multiple
regression model) that can more effectively
28
explain y variation

You might also like