You are on page 1of 28

04/04/2006

Hydrologic Statistics
Reading: Chapter 11 in Applied
Hydrology
Some slides by Venkatesh
Merwade

Hydrologic Models
Classification based on randomness.

Deterministic (eg. Rainfall runoff analysis)


Analysis of hydrological processes using
deterministic approaches
Hydrological parameters are based on physical
relations of the various components of the
hydrologic cycle.
Do not consider randomness; a given input
produces the same output.

Stochastic (eg. flood frequency analysis)


Probabilistic description and modeling of
hydrologic phenomena
Statistical analysis of hydrologic data.
2

Probability
A measure of how likely an event will
occur
A number expressing the ratio of
favorable outcome to the all possible
outcomes
Probability is usually represented as P(.)
P (getting a club from a deck of playing cards) = 13/52 =
0.25 = 25 %
P (getting a 3 after rolling a dice) = 1/6

Random Variable
Random variable: a quantity used to
represent probabilistic uncertainty
Incremental precipitation
Instantaneous streamflow
Wind velocity

Random variable (X) is described by a


probability distribution
Probability distribution is a set of probabilities
associated with the values in a random
variables sample space
4

Sampling terminology
Sample: a finite set of observations x1, x2,.., xn of
the random variable
A sample comes from a hypothetical infinite
population possessing constant statistical properties
Sample space: set of possible samples that can be
drawn from a population
Event: subset of a sample space
Example
Population: streamflow
Sample space: instantaneous streamflow,
annual maximum streamflow, daily average
streamflow
Sample: 100 observations of annual max.
streamflow
6

Types of sampling

Random sampling: the likelihood of selection of each


member of the population is equal
Pick any streamflow value from a population

Stratified sampling: Population is divided into groups, and


then a random sampling is used
Pick a streamflow value from annual maximum series.

Uniform sampling: Data are selected such that the points


are uniformly far apart in time or space
Pick steamflow values measured on Monday midnight

Convenience sampling: Data are collected according to the


convenience of experimenter.
Pick streamflow during summer

Summary statistics
Also called descriptive statistics
If x1, x2, xn is a sample then
Mean,

Variance,
Standard
deviation,

1 n
X xi
n i 1
1 n
S
xi X
n 1 i 1
2

for continuous
data
2

for continuous
data
for continuous
data

Coeff. of
variation,

Also included in summary statistics are median, skewness,


correlation coefficient,

Graphical display

Time Series plots


Histograms/Frequency distribution
Cumulative distribution functions
Flow duration curve

10

Time series plot


Plot of variable versus time (bar/line/points)
Example. Annual maximum flow series

3 3
Annual
Annual
Max
Max
Flow
Flow
(10(10
cfs)
cfs)

600
600
500
500
400
400
300
300
200
200
100
100
0
01905
1900

1908

1918
1900

1927 1938
1900

1948 1958
1900
Year
Year

1968
1900

1978

Colorado River near Austin


11

1988
1900

1998
1900

Frequency Histogram
Plots of bars whose height is the number ni of
data falling into one of several intervals of
equal width
30
60
100

90

50
25
No. ofoccurences
occurences
No.
No. of
of occurences

80

Interval = 50,000
cfs
Interval
Interval
==
10,000
25,000 cfs
cfs

70
40
20
60
30
15
50

40

20
10

30

1020
5

10

0
00
0

0 50 50 100100 150
150 200
200 250
250

300
300

350 400
400 450
450 500
500
350

3 3 3cfs)
Annual
ax
flow
(10
Annual
ax
flow
Annualmm
m
ax
flow(10
(10cfs)
cfs)

Dividing the number of occurrences with the total number of points will
give Probability Mass Function 12

Probability density function


Continuous form of probability mass function is
probability density function
0.9
100

90
0.8
No. of
occurences
Probability

80
0.7
70

0.6

60

0.5

50

0.4

40

0.3

30

0.2
20
0.1
10
00
0

50
100 100

150
200

200 300
250

300 400350

400500450

500
600

3 3 cfs)
Annualmm
flow(10
(10
Annual
axaxflow
cfs)

pdf is the first derivative of a cumulative distribution


13
function

Using Excel to plot histograms


1) Make sure Analysis Tookpak is added in
Tools.
This will add data analysis command in
Tools
2) Fill one column with the data, and
another with the intervals (eg. for 50 cfs
interval, fill 0,50,100,)
3) Go to ToolsData
AnalysisHistogram

4) Organize the plot in a


presentable form (change fonts,
scale, color, etc.)

15

Cumulative distribution
function

Cumulate the pdf to produce a cdf


Cdf describes the probability that a random
variable is less than or equal to specified value of x
1

P (Q 50000) = 0.8

Probability

0.8

P (Q 25000) = 0.4

0.6

0.4

0.2

0
0

100

200

300

400

Annual m ax flow (103 cfs)


17

500

600

Flow duration curve


A cumulative frequency curve that shows the percentage of
time that specified discharges are equaled or exceeded.

Steps

Arrange flows in chronological order


Find the number of records (N)
Sort the data from highest to lowest
Rank the data (m=1 for the highest value and m=N for the
lowest value)
Compute exceedance probability for each value using the
following formula

22
Plot p on x axis and Q (sorted)
on y axis

Flow duration curve in Excel

600
500

Q (1000 cfs)

400

Median flow

300
200
100
0
0

20

40

60

% of tim e Q w ill be exceeded

23

80

100

Statistical analysis

Regression analysis
Mass curve analysis
Flood frequency analysis
Many more which are beyond the
scope of this class!

24

Linear Regression
A technique to determine the relationship between
two random variables.
Relationship between discharge and velocity in a stream
Relationship between discharge and water quality
constituents

yi by
: 0
A regression model is given

1 xi i

i 1,2,..., n

yi = ith observation of the response (dependent variable)


xi = ith observation of the explanatory (independent)
variable
0 = intercept
1 = slope
25
i = random error or residual for the
ith observation

Least square regression


We have x1, x2, , xn and y1,y2, , yn
observations of independent and
dependent variables, respectively.
y i
i 1,2,..., n
Define a linear model for
y0i, 1 xi
Fit the model (find b0 and b1) such at
the sum of the squares of the vertical
deviations is minimum
2
2

(
y

x
)
Minimize i i
i
0
1 i

i 1,2,..., n

Regression applet:
26
http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

Linear Regression in Excel


Steps:
Prepare a scatter plot
Fit a trend line
1800

TDS (mg/L)

1500

Data are for Brazos


River near
Highbank, TX

TDS = 0.5946(sp. Cond) - 15.709


R2 = 0.9903

1200
900
600
300
0
0

500

1000

1500

2000

2500

3000

Specific Conductance ( S/cm)

Alternatively, one can use ToolsData


27
AnalysisRegression

Coefficient of determination
(R2)
It is the proportion of observed y variation that
can be explained by the simple linear regression
model
SSE
R 1
SST
2

SST ( yi y ) 2 Total sum of squares, Ybar is the mean of


yi

SSE ( yi y i ) 2 Error sum of squares


The higher the value of R2, the more successful is the model in
explaining y variation.
If R2 is small, search for an alternative model (non linear or
multiple regression model) that28can more effectively explain y
variation

You might also like