Chapter 2

The "Bijlmer disaster"; an El-Al freight plane crashed into a block of flats in the Bijlmermeer (6-10-'92).
2-1
PROBABILITY IN CIVIL ENGINEERING
Chapter 2: Probability calculus

2.1 THE CONCEPT OF PROBABILITY
2 . 1 . 1 L A P L A C E ' S D E F I N IT I O N O F P R O B A B I L I T Y Laplace's (1812) classic definition of probability reads:

probability = number of outcomes for which the event E occurs number of possible results
(2.1)
"If during an experiment in total n different and equally probable results are possible, and if for precisely m of the n outcomes the event E occurs, then the probability of this event equals m/n. Laplace's definition mentions "equally probable results. This concept actually implies that all results should have an equal probability. This means that in this definition of probability the concept of probability has already been used. Furthermore, the probability of an event is not defined if all outcomes are not equally probable. Thus it will be clear that there is a need for a more specific formulation of the concept of probability.
2.1.2 EXPERIMENTAL DEFINITION OF PROBABILITY When an experiment is repeated n times, in which the event E occurs m times, the frequency quotient of the event equals m/n. The experimental law of large numbers states that, for increasing n, the value of the frequency quotient m/n of a certain event converges to a fixed value. In formula:
P( E ) = lim
m n
(2.2)
The bigger the amount of experiments n, the better the probability of event E will be estimated. An objection to the use of this definition for the determination of probabilities is the large number of experiments needed for a reliable estimate.
2.1.3 DEFINITION OF THE CONCEPT OF PROBABILITY BY MEANS OF AXIOMS The concept of probability can be introduced according to a number of basic rules, which are imposed as axioms. Prior to naming the probability axioms, a number of notions have to be amplified. Consider e.g. the wave height at sea. Measuring these heights leads to a number of distinguishable results. The set of all possible outcomes of the wave height is called the solution space . An event E occurs when the outcome of an experiment meets the description of E, e.g. E is all the wave heights between 0 1. The results corresponding to E form a subset of . The result of the experiment is certainly an element from ; therefore is called the certain event. The empty set , which does not contain a result from the experiment, is called the impossible event. The complementary event is the event that E does not occur.
2-2
The intersection
E
i =1
of E1 , E 2 , , E n
is the event that each of these events occurs. In the Venn-diagram in figure 2.1 the cross-section of six events is indicated by the hatched area.
Figure 2.1
Intersection.
Figure 2.2
Union.
The union
E
i =1
of E1 , E 2 ,..., E n
is the event that at least one of these events occurs (see the hatched part of figure 2.2). If E1 E2 = , then E1 and E2 are called mutually exclusive or disjoint events. The definition of probability by means of axioms reads: "The probability is a function symbolised by the letter P, defined for events Ei in the total set of all possible events. It is a measure for the probability of the occurrence of the events Ei ". The function has to satisfy three probability axioms, namely:
P(E) 0 P () = 1
( E )
(2.3) (2.4)
P(E i ) = P(E i ), if the events E i are disjoint.

i=1 i=1
(2.5)
The rules for probability calculations are defined with the help of these probability axioms. These mathematical rules are stated in appendix A.
2.1.4 SUBJECTIVE CONCEPT OF PROBABILITY In many cases a mathematical substantiation of the probability of a certain event E is not possible. This is caused by a lack of relevant statistical data. In such a situation the determination of the probability of such an event will generally be a case of instinctive considerations. In that case a subjective probability is involved. 2-3
Even in the case where a probability is determined by means of statistics, subjectivity may be involved. For example, when someone does not have all relevant information at his disposal or when merely a part of the available information is used for the sake of simplicity. The subjective concept of probability is usually a controversial subject in discussions among mathematicians and users of statistics. For risk-analysts it is mostly inevitable to make use of the subjective concept of probability. Usually instinctive considerations are combined with available statistical information. The foundation of the creation of such a combination was laid by Thomas Bayes. The method is thus known as the Bayesian method (see appendix C).
2.2
RANDOM VARIABLES
2.2.1 TYPES OF RANDOM VARIABLES If the outcomes of an experiment are uncertain, one speaks of a random variable. The random variable is denoted by a variable X or Y, which describes the probability of the results of the experiment. This is schematically shown in fig. 2.3.
Solution space
Event Random variable
Real numbers Interval Figure 2.3 Schematic representation of a random variable as an uncertain outcome.
As an example consider again the wave height at sea. The value of the wave height, H, has units of meters. A possible definition of the random variable might simply be Y(H) = H (m). In this case, the parameter which is measured is called the random variable; this is common in many engineering applications. A random variable can be either continuous or discrete. In the previous paragraph the wave height H is a continuous random variable, because the variable could attain any value between 0 . An example of a discrete random variable is, when we would classify the continuous wave height as follows:
1 2 Y (H ) = 3 4
if if if if
0 H 0.5 H 1.0 H H
< < <
0.5 1.0 1.5 1.5
m m m m
In this case, the random variable can attain only four discrete integer values.
2-4
2.2.2 PROBABILITY MASS FUNCTION, PROBABILITY DISTRIBUTION FUNCTION AND PROBABILITY DENSITY FUNCTION For discrete random variables it is possible to define the probability that the variable will attain a certain value. This is done with the probability mass function:
p X ( X ) =P( X= X ),
(2.6)
What is actually stated is that pX(X) is the probability that variable X will attain the specified value X. Where X is a real number. Note that the random variable (with an uncertain value) is denoted by a capital letter, whereas a specific value of the variable is denoted by a capital italic letter. Figure 2.4 shows a probability mass function. The variable X can attain only four discrete values. The probability axioms directly lead to the following:
0 p X( X ) 1
(2.7) (2.8)
p
i =1
(Xi ) = 1
P (a < X b) =
Xi b
p (X ) p (X )
X i Xi a X i
(2.9)
The probability distribution function of a random variable gives the probability that this variable is smaller than or equal to a certain value (see fig. 2.4). The probability distribution function of X is related to the probability mass function by:
PX ( X ) = P(X X )=
Xi X
( X i),
(2.10)
Figure 2.4
Probability mass function and probability distribution function.
A continuous random variable can attain an infinite number of values (even within a small domain). The probability that the variable will attain a certain value exactly is therefore zero. The probability mass function of a continuous random variable therefore has no value. The probability distribution function, however, remains defined as:
2-5
FX ( X ) = P ( X X ) ,
The probability axioms lead to:
(2.11)
FX ( ) = 1 FX ( ) = 0
FX(X) is monotonously non-decreasing.
(2.12) (2.13)
In general it is useful to know the derivative of the probability distribution function. This derivative is called the probability density function and is defined as (see fig. 2.5):
fX ( X ) =
dFX ( X ) dX
(2.14)
Figure 2.5
Probability density function and probability distribution function.
The mathematical rules for probabilities (probability mass functions) are also valid for the derived function. The following applies for the probability density function:
f X ( ) = 0
(2.15) (2.16)
f X ( ) = 0 f X ( X ) 0,
(2.17)
( X )dX = 1
(2.18)
There are probability distributions, for which the probability density function is defined, but for which the probability distribution function can only be presented as:
FX ( X ) =
f ( t ) dt
X
2-6
because the integral does not have an analytical solution. This underlines the importance of the probability density function. Appendix B gives some common probability distribution types with corresponding probability density functions.
2.2.3 CHARACTERISTICS OF A RANDOM VARIABLE A random variable is defined by its probability distribution. For the evaluation of and calculating with random variables it is often useful to know also a couple of characteristics that can be derived from the probability distribution. A random variable can completely be determined by the type of probability distribution and the characteristics. But in practice it appears to be even simpler to make estimates of these characteristics than to determine the exact probability distribution. The expected value of a random variable X is defined by:
E (X) =
Xf ( X ) dX
X
(2.19)
f ( X ) dX
X
This is the determination of the weighed average of X or the X co-ordinate of the centre of mass of the probability density function. The expected value of X is therefore also called the average and is denoted as X. As:
f ( X )dX = 1,
X
E(X) = X f X ( X ) d X
-
(2.20)
applies, which corresponds to the determination of the static moment of the probability density function relative to the axis X = 0 (see fig. 2.6). For this reason the expected value of X is also referred to as the first moment. E(X2) can be determined the same way. This corresponds to the moment of inertia of the probability density function relative to the axis X = 0 and is therefore referred to as the second moment. In general the kth moment is defined by:
E ( Xk ) =
f X ( X ) dX
(2.21)
Analogously, the moments can be calculated in relation to the axis X = X (see fig. 2.7). These moments are called the central moments, the general formulation reads:
2-7
mk = E ( X - X ) =
k
( X - ) f
k
X
( X )d X
(2.22)
Figure 2.6
First moment.
Figure 2.7
First central moment.
By definition the first central moment equals zero. The second central moment is known as the variance and is denoted as Var(X) or X2:
Var(X) = 2 = E ( X - X ) X
(2.23)
The positive square root of the variance, X, is called the standard deviation and is a measure for the spread around the average. For the evaluation of the spread of a random variable around the average, not so much the absolute value of the standard deviation but more so the relative value in relation to the average is of importance. This relative value is represented by the coefficient of variation VX, which is defined as:
VX =
X X
(2.24)
In the case X = 0 the coefficient of variation has no value, however, there's also no need for a relative value of the standard deviation in this case. For negative mean values, we take its absolute value to calculate a coefficient of variation. There are also defined "standardised" central moments. Note the standard deviation in the denominator. The third standardised central moment 1= m3/3 is a measure for the asymmetry or skewness and the standardised fourth central moment 2= m4/4 is a measure for the kurtosis or peakness of the probability density function. Figure 2.8 gives three different probability density functions, drawn with the same average and the same standard deviation.
2-8
Figure 2.8
Skewness and kurtosis.
The standardised third central moment equals zero for probability density function 1, this function is symmetrical relative to the average. Probability density function 2 slants to the right and its third central moment is thus positive. For probability density function 3 the tail is left of the average and the third central moment is negative. The standardised fourth central moments of probability density functions 2 and 3 are larger than that of function 1.
2.3 RANDOM VECTORS Frequently, observations contain pairs of random variables, e.g. simultaneously humidity X1 and temperature X2 . This observation will be expressed as a random vector:
X = ( X1, X 2 )
(2.25)
X is a two-dimensional random variable. Like the one-dimensional random variable, the random vector is defined by a probability distribution function. The probability distribution function for this vector reads:
FX (X) = FX1 ,X2 , ( X 1 , X 2 ) = P ( (X1 X 1 ) ( X 2 X 2 ) )
From this it's easy to determine the joint probability density function of this random vector:
2 FX ( X ) X 1X 2
(2.26)
fX ( X ) =
(2.27)
This function is depicted in figure 2.9, by means of contour levels. On the edges the single or marginal density functions
f X1 ( X 1 ) and f X2 ( X 2 ) are given.
2-9
X2
fX X
( )
f X2 ( X 2 )
0 X1
f X1 ( X 1 )
Figure 2.9
Joint probability density function
The joint probability density function f X X
( ) has the shape of a hill. It reveals that there's a certain
correlation between the humidity X1 and temperature X2. This correlation will be expressed in a number. First, analogous to equation (2.20), the expected values of the variables of a random vector are defined by:
E ( X i ) = X i = X i f X i ( X i ) dX i
-
(2.28)
and the variances can be found with:

2 2 2 i = E X i - X = X i - X f Xi ( X i ) dX i X i i -
(2.29)
Beside the variances of the single random variables, the mixed central moment or covariance also plays an important part. This is defined by:
Cov ( X1 , X 2 ) = E ( X1 - X1 )( X 2 X 2 ) = E ( X1X 2 ) - X1 X2
(2.30)
The correlation coefficient is a parameter derived from the covariance and the variances. It reads:
X ,X =
1 2
Cov ( X1, X 2 )
X X
1
1 X1, X 2 1
(2.31)
This coefficient is a measure for the linear independence between two random variables. If X1, X2 = 0, the variables are linearly uncorrelated. However, this tells nothing about a possible non- linear 2-10
correlation. The variables are fully correlated if X1, X2 = 1. Physical relations may exist between the random variables of a vector. Such a case involves socalled dependent variables. If the relation is known exactly, it is possible to write a vector with dependent variables as a vector with independent variables by substituting the physical relations in the vector. In appendix A, the probability that two events both occur is described with:
P( E1 E 2) = P(E 2) P(E1 | E 2)
(2.32)
According to this formulation the combined probability density function of a vector of two random variables X1 and X2 can be written as:
f X1 ,X2 (X1 , X 2 ) = f X1 ( X1 ) f X2 X1 (X 2 | X1 )
In which fX2X1(X2X1) is the conditional probability density function of X2, given that X1 = X1:
(2.33)
f X 2|X1(X 2 | X1) =
f X1 ,X2 (X1 , X 2 ) f X1 (X1 )
(2.34)
If the variables X1 and X2 are statistically independent, then:
f X 2|X1(X 2 | X1) = f X 2( X 2)
applies.
(2.35)
In that case, the joint probability density function is defined by the product of the marginal probability density functions:
f X1 ,X2 ( X 1 , X 2 ) = f X1 ( X 1 ) f X2 ( X 2 )
Given that the variables X1 and X2 are statistically independent, the following applies:

(2.36)
E (X1X 2 )= =
XX f
1 1 -
2 X
(X1X 2 ) d X1X 2 (X1 )f X2 (X 2 ) d X1dX 2
XX f
2 X1
(2.37)
= X 2 f X2 (X 2 ) X1f X1 (X1 ) d X1X 2

- -
= E ( X1 ) X 2 f X2 ( X 2 )dX 2 = E ( X1 ) E ( X 2 )
-
From this it follows that the covariance, according to equation (2.30), is zero. For vectors of n marginal random variables the functions can be easily extended. The probability 2-11
distribution function for a vector reads:
FX (X) = FX1 ,X2 ,...,Xn ( X 1 , X 2 ,..., X n )

n = P ( (X1 X 1 ) ( X 2 X 2 ) ... (X n X n ) ) = P (X i X i ) i =1
The joint probability density function of the random vector is:
n FX ( X ) X 1X 2 ...X n
(2.38)
fX ( X ) =
(2.39)
Obviously, the reverse applies:

X1 X 2
FX (X) =
... f X x dx n ...dx 2dx1
Xn
( )
(2.40)
Out of here the probability distribution function of the single random variable X1 can be determined with multiple (n times) integration:
X1
FX1 (X1 ) =
- -
... f X (x) d x n ...d x 2 d x1

-
(2.41)
The marginal probability density function of X1 can be found by partial differentiation:
f X1 ( X1 ) =
d FX ( X1 ) dX1 1
f X1 (X1) =
- -
... f X x d x n ...dx 3dx 2

-
( )
(2.42)
The preceding is based on a known probability distribution function or probability density function of the random variable. The marginal probability density functions can be determined from the probability density function of the random vector. In practice, however, the interest is often in a random vector, which consists of a number of random variables for which the marginal probability density functions are known. Knowing the marginal probability functions the random vector can be determined and reversely. In addition to eq. 2.33 the joint probability density function of a vector of n random variables can be written by:
f X (X) = f X1(X1) f X 2|X1(X 2 | X1)....f Xn | X n 1....X 2 X1 (X n X n 1....X 2 X1 )
(2.43a)
2-12
When all the random variables n of the vector are independent, the joint probability density function is given by ( extending equation 2.36):
f X X = f X1 ( X 1 ) f X 2 ( X 2 ) ...f X n ( X n ) = f Xi ( X i )
i =1
( )
(2.43)
For random dependent variables the marginal probability density functions offer insufficient information to be able to determine the joint probability density function. For random vectors with n variables the mutual linear correlations between the variables are described with the covariance-matrix, given by:
Var(X1 ) Cov(X 2 , X1 ) C XX = . . Cov(X , X ) n 1

in which:
Cov(X1 , X 2 ) Cov(X1 , X n ) Var(X 2 ) Cov(X 2 , X n ) . . . . Cov(X n , X 2 ) Var(X n )
(2.44)
Var ( X1 ) = Cov ( X1 , X1 )
If the covariance-matrix only has values on the diagonal it involves so-called uncorrelated base variables. With the help of linear algebra it is possible to transform the set of correlated variables to a set of uncorrelated variables. This transformation reads:
Y = AT X
in which:
(2.45)
X A C XX Y
is the vector with the correlated base var iables; is the matrix with the orthonormal eigenvectors of CXX as column vectors; is the cov ariancematrix of X; is the vector with uncorrelated base var iables.
The expected values of the uncorrelated base variables are determined with:
E(Y) = A T E(X)
(2.46)
On the diagonal, the covariance matrix of Y contains the variances that equal the eigenvalues of CXX.
2-13
2.4
FUNCTIONS OF RANDOM VARIABLES
2.4.1 FUNCTIONS OF ONE RANDOM VARIABLE In practice, the risk analyst often encounters functions of random variables. For example consider the wind velocity causing a pressure p. Velocity v is a random variable. For the pressure applies, where variable.
p = 1 v2 2
is the density. Because v is a random variable, as a result p is also a random
A random variable is characterised by the average and the variance, or it can be necessary to know also its probability distribution. In this subsection it is explained how to determine the average and the variance, by determining the moments of the random variable Y. Subsequently the determination of the probability functions is explained. In 2.2.3 the kth moments are defined as the expected values of Xk. If in equation (2.21) Xk is replaced by an arbitrary function Y = g(X), the expected value of the variable Y is defined as:
E(Y) = Y =
Y f X ( X )d X =
g( X )f
(X )d X
(2.47)
For linear functions g(X):
Y = g( X )
applies. The kth central moment of the function Y is determined by:
(2.48)
E ( Y- Y ) =
k
( Y- ) f
k Y
(X )d X
(2.49)
The variance of Y is therefore:
Var(Y) = E ( Y - Y ) =
2
(Y - ) f
2 Y
(X ) d X
(2.50)
For the linear function Y = a X + b:
Var(Y) = E ( g(X) - g( X ) ) = E (a X + b - a X - b ) 2 = a 2 2 X
2
) (
(2.51)
For non-linear functions the function can be approximated by means of a Taylor-polynomial in the average of X. This is denoted as:
2-14
g(X) g( X )+
dg( X ) 1 d 2 g( X ) ( X - X) + (X- X ) 2 + ... 2 dX 2! dX

(2.52)
... +
1 d n g( x ) ( X - X) n n n! dX
Subsequently, the expected value of the polynomial can be determined as an approximation of the expected value of Y:
dg(X) 1 d2 g(X) 1 dn g(X) E(Y) g(X)+ ( X - X) + ( X - X)2 + ( X - X)n fX ( X ) d X 2 n dX 2! d X n! d X - 2 n 1 d g(X) 1 d g(X) = g(X)+ E ( X - X)2 + E ( X - X) n 2 2! d X n! dX n
(2.53)
The approximation of the expected value of Y is therefore a function of the average and the central moments of X. When a function is approximated by the first two terms of the Taylor-polynomial one speaks of a linear function:
g( X ) g( X ) +
dg( X ) ( X - X) dX
(2.54)
The expected values of Y can be approximated by:
E(Y) g( X )
(2.55)
This approximation is known as the Mean Value approximation. By substituting the derivative of g(X) for a in equation (2.51) an approximation for the variance of Y is found:
dg( X ) 2 Var(Y) X dX
In the preceding the average and the variance of Y can be determined exactly or approximated.
(2.56)
Hereafter it is explained how to determine the probability functions. If the function g(X) is monotonous, the probability distribution function is given by:
FY (Y ) = P(Y Y ) = P(X g -1(Y )) = FX (g -1(Y ))
(2.57)
The probability density function of Y can be derived by substituting equation (2.57) in equation (2.46). This gives:
-1 dFY (Y ) dFX g (Y ) d g -1(Y ) dX -1 f Y (Y ) = = = f X g (Y ) = fX ( X ) dY dY dY dY
(2.58)
This is illustrated in figure 2.10 for a function Y =
Xn.
2-15
Xn
f Y (Y )
dY dX
fX ( X )
Figure 2.10 Probability density functions for random variables X and Y=g(X)
The properties of a random variable Y, which is a function of another random variable X, can be determined using these equations.
2.4.2. FUNCTIONS OF SEVERAL RANDOM VARIABLES (FROM
A given random vector Y = (Y1, Y2, ..., Yn) is a function g = (g1, g2, ..., gn) of X = (X1, X2, ..., Xn), so that:
Yi = g i (X1 , X 2 , ..., X n )
(2.59)
In 2.4.1 formulae are given for the expected value and the probability density function of a function of a random variable. Completely analogously, the expected value and probability density function for a function of a random vector can be determined. The expected value of Y is determined by:
E(Y )=
... g ( X ) f X ( X ) d X 1... d X n
-
(2.60)
The probability density function of Y is given by:
2-16
X 1 Y1
X 1 Yn
(2.61)
f Y (Y ) = f X ( X )
. . X n Y1
. . X n Yn
The determinant with the partial derivatives is known as Jacobi's determinant of the Jacobian and is denoted as J. Equation (2.61) can now simply be written as:
f Y (Y ) = f X ( X ) | J |
(2.62)
Though the formulation seems relatively simple, the determination of f Y Y usually requires a lot of calculations in practice. The foregoing will be clarified with an example.
( )
EXAMPLE 2.1 The question is to determine the probability density function of the function Y1 = X1 + X2, in which X1 and X2 are independent uniformly distributed random variables in the interval (0,1). X1 and X2 are graphically represented in figure 2.11
1 X1
Figure 2.11
X2
Random variables X1 and X2.
Auxiliary variable Y2 is defined as Y2 = X1. X1 and X2 can be written as a function of Y1 and Y2, namely: X1 = Y2 X2 = Y1 - Y2 = Y1 - X1 The Jacobian is:
J=
0 1 = -1, thus | J |= 1 1 -1
The probability density function of Y1 if found by integration: 2-17
f Y1 (Y1 ) = f X ( X 1 ,Y1 X 1 ) | J | d X 1
-
As it is given that X1 and X2 are independent:
f X ( X 1 , Y1 X 1 ) = f X1 ( X 1 )f X2 (Y1 X 1 )
From which follows:
f Y1 (Y1 ) =
X1
( X 1 )f X2 (Y1 X 1 ) d X 1
In a couple of cases this integral can be solved analytically. The solvability is dependent on the marginal probability density functions of X1 and X2. In this example it has been assumed that both X1 and X2 are uniformly distributed in the interval (0,1), so:
if 0 X1 1 then f X1 ( X 1 ) = 1 and otherwise f X1 ( X 1 ) = 0 if 0 Y1 X1 1 then f X2 (Y1 -X 1 ) = 1 and otherwise f X2 (Y1 -X 1 ) = 0

When integrating, these limits, within which the probability density functions are defined, have to be observed. This leads to the following solution:
if 0 Y1 1 then : f Y1 (Y1 ) = 1 1d X 1 = Y1
0 1
Y1
if 1 Y1 2 then : f Y1 (Y1 ) =
Y1 1
1 1d X
= 2 Y1
and otherwise : f Y1 (Y1 ) = 0

See figure 2.12
f Y1 (Y1 )
1 2 Y1
Figure 2.12
Probability density function of Y1.
2-18
2.4.3 FUNCTIONS OF A RANDOM VECTOR (FROM is defined as a function of a random vector X :
In many cases the risk-analyst is interested in the probability distribution of a random variable Y which
Y = g(X) | g :
(2.61)
Often this form applies for the limit state function Z (see chapter 5) The probability density function can be found by first considering a function from Suppose:
n
Y1 = g X and Yi = X i 1 ,
( )
for i = 2, 3, , n
(2.62)
If the function g X is monotonous, it is possible to define the following inverse functions:
( )
-1
X i = Yi +1
for
i = 1, 2,..., n-1
X n = g (Y) = h(Y) = h(Y1 , Y2 ,..., Yn ) = h(Y1 , X1 , X 2 ,..., X n 1 )

The Jacobian can be defined by partial differentiation of the functions of Xi:
(2.63)
0 0 J= . . X n Y 1
X 1 Y2
0
X 2 Y3
0 0 . . X n . Yn
(2.64)
The probability density function of Y is:
f Y (Y ) = f X ( X ) | J | = f X ( X 1, X 2, ..., X n ) | J | = f X ( X 1, X 2, ..., h (Y 1, X 1, ..., X n - 1)) | J |

From the probability density function of Y , the marginal probability density function of Y1 can be determined by means of integration:
(2.65)
2-19
f Y1 (Y1 ) =
- -
... f Y (Y ) d Yn ... d Y3 d Y2
-
(2.66)

- -
... f X ( X1 , X 2 ,..., h (Y1 , X1 , X 2 ,..., X n 1 ) | J | dX n 1...dX 2dX1

-
It is also possible to determine the probability density function of Y1 by first calculating the probability distribution function and subsequently differentiating it. Calculating the probability distribution function can be done with the help of the total likelihood theorem (see appendix A5). The formulation of the probability distribution function is:

F Y1 (Y1 ) = =
in which:
- -
... P(Y < Y | X )f

1 1 -
( X 1 , X 2 ,..., X n )d X 1 d X 2 ... d X n
(2.67)
- -
... 1(g( X ) - Y )f
1 -
( X 1 , X 2 ,..., X n )d X 1 d X 2 ... d X n
1 g X Y = 1 1 1 g X Y1 = 0
(( ) ) (( ) )
if if
g X Y1 < 0;
1
( ) g ( X ) Y > 0.
The probability density is now found by differentiation. This method for determining the probability density function and the probability distribution function of Y1 is particularly suitable for application in numerical methods.
Using the marginal probability density function the expected value and the variance of Y1 can be calculated. In the case of a linear function it is possible to calculate these values without determining the probability density. In this case the function Y1 is:
Y1 = g(X) = a1X1 + a 2 X 2 + ... + a n X n + b

The expected value of Y1 is then:
(2.68)
E(Y1 )=E(a1X1 + a 2 X 2 + ... + a n X n + b) =a1 E(X1 ) + ... + a n E(X n ) + b = a1 X1 + ... + a n Xn + b = g( X )

and the variance of Y1 is: (2.69)
2-20
Var(Y1 ) = E (Y1 - Y1 ) 2
)
))
2
= E (a1X1 + ... + a n X n + b) - (a1 X1 + ... + a n Xn + b)
((
= E a1 (X1 - X1 ) + ... + a n (X n - Xn )
((
))
2
(2.70)
n n = E a a (Xi - Xi ) (X j - X j ) i = 1j = 1 i j n n = a a Cov(X i , X j ) i j i = 1j = 1
If the function is non-linear, it can be approximated around an arbitrary point by the first two terms of the Taylor-polynomial:
Y1 = g(X) g(X 0 ) +
i =1
g(X 0 ) ( Xi X0i ) Xi
(2.71)
The expected value of Y1 can then be approximated by:
E(Y1 ) = E(g(X)) g(X 0 ) +

and the variance by:
g(X 0 ) ( X i - X 0 i ) Xi i =1
n
(2.72)
Var ( Y1 ) =
g(X 0 ) g(X 0 ) Cov(Xi , X j ) X j i = 1 j = 1 X i

n n
(2.73)
If the expected value of X is chosen for X 0 , a so called Mean Value approximation is used.
2.4.4 CENTRAL LIMIT THEOREM This theorem implies a special case of a function of a random vector in section 2.4.3 like
Y = Xi
i =1
An important property of random variables is given by the central limit theorem: "When a large number of independent random variables, of which none dominates, are added up, this results in a random variable that is normally distributed, irrespective of the starting distributions of the added variables. In figure 2.13 this is demonstrated for the sum of respectively 2, 3 and 4 random independent variables that are uniformly distributed between 0 and 1. Already the sum of 4 variables results in a distribution that is fairly similar to a normal distribution( except for the tails).
2-21
A consequence of the central limit theorem is that the sum of two normally distributed variables is once again normally distributed.
Figure 2.13
Central limit theorem.
Analogously to the central limit theorem for the sum of a large number of independent random variables, the product of a large number of independent random variables is lognormally distributed.
2.5
EXTREME VALUE DISTRIBUTIONS
Many applications in civil engineering concern the largest or smallest value of a group of random variables. For instance, a constructor would like to know the maximum wind load on a construction during the design period and not per storm. Also, for a construction consisting of a number of elements, where the weakest element determines the strength of the construction, he would like to know the minimum strength. In case of the wind velocity the group of random variables can be defined as:
X1: wind velocity X 2 : wind velocity X 3: wind velocity X n : wind velocity
year1 year 2 year 3 year n
From this number of variables the maximum or minimum value is distracted, these functions are the so called extreme value functions. They are written as:
Y = Min X i
i =1 n
= smallest value of X1 , X 2 ,..., X n

(2.74)
Y = Max X i
i =1
= larg est value of X1 , X 2 ,..., X n

2-22
If the variables X1 up to and including Xn are random, the extreme values are random variables too. The probability distributions of the variables X1 up to and including Xn are called the mother distributions. Those of the largest and smallest values are known as the extreme value distributions. This paragraph focuses on the extreme value distribution of a number of identically distributed random variables. This corresponds to the probability distribution of the largest or smallest value the variable can attain in a number of realisations. Using the mathematical laws for calculus of probability it is possible to determine the extreme value distributions from the mother distributions. Suppose that the mother distribution of a random variable X is known. The question is now what is the probability distribution of the maximum and minimum value of X for n realisations, given that the realisations do not influence each other. This is actually about n independent random variables with the same probability distribution. The probability that all realisations deliver values for X which are smaller than or equal to X, is given by:
P(X1 X X 2 X ... X n X ) = P ( X1 X ) P ( X 2 X ) ...P ( X n X )

Because P(X1 X) = P(X2 X) = ... = P(Xn X) = FX(X):
(2.75)
P(X1 X X 2 X ... X n X ) = ( FX ( X ) )
(2.76)
This defines the probability that n values of X for n realisations are smaller than or equal to X. The probability distribution for maximums is usually written as:
FXn ( X ) = ( FX ( X ) )
n
(2.77)
Analogously, the probability that all realisations deliver values that are larger than X, is determined by:
P(X1 > X X 2 > X ... X n > X ) = (1 FX ( X ) )
(2.78)
The probability that at least one of the realisations gives a value that is smaller than or equal to X, is complementary to the foregoing. The probability distribution of the minimum value of X with n samples is therefore:
F X1n ( X ) = 1- (1- FX ( X ) )
(2.79)
From the probability distributions of the extreme values for maximums and minimums the probability density functions can be determined by differentiating to X. The result is:
f X n (X) = n
d FX n (X) n dX
d FX n (X)
1
= n f X (X)( FX (X) )
n -1
(extreme value distribution for maximum)
(2.80)
f X1n (X) =
dX
= n f X (X)(1 FX (X) )
n -1
(extreme value distr. for minimum)
(2.81)
In figures 2.14 and 2.15 the probability density functions are given of respectively the maximums and minimums of X for a number of values of n. The mother probability density function of X is also drawn in both figures. The extreme value distribution does not have to be of the same type as the mother 2-23
distribution.
Figure 2.14 Probability density functions of the maximums of X. Figure 2.15 Probability density functions of the minimums of X.
For large values of n the extreme value distributions of the random variable approach a limited number of possible distribution types. These distributions are known as the asymptotic extreme value distributions (see appendix B6) and are subdivided in three main types. For the theoretical substantiation one is referred to [2.1] and [2.7]. It is noted that the convergence to the asymptotic extreme value distributions is very slow for increasing n, much slower than for example the convergence to a normal distribution. For the middle area of the mother distribution there usually is convergence, but for the tail of the distribution there is often hardly any convergence to speak of. Therefore one has to be cautious with assumptions concerning the type of the extreme value distribution.
2.6
ESTIMATING DISTRIBUTIONS
2.6.1 INTRODUCTION An important element in the risk analysis is the determination of the distributions of the random variables. The selection of the distribution type and the distribution parameters mainly determine the outcome of the analysis. If a large number of statistical data is known, for example hourly observations of the wind velocity during a long period frequentistic methods from classic statistics can be used. In many cases the amount of statistical data is inadequate. There are almost no statistical data available when the probability of a human failure is considered of the pilot of a bowing 747 landing at Schiphol airport in the dark when the weather's bad. Then one has to resort to more subjective methods for estimation of distribution types and parameters. Prior to making an estimate of the distribution of a random variable, a number of considerations have to be contemplated. In the first place distributions often have a theoretical base, which makes them suited or not to describe a certain phenomenon. The most well-known are: > > the central limit theorem according to which the sum of a large number of random variables is normally distributed and the product of the variables is lognormally distributed; maximum/minimum of a large number of independent random variables is often divided according to one of the asymptotic extreme value distributions.
2-24
In some cases the sought random variables are functions of other known random variables, of which the distribution types are known. For example the Rayleigh-distribution applies for the height of sea waves. Then one can derive a distribution theoretically. In other cases there are considerations that exclude certain distributions. For example, in theory, if a variable can only attain positive values, a normal distribution doesn't qualify. However, one mustn't be too strict in maintaining this argument and only apply it when the deviation coefficient is large: for example when the probability of a negative value is in the order of 10-8, there is no reason to dismiss the normal distribution for most applications. In general however, these considerations won't suffice or will at least require verification. This can be done in two ways, namely the classical and the Bayesian way. Both procedures will be described. Besides following the formal procedure it is always helpful to draw the found distribution and the observation material in a figure. Preferably, both the distribution and density function are drawn. On the face of it some conclusions are already possible, considerations can be involved that are difficult to formalise. For example, sometimes the right or left tail is important and sometimes only the middle area. In 2.6.2 up to and including 2.6.4 several methods for the estimation of parameters of a known distribution type are discussed and in 2.6.5 methods for choosing or rejecting a distribution type are mentioned. The estimated parameters are generally called estimates of statistics and are denoted with a ^. Thus
p is an estimate of p. The estimates of statistics for the average and the standard deviation are
usually denoted by m and s respectively.
2.6.2 SUBJECTIVE PARAMETER ESTIMATION Frequently there is a lack of statistical data concerning random variables, for example the earlier mentioned landing of a bowing. In such cases one relies on the experience and intuition of experts, supported by data from literature. The estimate of the properties of random variables gathered in this way is subjective and often liable to discussion. But even when a lot of statistical material is available, more knowledge than just the statistical data is necessary to estimate the probability distribution of a variable. Some subjectivity can not be ruled out therefore. In general one attempts to define a lower and an upper limit, within which the value of a variable is most probable, and to define a most likely value. In such cases, most probably is usually understood to mean "with a probability of 95 %. This gives two points of the probability distribution, namely:
F(upper limit) = 97.5 % F(lower limit) = 2.5 %

Obviously, other limits can be chosen too. If the distribution type of the variable is known, the parameters of the probability distribution can be estimated, using the chosen values. In most cases, the selection of the distribution type is also based on experience, intuition and literature. It is advisable to base the choice of the distribution type on an analysis of the physical factors that influence the value of the random variable.
(2.82)
2.6.3 FREQUENTISTIC PARAMETER ESTIMATION (SEE ALSO ANNEX C) If the distribution type of a random variable is known and observations of the variable are available, the parameters of the distribution can be estimated by means of frequentistic methods. The estimate of a parameter is a function of the observations and can be written as:
p = g( X ) = g ( X 1 , X 2 ,..., X n )
2-25
(2.83)
In which:
X is a vector with the results of random sample surveys (observations) of X.

Because X is a random variable, the observations can also be considered random variables in advance. The estimate of statistic p is therefore a function of n random variables. Based on the expected value and the standard deviation of p several properties can be defined. The estimate of statistic p for a parameter p is called an unbiased estimate of statistic if the expected value of g( X ) equals p, so if E(g( X )) = E( p ) = p.
EXAMPLE 2.2 From a random sample of n observations the average mX is determined as an estimate of the average of the distribution of X, thus:
mX =
(X1 + X 2 + ... + X n ) n
The expected value for mX is:
E(m X )= = =
( X 1 + X 2 + ... + X n ) f X ( X )d X n -
1 X 1f X ( X ) d X + X 2 f X ( X ) d X + ... + X n f X ( X )d X n - - -
1 ( X1 + X 2 + ... + X n )= X n
From this it follows that mX is an unbiased estimate of X. If E( p ) = p is only valid for great values of n, then p is an asymptotic unbiased estimate of statistic. Sometimes a number of unbiased estimates can be defined for one parameter of a probability distribution and not only one. The average of a normal distribution for instance, can be estimated with: > > > the average of the random sample; the median of the observations; the average of the highest and the lowest observation.
The difference between the estimates of statistics lies in the standard deviation of mX of the three above estimators. The average of the random sample has the smallest standard deviation and is therefore the most efficient estimate of statistic. It is illustrative to deal with two often used estimators for the standard deviation:
2-26
( X i - X ) 2 sX = n sX = ( X i - X ) 2 n -1
(2.84)
(2.85)
If the average is unknown, the first mentioned estimator is biased and the latter is unbiased. However the biased estimate does have the smallest expected value of the mean square error. Some methods of frequentistic parameter assessment are: > > > the method of moments; the method of the maximum likelihood; the least square method.
Appendix C elaborates on these methods.
2.6.4 BAYESIAN PARAMETER ESTIMATION The Bayesian parameter estimation is a mix of the subjective and the frequentistic parameter estimation. First, one determines the so-called a priori probabilities that a number of hypotheses concerning the parameters to be estimated, are true. These hypotheses reflect the knowledge when no statistical data are available yet. These probabilities are subjective and thus form the debatable side of the Bayesian analysis. For example statistical data of a dutch laboratory aren't available yet, but statistical data from a german laboratory are used to formulate hypothesis concerning the requested parameters. Now the a-priori probabilities are estimated. When the statistical data from the dutch laboratory are available, the objective material, they are combined with the a priori parameters by means of a standard procedure, to come to the so-called a posteriori parameters. This procedure is described in appendix C. The a posteriori probabilities give the likelihood of the posed hypotheses concerning the parameters to be estimated.
2.6.5 SELECTION OF DISTRIBUTION In the foregoing it was presumed that the probability distribution type of the considered random variable was already known. Usually, this probability distribution type is not known however. The choice of the type depends on the knowledge concerning the random variable. Often a subjective estimate of the type will have to be based on the intuition and knowledge of experts and on data from literature. If statistical data are available, an estimate of the distribution type can be made by employing frequentistic methods. One of these methods uses the estimates of the standardised asymmetry or skewness and the standardised kurtosis or sharpness. The central moments of the distribution of a random variable can be estimated by determining the moments of the statistical material:
Xi Xi n n mk= n i =1
(2.86)
2-27
in which:
mk Xi n
is the estimate of the k th central moment; is the observation i of X; is the number of observations of X.
With help of the estimates of statistic for the central moments it is possible to estimate the so-called standardised skewness and the standardised kurtosis of the distribution:
1 =
m3 ( m 2) 3 2
respectivel y
m4 2 = (m 2) 2
(2.87)
For various known distribution types the relations between the standardised skewness (1) and the standardised kurtosis (2) have been investigated by Pearson. Some of these relations are given in
figure 2.16. Based on the estimates of statistics 1 and 2 of

a distribution type.
and
2 , this figure can help select
Figure 2.16
Relations between 1 and 2 of different distribution types (after Professor E.S. Pearson, University College, London).
The practice has revealed that, even if a lot of observations are available, it is almost impossible to find the exact distribution. Using figure 2.16 a number of possible distributions are usually selected. From these chosen distributions a further selection can be made by means of a number of tests. Appendix C describes two tests, namely the chi-square-test and the Kolmogorov-Smirnov-test. The previously named tests are based purely on the statistical material, they don't take other knowledge, relevant for the choice of the distribution type, into account. A method with which this knowledge can be taken into account, is the Bayesian procedure for the selection of the distribution type. This procedure is also described in appendix C.
2-28
LITERATURE Recommended literature: 2.1. GUMBEL, E.J., Statistics of extremes. Columbia University Press, 1958. Consulted literature: 2.2. BENJAMIN, J.R. and C.A. CORNELL, Probability, statistics and decision for civil engineers. McGraw-Hill, 1970. 2.3. BOLOTIN, V.V., Statistical methods in structural dynamics. Holden-Day, San Francisco, 1989. 2.4. GROENEBOOM, P et al, Statistics and operational analysis. (in Dutch: "Statistiek en operationele analyse.") Technical University Delft, Delft, 1993. 2.5. NOWAK, A.S. and R.K. COLLINS, Reliability of Structures. McGraw-Hill, 2000. 2.6. SCHNEIDER, J. and H.P. SCHLATTER, Sicherheit und Zuverlssigkeit im Bauwesen. Verlag der Fachvereine an den schweizerischen Hochschulen und Techniken AG, Zrich, und B.G. Teubner Verlag, Stuttgart, 1994. 2.7. VRIJLING, J.K. and P.H.A.J.M. van GELDER, Probabilistic design in Hydraulic engineering. Lecture notes CT5310, 2002.
2-29
2-30

Chapter 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2

Uploaded by

Copyright:

Available Formats

The "Bijlmer disaster"; an El-Al freight plane crashed into a block of flats in the Bijlmermeer (6-10-'92).

PROBABILITY IN CIVIL ENGINEERING

Chapter 2: Probability calculus

2 . 1 . 1 L A P L A C E ' S D E F I N IT I O N O F P R O B A B I L I T Y Laplace's (1812) classic definition of probability reads:

PROBABILITY IN CIVIL ENGINEERING

P(E i ) = P(E i ), if the events E i are disjoint.

PROBABILITY IN CIVIL ENGINEERING

Event Random variable

< < <

0.5 1.0 1.5 1.5

PROBABILITY IN CIVIL ENGINEERING

Probability mass function and probability distribution function.

PROBABILITY IN CIVIL ENGINEERING

Probability density function and probability distribution function.

PROBABILITY IN CIVIL ENGINEERING

PROBABILITY IN CIVIL ENGINEERING

First central moment.

PROBABILITY IN CIVIL ENGINEERING

Skewness and kurtosis.

f X1 ( X 1 ) and f X2 ( X 2 ) are given.

PROBABILITY IN CIVIL ENGINEERING

Joint probability density function

The joint probability density function f X X

( ) has the shape of a hill. It reveals that there's a certain

and the variances can be found with:

PROBABILITY IN CIVIL ENGINEERING

f X1 ,X2 (X1 , X 2 ) f X1 (X1 )

If the variables X1 and X2 are statistically independent, then:

(X1X 2 ) d X1X 2 (X1 )f X2 (X 2 ) d X1dX 2

= X 2 f X2 (X 2 ) X1f X1 (X1 ) d X1X 2

PROBABILITY IN CIVIL ENGINEERING

distribution function for a vector reads:

FX (X) = FX1 ,X2 ,...,Xn ( X 1 , X 2 ,..., X n )

Obviously, the reverse applies:

... f X x dx n ...dx 2dx1

... f X (x) d x n ...d x 2 d x1

The marginal probability density function of X1 can be found by partial differentiation:

... f X x d x n ...dx 3dx 2

f X (X) = f X1(X1) f X 2|X1(X 2 | X1)....f Xn | X n 1....X 2 X1 (X n X n 1....X 2 X1 )

PROBABILITY IN CIVIL ENGINEERING

Var(X1 ) Cov(X 2 , X1 ) C XX = . . Cov(X , X ) n 1

Cov(X1 , X 2 ) Cov(X1 , X n ) Var(X 2 ) Cov(X 2 , X n ) . . . . Cov(X n , X 2 ) Var(X n )

PROBABILITY IN CIVIL ENGINEERING

FUNCTIONS OF RANDOM VARIABLES

is the density. Because v is a random variable, as a result p is also a random

For linear functions g(X):

The variance of Y is therefore:

For the linear function Y = a X + b:

PROBABILITY IN CIVIL ENGINEERING

dg( X ) 1 d 2 g( X ) ( X - X) + (X- X ) 2 + ... 2 dX 2! dX

The expected values of Y can be approximated by:

FY (Y ) = P(Y Y ) = P(X g -1(Y )) = FX (g -1(Y ))

This is illustrated in figure 2.10 for a function Y =

PROBABILITY IN CIVIL ENGINEERING

2.4.2. FUNCTIONS OF SEVERAL RANDOM VARIABLES (FROM

The probability density function of Y is given by:

PROBABILITY IN CIVIL ENGINEERING

Random variables X1 and X2.

The probability density function of Y1 if found by integration: 2-17

PROBABILITY IN CIVIL ENGINEERING

As it is given that X1 and X2 are independent:

if 0 X1 1 then f X1 ( X 1 ) = 1 and otherwise f X1 ( X 1 ) = 0 if 0 Y1 X1 1 then f X2 (Y1 -X 1 ) = 1 and otherwise f X2 (Y1 -X 1 ) = 0

and otherwise : f Y1 (Y1 ) = 0

Probability density function of Y1.