Professional Documents
Culture Documents
Reliability Engineering PDF
Reliability Engineering PDF
Chapter 4: Sampling
Sampling techniques, concepts of sampling, sampling plans, decisions based on
Samplings.
Learning objectives
1. Understanding
various
terms
connected
with
the
significance
of
incorporating
uncertainties in design
3. Understanding the use of probability considerations and
solve some problems
Motivation:
Importance of understanding the role of uncertainties in
engineering design
Highlights
Knowledge of terms such as uncertainty, randomness modeling
probability concepts, Bayes theorem
1. 1. Introduction
The trend in civil engineering today more than ever before is to provide
1. economical or robust design at certain levels of safety,
2. use new materials in construction. When newer materials are being used in civil
engineering design, there is a need to understand to what extent the structure is
safe, and
3. consider uncertainties in design. One has to recognize that there are many
processes such as data collection, analysis and design in civil engineering systems
which are random in nature. Design of many facilities such as: buildings,
foundations, bridges, dams, highways, airports, seaports, offshore structures,
tunnels, sanitary landfills, excavation etc. need to address the design issues
rationally.
The loading in civil engineering systems are completely unknown. Only some of the
features of the loading are known. Some of the examples of loading are frequency and
occurrence of earthquakes, movement of ground water, rainfall pattern, wind and ice
loadings etc. All these loading are random in nature, and at times they create overloading
situation. What we have been doing so far can be schematically shown as follows:
Sampling
Testing
Formula
Assumption ?
Experience
At all stages indicated above, there is an element of uncertainty with regard to the
suitability of the site in terms of soils, construction materials, which we transfer to a
different level using a set of expressions to obtain the desired quantities such as the floor
capacity, allowable loads in buildings etc.
flexure, soil unit weight, cohesion, angle of internal friction, pore water pressure,
compressibility and permeability.
For example in a so-called homogeneous soil, each parameter may vary significantly.
Moreover, natural media, i.e. earth masses are often heterogeneous and an isotropic and
the soil profile is complex due to discontinuities and minor geological details.
(b) The second group consists of uncertainties in loads. Under static loading conditions,
one is concerned with dead and live load and there are usually more uncertainties in
relation to live loads. Structures and soil masses may also be subjected to dynamic loads
from earthquakes, wind and waves. Significant uncertainties are associated with such
random loads. Often the uncertainties associated with static loads may be negligible in
comparison to those associated with material parameters. On the other hand, uncertainties
associated with dynamic loads may be of the same order of magnitude or even greater
than those associated with material parameters. It should also be noted that under
dynamic loads, the magnitude of material parameters may change significantly.
For example, the shear strength of a soil decreases during cyclic loading and, as such,
there are additional uncertainties concerning geotechnical performance.
(c) The third group consists of uncertainties in mathematical modeling and methods of
analysis. Each model of soil behavior is based on some idealization of real situations.
Each method of analysis or design is based on simplifying assumptions and arbitrary
factors of safetys are often used.
In this method of analysis or design one is concerned with relatively simple cause and
effect relationship. For each situation it is assumed that there is a single outcome; for
each problem a single and unique solution. Of course, one may not be able to arrive at the
exact solution and also unique solution may not exist. In such circumstances a
deterministic approach aims at obtaining approximate solution. Empirical and semiempirical methods have always been used in civil engineering although with varying
degrees of success. Finally, in deterministic method of analysis, uncertainty is not
formally recognized or accounted for one is not concerned with the probabilistic outcome
but with well defined outcomes which may or may not occur, that is, either a 100%
probability of occurrence or 0% without intermediate value.
For example, one may arrive at the conclusion that a foundation will be safe on the basis
that the safety factor, F, has a magnitude greater than one. On the other hand, one may
conclude that a foundation or a slope is not safe on the basis that the magnitude of the
factor of safety F is less than one. A given magnitude of F. e.g. F = 2.5 represents a
unique answer to a problem posed in specific terms with certain unique values of loads
and of shear strength parameters. In conventional analysis one is not concerned with the
reliability associated with this unique value.
that may occur, The probability would be any percentage between p = 0% and p=100%
or any fraction between p = 0 and p=1. In a specific problem the number of likely
outcomes may be limited and it may be possible to consider the probability of each
outcome.
1.4 Risk and reliability
In engineering practice, we routinely encounter situations that involve some event
that might occur and that, if it did, would bring with it some adverse consequence. We
might be able to assign probability to the occurrence of the event and some quantified
magnitude or cost to the adversity associated with its occurrence. This combination of
uncertain event and adverse consequence is the determinant of risk. In engineering
practice to assess risk, three things need to be defined.
1. Scenario,
2. Range of consequences,
3. Probability of the events leading to the consequences.
Based on the above, the risk analysis attempts to answer three questions:
1. What can happen?
2. How likely is it to happen?
3. Given that it occurs, what are the consequences?
Thus, in engineering, risk is usually defined as comprising:
-------------------(1)
The term risk is used, when more than one event may lead to an adverse outcome then
above equation is extended to be the expectation of consequence over that set of events:
Risk = i p i ci
-------------------(2)
Table 1 Average risk of death of an individual from various human caused and
natural accidents (US Nuclear regulatory Commission 1975)
Accident Type
Total Number
Motor Vehicles
55,791
1 in 4,000
Falls
17,827
1 in 10,000
7,451
1 in 25,000
Drowning
6,181
1 in 30,000
Firearms
2,309
1 in 100,000
Air Travel
1,778
1 in 100,000
Falling Objects
1,271
1 in 160,000
Electrocution
1,148
1 in 160,000
Lightning
160
1 in 2,500,000
Tornadoes
91
1 in 2,500,000
Hurricanes
93
1 in 2,500,000
All Accidents
111,992
1 in 1,600
Table 2 : Average risk to society of multiple injuries of deaths from various humancaused and natural accidents (US Nuclear Commission 1975)
Type of event
Probabilities of 100 or
more fatalities
Airplane Crash
1 in 2 yrs
1 in 2000 yrs
Fire
1 in 7 yrs
1 in 200 yrs
Explosion
1 in 16 yrs
1 in 120 yrs
Tonic gas
1 in 100 yrs
1 in 1000 yrs
Tornado
1 in 5 yrs
Very small
Hurricane
1 in 5 yrs
1 in 25 yrs
Earthquake
1 in 20 yrs
1 in 50 yrs
Meteorite impact
1 in 100,000 yrs
1 in I million yrs
Human Caused
Natural
Voluntary
Involuntary
Factor II : Observable vs. Unobservable
Observable
Unobservable
Known to those exposed
Unknown to those exposed
Effect immediate
Effect delayed
Old risk
New risk
Risk known to science
Risk unknown to science
1.5.F-N Charts
In a simple form, quantitative risk analysis involves identification of risks and
damages/fatalities. It is recognized that in many cases, the idea of annual probability of
failure, depending on F-N relationships (frequency of fatalities (f), and number of
fatalities (N)) is a useful basis. In UK, risk criteria for land use planning made based on
F-N curves (frequency - Number of fatalities) on annual basis suggest lower and upper
limits of 10-4 and 10-6 per annum for probability of failure or risk. Guidance on rRisk
assessment is reasonably well developed in many countries such as USA, Canada and
Hong Kong. Whitman (1984) based on the collected data pertaining to performance of
different engineering systems categorized these systems in terms of annual probability of
failure and their associated failure consequences, as shown in Fig.1.
10
11
1.E-02
1.E-03
Unacceptable
1.E-04
1.E-05
1.E-06
ALARP
1.E-07
Acceptable
1.E-08
1.E-09
1
10
100
1000
10000
Number of fatalities, N
Figure 2. FN diagram adopted by Hong Kong Planning Department for planning
purposes.
The slope of the lines dividing regions of acceptability expresses a policy decision
between the relative acceptability of low probability/high consequence risks and high
probability/low consequence risks. The steeper the boundary lines, the more averse are
the policy to the former. The boundary lines in the Hong Kong guidelines are twice as
steep (in log-log space) as the slopes in the Dutch case. Also that, in the Hong Kong case
there is an absolute upper bound of 1000 on the number of deaths, no matter how low the
corresponding probability.
The role of probabilistic considerations is recognized in Corps of Engineers USA and
guidelines on reliability based design of structures is suggested. Fig.3 presents the
classification. The annual probability of failure corresponds to an expected factor of
safety E(F), which is variable and the variability is expressed in terms of standard
deviation of factor of safety F. If factor of safety is assumed to be normally distributed,
reliability index () is expressed by
12
( E (F) 1.0)
F
(3)
Fig. 3 Relationship between reliability index () and probability of failure (pf) (Phoon
2002) (adapted from US Army Corps of Engineers 1997).
1.6
The role of consequence costs is realised in recent times and has been receiving
considerable attention in the geotechnical profession. Recently, Joint Committee on
Structural Safety (JCSS 2000) presented relationships between reliability index (),
importance of structure and consequences of failure. The committee divided
13
consequences into 3 classes based on risk to life and/or economic loss, and they are
presented in Tables 4 and 5 respectively.
From these tables, it can be inferred that if the failure of a structure is of minor
consequence (i.e., C*2), then a lower reliability index may be chosen. On the other hand,
if the consequence costs are higher (i.e., C* = 5 to 10) and if the relative cost of safety
measures is small, higher reliability index values can be chosen. It can also be noted
from the tables that reliability index in the range of 3 to 5 can be considered as acceptable
in design practice.
Table 4. Relationship between reliability index (), importance of structure and
consequences (JCSS 2000)
Relative cost of
safety measure
Minor consequence
of failure
Moderate consequence
of failure
Large consequence
of failure
Large
= 3.1
= 3.3
= 3.7
Normal
= 3.7
= 4.2
= 4.4
Small
= 4.2
= 4.4
= 4.7
Consequences
C*
Minor
Moderate
2 < C* 5
Medium or considerable
Large
5 < C* 10
High or significant
where C* is the normalized consequence cost (normalized with respect to initial cost).
From Tables 4 and 5 the following aspect points are clear.
14
1. The targeted reliability indices vary from 3 to 5, depending on the expected level of
performance.
2. Consequence costs can also be considered in the analysis. If the consequence costs
are not significant compared to initial costs (C*2) (for example slope design in a
remote area), lower reliability index can be allowed, whereas higher reliability index
is required where the consequence costs are high (for example slope in an urban
locality).
Axioms of probability
A popular definition of probability is in terms of relative frequency of an outcome A
occurs T times in N equally likely trials,
P[A] = T / N
It is implies that if large number of trials were conducted this probability is likely. As the
concept of repeated trials does not exist in civil engineering, subjective interpretation is
considered, which it implies that it is a measure of information as to the likelihood of an
occurrence of an outcome. The three axioms of probability are given by
Axiom I:
0 P[ A] 1
Axiom II:
Axiom III:
outcomes are mutually exclusive, if they cannot occur simultaneously. The axiom states
that
P[A1+A2+..+AN] = P[A1] + P[A2] + P[A3] + .+ P[AN]
(4)
15
As an example consider the design of structure. After construction only two outcomes are
possible either success or failure. Both are mutually exclusive, they are also called
exhaustive and no other outcome is also possible. As per axiom III,
P[Success] + P[ Failure] = 1
The probability of success of the structure is reliability is given by
R + P[Failure] = 1 or
R = 1 - P[Failure]
(5)
16
Venn diagram
A sample space is represented by a rectangle, an event (E) is represented by a closed
region. The part outside is complimentary event E
E
Combinations of events
17
Example
E1
Occurrence
of rainfall
E2
E1E2 Intersection
Occurrence of
earthquake
If an event E1 occurs n1 times out of n times , it does not occur n2 times. i.e. n2 = n- n1 for
which the probability of non-occurrence being
P[ E1 E2 ] =
n2
n
n1 n2
+
= P[E1] + P[E2]
n
n
(6)
(7)
18
By multiplication rule
A
E
P P[Ei ] = P[A]P i
A
Ei
(8)
A
E
The above equation gives the probability of occurrence of P i if P is known I.
A
Ei
A
P P[E i ]
E
E
P i = i
P[A]
A
Hence
(9)
P[ E1 E 2 ]
= P1 [ E1 ]
P[ E 2 ]
P[ E1 E 2 ] = P[ E1 ]P[ E 2 ]
(10)
Conditional Probability
The occurrence of an event depends on the occurrence (or non-occurrence) of another
event. If this dependence exists, the associated probability is called conditional
probability. The conditional probability E1 assuming E2 occurred
E
P 1 means the
E2
19
E P(E1 E 2 )
P 1 =
P (E 2 )
E2
E1
E2
E1E2
P[ A ] = 1
i =1
For the sample space N=5 and there is an other event B which intersects A2, A3 ,A4 and
A5 but not A1. For example, the probability of joint occurrence B and A2
B
= P[BA2 ] = P[ A2 ]P
A2
The probability of joint occurrence of B is dependent on the outcome of A2 having
occurred. Since A2 precipitates that past of B that it overlaps, It is said to be a prior event.
The occurrence of the part of B that overlaps A2 is called posterior. Now considering that
we need to determine the occurrence of B as it is a joint event with A2, A3 ,A4 and A5,
one can write,
20
N
B
P[B ] = P[Ai ]P
i =1
Ai
(11)
Where i is from 2 to 5
Hence
B
P[Ai ]P
A
Ai
P i =
B
B N
P[A]P
i =1
Ai
(12)
This is called Bayesian theorem. This equation is very useful in civil engineering and
science where in based on the initial estimates, estimates of outcome of an event can be
made. Once the results of the outcome known, this can be used to determine the revised
estimates. In this probability of the event B is estimated knowing that its signatures are
available in events Ai .
21
Example 1
Three contractors A , B, and C are bidding for a project. A has half the chance that B has.
B has two thirds as likely as C for the award of contract. What is the probability of each
contractor, if only he gets the contract?
Answer:
There are three contractors, one will be successful
P[A + B + C] = 1
As they are mutually exclusive, P[A] + P[B] + P[C] = 1
But, P[A] =
1
2
P[B] and P[B] = P[C]
2
3
1
3
P[B]+ P[B]+ P[B] = 1
2
2
P[B] =
1
1 1 1
, P[A] = =
3
2 3 6
and P[C] =
3 1 1
=
2 3 2
Example 2
A load of 100 kg can be anywhere on the beam, consequently the reaction RA and RB can
be anywhere between 0 to 100 depending on the position of load.
1. Identify the possibilities of (10 R A 20 Kg ) and RA > 50
2. Identify the event of interest
Answer:
Possibilities are 1. (10 R A 20 Kg )
2. RA > 50
Event of interest: 1. P (10 R A 20 Kg ) =
2. P (RA > 50) =
10
= 0 .1
100
50
= 0 .5
100
Example 3
A contractor is planning to purchase of equipment , including bulldozer needed for a
project in a remote area. He knows that bulldozer can last at least six months without any
breakdown. If he has 3 bulldozers, what is the probability that there will be at least one
bulldozer after six months.
Answer:
If denote the condition of each bulldozer after six months as G Good and B Bad,
the possible combinations are:
There are at least three combinations out of 8, where there is possibility of having at least
one bull dozer after six months.
The probability of having at least one bulldozer in good condition after six months is
3
.
8
Example 4
In a bore log of interest,the soil profile the results are as follows; sand = 30% silty sand
= 25%, silty clay = 25% and clay = 20%. This is the arrived result of bore logs at many
places. However there is some doubt that not all soil sampling is reliable. The adequacy
is represented by sand = 27%, silty sand = 10%, silty clay = 30% and clay = 50%. What
is the reliability of information from sampling at one of the random points?
Answer
30% sand
27% sand
2 5 % s ilt y
sand
1 0 % s ilt y
sand
2 5 % s ilty
c la y
3 0 % s ilty
c la y
2 0 % c la y
5 0 % c la y
P[ASC ] = 0.25
P[AC ] = 0.20
B
P = 0.27
AS
B
P
= 0.10
A
SS
B
P
= 0.75
ASC
B
P = 0.50
AC
Example 5
A fair coin [ P[head] = P[Tail] = ] is tossed three times. What is the probability of
getting three heads.
Answer
START
HHH
HHT
H
HTH
HTT
THH
T
THT
TTH
TTT
Example 6
A sample from a pit containing sand is to be examined, if the pit can furnish adequate
fine aggregate for making road. Specifications suggest that the pit should be rejected if at
least one lot is not satisfactory of the four lots made from the pit. What is the probability
that sand from the pit will be accepted, if 100 bags are available and contain fine
aggregate of poor quality? Find the probability of acceptance.
Answer
Let Ei be the probability of finding that the sand can be accepted. For four lots, each time
sampling is done it is not replaced. After four trials,
P[acceptance] =
95 94 93 92
= 0.812
100 99 98 97
45 44 43 42
= 0.650
50 49 48 47
20 19 18 17
= 0.380
25 24 23 22
Hence it can be noted that sample size has influence on the acceptance criteria.
Example 7
Consider a 100 Km highway; assume that the road condition and traffic conditions are
uniform throughout, So that the accidents are equally likely anywhere on the highway.
(a) Find the probabilities of events A and B if A = an event denoting accidents between 0
to 30 km and B = an event denoting accidents between 20 to 60 km.
Answer
Since the accidents are equally likely along the highway , it may be assumed that the
probability of an accident in a given interval of highway is proportional to the distance of
travel . If the accident occurs along 100 km highway then;
P[A] =
30
100
and
P[B] =
40
100
(b) If an accident occurs in the interval (20, 60) what is the probability that an accident
occurs between 20 to 30 Km from A
20
30
60
100
10
10 P( A B)
P(A / B) =
=
= 100 = 0.25
40
P( B)
40
100
Example 8
Consider a chain system of two links. If the applied force is 1000 kg and if the strength
anywhere is less than 100 kg, it will fail. However probability of this happening to any
link is 0.05. What is the probability of failure of chain?
Answer
If E1 and E2 denote the probability of failure of links 1 and 2 , then failure of the chain is
P[E1 E2] = P[E1] + P[E2] P[E1E2]
= 0.5 + 0.5 P(E1 / E2) P[E1]
= 0.5 + 0.5 - 0.05 P(E2 / E1)
= 1- 0.05 P(E2 / E1)
This depends on the mutual dependence of E1and E2. For example if the links randomly
selected from two suppliers then E1 and E2 may be assumed to be statistically
independent. In such a case,
P[E2 / E1] = P[E2] = 0.05
hence
If the links are form the same manufacturer P[E1/ E2] = 1.0
P(E1 E2) = 0.1-0.05*0.1 = 0.05 Probability of one of the links
Hence the failure probability of the chain system ranges from 0.05 to 0.095 depending on
the conditional probability P[E2 / E1]
Example 9
The following problem demonstrates the use of Bayes theorem in updating the
information. Consider that Mohammad Gazini while planning his attack on India from
West Coast, assumed that he had 1% chance of winning (probability of win) if he
attacked. This represents the level of confidence in the first trial. Probability of win in the
subsequent trials can be obtained from Bayes theorem. Probability of win is improved in
the second and subsequent attacks and finally at the end of 17 trials, the probability of
win becomes 0.99 as given in the following .
Ans:
T
P * P ( A)
A
P(A/T) =
T
T
P * P ( A) + P P ( A )
A
A
1
2
3
(1 * 0.01)
(1 * 0.01) + (0.5 * 0.99)
(1 * 0.02)
0.02 + (0.5 + 0.98)
0.04
0.04 + 0.5 + 0.96
0.02
0.04
0.08
0.08
0.08 + 0.5 + 0.92
0.15
0.15
0.15 + 0.5 + 0.85
0.26
0.26
0.26 + 0.5 * 0.74
0.41
0.41
0.41 + 0.5 * 0.59
0.58
0.58
0.58 + 0.5 * 0.42
0.73
0.73
0.73 + 0.5 * 0.27
0.84
10
0.84
0.84 + 0.5 * 0.16
0.91
11
0.91
0.91 + 0.5 * 0.09
0.91
12
0.95
0.95 + 0.5 * 0.05
0.974
13
0.974
0.974 + 0.5 * 0.0256
0.974
14
0.987
0.987 + 0.5 * 0.013
0.993
15
0.993
0.993 + 0.5 * 0.0065
0.9967
16
0.9967
0.9967 + 0.5 * 0.0098
0.9983
17
0.9983
0.9983 + 0.5 * 0.005
0.9991
Learning objectives
1. Understanding various graphical methods of analysis like
histogram and frequency diagram.
2. Understanding the significance of significance of probability
distributions
3. Understanding the importance of moments of distribution.
Motivation:
Importance of understanding the function of probability theory and
conditional probability in engineering design
Highlights
Knowledge of graphical representation such as Histogram and
frequency diagram, Moments of distribution, Bayesian Theorem
2.1 Introduction
Civil Engineering systems deal with variable quantities, which are described in terms of random
variables and random processes. Once the system design such as a design of building is
identified in terms of providing a safe and economical building, variable resistances for different
members such as columns, beams and foundations which can sustain variable loads can be
analyzed within the framework of probability theory. The probabilistic description of variable
phenomena needs the use of appropriate measures.
Number
1
2
3
4
Depth
(m)
0.15
0.30
0.46
1.52
Total unit
weight
(Kg / m3)
1681.94
1906.20
1874.16
1585.83
(x-x)2
(Kg / m3)2
115.33
2050.36
1388.80
1209.39
(x-x)3
(Kg / m3)3
-310.76
23189.92
12936.51
-10503.30
Depth
(m)
52.43
2.29
1.52
6.71
Total unit
weight
(Kg / m3)
1521.75
1537.77
1585.83
1585.83
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
1.98
2.29
5.03
5.79
6.71
7.62
8.38
9.45
10.52
11.43
12.19
13.72
15.24
18.44
18.90
21.79
21.95
24.84
24.99
27.89
30.94
31.09
37.03
37.19
40.23
43.43
46.48
49.38
52.43
58.37
61.42
64.47
73.61
76.66
79.80
82.75
82.91
1617.86
1537.77
1826.10
1601.85
1585.83
1633.88
1601.85
1617.86
1617.86
1601.85
1617.86
1585.83
1601.85
1649.90
1617.86
1697.96
1746.01
1601.85
1665.92
1633.88
1697.96
1585.83
1633.88
1601.85
1617.86
1617.86
1665.92
1633.88
1681.94
1521.75
1858.14
1713.98
1794.07
1826.10
1746.01
1762.03
1746.01
716.03
2188.12
637.53
946.69
1209.39
517.40
946.69
716.03
716.03
946.69
716.03
1209.39
946.69
352.41
716.03
44.85
27.23
946.69
217.85
517.40
44.85
1209.39
517.40
946.69
716.03
716.03
217.85
517.40
115.33
2578.97
1106.88
8.01
297.94
637.53
27.23
84.90
27.23
-4791.12
-25573.47
4028.64
-7277.19
-10503.30
-2947.40
-7277.19
-4791.12
-4791.12
-7277.19
-4791.12
-10503.30
-7277.19
-1649.90
-4791.12
-76.89
36.84
-7277.19
-802.52
-2947.40
-76.89
-10503.30
-2947.40
-7277.19
-4791.12
-4791.12
-802.52
-2947.40
-310.76
-32714.50
9201.00
-4.81
1284.68
4028.64
36.84
198.63
36.84
13.72
31.09
5.79
8.38
11.43
15.24
24.84
37.03
1.98
9.45
10.52
12.19
18.90
37.19
40.23
7.62
27.89
34.14
46.48
18.44
24.99
43.43
98.15
0.15
49.38
21.79
30.94
82.91
61.42
85.80
21.95
76.66
82.75
79.71
89.00
64.47
94.95
1585.83
1585.83
1601.85
1601.85
1601.85
1601.85
1601.85
1601.85
1617.86
1617.86
1617.86
1617.86
1617.86
1617.86
1617.86
1633.88
1633.88
1633.88
1633.88
1649.90
1665.92
1665.92
1665.92
1681.94
1681.94
1697.96
1697.96
1697.96
1713.98
1729.99
1746.01
1746.01
1746.01
1762.03
1778.05
1794.07
1794.07
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
85.80
89.00
91.90
94.95
98.15
101.04
104.09
104.24
107.29
110.19
110.34
113.23
116.28
119.33
119.48
122.53
125.43
125.58
128.47
131.67
131.67
134.72
137.62
1697.96
1729.99
1778.05
2002.31
1794.07
1665.92
1810.09
1794.07
1810.09
1858.14
1986.29
1874.16
1826.10
1842.12
1826.10
1842.12
1826.10
1794.07
1842.12
1842.12
1794.07
1842.12
1906.20
44.85
1.60
176.20
4800.73
297.94
217.85
451.72
297.94
451.72
1106.88
4262.51
1388.80
637.53
856.99
637.53
856.99
637.53
297.94
856.99
856.99
297.94
856.99
2050.36
-76.89
0.00
581.47
83118.19
1284.68
-802.52
2401.17
1284.68
2401.17
9201.00
69531.33
12936.51
4028.64
6263.22
4028.64
6263.22
4028.64
1284.68
6263.22
6263.22
1284.68
6263.22
23189.92
104.09
125.43
131.67
101.04
104.24
5.03
73.61
113.23
119.33
122.53
116.28
119.48
125.58
128.47
134.72
58.37
107.59
0.46
110.34
0.30
137.62
110.19
91.90
1794.07
1794.07
1794.07
1810.09
1810.09
1826.10
1826.10
1826.10
1826.10
1826.10
1842.12
1842.12
1842.12
1842.12
1842.12
1858.14
1858.14
1874.16
1874.16
1906.20
1906.20
1986.29
2002.31
The histogram conveys important information about variability in the data set. It shows the range
of the data, the most frequently occurring values, and the amount of scatter about the middle
values in the set.
There are several issues to consider in determining the number of intervals for a histogram.
1. The number of intervals should depend on the number of data points. As the number of
data points increases, the number of intervals should also increase.
2. The number of intervals can affect how the data are perceived. If too few or too many
intervals are used, then the distribution of scatter in the data will not be clear.
Experimentation with different intervals is one approach in addition to the following equation
provides an empirical guide
k = 1 + 3.3 log10 (n)
Where k is the number of intervals and n is the number of data points. As an example k is equal
to 7 for the unit weight data set with n equal to 64.
Table 2 Frequency Plot Data for Total Unit Weight
Interval
Number of
occurrences
(c)
Frequency of
occurrences
(%)
(d)
Frequency
density
(% / Kg/m3)
(e)
Cumulative
frequency
(%)
(f)
Lower bound
(a)
Upper bound
(b)
1441.66
1505.74
1505.74
1569.81
0.78
1569.81
1633.88
21
33
8.20
36
1633.88
1697.96
14
3.52
50
1697.96
1762.03
2.34
59
1762.03
1826.10
13
20
5.08
80
1826.10
1890.18
14
3.52
94
1890.18
1954.25
0.78
97
1954.25
2018.33
0.78
100
2018.33
2082.40
100
64
100
25
n = 64
20
15
10
5
2018
1954
1890
1826
1762
1698
1634
1570
1506
0
1442
Number of occurence
25
Kg/m3
2018
1890
1826
1762
1698
1634
1570
1506
1954
n = 64
35
30
25
20
15
10
5
0
1442
Frequency of occurence
(%)
Kg/m3
Figure 1
Note, that the histogram and frequency plot have the same shape and convey the same
information. The frequency plot is simply a normalized version of the histogram. Because it is
normalized, the frequency plot is useful in comparing different data sets. Example frequency
plots are shown on Figs.2 through 2.5. Fig.2 which varies spatially shows the unit weight data.
Fig.3 shows an example of data that vary with time. The data are monthly average pumping rate
measurements versus time for the leak detection system in a hazardous waste landfill. The data
vary from month to month due to varying rates of leachate generation and waste placement.
F req u en cy o f
o ccu ren ce %
n = 23
35
30
25
20
15
10
5
0
0
2.63
3.95
5.26
6.57
Flow Rate
7.89
9.2
10.52
cc/sec
Fig.4 shows an example of data that vary between construction projects. The data are the ratios
or actual to estimated cost for the remediation of superfund (environmentally contaminated)
sites. The data vary between sites due to variations in site conditions, weather, contractors,
technology and regulatory constraints, Note that the majority of projects have cost ratios greater
than 1.0.
Frequency of occurence
%
n = 102
10
Fig.5 shows an example of data that vary between geotechnical testing laboratories. The data are
the measured friction angles for specimens of loose Ottawa sand. Although Ottawa sand is a
uniform material and there were only minor variations in the specimen densities, there is
significant variability in the test results. Most of this variability is attributed to differences in test
equipment and procedures between the various laboratories.
Frequency of occurence
%
n = 28
50
40
30
20
10
0
10
14
18
22
26
30
34
42
50
Friction angle
frequency density is called the frequency density plot. The objective in dividing the frequency by
the interval width is to normalize the histogram further the area below the frequency density plot
(obtained by multiplying the bar heights by their widths) is equal to 100%. This normalization
will be useful in fitting theoretical random variable models to the data.
The frequency densities for the unit weight data are calculated in Table 2 the frequency density
are % per the units for the data, which are % per Kg/m3 weight data. The resulting frequency
n = 64
15
05
.7
15
69
.8
16
33
.9
16
98
.0
18
26
.1
18
90
.2
19
54
.2
20
18
.3
20
82
.4
14
41
.7
Frequency Density
%/
Kg/m3
Kg/m3
Cumulative Frequency %
120
100
90 Percentile
80
60
50 Percentile
40
20
0
1441 1505 1569 1649 1778 1826 1874 1906 1986 2082
Total unit weight
Frequency of occurence
1.0E-06
1.8E-06
2.5E-06
3.3E-06
A second example of data for which a transformation is useful are undrained shear strength data
for a normally consolidated clay. A frequency plot of these data from an offshore boring in the
Gulf of Mexico is shown in Fig10. The data exhibit substantial variability with depth, ranging
from 2000 to 20,000 Kg/m2, however, and this frequency plot is misleading because much of the
variability can be attributed to the shear strength increasing with depth. In order to demonstrate
this trend, a scatter plot of the undrained
Frequency of occurence, %
-8
-7
-6
-5
Shear strength versus depth is shown on Fig.10. A more useful measure of undrained strength is
to normalize it by depth, as shown in Fig.11. This scatter plot shows that the trend with depth has
now been removed from the data, and the variability in the shear strength to depth ratio is much
smaller than that in the undrained shear strength alone. A frequency plot of the shear strength to
depth ratio is shown on Fig.12.
Frequency of occurence %
30
25
20
15
10
5
0
0
9765
17089
29295
f ( x) dx
this is essentially a weighted average .Other quantities that are used to denote the central
tendency include Mode and Median.
The mode x is the most probable value of a randomness variable, the value that has the
maximum probability or the highest probable density.
The median is the value of randomness variable at which values above and below are
equally probable.
In general, the mean, median and mode of random variable are different, if the density
function is not symmetric .However if the Probability Density Function (PDF) is symmetric and
unimodal, then quantities coincide.
2.6. Variance and Standard deviation
(x
x ) p x ( xi )
all xi
10
Var ( X ) =
(x
) 2 f x ( x) dx
Var ( X ) =
(x
2 x x + x ) 2 f x ( x) dx
Var ( X ) = ( X ) 2 2 X E ( X ) + X2
Var ( X ) = ( X 2 ) X2
Mode is the probable size which occurs most frequently .A sample may have more than
one mode and it is said to be multimodal. A cumulative distribution may show point of inflexion.
2.7. Moments
= fi
i =1
x=
fi
Suppose the discrete forces represent the probability of all possible occurrences, of N
=1
N
central tendency of the distribution .It is different from arithmetic mean (It may be equal in the
case of normal distribution, where in the expected value is obtained from probability of a random
variable) The following rules are the operation for expectation, since expectation is a linear
operator
1. E[ax + b] = aE[ x] + b
2. If x = x1 + x2 + x3 + ............xn
E[ x] = E[ x1 ] + E[ x2 ] + ........ + E[ xn ]
11
3. If
E[ f1 ( x) + f 2 ( y )] = E[ f1 ( x)] + E[ f 2 ( y )]
Again from statics
N
M.I. = Iy= ( xi x) f i
2
i =1
V[x1]=
(x
i =1
x) 2 f i
To use probability or a probabilistic model for formulating and solving a given problem,
one accepts the view that the problem is concerned with a random phenomenon or phenomena.
Significant parameters influencing the problem are random variables or are regarded as such.
A random variable is a function of the value(s) which identify an outcome or an event. A
random variable may be discrete or continuous or a combination of the two.
Each
numerical value of a random variable is associated with a probability measure. For example, if A
is a discrete random variable with values 1,2 and 3 then a value of A = 1 may have a probability
of 0.2 a value of A = 2 may have a probability of 0.3 and a value of A = 3 would have a
probability 0.5. (Note that the sum of these probabilities is unity.)
For a continuous random variable X, probabilities are associated with intervals on the real
line (abscissa). At a specific value of X (say X = x) only the density of the probability is defined.
The probability law or probability distribution is therefore defined in terms of a probability
density function denoted by PDF. Let fx(X) be the PDF of X. then the probability X in the
interval (a,b) is
b
12
Fx ( x ) = P( X x)
The CDF is extremely useful as we obtain a measure of probability directly, whereas to
obtain the probability measure from the PDF the area under the PDF has to be calculated. For
the continuous random variable we can write:
x
Fx ( x) =
( y )dy
dFx ( x)
dx
The probability that the values of X lie in the interval {x, (x + dx)} is given by fx(x) dx, that is,
P ( x < X x + dx) = f x ( x)dx = dFx ( x)
Figure 11 shows an example of a continuous random variable with PDF and CDF.
A function used to describe a probability distribution must be positive and the probabilities
associated with all possible values of the random variable must add up to unity. Therefore
Fx ( ) = 0, Fx (+ ) = 1.0, Fx ( x) 0
Note also that Fx(x) will never decrease with increasing x and that it is continuous with x.
Obviously the CDF is a continuous curve, the magnitude of the ordinate at the end of the curve
being unity. This represents the total area under PDF which is also unity, the total probability
associated with a random variable.
Consider now the corresponding terms with respect to a
13
Fx(x)
fx(x)
CDF
Discrete random variable. The CDF has the same meaning as for a continuous variable and the
same equation applies. However, instead of the PDF, the alternative to CDF is a probability
mass function denoted by PMF. The PMF gives the probability of the random variable for all its
discrete
values
(as
stated
for
the
variable
earlier
in
this
section).
Let X be a discrete random variable with PMF px(x1) = p(X=x1) in which x represents all the
discrete values of X, that is x1, x2, x3 etc. Then its CDF Fx(x) is given by
Fx ( x) = P( X x) =
( xi )
allxi x
14
0.4
PMF of X
0.3
Px (X1)
0.2
0.1
x
CDF of X
Fx(X)
1.0
0.6
0.4
0.1
x
Cumulative distribution function for the discrete case must satisfy the same conditions as in the
case of the continuous random variable. Thus the CDF is a continuous curve and is none
decreasing with increasing x.
The simplest continuous distribution is a uniform distribution that is a line parallel to the
horizontal or abscissa as shown in Figure above. Another relatively simple distribution is a
triangular distribution; a modification of the triangular distribution is a trapezoidal distribution. It
is useless consider some of these as examples before proceeding to well known and widely used
distributions such as the normal (or Gaussian) distribution, the lognormal distribution, the Beta
distribution and others.
15
and
its
probability
for
discrete
distribution.
The
mean value is called an expected value and it is also referred to as the first moment of a random
variable. The mean value of X is denoted by E(X) or x or x .For a continuous random variable
with PDF fx(x) we have
x
x = E ( x) = xf x ( x)dx
x
Other descriptors such as the mode and the median may also be used to designate the
central value of a random variable. The mode is the most probable value of a random variable
and the median is the value of the random variable at which the cumulative probability is 0.50 or
50. For a symmetric PDF with a single mode the mean, the median and the mode are identical.
But, in general, the values of all three may be different from one another.
16
The variance of a random variable is a measure of its dispersion and is defined as follows for a
continuous random variable
V ( x) = ( x x) 2 f x ( x)dx
Noting the form of the expression, the variance is also called the second central moment
2
2
of a random variable as it is the expectation of ( x x) or E ( x x) . By expanding the right-
In practice the standard deviation Sx (also denoted x) of a random variable is used in preference
to the variance primarily because it has the same units as the mean. We recall that:
S x = V (x)
An equation similar to the above may be written for a discrete random variable.
A relative measure of dispersion of a random variable is its coefficient of variation Vx which is
the ratio of the standard deviation to the mean that is:
Vx =
Sx
x
The coefficient of variation is a good parameter for comparing different random variable
as their spread or dispersion .In other words it is useful for comparing the variability or
uncertainty associated with different quantities.
The Third centre moment of a random variable is a measure of the asymmetry or skewness of
its distribution; otherwise it may be negative or positive .For a continuous random variable, the
expression is
x
E ( x x) = ( x x) 3 f x ( x)dx
3
17
Perhaps the best known and most used distribution is the normal distribution also knows as
Gaussian distribution. The normal distribution has a probability density function given by
1 x 2
f X (x ) =
exp
< x <
2
2
where = mean and = standard deviation of the variate are the parameter of the distribution.
The standard normal distribution: A Gaussian distribution with parameters = 0 and = 1.0 is
known as standard normal distribution denoted by N (0,1) the function accordingly becomes
f s (s ) =
1
exp
2
2s
1
Models arise as a result of relationships between the phenomenon of interest and its many
causes. The uncertainty as a physical variable may be as a result of combined effects of many
contributing causes. The small contributing factors are difficult to be quantified, at the same time
its overall behavior can be studied. The ability of to result in this shape to approximate the
distribution of sum of a number of uniformly distributed random variables is not coincidental. It
is due to central limit theorem.
Under very general conditions, as the number of variables in the sum becomes very large, the
distribution of sum of random variables will approach the normal distribution.
18
Continuous distribution
f(x)
x(a )
x(b )
b x
M
x (b )
( ) (x )dx = 1
x a
E [x ] =
x (b )
V [x ] =
x (b )
x (a )
x (a )
xf (x )
(x x ) f (x )dx
V [x ] = E ( x x )
[
]
= E [x ] 2 x E [x ] + x
= E [x ] 2 x + x
V [x ] = E [x ] x
= E x 2 x E [x ] + x 2
2
Uniform distribution
f(y)=c
19
a+b
2
= var iance
E[y ] =
[ ]
E y2
2
(
b a)
=
12
Useful when all the chances are equally likely and no informations are available
Cumulative distribution
Cumulative distribution is helpful in determining the probability that a random variable will take
a value less then or equal to a particular numerical value or a range of values.
2.10.1. The standard normal variate
f x ( x) < x< =
1
Sx
1 x x 2
exp
2
2 S x
fx(x)
N ( x, S x )
Area =
P(a<X<b)
b
x
20
A short notation
A very useful form of the distribution is one with a zero mean and unit standard deviation and is
referred to as the standard normal distribution. Thus if S is the standard normal random
variable (or simply variate), its PDF is (Figure 14)
f s ( s) =
1
exp s 2 , < s <
2
2
1
This is also denoted by N (0, 1) and is symmetrical about zero. Its cumulative distribution
function or CDF is often denoted by (s ) that is:
(s ) =F (s)=p
s
fs(s)
Probability = p
N(0, 1]
Where p is the probability p=P(S s). This probability p represents the area under the standard
normal distribution to the left of s, that is, from - o to a. This distribution is available in tables
and often values are given only for positive values of the standard normal variate. Thus values
will start from 0.5 for s =0 and approach unity for increasing positive values of s.
For negative values of the probability is obtained by subtraction from unity (the total area under
the distribution being unity). Hence, we have
( s) = 1 ( s)
This is obviously correct because the standard distribution is symmetrical about s=0.
The reverse calculation, that is determination of the value of the variate s for a given cumulative
probability p is often important and one may write
21
s = 1 ( p)
Returning to tabulated values, as noted earlier the tables usually contain the CDF for
positive values of the variate, s. Because of symmetry the CDF for negative values of a can be
simply obtained using equation ( s ) = 1 ( s ) . Positive values of the variates are associated
with CDF>O.5 or p>0.5. For values of p<0.5, the variates is given by
s = 1 ( p) = 1 (1 p)
[Note:
In some tables the values of cumulative probability start from zero even though only
positive values of the variate are considered. In such cases the user should add 0.5 to the
tabulated value for the left symmetrical half of the area].
2.10.2. Application of standard normal variate
The first step is to obtain the standard variables s from the given mean and standard deviation of
the random variable x, The relationship between x and s is obvious from the corresponding
expressions for PDF and we have
s=
xx
Sx
The probability that the random variable A lies between two limits a and b is given by the
probability that the standard normal variate lies between x and s. and we have;
P (a X b) = ( s 2 ) ( s1 )
b x
a x
S
S
x
x
2.10.3. Logarithmic normal distribution
Consider a random variable X which does not follow a normal distribution but
whose natural logarithm (In X) has a normal distribution. The variable X is then said to have a
logarithmic normal or log-normal probability distribution and its density function is
fx =
1 ln x 2
exp
2 x
2
1
,0 x
In which
22
= E (ln X ) = ln X
and
= Var (ln X ) = S ln X
are respectively the mean and standard deviation of ln X and are the parameters of the
distribution. Assumption of a lognormal distribution is often preferred to the assumption of a
normal distribution for random variables which must have a positive value. For instance the
factor of safety F is, by definition. a positive quantity. Therefore, it appears desirable to adopt F
as a lognormal variate than at a normal variate. Figure 16 shows a typical lognormal distribution.
It is easy to show that the tabulated values of the CDF of a standard normal distribution can be
used for a lognormal distribution as well. The probability of X being in the interval (a,b) is
ln b
ln a
P (a < X b) =
P ( X < 1) =
fx(x)
0
x
A random variable X has a logarithmic normal probability if lnX is normal; the density function
similar to normal distribution is as written as
23
f x (x ) =
1 ln X 2
exp
2 Gx
2 G
1
0 x
where
= E (ln X ) = mean
G = Var (ln X )
Where and G are the parameters of the distribution. Probability associated with a log-normal
variate can be determined from standard normal probabilities.
The probability that the variable X will assume values in an interval (a,b) is
P(a < x b ) =
1 ln X 2
exp
dx
2 Gx
2 G
1
let
s=
ln x
,
G
then dx = xG ds and
P(a < X b ) =
1
2
1 3
ln b s ds
G 2
ln a
G
ln b ln a
=
G G
since log-normal distribution can be evaluated using normal distribution itself and since the
value of the random variable are always positive , the log-normal distribution may be useful
where the value of the variates are strictly positive Ex: Intensity of rainfall
If the log-normal is bounded between y(a) and y(b) it is such that
y (a ) = exp( ) and y (b ) = exp(+ )
If E(x) and V(y) are the mean and CoV of a log normal then
24
1
2
= exp X + 2
1
2
= represents CoV as = Var (ln X )
= ln 2
= = CoV
if 0.3
1 (x q ) (b x )
B(q, r )
(b a )(q =r 1)
( q 1)
( r 1)
(a x b )
25
In which q and r are the parameters of the distribution and B(q,r) is the bets function
1
B(q, r ) = x (q 1) (1 x )
0
( r 1)
dx
T (q ) * T (r )
T (q + r )
Depending on the parameters q and r the density functions of the bets distribution will have
different shapes.
If the values of the variate are limited between 0 an d 0.1 (i.e. a=0 and b= 1.0) , then the above
equation for fx(x) becomes
f x (x ) =
1
( r 1)
x (q 1) (1 x )
B(q, r )
0 x 0.1
The shape of the distribution becomes important when one is concerned with very low
probability of failure. For example when one is concerned with high probabilities of safety or
high reliability > 0.99, then choice of distribution plays a dominating role
It also depends as the type of problem foe example for highway shallow cutting and mine works.
A reliability of 0.90 to 0.95 is also acceptable where as for earth dams and multi storeyed
buildings, the reliability should be in the range of 0.999 to 0.9999.
26
Q=2.0
10
r=6.0
12
The mean and the variance of the bets distribution are given by
x = a +
X2
q
(b a )
q+r
qr
(q + r ) (q + r + 1)
2
(b a )2
a = 5 b = 10
q
(10 5) = 7
q=r
q = 2r 3
5+
We have
qr
(q + r ) (q + r + 1)
2
q = 3.26
(b a )2 = (0.1 * 7 )2
and
r = 4.89
The beta distribution (Figure 17) is appropriate for random variables which have a finite range.
The uniform distribution, already considered earlier, is the simplest example of a beta
distribution. Considering the limits a and b for a random variable X. the PDF of a beta
distribution may be written in the form
fx (X ) =
( q,r )
(x a )q 1 (b x )r 1
(b a) q + r 1
a xb
in which q and rare the two parameters which determine the shape of the distribution and B(q,r)
is the beta function
27
0.2
fx(x)
q = 2.0 : r = 6.0
0.1
0
0
10
12
B(q, r ) = x q 1 (1 x) r 1 dx
0
The standard beta distribution may be defined as one which has a=0 and b=1. The
standard PDF is symmetrical for q=r=3 and it is uniform with a density of unity for q=r=1.0
(Figure 18)
q = 1.0
3
r = 4.0
q = r = 3.0
q = 4.0
fx(x)
r = 2.0
q = r = 1.0
0.5
1.0
28
Probability calculation are facilitated by the incomplete beta function which is defined as
x
B x (q, r ) = y q 1 (1 y ) r 1 dy
0 < x < 1 .0
1
B( q , r )
Where
u=
d a
ba
and
v=
ca
ba
Assume that repeated trials of an event are made, the probability p of occurrence in each
trial is constant and the trials are statistically independent. Then considering that each trial has
only two outcomes either occurrence or non-occurrence, the problem may be modeled as a
Bernoulli sequence. One may apply this to rainfall, flooding, earthquakes etc. Which affect the
performance
of
geotechnical
structures?
The probability of exactly a occurrences among n trials in a Bernoulli sequence is given by the
binomial
distribution:
n
P ( X = x) = p x (1 p ) n x ,
x
the
equation
for
the
PMF
being:
x=0,1,2,..,n
Where,
n!
n
=
x x!(n x)!
29
The number of trials until an event occurs for the first time is governed by the geometric
distribution, Let T be the random variable concerning the number of trials for first occurrence
and Let this occur at T=t. Then we have
P(T = t ) = pq t 1 ,
t=1,2,
Where (q=p1)
This is known as the geometric distribution obtained by substituting x=1 and n= t in Equation.
The first occurrence time is considered equal to the recurrence time and the latter also has a
geometric distribution; the mean recurrence time also known as the return period is:
T = E (T ) = t. pq t 1 = p (1 + 2q + 3q 2 + ...)
t 1
Since q< 1.0 series summation gives T = 1 / p or average return period is the reciprocal of the
probability of the event within one time unit.
So far we have considered either the number of trials or time units until the first
occurrence. The time until the next occurrence is governed by the negative binomial distribution.
The probability of (Tk = t) where Tk is the number of trials until the kth occurrence is
t 1 k t k
p q
P(Tk = t ) =
k 1
A basic situation at certain times is that if outcomes of experiments can be separated into two
exclusive categories good or bad, failure or success etc. We are interested in the simplest kind of
experiments, the out comes of which can be either failure or success. The two events are
mutually exclusive, collectively exhaustive possible outcomes. This is called Bernoulli trial
The binomial distribution is given by
()
b(x, N, R) = N R x p N x
x
N!
R x P (N x)
(N x )!
30
The binomial distribution models the outcomes of experiments for which the following
properties hold are
1. An experiment is repeated N tims with the outcome of each trial being
independent of others
2. Only two mutually exclusive outcomes are possible called success ot failure , x is
the no of successes
3. The probability of success in each trial denoted by R remains the same for all
trials , p is the probability of non-occurrence
R + p =1
4. The experiment is performed under the same conditions for all N trials
5. The interest is the number of success x in the N trials and not in the order they
occur.
Geometric distribution
Assuming independence of trials and a constants values of P, the distribution of N , the number
of trials for the first success can be found. The first success will occur only on the nth trial and if
only (n-1) are failures
P[N = n] = (1 P )
n 1
FIGURE
31
E(N)=1 / P
[ ]
Var [N ] = E N 2 E 2 [N ] =
1 P
P2
Design values and return periods , civil engineering systems must withstand the effect of rare
events such as large floods or high winds . it is necessary to consider the risks involved in the
choice of design capacity
In a design, one can estimate the maximum magnitude of rare events which the structure can
withstand (maximum wind velocity)
Average return periods
The expected value o geometric distribution is
E [N ] = nP(1 P )
n 1
n =1
= P * 2 + 2 P(1 P ) + 3P(1 P )
= P + 2 P 2 P 2 + 3 P 3P 2
= 6 P 5P 2
= P(6 5P )
The probability that there will be no events greater than 50years m-flood in 0years is B[m,1 / m]
1
= 1
m
1 m(m 1) 1
= 1 m +
+ ..........
2
m
m
2
3
4 4
4
= 1 +
+
+ .................
11 21 31
= 0.368
r
if n = m *
1
m
The probability that one or more events will occur in m years is (1-e-1)=0.632 thus a system can
set affected by a rare event within its return period is 0.632
32
Many physical process occurrences of fatigue crakes, earthquakes occurring at any tie in an
earthquake prone region, occurrence of accidents on a highway may be modeled using
Bernaoulli sequence, dividing the time interval r space into smaller intervals and considering
whether the event will occur or not occur. If the vent can occur at any instance and again can
occur the event may be modeled as a poissons process.
Assumptions:
1. An event can occur at random at any time or any point in space.
2. The occurrence of an event in a given time (space) is independent of any other in other
interval.
3. The probability o occurrence of an event in a small interval t is proportional to t and
can be given by r t .where r is the mean rate of occurrence.
The number of occurrence of an event in t is given by Poisson distribution i.e. if Xt is the
number of occurrences in time (or space) then,
P( X t = x ) =
(t )x e vt
where x = 0,1,2.........
x!
is the mean occurence rate
E ( X t ) = t
var inace of X t = t
Poissions distribution (derivation from binomial distribution)
()
PX (x ) = n * P x (1 P )
(n x )
when the time distribution are reduced smaller and smaller the number of trials (n) increases and
the probability Pof success decreases. But the expected number of events is np.
Say np = as n , P 0, np
33
Px (x ) =
n!
(n x )! x! n
1
x! n
1
n
(n x )
n!
(n x )!n x 1 x
n
1.2...............n
1
1
x! n 1....(n x + 1)(n x ) x
n 1
n
n
x (n x + 1)....(n 3)(n 2 )(n 1)n
=
1
x
x! n
n 1
For large values of n this term is nearly 1 and 1 = e
n
n
Hence
PX ( X ) =
x e
x!
x = 0,1,2,...................
34
[X ] =
x e
x!
x =0
=
x =1
( x 1)e
(x 1)!
put
y = ( x 1) [X ] =
x e
y =0
y!
V [x ] =
we can also put V = t and say
Px ( x ) =
(t )e t
x!
v[xi ] =
(1) =
(2) =
1 + 3
The Poisson process is useful where an event may occur anywhere in a space and
time framework and is based on the following assumptions
(1) The occurrence of an event in a given interval is independent of its occurrence in other
intervals.
(2) The probability of occurrence of an event in a small interval is proportional to that interval
and
(3)
the
The
proportionality
mean
constant
rate
of
v,
describes
the
mean
occurrence
is
rate
of
occurrence.
assumed
constant,
35
(4) The probability of two or more occurrences in the chosen small time interval t is negligible.
If X, is the number of occurrences in time (or space) interval t.
P ( X t = x) =
(vt ) x vt
e ,
x!
x=0,1,2..
It is obvious that
E ( X t ) = X t = vt
It can be shown that the variance is the same as exception, i.e.
S x2 = vt
that
no
event
occurs
in
time
t:
P (T1 > t ) = P ( X t = 0) = e vt
The PDF is
f T1 (t ) = ve vt
t0
The CDF is
FT1 (t ) = P(T1 t ) = 1 e vt
If
is
E (T1 ) = T1 =
independent
of
and
hence
constant
the
mean
of
T1
is
1
v
This is the mean recurrence time or return period and may be compared to 1/p for the Bernoulli
sequence. For small time interval the two are nearly equal.
If it is desired that the PDF for an exponential distribution should start at a value of greater than
36
zero (i.e. not pass through the origin), we may use the shifted exponential distribution.
The time until the kth occurrence is described by the gamma distribution. If Tk denotes the time
until the kth event
f Tk (t ) =
v(vt ) k 1 vt
e
(k 1)!
The exponential and gamma distributions are related to the Poisson process in the same way
that the geometric and negative binomial distributions are related to Bernoulli sequence.
Exponential distribution
The exponential distribution is related to the poisson distribution as follows. If the events occur
as per poisson process, then the time T1 till the first occurrence of the event is an exponential
distribution. This means that in the interval (T1>t) no evens occurs
P(T1 > t ) = P( X t = 0) = e t
this is the first occurrence time in a poisson process
Distribution function of T1
fT 1 (t ) = P (T 1 t ) = 1 e t
fT 1 (t ) =
df
= e t t 0
dt
37
Gamma Distribution
If the occurrence of an event constitutes a poisson process, then the time until th kth occurrence
of the event is described by gamma distribution. Let Tk denote the time till the kth event , then
(Tk
(t )k 1 t
f TK (t ) =
e
(k 1)!
mean time till the occurrence of kth event
= E (Tk ) =
k
v
k
v2
distribution.
m N m
x n x
P ( X = x) =
N
n
This
is
written
as
follows:
x=1,2,m
N
The number of samples of size n in the finite population N is n
m N m
x
n
38
Assuming that all samples of size n are equally to be chosen the above equation is obtained.
The hyper geometric distribution arises when samples from a finite population (for example,
consisting two types of element like good or bad.) are examined.
Consider a lot of N items, m of which are defective and the (N-m) are good. If a sample of n
items is taken (at random) from this lot, the probability of x defective items in the sample is
given by the hyper geometric distribution.
(m) Nn mx
P ( X = x) =
(N )
x
x = 1,2,.............m
()
()
N m
If X and Y are two random variables then probabilities associated with any pair of
values
and
may
be
described
by
joint
distribution
function.
e.g.
FX ,Y ( x, y ) = P ( X x, Y y )
For discrete random variables the joint PMF (Figure above) may be used
p X ,Y ( x, y ) = P ( X = x, Y = y )
y
f x , y ( x, y )
p x , y ( x, y )
x
(a)
x
(b)
39
For a continuous random variable the joint PDF ( Figure 19) may be defined by:
f X ,Y ( x, y ) dxdy = P ( x < X x + dx. y < Y y + dy )
f x ( x) =
X ,Y
( x, y )dy
X ,Y
( x, y )dx
f Y ( y) =
The CDF is given by the volume under the surface f(x,y) and is given by
x
FX ,Y ( x, y ) =
X ,Y
(u, v)dudv
X ,Y =
by
cov(x,y)
and
the
product
of
the
standard
deviations
cov( x, y )
SxSy
The covariance itself is defined as the joint central second moment, that is, the expectation of the
product ( X x )(Y y ) and hence
and cov(X,Y)=0
If two variables are statistically independent then the variables are uncorrelated, however, the
reverse is not true, if the variables are uncorrelated, they may not be statistically independent.
The correlation coefficient p may vary from -1 to +1 and may be regarded as a normalized
covariance. It is a measure of the linear relationship between the two random variables (Figure
20).
40
If X and Y (cohesion and friction) are two random variables, then the probability associated with
any pair of values x and y may be described by a joint distribution function.
Eg : F( X ,Y ) ( x, y ) = P( X x, Y y )
For discrete random variables the joint PDF is defined as
PX ,Y (x, y ) = P( X = x, Y = y )
f x ( x ) = f x , y ( x, y )dy
f y ( y ) = f x , y ( x, y )dx
at any give x
at any give y
CDF of x and y is the total volume and are dependent on each other, another variable linking the
two variables comes into picture. This is given by
x, y =
CoV (x, y )
xy
CoV = E [( X x )(Y y )]
= E ( XY ) E ( X )E (Y )
f (x )dx = 1
Uniform
Exponential
Normal
Beta
max
Poissons
41
Pearsons system
In binomial model, it is assumed that all the outcomes are equally likely which means that
sampling is done with replacement from a finite population of N
N
b(x, N , R ) = R x P ( N x )
x
N!
b ( x, N , R ) =
R x P (N x)
(N x )! x!
Suppose sampling is done without replacement from a collection N and the lot N contains K
number of samples which have a particular characteristic and (N-k) which do not have it. If a
sample in selected from this collection, either it is from K or (N-k). Suppose r random samples
are drawn without replacement from N items.
The probability that x of the r samples are of the type k is given by
k N k
x r x
h ( x, N , r , k ) =
N
r
This is called hyper geometric distribution
Binomial distribution yields Normal distribution. The limit of the hyper geometric distribution
must produce a more flexible continuous probability distribution capable of better representing
skewed variation.
For symmetrical distributions all moments of odd order about the mean are zero, which means
that any odd-ordered moment may be used as a measure of the degree of skew ness.
Moments
N
M = fi
i =1
Consider a system of discrete parallel forces f1, f2, ..fN actually on a rigid beam at the
respective distances x1, x2,xN
From statistics
42
M = fi
i =1
x=
fi
i =1
we can consider that f1, f2, ..fN represent the probability of all possible occurrences of the
N outcomes x1, x2,xN
Since the distribution is exhaustive M=1
N
x = E [x ] = xi f i
i =1
where E[x] is the expected value of x. In general, it denote the central tendency of the
distribution
The expected value of the distribution can be considered as first moment of the distribution and
the concept can be generalized to Kth moment as follows
N
E [xi ] = xik fi
k
i =1
I y = ( xi x ) fi
2
V [xi ] = ( xi x ) fi
2
Ex
V [xi ] = E ( xi x ) = E xi2 + x 2 xi x
[ ]
= E [x ] 2 x + x
= E [x ] (E [x ])
= E xi2 + E [x ] 2 x E [xi ]
2
i
2
i
43
[xi ] = V [xi ]
5. If xi is a random variable, a and b are constants
V [axi + b] = a V1 [xi ] = a 2 ( [xi ])
For symmetrical distribution all moments of odd order about the mean must be zero.
Consequently an odd ordered moment may be used as a measure of degree of skew ness . The
third moment E ( x1 x 2 )
ness. Since the units of that central moment are cube of the units of the variable. To provide an
absolute measure of skew ness , Pearson designed an absolute term called coefficient of skew
ness.
(1) =
E ( xi x )
( [xi ])
44
(2) =
E ( xi x )
( [xi ])
2
(S P ) = P
* e (1 2 )s
< s <
S P = 1 (P )
45
( S ) = 1 (S )
f s (s )
N (0,1)
1 x 2
exp 2
2 a
P(a x b ) =
S=
dx = ds
P(a x b )
exp[ 1 2 s ]ds
r
1
2
e (1 2 )S
< S <
Because of its wide usage a special notation (s) is commonly used to designate the distribution
function of the standard normal variate
46
(S ) = P
where
x
S =
The tables give the probability of only positive values of the variate .However the probability of
negative values of the variate can be obtained as
( S ) = 1 (S )
S = 1 (P ) = 1 (1 P )
Consider a function Y which is dependent on two random variables X1 and X2. thus (a1
and a2 are constants)
Y = a1 X 1 + a 2 X 2
Then it can be shown that
y = E (Y ) = a1 x1 + a 2 x 2
and
V ( y ) = a12V ( x1 ) + a 22V ( x 2 ) + 2a1 a 2 cov( x1 x 2 )
Y = a1 X 1 + a 2 X 2
Now if y = a1 x1 a 2 x 2
and
V ( y ) = a12V ( x1 ) + a 22V ( x 2 ) 2a1 a 2 cov( x1 x 2 )
47
In general, if
n
Y = a1 X i ,
y = ai xi
i =1
i =1
i =1
V ( y ) = ai2V ( xi ) ai a j cov( xi x j )
Suppose Z is another function of random variable X, i.e.
n
Z = bi X i
i =1
i =1
i j
cov(Y , Z ) = ai biV ( xi ) +
a b
i
cov( xi x j )
48
Then by expanding g(X) in a Taylor series about the mean value x the following first-order
approximation can be made
1
d 2g
y = E (Y ) g ( x) + V ( x) 2
2
dx
dg
V ( y ) V ( x)
dx
Good approximation of exact moments is obtained if g(X) is approximately linear for the entire
range of values of X (even if the second term in the expression for the mean is neglected; this is
generally done)
Now if Y is a function of several variables X1, X2, X3 etc.
Y = g ( X 1 , X 2 , X 3 ,...... X n )
the corresponding first-order approximations are
y = E (Y ) g ( x1 , x 2 , x3 ........x n )
1 n
+
2 i =1
d 2g
j =1 dx i dy j
n
cov( xi x j )
i =1
i j
V ( y ) ci2V ( xi ) +
where
ci,
evaluated at
and
c c
i
cj
cov( xi x j )
are
the
partial
derivatives
(dg/dxi)
and
(dg/dxj)
x1 , x 2 , x3 ,.......x n .
The second term of the first equation is generally omitted. The second term of the second
equation is not omitted but will vanish if x1,x2..xn are uncorrelated or statistically independent.
49
fx(x)
h = 1/200
1.00
200
CDF
Fx (x)
Fx ( x) =
0
x
200
200
1
25
p1 = P(cu 40)
1
This
probability
is
the
area
of
the
rectangle
between
the
ordinates
(b)
1
= 0.6
25
1
= 0 .2
25
p3 = P(cu 10) = 0
(d) 55 kN/m2 is outside the range 25 50. Accordingly, p 4 = P(cu > 55) = 0
a.
1 x 2
P(a X b ) =
exp
2 a
2
x
S=
dx = ds
1
b a
P(a X b ) =
70 60 40 60
P(40 X 70) =
15 15
= (0.67 ) ( 1.33)
= (0.67 ) [1 (1.33)]
= 0.748571 (1 0.908241)
0.6568
from table
b.
30 60
P( X 30) = ( )
15
= 1 ( 2.0)
= 1 [1 (2.0)]
= 0.9772
P( X 60) = ( ) (0) = 1.0
c.
P[ X x0.1 ] = 0.10
x 0.1 60
= 0.10
15
x 0.1 60
= 1 (0.10 )
15
= 1 (0.90 )
= 1.28
x0.10 = 60 19.2
= 40.8 inches
Example 2.
A structure is resting on three supports A,B and C. Even though the loads from the roof
supports can be estimated accurately , this soil conditions under A, B and C are not
completely predictable. Assume that the settlement A , B and C are independent ,
normal variates with mean of 2,2.5 and 3 cm and CoV of 20%, 20% and 25%
respectively.
a. What is the probability that the maximum settlement will exceed 4cm
b. If I is known that A and B have settled 2.5cm and 3.5cm respectively . What is the
probability that the maximum differential settlement will not exceed .8m and also
what if it does not exceed 1.5 cm
Answer
a.
P(max > 4 cm ) = 1 (max 4 cm )
SD = Mean * CoV
A = 2 * 0.2 = 0.4
= 1 P( A 4)P( B 4)P( C 4)
4 2 4 2.5 4 3
= 1
b.
The differential settlement between A and B is AB =3.5cm 2.5cm = 1 cm has already
(
)
occurred. Hence, P max 0.8 cm = 0 irrespective of C , however C matters if
P( max 1.5 cm ) , it is necessary if we need the data with 95% or 99% or 99.9% of
occurrence , we need to determine the following
P( 1.960 < X + 1.96 ) = 95%
P( 2.58 < X + 2.58 ) = 99%
For the max to be more than 1.5cm, A = 2.5 cm , either C should be less than 1 or
more than 4. Since B = 3.5 cm , C should be less than 2cm or more than 5 cm .
Acceptable region for safety is C should be between 2 to 4.
0.75 0.75
= ( 1.333) ( 1.333)
= 0.9088 0.0912
= 0.8176
Example 3
The total load on the footing is the sum of dead load of the structure and the live load.
Since each load is sum of various components, total dead load (X) and total live load
(Y)can be considered as normally distributed. The data from building suggest that
and
y = 10 ton
both x and y
are not correlated. What is the total design load that has 5 % probability?
The total load Z = x + y = 100 + 40 = 140 ton and
z = x2 + y2 = 10 2 + 10 2 = 10 2 = 14.1ton
We need to determine z such that it has only 5% probability of occurrence
1 ( z ) = 0.05
( z ) = 0.95
z
= 0.95
z 140
= 0.95
14.1
z 140
= 1.65
14.1
z = 163.3 ton
Example 4
The mean and coefficient of variation of the angle of internal friction of a soil
supporting a multi-storeyed structure are =20 and V =3O%. What is the probability
that will be less than (a) 16 , (b) 10 , (c) 5 , Assume that has a normal distribution
Solution
(a)
V =
= 0.3
S = 0.3 20 = 6
s=
16 20 4
=
= 0.666
6
6
( s ) = 1 ( s ) = 1 (0.666)
where () is obtained from tabulated values
P( s 16) = 0.253
(b)
10 20
5
= = 1.666
6
3
(1.666) = 1 (1.666) = 1 0.952 = 0.048
s=
5 20
15
5
= = = 2 .5
6
6
2
( 2.5) = 1 (2.5) = 1 0.994 = 0.006 = 6 10 3
s=
It should be noted that denotes the friction angle and (s) is cumulative distribution of
standard normal variate.
Example on cumulative distribution
Example 1
0.00 6 + 0.04 + 0.12 + 0.21 + 0.25 = 0.620 becomes the probability that 5 or less cars take
a left turn.
If x is a random variable and r is a real number then the CDF designated as F(r) is the
probability that x will take an value equal to or less than r or
F (r ) = P[x r ]
The rainfall has lognormal distribution with mean and SD of 60 inches and 15 inches
a. Calculate the probability that in future the annual rainfall in between 40 and 70
b. The probability that the annual rainfall is at least 30
c. What is the 10 percentile annual rainfall
15
= 0.25
60
1
= ln 60 * 0.25 2 = 4.09 0.03 = 4.06
2
P(40 < x 70 ) =
0.25
0.25
= (0.75) ( 1.48)
= 0.773 0.069
= 0.7039
ln 30 4.06
P( x 30) = 1
0.25
=1 (2.64)
=1 0.9959
= 0.041
c. 10 percentile rainfall
ln x10 4.06
= 0.10
0.25
ln x10 4.06
= 1.28
0.25
ln x0.10 = 4.06 1.28 0.25 = 3.74
Example 2
[x] = ln (CoV )2
[x] = ln E ( y ) (CoV )2
[x] = ln (1 + 0.25)2 = 0.25
E [x ] = ln 20 (0.25) / 2
2
h=
3.69 2.96
= 2.92
0.25
The of the soil samples in a locality varies between 20 to 40 with the coefficient of
variation of 20 with mean value 30. The design value is 35. What is the probability
that 35 .
q
(b a )
(q + r )
q
30 = 20 +
* 20
(q + r )
q
* 20 = 10
(q + r )
10q + 10r 20q = 0
x = a +
10r 10q = 0
r=q
10
qr
(b a )2
(q + r ) (q + r + 1)
qr
36 =
20 2
2
(q + r ) (q + r + 1)
x2 =
q2
q2
0.09 =
=
=
(q + r )2 (q + r + 1) (q + q )2 (q + q + 1) 4q 2 * (2q + 1)
1 4 = (2q + 1) * 0.09
qr
q = 5.05
r = 5.05
1.0
0.8
0.6
0.4
0.2
20
25
30
35
40
1 q
(b a )
2qr
1 5.05
= 20 +
* 20
2 5.05 * 2
4.05
= 20 +
* 20
8.1
mode x = = 30
a+
2(q r )
Coefficient of skew ness = (q + r )(q + r + 2 ) x
q < r , the distribution is positive and skewed to the left
11
q =1.0 ; r = 4.0
q = r = 3.0
q=4;r=2
q = r = 1.0
a=20 b=40
x = a +
0.5
q
+ 20
(q + r )
q
6
=
6q + 6r 20q = 0
20 q + r
6r 14q = 0
x2 =
qr
(b a )2
(q + r )(q + r + 1)
SD = CoV * 26 (5.2 )
qr
20 2
(q + r )(q + r + 1)
qr
0.0676 =
(q + r )(q + r + 1)
6
r *r
14 * 6r 2
14
=
=
20r 20r + 14 20r (20r + 14 )
*
14 14
5.2 2 =
12
Over a period of time it is observed that 40% of the automobiles traveling along a road
will take a left turn at a given intersection. What is the probability that given a traffic
stream of 10 automobiles 2 will take a left turn.
N= 10 ; x=2 ; R= 0.40
Possibility
Probability
0.006
0.04
0.12
0.21
0.25
0.20
0.11
0.04
0.01
0.002
10
0.0001
13
Possibility Vs Probability
0.3
Probability
0.25
0.2
0.15
0.1
0.05
0
0
9 10
Possibility
Example 2
A dam has a projected life of 50years . What is the probability that 100years flood will
occur during the life time of the dam?
R = 1 / 100 = 0.01 , N = 50 years
b(1.50,0.01) = 50 (0.01)1(0.99)49= 0.31
N
10
10*(0.01)1*(0.99)9
0.09
20
20*(0.01)1*(0.99)19
0.165
30
30*(0.01)1*(0.99)29
0.224
40
40*(0.01)1*(0.99)39
0.270
50
50*(0.01)1*(0.99)49
0.305
60
60*(0.01)1*(0.99)59
0.387
70
70*(0.01)1*(0.99)69
0.350
80
80*(0.01)1*(0.99)79
0.362
90
90*(0.01) *(0.99)
0.368
100
100*(0.01)1*(0.99)99
89
0.370
14
Example 3
A flood control system for a river , the yearly maximum flood of river is concern. The
probability of the annual maximum flood exceeding some specified design level no. is
0.1 in any year. What is the probability that the level no. will be exceeded once in five
years.
Assuming binomial distribution means that there is only one occurrence or not in the
year. Each occurrence or not occurrence is independent of the other events. The
probability of occurrence in each trial is also considered.
()
b ( x, N , R ) = n p x q ( n x )
x
n!
1!
p x q (n x )
=
0! 1! (n x )! x!
=
5!
(0.1)1 (0.9)4
4! 1!
= 5 * (0.1) (0.9 )
1
= 0.328
Hence
15
V [ x i ] = E[ x i 2 x i x + x ]
2
V [ xi ] = E[ xi ] 2 xE[ xi ] + E[ x ]
2
V [ x i ] = E[ x i ] 2 x + x
2
V [ x i ] = E[ x i ] x
Variance has the dimension of a square of the random variable ,more meaningful measure
of dispersion is the positive square root of variance called standard deviation
[ xi ] =
[ xi ]
V ( x) =
[ x]
E[ x]
* 100%
E[( xi x) 3 ]
( [ xi ]) 3
(1) is +ve ,the long tail is on the right side of the mean , (2) is negative ,long tail is on
the left side
16
E[( xi x )]4
[ x]4
A structure is designed for a height 8m above the mean sea level. This height corresponds
to 10 % probability of being exceeded by sea waves in a year. What is the probability that
the structure will be subjected to waves exceeding 8m within return period of design
wave.
8m
1
1
=
= 10 years
P 0.1
10
P[H > 8M in 10 years ] = 1 (0.9) = 1 0.3487 = 0.6513
T=
The number of trials (t) until specified event occurs for the first time is given by
geometric distribution
17
b(1,1, p ) =
1! 1 (t 1)
pq
= p * q (t 1)
0! 1!
Example 2
A transmission tower is designed for 50years period i.e. a wind velocity having a return
period of 50yrs
What is the probability that the design wind velocity will be exceeded for the first time in
5th year, after the completion of the structure?
Every year is considered as scale and the probability of 50 years wind occurrence in any
year is p=1 / 50 = 0.02
b ( x, N , p , q ) = p x q ( N x )
what is the probability that first such wind velocity will occur with 5 years after the
completion of the structure.
5
t 1
i =1
Record of rain storm in a town indicates that on the average there have been four
rainstorms per year over the last years. If the occurrence is assumed to follow a poisons
process what is the probability that there is no rainstorm next year?
18
4 0 4
P( X t = 0) =
e = 0.0018
0!
4 4 4
=
e = 0.195
4!
The above result indicate that the average yearly occurrence of rainstorm is 4, the
probability of having exactly 4 storm in a year is also 4
The probability of 2 or more rainstorms in a year is
4 x 4
e
x = 2 x!
P( X t 2) =
4 x 4
e = 1 0.0018 0.024 = 0.908
x = 0 x!
= 1
Example 2
The probability that a structure fails is P(f) = 0.001 of 1000 such structures built what is
the probability that two will fail
b(x, N , P( f )) =
N!
x
P( f ) R ( N x )
x! ( N x )!
Stirlings Formula
N ! 2N N N e N
ln (N !) =
1
ln(2N ) + N ln N N
2
1000!
(0.001)2 (0.999)998
2! 998!
998 * 1000
=
= 0.184
2
b(x, N , P( f )) =
19
12 e 1
= 0.184
2!
f (0) = 0.37
f (1) = 0.37
f (3) = 0.06
f (4 ) = 0.02
f (2) =
f (2 ) = 0.18
Example 3
During world war II German dropped 54% bombs on London , an analysis was
conducted to determine if the bombs were guided or not. It was reasoned that if bombs
lacked guidance they should follow Poissons distribution. To check that, London was
divided into 180 regions of approximately equal area and the number of hits in each area
were recorded.
No of hits
Observed No. of
Poissons
areas within xi
distribution with
Theoretical No.
229
0.389
226
211
0.367
213
93
0.173
100
39
0.054
32
0.013
5 or more
0.004
580
1.00
580
Historical records of earthquake in San Francisco, California show that during the period
1836-1961 there were 16 earthquakes of intensity VI or more. If the occurrence of suc
20
high intensity earthquakes in the region is assumed to follow a poisson process then what
is the probability that such earthquakes will occur within the next years.
16
= 0.128 quakes per year
125
P(T1 2) = 1 e 0.128(2 ) = 0.226
The probability that no earthquake of this high intensity will occur in the next 10 years is
P(T1 0) = e 10(0.128 ) = 0.278
E (T1 ) =
1
= 7.8 years
0.128
A box contains 25 strain gauges and out of them 4 is known to be defective. If 6 gauges
were used in the experiment, what is the probability that there is one defective gauge in
the container.
N=25 , m=4 , n=6
The probability that one gauge is defective
21
4 21
1 5
P( X = 1) = = 0.46
25
6
The probability that no gauge is defective
21
6
P( X = 0) =
= 0.31
25
6
Example 2
An inspector on an highway project finds that two substandard test per 10 samples are a
good measure of contractors ability to perform. Find the probability that I among the five
samples selected at random
a. There is one substandard test
b. There are two such results
m N m
x n x
P( X = x ) =
N
n
We have
a. We have N = 10 , m = 2, x = 1, n = 5
2 8 21 81
*
1 4 111 4141 2 * 70
=
=
=
= 0.55
101
252
10
5151
5
b. We have N = 50, m = 5, n = 10, x = 1
22
5 45
5!
45!
*
1 9
411! 3619!
= =
= 0.43
50!
50
40! 10!
10
Example 3
In an area chosen for foundation structure at 50 locations were collected and shear
strength determined, If the 50,10 are unsuitable from shear strength considerations. In
order to improve shear strength insitu grouting is one of the methods proposed for 10
locations. What is the probability that
a. the present location
b. Two locations were initially unsuitable
Answer
N = 50, m = 10, n = 10, x = 1 and 2
m N m
x n x
f (x ) =
N
n
a. The present location is unsuitable
23
10 50 10
1 10 1
f (1) =
=
50
10
10 40
1 9 = 0.267
50
10
10 50 10
2 10 2
f (2 ) =
=
50
10
10 40
2 8 = 0.34
50
10
if the location is such that, the one or two locations can be discarded leaving the ones in
unsuitable area, the it is considered as a distribution with replacement. Binomial
distribution can be chosen for the purpose
x
n M M
f ( x ) = 1
N
x N
10 10
f (1) =
1 50
10 10
f (2) =
2 50
n x
10
1 = 0.268
50
10
1 = 0.30
50
Example 4
Records collected by a contractor over 40 years period indicate that there has been
240days of indigent weather, because of which the operations were closed down. On the
basis of past record , what is the probability that no days were lost next year
Answer
Mean value of occurrence = 240 / 40 = 6 per year
24
f (x ) =
x e
x!
6 e 6
f (0 ) =
= 2.48 *10 3
0!
0
No of days
0.0025
0.0149
0.0446
0.0892
0.1339
0.1606
0.1606
0.1377
0.1033
0.0688
10
0.0413
11
0.0225
12
0.0113
f T (t ) = e t
f T (0) =
f T ( )
25
= e t
0
= e t 1 = 1 e t
Mean or expected value
E (k ) = xf x dx
Var ( x ) = xf ( x )dx
hence
= E (T ) = te t dt
Example 5
Fx ( x ) = 1 e ax
x0 a>0
Fy ( y ) = 1 e by
y0 b>0
Assuming that the joint distributions of both X and Y is also exponential then,
)(
FX ,Y = 1 e ax 1 e by
= 1 e ax e by + e (ax + by )
FX ,Y
= + ae ax ae (+ ax + ay )
x
2 FX ,Y
= abe (ax + by )
x y
f X (x ) =
abe (ax + by ) dy
ab (ax + by )
=
e
b
0
= ae (ax + by )
= + ae ax
as y
f y ( y ) = be by
e (ax + by ) 0
26
( ax + by )
abe
0
a
=
=
[0 e ]dy
by
e by dy
[ ]
1 by
e 0
b
= [0. 1] = 1
=
The joint PDF for the concentration much of two pollutants (x,y) in ppms
If (x,y) = 2-x-y 0 x, y 1
Show that
a. f(x,y) is a probability distributions
b. Determine CDF
c. The joint probability that x 1 2
, y3 4
27
y
0 0 (2 x y )dxdy = 0 2 y xy
1 1
= (2 x 1 2)dx = 2 x x 2 2 x 2 0
0
= 2 *1 1 2 1 2
=1
b.
F( x , y ) =
(2 4 0)
y
y2
= 2 y
4 y dy
0
2
xy
= 2 xy
xy
2
c.
1 2 * 32 4 2
F (1 2 , 3 4) = 2 * 1 2 * 3 4
1 2 * 3 4
2
= 0.52
x2
1
xy = 2 y = 1.5 y
f y ( y ) = (2 x y )dx = 2 x
0
2
2
0
1
y2
1
xy = 2 x = 1.5 x
f x ( x ) = (2 x y )dx = 2 y
0
2
2
0
1
d.
28
P ( A B ) = P ( A) * P (B )
f x ( x ) * f y ( y ) = (1.5 x )(1.5 y ) should be equal to f ( xy )
= 2.25 1.5 y 1.5 y
=2 x y
similarly
f x ( x ) = ae ax
f y ( y ) = be by
f x f y = ab e (ax +by )
29
Learning objectives
1. Understanding the concept of random field theory.
2. Understanding the significance of Auto covariance function.
3. Understanding the various functions of random fields.
MODULE - 3
Motivation:
Importance of understanding the random field theory
Highlights:
Knowledge of Stationary process, Auto covariance function
The inherent variability of a soil parameter is attributed to the natural geological processes,
which are responsible for depositional behaviour and stress history of soil under
consideration. The fluctuations of soil property about the mean can be modelled using a zeromean stationary random field (Vanmarcke 1977). A detailed list of the fluctuations in terms
of coefficients of variation for some of the laboratory and in-situ soil parameters, along with
the respective scales of fluctuation in horizontal and vertical directions are presented in
Baecher and Christian (2003).
d = T ( m , )
(1)
where m is the measured property of soil parameter obtained from either a laboratory or insitu test. The measured property can be represented in terms of algebraic sum of nonstationary trend, t, stationary fluctuating component, w, and measurement uncertainty, e. is
the transformation uncertainty, which arises due to the uncertainty in transforming the in-situ
or laboratory measured soil property to the design parameter using a transformation equation
of the form shown in Equation 1. Hence, the design property can be represented by Equation
2.
d = T (t + w + e, )
(2)
Phoon and Kulhawy (1999b) expressed the above equation in terms of Taylor series.
Linearizing the Taylor series after terminating the higher order terms at mean values of soil
10
parameters leads to the Equation 3 for soil design property, subsequently the mean and
variance of design property are expressed as given in Equations 4 and 5.
d T (t ,0) + w
T
T
+e
w (t ,0 )
e
+
(t , 0 )
(3)
(t , 0 )
md T (t ,0 )
(4)
2
2
d
SD
T
T
T
2
2
2
=
SDw +
SDe +
SD
w
e
(5)
The resulting variance of design parameter after incorporating the spatial average is given by
2
T 2
T
T
2
2
2
SD =
(L ) SDw +
SDe +
SD
w
e
2
a
(6)
Of the above, the treatment and evaluation of inherent soil variability assumes considerable
importance as the uncertainties from measurements and transformation process can be
handled if proper testing methods are adopted and transformation errors are quantified.
Approaches for evaluation of inherent soil variability are developed based on random fields
and a brief description of the theory and its relevance to characterisation of soil spatial
variability is described in the following sections.
11
accounts for the distance within which the soil property shows relatively strong correlation
from point-to-point.
Figure 3.1(a) shows a typical spatially variable soil profile showing the trend, fluctuating
component, and vertical scale of fluctuation. Small values of scale of fluctuation imply rapid
fluctuations about the mean, whereas large values suggest a slowly varying property, with
respect to the average.
(b)
(a)
Figure 3.1(a). Definition of various statistical parameters of a soil property (Phoon and
Kulhawy 1999a); (b) approximate definition of the scale of fluctuation (Vanmarcke
1977a)
Vanmarcke (1977a) demonstrated a simple procedure to evaluate an approximate value of the
vertical scale of fluctuation, as shown in Figure 3.1(b), which shows that the scale of
fluctuation is related to the average distance between intersections, or crossings, of the soil
property and the mean.
A random field is a conceivable model to characterize continuous spatial fluctuations of a soil
property within a soil unit. In this concept, the actual value of a soil property at each location
within the unit is assumed to be a realization of a random variable. Usually, parameters of the
random field model have to be determined from only one realization. Therefore the random
12
field model should satisfy certain ergodicity conditions at least locally. If a time average does
not give complete representation of full ensemble, system is non-ergodic. The random field is
fully described by the autocovariance function, which can be estimated by fitting empirical
autocovariance data using a simple one-parameter theoretical model. This function is
commonly normalized by the variance to form the autocorrelation function. Conventionally,
the trend function is approximately removed by least square regression analysis. The
remaining fluctuating component, x(z), is then assumed to be a zero-mean stationary random
field. When the spacing between two sample points exceeds the scale of fluctuation, it can be
assumed that little correlation exists between the fluctuations in the measurements. Fenton
(1999a & b) observed that the scale of fluctuation often appears to increase with sampling
domain.
13
soil data. The entire soil profile within the zone of influence is divided into number of
statistically homogeneous or stationary sections, and the data within each layer has to be
analysed separately for further statistical analysis. Hence, the partition of the soil profile into
stationary sections plays a crucial role in the evaluation of soil statistical parameters such as
variance.
14
S
n ( n 1) / 2
(7)
where n is the number of (X,Y) observations. To obtain S, and consequently , the following
procedure is followed.
1. Arrange the observations (Xi, Yi) in a column according to the magnitude of the Xs,
with the smallest X first, the second smallest second, and so on. Then the Xs are said
to be in natural order.
2. Compare each Y value, one at a time, with each Y value appearing below it. In
making these comparisons, it is said that a pair of Y values (a Y being compared and
the Y below it) is in natural order if the Y below is larger than the Y above.
Conversely, a pair or Y values is in reverse natural order if the Y below is smaller
than the Y above.
3. Let P be the number of pairs in natural order and Q the number of pairs in reverse
natural order.
4. S is equal to the difference between P and Q;
n n ( n 1)
possible comparisons of Y values can be made in this manner. If all
A total of =
2
2
the Y pairs are in natural order, then P =
n (n 1)
n (n 1)
n (n 1)
, Q=0, S =
, and hence
0 =
2
2
2
n (n 1) / 2
= 1 , indicating perfect direct correlation between the observations of X and Y.
n (n 1) / 2
On the other hand, if all the Y pairs are in reverse natural order, we have P=0, Q =
S= 0
n ( n 1) n (n 1)
, and
=
2
2
n (n 1)
,
2
n (n 1) / 2
= 1 , indicating a perfect inverse
n (n 1) / 2
15
Hence cannot be greater than +1 or smaller than -1, thus, can be taken as a relative
measure of the extent of the disagreement between the observed orders of the Y. The strength
of the correlation is indicated by the magnitude of the absolute value of .
(8)
16
The sampling window is divided into two equal segments and sample variance ( s12 or s 22 ) is
calculated from the data within each segment separately. For the case of two sample
variances, s12 and s 22 , the Bartlett test statistic is calculated as
Bstat =
2.30259(m 1)
2 log s 2 log s12 + log s 22
C
)]
(9)
where m=number of data points used to evaluate s12 or s 22 . The total variance, s2, is defined as
s12 + s 22
s =
2
2
(10)
1
2(m 1)
(11)
While choosing the segment length, it should be remembered that m10 (Lacasse and Nadim
1996). In this technique, the Bartlett statistic profile for the whole data within the zone of
influence is generated by moving sampling window over the soil profile under consideration.
In the continuous Bartlett statistic profile, the sections between the significant peaks are
treated as statistically homogeneous or stationary layers, and each layer is treated separately
for further analysis.
17
soil profiles covering all the possible ranges of autocorrelation functions applicable to soil. In
this procedure, the test statistic to reject the null hypothesis of stationarity is taken as the peak
value of Bartlett statistic profile. The critical value of modified Bartlett statistic is chosen at
5% significance level, which is calculated from simulated soil profiles using multiple
regression approach, following five different autocorrelation functions, viz., single
exponential, double exponential, triangular, cosine exponential, and second-order Markov.
The data within each layer between the peaks in the Bartlett statistic profile are checked for
existence of trend. A particular trend is decided comparing the correlation length obtained by
fitting a theoretical function to sample autocorrelation data. If the correlation lengths of two
trends of consecutive order are identical, it is not required to go for higher order detrending
process. However, it is suggested that no more than quadratic trend is generally required to be
removed to transform a non-stationary data set to stationary data set (Jaksa et al. 1999).
The following dimensionless factors are obtained from the data within each layer.
Number of data points in one scale of fluctuation, k =
(12)
nz n
=
kz k
(13)
mz m
=
kz
k
(14)
where is the scale of fluctuation evaluated, and n is the total of data points in a soil record
of T. The Bartlett statistic profile is computed from the sample variances computed in two
contiguous windows. Hence, the total soil record length, T, should be greater than 2W. To
ensure that m10, the normalized segment length should be chosen as I2=1 for k10 and I2=2
for 5k<10 (Phoon et al. 2003a).
18
Equations 15 and 16 show the typical results obtained from regression analysis for I2 equals
to 1 and 2 respectively for the single exponential simulated profiles. Similar formulations
have also been developed for other commonly encountered autocorrelation functions and
reported in Phoon et al. (2003a).
Bcrit=(0.23k+0.71) ln(I1)+0.91k+0.23
for I2=1
(15)
Bcrit=(0.36k+0.66) ln(I1)+1.31k-1.77
for I2=2
(16)
A comparison is made between the peaks of the Bartlett statistic within each layer with Bcrit
obtained from the respective layer. If Bmax<Bcrit, the layer can be treated as statistically
homogeneous and hence, accept the null hypothesis of stationarity. Otherwise, if Bmax>Bcrit,
reject the null hypothesis of stationarity, and treat the sections on either side of the peaks in
the Bartlett statistic profile as stationary and repeat the above steps and evaluate whether
these sections satisfy the null hypothesis of stationarity. However, while dividing the sections
on either side of the peaks in the Bartlett statistic profile, it should be checked for m10,
where m is the number of data points in a segment.
19
computed distance BC is noted each time at the elevation coinciding the centre of the
window (i.e., the intersection of two contiguous sub-windows). This length of sub-window is
selected based on the premise that at least 10 data points are available within the sub-window.
The data within the two sub-windows is treated separately, and checked for linear trend in the
data of 10 points. The reason behind verifying the data with only linear trend is that within
0.2 m profile, higher-order trends are rarely encountered. In addition, in normally
consolidated soils, the overburden stress follows a linear trend with depth. Kulathilake and
Um (2003) suggested that the demarcation between existence and non-existence of a linear
trend in the data be assumed at a determination coefficient (R2) of 0.9. It means that if the R2
value of theoretical linear fit is greater than 0.9, then the data set is said to be having a
linearly trend in it, if not the mean value is said to be constant throughout the sub-window.
Hence, within a window length (i.e., two contiguous windows) there exist four sets of
possibility of trend in the mean values. They are
1. Constant trend in both the contiguous sub-windows
2. Constant trend in upper sub-window and a linear trend in the lower sub-window
3. Linear trend in the upper sub-window and constant trend in the lower sub-window,
and
4. Linear trend in both the contiguous sub-windows.
The above four sets possibilities of trend within the contiguous windows are shown in Figure
3.2. As the distance BC increases, the heterogeneity of the qc at the intersection between
two sub-sections increases.
20
C
R2 for linear fit>0.9
C
R2 for linear fit<0.9
21
procedures will in most cases result in different values of the random fluctuating components
and different shapes of the autocorrelation function.
Baecher (1987) commented that the selection of a particular trend function is a decision on
how much of the spatial variability in the measurements is treated as a deterministic function
of space (i.e., trend) and how much is treated statistically and modelled as random processes.
However, the detrending process cannot be entirely arbitrary. After all, the fluctuating
components remaining in the detrended soil records must be stationary for meaningful
statistical analyses to be undertaken on limited data points. Clearly, the chosen trend function
should be reasonable in view of this stationary constraint. The scale of fluctuation or
autocorrelation distance evaluated from the non-stationary data is always higher than the
corresponding stationary data. In other words, the trend removal invariably reduces the scale
of fluctuation of the soil properties. One of the simplest methods to evaluate whether a linear
or 2nd order polynomial trend is sufficient to be removed from the experimental data is to
calculate the scale of fluctuation for the above both detrended data. If the evaluated scales of
fluctuation are closer to each other, a detrending process using the lesser degree polynomial
is chosen. In the limit, the scale of fluctuation is zero when the entire profile is treated as a
trend with zero random variation (Phoon et al. 2003a).
If a trend is evident in the measurements, it should be decided whether or not it should be
removed before statistical analysis of a set of raw data. An observed trend that has no
physical or geological basis or is not predictable must not be removed prior to statistical
analysis, since it is a part of the uncertainty to be characterized (Fenton 1999b). After
selecting a proper trend function for the data, the residuals off the trend are calculated. Phoon
et al. (2004) pointed out that trend removal is a complex problem, and there is at present no
fully satisfactory solution to it. The identified trend in the data is removed by employing any
of the following three widely used detrending methods.
22
23
second difference is rarely needed to transform a nonstationary data to stationary data (Jaksa
et al. 1999).
17)
where x is the measurement at a given location, z is the trend component, and e is the residual
(deviation about the trend). The residuals off the trend tend to exhibit spatial correlation. The
degree of spatial correlation among the residuals can be expressed through an autocovariance function.
c(k ) = E [P (Z i ) t (Z i )] P (Z j ) t (Z j )
(18)
where k is the vector of separation distance between point i and j, E[.] is the expectation
operator, P(Zi) is the data taken at location i, and t(Zi) is the value of the trend at location i.
24
The normalized form of the autocovariance function given in Equation 19 is known as the
autocorrelation function.
(k)= c[k]/c[0]
(19)
where c[0] is the autocovariance function at zero separation distance, which is nothing but
variance data.
It is not possible to evaluate ck nor k with any certainty, but only to estimate them from
samples obtained from a population. As a result, the sample autocovariance at lag k, c k* , and
sample autocorrelation at lag k, rk, are generally evaluated. The sample autocorrelation
function (ACF) is the graph of rk for lags k=0,1,2, h, where h is the maximum number of
lags allowable. Generally, h is taken as a quarter of total number of data points in time
series analysis of geotechnical data (Box and Jenkins 1970; Lumb 1975a). Beyond this
number, the number of pairs contributing to the autocorrelation function diminishes and
produces unreliable results. The sample ACF at lag k, rk, is generally evaluated using
)(
N k
1
X i X X i+k X
(
N k 1) i =1
rk =
N
2
1
Xi X
(N 1) i =1
)
(20)
= 2 ( ) d
(21)
25
More generally, the scale of fluctuation is defined as the radius of an equivalent unit step
correlation function, i.e., ()=1 for and =0 for > , being the Euclidian lag (JCSS
2000). The autocorrelation distance (or scale of fluctuation) is evaluated from the sample
autocorrelation function using method of fitting or based on Bartlett limits, which are
described in the following sections.
3.4.4.1.1 Method of fitting
Analytical expressions are fitted to the sample autocorrelation functions using regression
analysis based on least square error approach. The least square error is generally
characterised by the determination coefficient of the fit. Frequently used single-parameter
theoretical auto-correlation functions are exponential, squared exponential, though models
such as triangular, second order auto-regressive, spherical, etc. are also not uncommon to fit
the sample autocorrelation data in geotechnical engineering. Some of these models are given
in Table 3.1.
Table 3.1. Theoretical autocorrelation functions used to determine the autocorrelation
distance and scale of fluctuation, (Jaksa et al. 1999)
Theoretical
Model
autocorrelation
No.
function
1
2
3
Triangular
Single
exponential
Double
exponential
Second-order
Markov
Cosine
exponential
Autocorrelation
distance,
Scale of
fluctuation,
2b
z = exp( z / d )1 +
d
4d
Autocorrelation function
z
1
=
a
0
for z a
for z a
z = exp( z / b )
z = exp ( z / c )
z = exp( z / e )cos
26
Table 3.1 shows the autocorrelation distance and corresponding scale of fluctuation for
theoretical autocorrelation functions. A small scale of fluctuation () implies rapid
fluctuations about the mean and vice versa. and a large reduction in variance over any failure
plane; this results in a small spread of the performance function. Conversely a large
means much longer variations about the mean and results in smaller reduction in variance
over a failure plane (Mostyn and Soo 1992).
3.4.4.1. 2 Bartlett limits
In the field of time series analysis, the most commonly used method to compute the
autocorrelation distance is by Bartletts approximation. In this method the computed scale of
fluctuation corresponds to two standard errors of the estimate, i.e., the lag distance at which
the positive Bartletts limits given by Equation 2.21, superimposed on the autocorrelation plot
crosses the autocorrelation function (Jaksa et al. 1999).
rh =
1.96
(22)
The scale of fluctuation of cone tip resistance varies from site to site. Moreover, it also varies
with type of soil, as Jaksa et al. (2004) reports smaller scales of fluctuation in sands than
clays due to their nature of formation. Further, Fenton and Vanmarcke (1998) argue that the
scale of fluctuation depends largely on the geological processes of transport of raw materials,
layer deposition, and common weathering rather than on the actual property studied.
Nonetheless, DeGroot and Baecher (1993) observed that the scale of fluctuation is also
function of sampling interval on in-situ measured property.
27
anisotropy. Inherent or initial anisotropy manifests itself in the soil deposits as a result of
applied stresses at the time of formulation in the form of first-structure on a macroscopic
scale or as a fabric orientation on the microscopic scale. Stress or induced anisotropy arises
from changes in the effective stress state produced by subsequent loading history. This
anisotropy can cause the elastic, strength and compressibility parameters of the soil deposits
to vary with direction, and hence cannot be ignored.
The soil properties exhibit large variations and their directional behaviour is observed by
many researchers (Vanmarcke 1983; Jaksa et al. 1999; Phoon and Kulhawy 1999a; Griffiths
and Fenton 2000; Nobahar and Popescu 2002; Fenton and Griffiths 2003; Jaksa et al. 2004;
Sivakumar Babu and Mukesh 2004; and Uzielli et al. 2005; Wei et al. 2005). The
autocorrelation distances in vertical and horizontal directions are never the same, but in
general, differ by an order of magnitude, with horizontal scale of fluctuation being higher
than that in the vertical (Uzielli et al. 2005). Attempts have been made in the literature to
formulate autocorrelation models for 1, 2, and 3-dimensional soil space (Vanmarcke 1977a;
and Kulathilake and Miller 1987). The effect of anisotropy of soil properties on the bearing
capacity in a probabilistic framework has not been studied extensively in the literature. Many
times, due to economic feasibility, speed of exploration, availability of equipment and time
constraints vertical cone penetration data alone is obtained and used in the evaluation of
strength properties (Wei et al. 2005).
The autocovariance structure is called isotropic if the normalized autocovariance depends on
the Euclidian distances between field points only, instead of the axis directional distance
components, components, i.e.,
(x, y, z ) = x 2 + y 2 + z 2
(23)
28
(x, y , z ) = x 2 + y 2 , z
(24)
Dx
(x, y, z ) = exp
y
Dy
D z
(25)
If an isotropy in the horizontal direction is assumed, then the exponential correlation function
shown in Equation 2.25 is reduced to
x 2 + y 2 z
(x, y, z ) = exp
Dh
Dz
(26)
Similar theoretical autocorrelation functions in 3-D field for other distributions can also be
formulated on the similar lines shown above.
29
influence. Both the conventional analyses based on the factor of safety and the simplified
probabilistic analyses fail to address this issue of scale of failure. Over the depth interval Z
the spatial average soil property is given as
u ( Z ) =
1
Z
u (Z )dz
(27)
The spatial average of the soil property u(x,y,z) over a volume V is given in the same way as
uv =
1
u ( x, y, z )dxdydz
V
V
(28)
Averaging distance depends on the nature of the problem in hand. For design of shallow
foundations in shear criterion, this distance is equal to the extent of shear failure zone within
the soil mass (Cherubini 2000). This distance for shallow foundations in cohesionless soil
subjected to vertical loading is approximately taken as 2B below the base of footing in the
vertical direction and 3.5B from the centre of footing in the horizontal direction, where B is
the width of the footing.
30
(or the volume) over which the property is averaged, higher is the fluctuation of ui that tends
to cancel out in the process of spatial averaging. This causes reduction in standard deviation
as the size of the averaging length or volume increases, which is given by
z
i
u (Z ) =
(29)
A simple relationship of the variance reduction function in terms of scale of fluctuation and
averaging distance is given in Equation 2.30 (Vanmarcke 1977a).
( Z ) = 1 .0
2 ( Z ) =
2
> 1 .0
(30)
1 .0
The Equation 2.30 indicates that with decrease in scale of fluctuation and increase in
averaging distance, the value of variance reduction function reduces, which in turn reduces
standard deviation of the spatially averaged soil property. In other words, the more erratic the
variation (i.e., less correlated the soil property) of the soil property with distance and larger
the soil domain considered, larger will be the reduction in variability of the average property.
This phenomenon is a result of the increasing likelihood that unusually high property values
at some point will be balanced by low values at other point (Vanmarcke 1977a). However,
Vanmarcke (1983) emphasized that the variance reduction function (T) is related to the
autocorrelation function () as given by.
(T ) =
1
T2
T T
(t
t 2 )dt1 dt 2
(31)
0 0
which reduces to
(T ) =
2
1 ( )d
T 0 T
T
(32)
31
From Equation 2.32, the variance reduction functions for triangular, exponential, and squared
exponential autocorrelation functions can be worked out as given in Equations 2.33 to 2.35,
respectively.
T
1
(T ) = 3a
a
a
1
T 3T
for T a
(33)
for T a
b T
1 + exp( T / b )
T b
(T ) = 2
d
(T ) =
T
(34)
T E T + exp T 1
d
d d
(35)
where a, b, d are referred to as the autocorrelation distances, T is the averaging length, the
distance over which the geotechnical properties are averaged over a failure surface, and E()
is the error function, which increases from 0 to 1 as its argument increases from 0 to . In
terms of standard Gaussian cumulative distribution function E(u)=2[FU(u)-0.5].
As the averaging length, T the variance reduction function, (T) 0. In other words, the
chances associated with failure of huge volume of soil are very rare. In addition, (T) is
inversely proportional to T at very large values of T.
The variance reduction factor for averaging in one, 2 or 3-D random field may be
approximated as given in Equations .
2 (L1 K Ln ) = 1
=
(L1 K Ln )
for (L1 K Ln ) n
(36)
for (L1 K Ln ) n
where n=1, 2, 3, and L1, L2 and L3 are the lengths over which averaging takes place and 1,
2, 3 are the correlation radii. In case of separable autocorrelation functions, i.e. which can
32
(37)
Similar to the above, Vanmarcke (1977a) also proposed an approximate and simplified
resultant variance reduction factor in 2-D space as the product of individual variance
reduction factors in vertical and horizontal directions in terms of scale of fluctuation () and
spatial averaging distance (L) in the respective directions as shown in Equation (38).
A2 = v2 h2
(29).
The above propositions have been used in the analysis of spatial variability of soils and the
influence of spatial variability in foundation design is presented in Sivakumar Babu et al
(2005) Dasaka et al (2005) and Sumanta Haldar and Sivakumar Babu (2006).
33
Learning objectives
1. Understanding the concept of sampling
2. Understand the various sampling techniques, sampling
plans.
3. Understanding the significance of making decision based
on samplings.
MODULE 4
Motivation:
Importance of understanding the various sampling
Highlights:
Knowledge of sampling techniques, decision based on sampling
4. Sampling
4.1. Concepts of Sampling
Many variables in Civil engineering are spatially distributed. For example concentration
of pollutants, variation of material properties such as strength and stiffness in the case of
concrete and soils are spatially distributed. The purpose of sampling is to obtain estimates
of population parameters (e.g. means, variances, covariances) to characterize the entire
population distribution without observing and measuring every element in the sampled
population. Sampling theory for spatial processes principally involves evaluation of
estimators sampling distributions and confidence limits. A very good introduction to
these methods and the uses and advantages of sampling is provided by Cochran (1977)
and Beacher and Christian (2003).
An estimate is the realization of a particular sample statistic for a specific set of sample
observations. Estimates are not exact and uncertainty is reflected in the variance of their
distribution about the true parameter value they estimate. This variance is, in turn, a
function of both the sampling plan and the sampled population. By knowing this variance
and making assumptions about the distribution, shape, confidence limits on true
population parameters can be set.
A sampling plan is a program of action for collecting data from a sampled population.
Common plans are grouped into many types: for example, simple random, systematic,
stratified random, cluster, traverse, line intersects, and so on. In deciding among plans or
in designing a specific program once the type plan has been chosen, one attempts to
obtain the highest precision for a fixed sampling cost or the lowest sampling cost for a
fixed precision or a specified confidence interval.
4.2. Common Spatial Sampling Plans
Statistical sampling is a common activity in many human enterprises, from the national
census, to market research, to scientific research. As a result, common situations are
encountered in many different endeavors, and a family of sampling plans has grown up to
1
handle these situations. Simple random sampling, systematic sampling, stratified random
sampling, and cluster sampling are considered in the following section.
4.2.1. Simple random sampling
The characteristic property of simple random sampling is that individual are chosen at
random from the sampled population, and each element of population has an equal
probability of being observed. An unbiased estimator of the population mean from a
simple random x={x1..xn} is the sample mean
x=
1 n
xi
n i =1 --------------------------------(1)
2 N n
n
--------------------------------(2)
where 2 is the (true) variance of the sampled population and N is the total sampled
population size. The term (N-n)/N is called the finite population factor, which for n less
than about 10% of N, can safety be ignored. However, since 2 is usually unknown. it is
estimated by the sample variance
s2 =
1 n
( xi x) 2
n 1 i =1
--------------------------------(3)
in which the denominator is taken as n-1 rather than n. reflecting the loss of a degree- offreedom due to estimating the mean from the same data. The estimator is unbiased but
does not have minimum variance. The only choice (i.e. allocation) to be made in simple
random sampling is the sample size n. Since the sampling variance of the mean is
inversely proportional to sample size. Var ( x ) n 1 , a given estimator precision can be
obtained by adjusting the sample size, if is known or assumed. A sampling plan can be
optimized for total cost by assuming some relationship between Var (x ) and cost in
YL = y
ts
1 f ,
n
YU = y +
ts
1 f
n
Total:
YL = N y
tNs
1 f ,
n
YU = N y +
tNs
1 f
n
The symbol t is the value of the normal deviate corresponding to the desired confidence
probability. The most common values are tabulated below:
Confidence probability (%)
Normal deviate, t
50
80
90
95
99
0.67
1.28
1.64
1.96
2.58
If the sample size is less than 60, the percentage points may be taken from Students
t table with (n-1) degrees of freedom, these being the degrees of freedom in the estimated
s2. The t distribution holds exactly only if the observations yi are themselves normally
distributed and N is infinite. Moderate departures from normality do not affect it greatly.
For small samples with very skew distributions, special methods are needed. An example
of the application is as follows.
Example.
In a site, the number of borehole data sheets to characterize the substrata to obtain design
parameters is 676. In each borehole data, 42 entries reflecting the various characteristics
3
of soils viz. compressibility, shear strength, compaction control, permeability etc are
indicated. In an audit conducted, it was revealed that in some datasheets, all the data are
not entered. The audit party verified a random sample of 50 sheets ( 7% sample) and the
results are indicated in Table.1
Table 21 Results for a sample of 50 petition sheets
Number of signatures, yi
Frequency, fi
42
23
41
4
36
1
32
1
29
1
27
2
23
1
19
1
16
2
15
2
14
1
11
1
10
1
9
1
7
1
6
3
5
2
4
1
3
1
fi
50
We find
n = fi = 50,
y = fi yi = 1471,
fi yi 2 = 54,497
( fi yi ) 2
1
1
2
[ fi ( yi y ) 2 ] =
f
y
i i
n 1
n 1
fi
1
(1471) 2
54,
497
= 229.0
49
50
This gives 18,107 and 21,669 for the 80 % limits. A complete count showed 21,045
entries and is close to the upper estimate.
N 1 2 k (n 1)
Var ( x ) =
w +
N --------------------------------(4)
N
where it is the interval between samples (k = N/n) and w2 is the variance of elements
within the same systematic sample
2
1 k j =1 ( xij x)
s =
k i =1
n 1
--------------------------------(5)
n
2
w
in which xij is the jth member of the ith interval of the sample. When only one systematic
sample has been taken (i.e. one set of n observations at spacing k) the variance of the
mean cannot be evaluated unless an assumption is made about the nature of the sampled
population. The conventional assumption is that the population can be modeled by a
linear expression of the form, xi = + ei , in which is the mean and ei is a zero-mean
random perturbation. For constant mean, this leads to (Cochran 1977)
2
1 N n ( xi x)
Var ( x) =
n N
n 1
--------------------------------(6)
Var ( x) =
1 N n ( xi 2 xi + k + xi + 2 k )
n N
6( n 2)
--------------------------------(7)
One must ensure that the chosen sampling interval does not hide a pattern. Any pattern
would threaten randomness. A random starting point must also be selected. Systematic
sampling is to be applied only if the given population is logically homogeneous, because
systematic sample units are uniformly distributed over the population.
Example: Suppose the auditor in the previous example wants to use systematic sampling,
then he can choose every 25th or 50th sheet and conduct the study on this sample.
A starting point is chosen at random, and thereafter at regular intervals. For example,
suppose you want to sample 25th sheet from 676 sheets, 676/25=27, so every 27th sheet is
chosen after a random starting point between 1 and 15. If the random starting point is 11,
then the sheets selected are 11, 28, 65, 92 etc.
4.2.3. Stratified random sampling
When sub-populations vary considerably, it is advantageous to sample each
subpopulation (stratum) independently. Stratification is the process of grouping members
of the population into relatively homogeneous subgroups before sampling. The strata
should be mutually exclusive: every element in the population must be assigned to only
one stratum. The strata should also be collectively exhaustive: no population element can
be excluded. Then random or systematic sampling is applied within each stratum. This
often improves the representativeness of the sample by reducing sampling error. It can
produce a weighted mean that has less variability than the arithmetic mean of a simple
random sample of the population.
There are several possible strategies:
1) Proportionate allocation uses a sampling fraction in each of the strata that is
proportional to that of the total population. If the soil sample consists of 60% of boulders
(boulder stratum) and 40% sand (sand stratum), then the relative size of the two types of
samples should reflect this proportion.
2) Optimum allocation (or Disproportionate allocation) - Each stratum is proportionate to
the standard deviation of the distribution of the variable. Larger samples are taken in the
strata with the greatest variability to generate the least possible sampling variance.
Estimates of the total population characteristics can be made by combining the individual
stratum estimates. For certain populations, stratifying before sampling is more efficient
than taking samples directly from the total population. Sampling plans that specify a
simple random sample in each stratum are called stratified random sampling plans.
1
N
N
h =1
x h ------------------- (8)
Where x is the population mean, in is the number of strata and h denotes the stratum (i.e.
N is the size of the hth stratum, and x h is the corresponding mean). The variance of this
estimate is
s h2
Var ( x) = w
(1 f h )
nh
h
--------------------------------(9)
2
h
1
N
2
h
2
+ s amongmeans
--------------------------------(10)
f1 = m M called, as before, the sampling fraction. Each cluster contains the same
number of elements, N, some number n of which are selected for measurement (f2= n/N).
An unbiased estimate of the average of each cluster is
xi =
1
ni
----------------------------------(11)
ij
where, xij is the jth element of the ith cluster. An unbiased estimate of the average of the
total population is
xi =
1
1
xj =
xij
m i
mn i j
Var ( x) =
(1 f1 ) 2 f1 (1 f 2 ) 2
s1 +
s2
m
mn
2
1
(x
=
n 1
2
2
(x
=
ij
xi
m(n 1)
----------------------------(12)
In the more general case, not all of the clusters are of equal size. For example, the
numbers of joints appearing in different outcrops are different. With unequal sized
clusters the selection plan for clusters is not as obvious as it was previously. The relative
probability, of selecting different sized clusters is now a parameter of the plan.
Commonly, the zj are either taken all equal (simple random sampling of the clusters) or
proportional to size. The precisions of these two plans are different. For selection with
equal probability an unbiased estimate of the true total population mean is
9
N x
x=
N
i
var( x ) =
(1 f1 ) ( N i xi x m ) 2
f1
+
2
2
( m 1)
mN
m2 N
N i2 (1 f 2i ) s 22i
ni
(1 f i ) (N i xi xm )
Var ( x) =
mN
(m 1)
f1 (1 f 2 )
2
2
N i s 2i
mnN
--------------------------------(14)
This assumption (ni N i ) is frequently valid: for example, proportionally more joints are
typically sampled from larger outcrops than from smaller outcrops.
For selection with probability proportional to size, an unbiased estimate of the total
population mean is
x=
1
xi
n i
Var ( x ) =
--------------------------------(15)
In all eases, the variance of the total population can be estimated from the variances
between elements within clusters and the variance between the means of the clusters:
10
Joint surveys and many other types of civil engineering sampling may be based on cluster
plans because the cost of sampling many individual joints on one outcrop (i.e. a cluster)
is less than the cost of traveling between outcrops.
The variance of geological populations is often a function of spatial extent. Indeed, this is
the principal argument in the geo-statistical literature for favoring variograms over auto
covariance functions. If we consider the strength of soil specimens taken dose together,
the variance among specimens is usually smaller than the variance among specimens
taken from many locations in one area of the site, which, in turn, is smaller than the
variance among specimens taken from all across the site. Cluster techniques allow us to
evaluate variance as a function of the extent of the distribution in space by nesting the
variances.
11
Learning objectives
1. Understanding importance of reliability analysis.
MODULE -5
Motivation and Highlights
Motivation:
Importance of understanding the concept of reliability analysis
Highlights
Knowledge of terms such as Levels of reliability, FOSM .
Figure 1 - Safe domain and failure domain in two dimensional state spaces.
ZR =
R R
R
R Q
ZQ =
Q
-------------------------------- (1)
The variables ZR and ZQ, are sometimes called reduced variables. By rearranging
Equation no.1, the resistance R and the load Q can be expressed in terms of the reduced
variables as follows:
R = R + Z R R
R = Q + Z Q Q
-------------------------------- (2)
The limit state function g(R, Q) = R-Q can be expressed in terms of the reduced variables
by using Eqs. 2. The result is
g ( Z R , Z Q ) = R + Z R R Q Z Q Q = ( R Q ) + Z R R Z Q Q
------------(3)
For any specific value of g(ZR, ZQ), Equation no.3 represents a straight line in the space
reduced variables ZR and ZQ. The line corresponding to g(ZR, ZQ) =0 separates the safe
and failure domain in the space of reduced variables. The loads Q and resistances R are
some times indicated in terms of capacity C and demand D as well in literature.
Design for a given reliability index
CD
Ps = R = 1
2 + 2
C
D
C 1 + V 2
D
ln
2
D
VC
Ps = 1 Pf = 1
2
2
ln 1 + V D 1 + VC
we define CFS =
)(
(1)
C
and write (1) and (2) in terms of CFS
D
CFS 1
(CFS ) V
2
2
C
+V
2
D
1 + VD2
ln CFS
1 + VC2
=
ln 1 + VD2 1 + VC2
)(
A version of the reliability index was defined as the inverse of the coefficient of variation
The reliability index is the shortest distance from the origin of reduced variables to the
is illustrated in Figure 3, line g(ZR, ZQ) = 0 .This definition, which was introduced by
Hasofer and Lind (1974) following formula:
Using geometry we can calculate the reliability index (shortest distance) from the
following formula:
R Q
R2 + Q2
-------------------------------- (4)
where is the inverse of the coefficient of variation of the function g(R, Q) = R-Q
When R and Q are uncorrelated for normally distributed random variables R and Q, it can
be shown that the reliability index is related to the probability of failure by
= 1 ( Pf ) or
Pf = ( )
Figure 3 - Reliability index defined as the shortest distance in the space of reduced variables.
10-1
1.28
10-2
2.33
10-3
3.09
10-4
3.71
10-5
4.26
-6
10
4.75
10-7
5.19
10-8
5.62
10-9
5.99
The definition for a two variab1e case can be generalized for n variables as follows.
Consider a limit state function
Zi =
Xi X
--------------------------------(5)
2. Redefine the limit state function by expressing it in terms of the reduced variables
(Z1,Z2,..,,Zn).
3. The reliability index is the shortest distance from the origin in the n-dimensional space
of reduced variables to the curve described by g(Z1, Z2, . . . , Zn) = 0.
z=0
fZ(z)
Limit surface
Z c
c =
S
z
-----------------------------------(6)
= R
z
R2 + S2
On the other hand, if the joint probability density function fX(x) is known for the multi
variable case, then the probability of failure pf is given by
p f = f X ( x) dX -----------------------------------(7)
L
where i is the direction cosine of random variable Xi, is the distance between the
origin and the hyper plane g(X)=0, n is the number of basic random variables X, and is
the standard normal distribution function.
The above formulations can be generalized for many random variables denoted by the
vector X. Let the performance function is in the form given as
Z= g(X) = g (X1, X2.Xn) -----------------------------------(9)
A Taylor series expansion of the performance function about the mean value is given by
equation.
g
1 n n 2g
( X i X i ) +
( X i X i )( X j X j ) + ......... ------(10)
2 i =1 j =1 X i X j
i =1 X i
n
Z = g ( X ) +
where derivatives are evaluated at the mean values of the random variables (X1, X2 Xn)
and X i is the mean value of Xi. Truncating the series in linear terms, the first order mean
and variance of Z can be obtained as
Z g ( X , X ,..... X )
1
g g
var( X i , X j ) -----------------------------------(11)
j =1 X i X j
n
and, Z2
i =1
where var (Xi, Xj) is the covariance of Xi and Xj. If the variances are uncorrelated, then
the variance for z is given as
g
i =1 X i
2
Z
The reliability index can be calculated by taking the ratio of mean ( Z ) and standard
deviation of Z ( z ) as in Equation 2.66.
z
z
g ( X 1 , X 2 , X 3 .......... X n ) = a0 + a1 X 1 + a2 X 2 + a3 X 3 ............an X n = a0 + ai X i
i =1
where the ai terms (i = 0, 1, 2. n) are constants and the Xi terms are uncorrelated
random variables. If we apply the three-step procedure outlined above for determining
the Hasofer-Lind reliability index, we would obtain the following expression for :
n
a 0 + ai X
i =1
(a
i =1
Xi
)2
-------------------------------- (13)
Observe that the reliability index , , in Eq. 5.18 depends only on the deviations of the
random variables Therefore this is called a second moment measure of structural safety
because
only
the
first
two
moments
(mean
and
variance)
to calculate . There is no explicit relationship between and the distributions of the type
of probability to calculate variables. If the random variables are all normally distributed
10
and uncorrelated then this formula is exact in the sense that and Pf are related by Eq.
5.15. Otherwise, Eq. 5.15 Provides only an approximate means of Probability of failure.
The method discussed above has some limitations and deficiencies. It does not use the
distribution information about the variable and function g( ) is linearized at the mean
values of the Xi variables. If g( ) is non-linear, neglecting of higher order term in Taylor
series expansion introduces significant error in the calculation of reliability index. The
more important observation is that the Equations 2.58 and 2.64 do not give constant value
of reliability index for mechanically equivalent formulations of the same performance
function. For example, safety margin R-S<0 and R/S <1 are mechanically equivalent yet
these safety margins will not lead to same value of probability of failure. Moreover,
MVFOSM approach does not use the distribution information about the variables when it
is available.
Nonlinear limit state functions
Now consider the case of a nonlinear limit state function. When the function is nonlinear,
we can obtain an approximate answer by linearizing the nonlinear function using a Taylor
series expansion. The result is
n
where
g
X i
( x1* , x2*.....xn* ) is the point about which the expansion is performed. One choice for
this linearization point is the point corresponding to the mean values of the random
variables.
Thus
Eq.
5.19
n
becomes
g
X i
11
Since Eq. 5.20 is a linear function of the Xi variables, it can be rewritten to be exactly like
Eq. 5.17. Thus Eq. 5.18 can be used as an approximate solution for ti reliability index .
After some algebraic manipulations, the following expression for results:
g ( X , X ,....... X )
1
(a
i =1
Xi
)2
where ai =
g
X i
The reliability index defined in Eq. 5.21 is called a first-order second-moment mean
value reliability index. It is a long name, but the underlying meaning of each part of the
name is very important:
First order because we use first-order terms in the Taylor series expansion ,
Second moment because only means and variances are needed. Mean value because the
Taylor series expansion is about the mean values.
5.4.2. Comments on the First-Order Second-Moment Mean Value Index
12
simple case in which g(R, Q) = R - Q. The method has both advantages and
disadvantages in structural reliability analysis.
Among its advantages:
1. It is easy to use.
2. It does not require knowledge of the distributions of the random variables.
13
X i' =
Xi Xi
, i=1, 2 n
-----------------------------------(14)
where X i' is a random variable with zero mean and unit standard deviation. The above
equation is used to transform the original limit state g(X) =0 to reduced limit state
g(X) =0. X is referred to as the original co-ordinate system and X reduced co-ordinate
system. Note that if Xi is normal in original co-ordinate system it will be standard normal
in reduced co-ordinate system.
The Hasofer-Lind reliability index (HL) can be defined as the minimum distance from
the origin of the axes in the reduced co-ordinate system to the limit state surface. The
minimum distance point on the limit state surface is called the design point or checking
point. Considering the limit state function in two variables as given in Equation 2.68,
wherein R and S should be normal variables, the reduced variables can be written as
given in equations.
14
Z = RS =0
R' =
R R
S' =
S S
Substituting values of R and S in the above equation, the limit state equation in the
reduced co-ordinate system can be written as
g ( ) = R R ' S S ' + R S = 0 -----------------------------------(15)
The position of the limit state surface relative to the origin in the reduced coordinate
system is a measure of the reliability of the system. By simple trigonometry, the distance
of the limit state line from the origin can be calculated and it will give the reliability
index value.
HL =
R S
R2 + S2
-----------------------------------(16)
This is same as the reliability index defined by the MVFOSM method, if both R and S are
normal. In this definition the reliability index is invariant, because regardless of the form
in which the limit state equation is written, its geometric shape and the distance from the
origin remains constant.
To be specific, is the First-order second moment reliability index, defined as the
minimum distance from the origin of the standard, independent normal variable space to
the failure surface as discussed in detail by Hasofer and Lind (1974). Figure 2.5 shows
the plot depicting the functional relationship between probability of failure (pf) and
15
reliability index (), and classifies the performance of designs based on these two values.
As seen from the figure, the performance is high if the reliability index is equal to 5,
which corresponds to a probability of failure of approximately 310-7.
5.5.1. Hasofer-Lind Reliability Index
Hasofer and Lind proposed a modified reliability index that did not exhibit the invariance
problem illustrated in Example 5.3. The correction is to evaluate the limit state function
at a point known as the design point instead of the mean values. The design point is a
point on the failure surface g = 0. Since this design point is generally not known a priori,
an iteration technique must be used (in general) to solve for the reliability index.
5.5.2. AFOSM Method for Normal Variables
The Hasofer-Lind (H-L) method is applicable for normal random variables. It first
defines the reduced variables as
X i' =
Xi X
(i = 1,2,......., n) -----------------------------------(17)
'
where X i is a random variable with zero mean and unit standard deviation. Above
equation is used to transform the original limit state g(X) = 0 to the reduced limit state
g(X`)= 0. The X coordinate system is referred to as the original coordinate syatem. The
X` coordinate system is referred to as the transformed or reduced coordinate system.
`
Hi is defined as the
minimum distance from the origin of the axes in the reduced coordinate system to the
limit state surface (failure surface). It can he expressed as
Hi = ( x `* ) I ( x `* ) -----------------------------------(18)
16
The minimum distance point on the limit state surface is called the design point or
checking point. It is denoted by vector x* in the original coordinate system and by vector
x`* in the reduced coordinate system. These vectors represent the values of all the
random variables, that is, X1, X2. ..., Xn, at the design point corresponding to the
coordinate system being used.
This method can be explained with the help of figure shown Consider the linear limit
state equation in two variables.
Z=RS=0 This equation is similar to above equation. Note that R and S need not be
normal variables. A set of reduced variables is introduced as
R`=
R R
-----------------------------------(19)
and
S `=
S S
-----------------------------------(20)
17
If we substitute these into above equation the limit state equation in the reduced
coordinate system becomes
g ( ) = R R' S S '+ R S = 0 -----------------------------------(21)
The transformation of the limit state equation from the original to the reduced coordinate
system is shown in above figure. The safe and failure regions are also shown. From the
figure, it is apparent that if the failure line (limit state line) is closer to the origin in the
reduced coordinate system the failure region is larger and if it is farther away from the
origin, the failure region is smaller. Thus, the position of the limit state surface relative to
the origin in the reduced coordinate system is a measure of the reliability of the system.
The coordinates of the intercepts of above equation on the R` and S` axes can be shown
to
be
respectively
.Using
simple
trigonometry, we can calculate the distance of the limit state line from the origin as
HL =
R S
R2 + S2
-----------------------------------(22)
This distance is referred to as the reliability index or safety index. It is the same as the
reliability index defined by the MVFOSM method in above equation if both R and S are
normal variables. However, it is obtained in a completely different way based on
geometry. It indicates that if the limit state is linear and if the random variables R and S
arc normal, both methods will give an identical reliability or safety index.
In general, for many random variables represented by the vector X = (x1,x2,..xn) in
the originates the safe state and g(X) < 0 denotes the failure state, Again, the HasoferLind reliability index is defined coordinated system and X = X 1 , X 2 , X 3 ........X n in
`
the reduced coordinate system the limit state g(X`) = 0 is a nonlinear function as shown
`
in the reduced coordinates for two variables in figure .At this stage, X i ' s are assumed to
be uncorrelated. Here g(X`) > 0 denoted as the minimum distance from the origin to the
design point on the limit state in the reduced coordinates and can be expressed by above
equation , where x` - represents the coordinates of the design point or the point of
minimum distance from the origin to the limit state. In this definition the reliability index
is invariant, because regardless of the form in which the limit state equation is written, its
18
geometric shape and the distance from the origin remain constant. For the limit state
surface where the failure region is away from the origin,
along the ray joining the origin and x`* it is obvious that the nearer x`* is to the origin,
the larger is the failure probability. Thus the minimum distance point on the limit state
surface is also the most probable failure point. The point of minimum distance from the
origin to the limit state surface, x`*, represents the worst combination of the stochastic
variables and is appropriately named the design point or the most probable point (MPP)
of failure.
For nonlinear limit states, the computation of the minimum distance becomes an
optimization problem:
Minimized D = x 'l x'
19
X i'
i =1
=
n
HL
'*
i
'
i =1
i
n
-----------------------------------(23)
2*
Where
'*
1
'*
1
'*
2
'*
2
'*
n
'*
n
xi'* = i HL (i = 1,2,.......n )
Where
i =
'
X i
'
i =1
i
n
2*
- ----------------------------------(24)
are the direction cosines along the coordinate axes X i' in the space of the original
coordinates and using equation, we find the design point to be
xi* = X i i X i HL
An algorithm was formulated by Rackwitz (1976) to compute HL and xi'* as follows:
Step 1. Define the appropriate limit state equation.
Step 2. Assume initial values of the design point xi'* , i = 1,2,...........n . Typically, the
initial design point may be assumed to be at the mean values of the random variables.
20
value converges.
Ditlevsen (1979a) showed that for a nonlinear limit state surface lacks comparability; the
ordering of HL values may not be consistent with the ordering of actual reliabilities. An
example of this is shown in figure with two limit state surfaces: one flat and the other
curved. The shaded region to the right of each limit state represents the corresponding
failure region. Clearly, the structure with the flat limit state surface has a different
reliability than the one with the curved limit state surface; however, the HL values are
identical for both surfaces and suggest equal reliability. To overcome this inconsistency,
Ditlevsen (1979a) introduced the generalized reliability index, g defined as
'
'
'
'
'
'
.........
x
........
x
d
x
dx
.......
x
-----------------------------------(25)
1
2
n
1
2
n
g (X ' )>0
g = 1
( )( )
( )
where and are the cumulative distribution function and the probability density
function of a standard normal variable. Respectively, Because the reliability index in this
definition includes the entire sale region, it provides a consistent ordering of secondmoment reliability. The integral in the equation looks similar to that in equation and is
difficult to compute directly. Hence, Ditlevsen (1979a) proposed approximating the
nonlinear limit state by a polyhedral surface consisting of tangent hyper-planes at
selected points on the surface.
21
Consider a limit state function g(X1, X2, . . . , Xn) where the random variables Xi are all
uncorrelated. (If the variables are correlated, then a transformation can be used to obtain
uncorrelated variables. See Example 5.15.) The limit state function is rewritten in terms
of the standard form of the variables (reduced variables) using
Zi =
Xi X
As before, the Hasofer-Lind reliablity index is defined as the shortest distance from the
origin of the reduced variable space to the limit state function g = 0.
Thus far nothing has changed from the previous presentation of the reliability index. In
fact, if the limit state function is linear, then the reliability index is still calculated as in
Eq 5.18
n
a 0 + ai X
i =1
(a
i =1
Xi
)2
-------------------------------- (26)
22
If the limit state function is nonlinear , however, iteration is required to find the design
point
{Z1* , Z 2* ......Z n* } in reduced variable space such that still corresponds to the
Shortest distance. This concept is illustrated in Figures 5.17 through 5.19 for the case of
two random variables.
Figure 10 - Design point on the failure boundary for the linear limit state
function g = R Q
23
Figure 11 - Design point and reliability index for highly nonlinear limit state
function.
The iterative procedure requires us to solve a set of (2n + 1) simultaneous equation with
(2n + I) unknowns: ,1, 2 ., n , Z 1* , Z 2* .......Z n* where
i =
g
Z i
k =1 Z k
g
g X i
g
=
=
Xi
Z i X i Z i X i
n
( )
i =1
z i* = i
g z1* , z 2* .............z n* = 0
Equation 5.23b is just an application of the chain rule of differentiation Equation 5.23c is
a requirement on the values of the i variables, which can be confirmed by looking at
24
Eq.5.23a. Equation 5.25 is a mathematical statement of the require that the design point
must be on the failure boundary .
There are two alternative procedures for Performing the iterative analysis: the
simultaneous equation procedure and the matrix procedure The steps in the simultaneous
Express
the
limit
state
function
in
terms
of
reduced
variates
zi.
3. Use Eq. 5.24 to express the limit state function in terms of and i.
4 Calculation then a values. Use Eq. 5.24 here also to express each i as a function of all
and i.
5. Conduct the initial cycle: Assume numerical values of and i , noting that the i
values must satisfy Eq. 5.23c.
6. Use the numerical values of and i on the right-hand sides of the equations formed in
Steps 3 and 4 above.
7. Solve the n + 1 simultaneous equations in Step 6 for and i.
8. Go back to Step 6 and repeat. Iterate until the and i values converge.
3. Determine the reduced variates {Z i* } corresponding to the design point {xi* } using
Z =
*
i
xi* X
4. Determine the partial derivatives of the limit state function with respect to the reduced
variates using Eq. 5.23b. For convenience, define a column vector {G} as the vector
whose elements are these partial derivatives multiplied by -1:
G1
G
{G} = 2 where Gi = g
Z i
Gn
-------------------- (27)
evaluated at design po int
z1*
*
T
{
G} {z *}
z
where {z *} = 2 -------------------------(28)
=
{G}T {G}
z n*
The superscript T denotes transpose. If the limit state equation is linear, then Eq 5.28
reduces to Eq. 5.18.
6. Calculate a column vector containing the sensitivity factors using
{ } =
(29)
7. Determine a new design point in reduced variates for n-1 of the variables using
Z i* = i
26
8. Determine the corresponding design point values in original coordinates for the n-1
values in Step 7 using
X i* = X + Z i* X
i
9. Determine the value of the remaining random variable (i.e., the one not found in Steps
7 and 8) by solving the limit state function g = 0.
10. Repeat Steps 3 to 9 until and the design point {xi* }converge.
27
Problems
1. Most phytoplankton in lakes are too small to be individually seen with the unaided eye.
However, when present in high enough numbers, they may appear as a green
discoloration of the water due to the presence of chlorophyll within their cells. Climate
and water quality are among the factors influencing the quantity of phytoplankton in
shallow lakes. Assume that the rate of increase of phytoplankton can be expressed as a
linear function, g(X1, X2, X3) of three variables, namely X1 temperature of water, X2
global radiations and X3 concentrations of nutrients. X1, X2, X3 can be modeled as
normal random variables. Positive growth rates must be avoided.
Although it is observed that temperature and radiation have no effect on effect on
concentration of nutrients, so that 13 = 23 = 0, mutually they are highly correlated with
12 = 0.8. The equilibrium function is given by g(X1, X2, X3) = a0 + a1*X1 + a2* X2 +
a3 *X3,
where a0 = -1.5 mg/cubic m, a1 = 0.08 mg/ (cubic m. degree Celsius), a2 = 0.01 mg/mW
and a3 = 0.05
Solution.
Equilibrium (limit state), g(X1, X2, X3) = 0
RANDOM
VARIABLE,X
X1,degree Celsius
X2,W/square M
X3,mg/cubic m
Mean ,
Coefficient of
Variation, V
0.5
0.3
0.7
16
150
100
Standard Deviation,
8
45
70
= 6.28
13.32
ai 2i 2
= 1.72
= (1.72) = 0.957
There is a chance that the equilibrium situation 96 percent. Hence the risk that algal
biomass will increase is only 4 percent.
2. The economic performance of the irrigation barrage located at a place in along a river
could be improved by installing a hydropower station to meet the local energy demand.
An engineer estimates the power demand X3 to be 600kW on average with variability
600kw. If standard turbo axial turbine was installed, power output can be estimated as
7.5X1X2.Discharge X1 is measured in cubic m/s and hydraulic head in m;7.5 is
coefficient accounting for gravity, density of water, and overall efficiency of installed
equipment. Accordingly, power is given in units of kW. Although average discharge of
22 cubic m/s and an average head of 5.2m are available, discharge head availability
depends on natural flow variability; it is also subjected to the construction of barrage
handling, which is operated with priority for irrigation demand. Discharge and head can
be assumed to be independent normal variables, X1 and X2, with coefficients of variation
0.2 and 0.15 respectively. Assuming that demand X3 normal and independent of
discharge and head, evaluate that reliability of the plant.
UNIT
MEAN
Normal
Discharge
X1
Normal
Hydraulic Head
X2
Normal Power
Demand
X3
Cubic m/s
22
COEFFICIENT STANDARD
OF
DEVAITAION
VARIATION
0.22
4.4
5.2
0.15
0.78
kW
600
0.10
60
g
f = 7 . 5 * x 2 * 1
x
g
f = 7 . 5 * x 1 * 2
x
g
f = 3
x 3
g
f
Xi
g
g
Xi * Xi
X
1
22.0
5.2
600
171.6
17.8
4.6
620
153.2
COEFFICINET
OF
VARIATION
17.7
4.7
623
154.6
STANDARD
DEVIATION
17.7
4.7
623
154.8
g
f
X 2
g
f
X 3
f
g
Xi
128.7
104.2
103.7
103.6
-60.0
-60.0
-60.0
-60.0
49,610
37,922
38,254
38,276
1 f
2 f
3 f
0.770
0.578
-0.269
17.8
4.6
620
-4.5*10e-5
0.787
0.535
-0.308
17.7
4.7
622.8
7.5*10e-6
0.790
0.530
-0.307
17.7
4.7
622.7
1.7*10e-5
0.791
0.529
-0.307
17.7
4.7
622.7
1.8*10e-5
1.24
1.23
1.23
1.23
NEW x1f
NEW x2f
NEW X3f
G(.)
7.5 x1f * x2f - X3f
It can be noted that last two iterations give identical value. This corresponds to reliability
of 89% and the risk of failure of 11%.
3. Consider a harbor breakwater constructed with massive concrete tanks filled with sand.
It is necessary to evaluate the risk that the breakwater will slide under pressure of a large
wave during major storm.
Resultant horizontal force, Rh, depends on the balance between the static and dynamic
proportional to height of the height of the design wave, Hb, when the slope of sea bottom
is known.
Coefficient of friction, cf, can interpret as a random variable, X1, which represents
Rv
< cf
(1)
Random variables
X1
X2
X3
X4
Mean
Coefficient of
variation
0.64
0.15
3400 KiloNewton/m 0.05
1
0.20
5.16
0.18
Standard deviation
0.096
108.80
0.2
0.93
X
2
g
f = - (70*x1*x4 +17x4*x4+145 x4) 3
X
3
g
f = - (70*x1*x3+34*x3*x4+145*x3) 4
X
4
X
1
g
f = 0.64*170 = 108.80.
X
2
g
f = - (70*0.64*5.09+17*5.09*5.09+145*5.09)0.02 = -283.97.
X
3
g
f = - (70*0.64*1.0+34*1.0*5.09+145*1.0)0.889 = -322.67
X 4
Direction cosines ,i
g f
Xi
i =
2
g *g
Xi Xi
1 = 292.19
(2.8 *10e5)
= 0.550
2 =108.80
(2.8 *10e5)
= 0.205
3 = -283.97
(2.8 *10e5)
4 = -322.67
(2.8 *10e5)
= -0.535
= -0.608
- (2)
-(3)
x3(new)=3- 3 * 3 * =1+0.107
- (4)
x4(new)=4- 4 * 4 * =5.09+0.541
- (5)
By substituting (2), (3), (4), (5) in limit state equation (1), we get solution for reliability
index, =1.379.
Iteration process
Initial x1f
Initial x2f
Initial x3f
Initial x4f
F(x*4f)
I iteration
o.64
3400
1.00
5.16
0.570
II iteration
0.576
3352
1.147
5.825
0.799
III iteration
0.603
3378
1.088
5.637
0.784
IV iteration
0.594
3370
1.105
5.704
0.767
V iteration
0.597
3373
1.009
5.681
0.760
F(x*4f)
inv[F(x*4f)]
{inv[F(x*4f)]}
Mean of X*4f
Standard
deviation of X*4f
g
f
X
1
f
g
X
2
f
g
X 3
f
g
X 4
f
g
Xi
1
2
3
4
New x1f
New x2f
Newx3f
Newx4f
g
4.4*10e-1
0.177
0.393
5.090
0.899
2.5*10e-1
0.838
0.281
5.590
1.136
3.0*10e-1
0.668
0.319
5.424
1.065
2.8*10e-1
0.729
0.306
5.481
1.090
2.9*10e-1
0.708
0.311
5.461
1.081
292.19
278.69
284.59
282.86
283.47
108.80
96.42
102.58
100.94
101.53
-283.97
-312.40
-311.15
-314.95
-313.6
-322.67
-488.34
-430.68
-449.30
-442.7
2.8*10e5
2.8*10e5
2.8*10e5
2.8*10e5
2.8*10e5
0.550
0.205
-0.535
-0.608
0.567
3352
1.147
5.836
4*10e-5
1.379
0.525
0.182
-0.605
-0.920
0.603
3378
1.088
6.348
6.3*10e-5
0.726
0.536
0.193
-0.586
-0.811
0.594
3370
1.105
6.201
-3*10e-5
0.899
0.533
0.190
-0.593
-0.846
0.597
3373
1.099
6.525
-2.1*10e-5
0.837
0.534
0.191
-0.590
-0.831
0.591
3372
1.101
6.234
-2.5*10e-5
0.83
Reliability () = 0.805
Risk 1- () = 0.195
Learning objectives
1. Understanding
importance
of
sampling
simulation
method.
2. Understanding the method of generation of random
numbers.
3. Understanding various variance reduction method.
MODULE 6
Motivation:
Importance of understanding the importance of simulation
methods
Highlights
Knowledge of terms such as simulation methods , random
numbers and variance reduction methods.
6. Simulation Methods
It is well established that simulation techniques have proven their value especially for
problems where the representation of the state function is associated with difficulties.
Such cases are e.g. when the limit state function is not differentiable or when several
design points contribute to the failure probability.
The basis for simulation techniques is well illustrated by rewriting the probability integral
in equation
Pf =
( X )dX
g ( x )0
Pf =
g ( x )0
( X )dX = I [ g ( X ) 0] f X ( X )dX
where the integration domain is changed from the part of the sample space of the vector
X =(X1, X2, .. Xn)T for which g(x)0 to the entire sample space of X and where I[g(x)0]
is an indicator function equal to 1 if g(x)0 and otherwise equal to zero. Above equation
is in this way seen to yield the expected value of the indicator function I[g(x)0].
Therefore if now N realizations of the vector X, i.e.
x j , j = 1,2,.........N
are sampled it
1 N
Pf = I [ g ( x ) 0]
N j =1
Is an unbiased estimator of the failure probability Pf.
x j , j = 1,2,.........N
outcomes
x j
x j
is positive.
All the simulations for which this is not the case are counted (nf) and after N simulations
the
pf =
failure
probability
pf
may
be
estimated
through
nf
N
which then may be considered a sample expected value of the probability of failure, In
fact for N , the estimate of the failure probability becomes exact. However,
simulations are often costly in computation time and the uncertainty of estimate is thus of
interest. It is easily realized that the coefficient of variation of the estimate is proportional
to
nf
the order of 10-6 it must be expected that approximately 108 simulations are necessary to
achieve an estimate with a coefficient of variance in the order of 10%. A large number of
simulations are required using Monte Carlo simulation and all refinements of this crude
techniques have the purpose of reducing the variance of the estimate. Such methods are
for this reason often referred to as variance reduction methods.
Simulation of the N outcomes of the joint density functions, in above equation principle
simple and may be seen as consisting of two steps. Here we will illustrate the steps
assuming
that
the
components
of
the
random
vector
are
independent.
In the first step a pseudo random number between 0 and 1 is generated for each of the
components in
x j x ji , j = 1,....., n
.
x j
by
x ji = F ji1 ( Z ji )
where
FX
is the probability distribution function for the random variable Xi, principle is
x j
the variance of the estimated failure probability would be reduced. Sampling centered on
an important point may be accomplished by rewriting above equation 1in the following
way
Pf = I [g (x ) 0] f X ( X )dX = I [g ( x ) 0]
f X (X )
f S ( X )dX
f S (X )
which fs(x) is denoted the importance sampling density function. It is seen the integral in
above equation represents the expected value of the term I [g ( x ) 0]
f X (X )
where the
f S (X )
f X (X )
f (X )
f S ( X )dX = I [g [x ] 0] X
( X )dX
f S (X )
(X )
1
N
f X (s )
I [g (s ) 0] (s )
j =1
(ak i + c )by m
Also shown in figure are results from the generator a = 2 7 + 1, c = 1 and m = 2 35 , which
yields a much larger period of cyclicity. This choice gives satisfactory results for binary
computers: and a 101,c = 1, and m = 2b for a decimal computer with a word length b.
6
The advantage of the linear congruencies method when applied through a digital
computer is the speed of implementation. Because only a few operations are required
each time, its use has become widespread. A disadvantage is that once the seed is
specified, the entire series is predictable.
The pseudorandom numbers generated by these procedures can be tested for uniform
distribution and for statistical independence. Goodness-of-fit tests, such as the chisquared and the Kolmogorov-Smiruov tests can be used to verify that these numbers are
uniformly distributed. Both parametric and nonparametric methods, such as the runs test,
can be used to check for randomness between successive numbers in a sequence in spite
of the fact that these procedures are essentially deterministic. Pseudo-random numbers
generated with large m and accurate choices of a and c generally appear to be uniformly
distributed, and stochastically independent, so that they can be properly used to perform
Monte Carlo simulations. Algorithms to generate pseudo-random numbers which closely
approximate mutually independent standard uniform variates are a standard feature in
statistical software. Standard uniform random numbers are available as a system-supplied
function in digital computers, as well as in most customary computational and data
management facilities such as spreadsheets and data bases. Now a days, software tools
MATLAB, MS Excel and Mathcad have built in options to generate random numbers and
evaluation of probability of failure. Problems solved using Hasofer Lind approach and
FORM methods earlier can be reworked using simulation procedures.
1. The problem solved in reliability can be solved using simulations as follows. The
following are the variables with the values of parameters along with the performance
function.
Performance function g(X1, X2, X3) = 7.5*X1*X2 X3.
RANDOM
VARIABLE
UNIT
MEAN
Normal
Discharge
X1
Normal
Hydraulic Head
X2
Normal Power
Demand
X3
Cubic m/s
22
COEFFICIENT STANDARD
OF
DEVAITAION
VARIATION
0.22
4.4
5.2
0.15
0.78
kW
600
0.10
60
The MATLAB program with the following commands is used to evaluate probability of
failure. Number of simulations is taken as 100000 and all the three variables are normally
distributed.
n = 10000;
x1 = ( randn(n,1) * 4.4) + 22;
x2 = (rand(n,1)*0.78)+5.2;
x3=(rand(n,1)*60)+600;
y = 7.5.*x1.*x2-x3;
hist(y,100);
y_mean = mean(y)
y_std = std(y)
beta1 = y_mean/y_std
pf = normcdf(-beta1,0,1)
Answer:
Probability density distribution of G is shown in Figure. Mean value of performance
function G_mean =292.8984
G_std = 188.5695
Reliability index and probability of failure are 1 .5533 and 0.0602 respectively which is
close to the result obtained in the previous case.
350
300
250
200
150
100
50
0
-600
-400
-200
200
400
600
800
1000
1200
Consider the harbor breakwater problem in the previous case. It is necessary to evaluate
the risk that the breakwater will slide under pressure of a large wave during major storm.
Resultant horizontal force, Rh, depends on the balance between the static and dynamic
pressure components, and it can be taken as quadratic function of Hb under simplified
hypothesis on the depth of the breakwater.
Random deep water value X4 = Hs, which is found from frequency analysis of extreme
storms in the area.
Resultant vertical force, Rv = X2 - FV
Where X2, weight of the tank reduced for buoyancy.
FV , a vertical component of dynamic uplift pressure due to the braking wave. It
is proportional to height of the height of the design wave, Hb, when the slope of sea
bottom is known.
Coefficient of friction, cf, can interpret as a random variable, X1, which represents
inherent uncertainty associated with its field evaluation.
if Rh
Rv
< cf
Random variables
X1
X2
X3
X4
Mean
Coefficient of
variation
0.64
0.15
3400 KiloNewton/m 0.05
1
0.20
5.16
0.18
Standard deviation
0.096
108.80
0.2
0.93
Mean of y =340.4513
Standard deviation of y =327.4135
Reliability index =1.0398
Risk =0.1492
(1)
Learning objectives
1. Understanding importance of logical tree, fault tree
analyses and event tree analysis
2. Understanding the symbols used in fault tree analyses
and fault tree representation.
3. Understanding the symbols used in event tree analyses
and event tree representation.
4. Understanding the representation of decision tree .
MODULE 7
Motivation:
Importance of understanding the use of fault tree analysis.
Highlights
Knowledge of terms such as logical tree, fault tree analysis,
event tree, decision tree and cause consequence charts.
Module 7
Logical Trees
Different sources of risk for an engineering system and / or activity can be analyzed with respect
to their chronological and causal components using logical trees. They are useful for the analysis
of the overall risk as well as for the assessment of the risk contribution from the Individual
components.
Fault tree and event tree diagrams are the most well known and most widely applied type of
logical trees in both qualitative and quantitative risk analysis. Even though more modern risk
analysis techniques such as eg. Bayesian probabilistic nets have been developed over last years,
fault tree and event tree are still main methods recommended (for US nuclear safety studies).
Fault trees and event trees are in many ways similar and the choice of using one other or a
combination of both in reality depends more on the traditions/preferences within a given industry
than the specific characteristics of the logical tree.
A significant difference between the two types of trees is though that the fault trees take basis in
deductive (looking backwards) logic and the event trees are inductive (looking forward). In
practical applications, a combination of fault trees and event trees is typically used. In this case,
the fault tree part of the analysis is concerned about the representation of the sequences of
failures, which may lead events with consequences and the event tree part of the analysis is
concerned with the representation of the subsequent evolution of the consequence inducing
events.
Intersection between the fault tree and the event tree is in reality a matter of preference of the
engineer performing the study. Small event tree / large fault tree and large event tree / small fault
tree techniques may be applied to the same problem to supplement each other and provide
additional insight with regard to the reliability of the considered system.
Decision trees are often seen as a special type of event tree, but may be seen in much wider
perspective and if applied consistently within the framework of decision theory, provide the
theoretical basis for risk analysis. The detailed analysis of the various types of logical trees
requires that the performance of the individual components of the trees already has been assessed
in terms of failure rates and or failure probabilities.
(logical
gates)
and
the
fault
tree
forms
in
effect
tree-like structure with the top event in the top and basic events at its extremities. The basic
events are those events, for which failure rate data or failure probabilities are available and which
cannot be dissected further. Sometimes the events are differentiated into initiating (or triggering)
events and enabling events, where the initiating events are always the first event in a sequence of
the enabling events are events, which may increase the severity of the initiated failure.
The fault tree is a Boolean logical diagram comprised primarily of AND and OR gates. The
output event of an AND gate occurs only if all of the input events occur simultaneously and the
output event of an OR gate occur if any one of the input occurs. Figure 1 illustrates different
commonly used symbols for AND and OR gates.
Several other types of logical gates exists such as QUANTIFICATION and COMPARISON,
however, these will not be elaborated in present text.
Top events and basic events also have their specific symbols as shown in figure below
It is noted that a fault tree comprising an AND gate represents a parallel system, i.e. all
components must fail for the system to fail. Such a system thus represents some degree of
redundancy because the system will still function after one component has failed. Fault trees
comprising an OR gate on the other hand represents a series system, i.e. a system without any
redundancy in the sense that it fails as soon as any one of its components has failed. Such as
system is often denoted a weakest component system. Systems may be represented alternatively
by reliability block diagrams, see Figure 4.
P = pi
i =1
P = 1 (1 pi )
i =1
Where n is the number of ingoing event to the gate .Pi are the probabilities of the failure of
ingoing events and it is assumed that the ingoing are independent.
System failure modes are defined by so-called cut-sets, i.e. combinations of basic events, which
with certainty will lead to the top event. The number of such combinations can be rather large several hundreds for a logical tree with about 50 basic events. It is important to note that the top
event may still occur event though not all basic events in a cut set occur. A minimal cut set is the
4
cut set that represents the smallest combination of basic events leading to the top event,
sometimes denoted the critical path. The top event will only occur if all events in die minimal cut
set occur. An important aspect of fault tree analysis is the identification of the minimal cut sets as
this greatly facilitates the numerical evaluations involved.
the evaluation of how to reduce the risks most efficiently. An example of a decision tree is
Figure 7
Example 1
A power supply system is composed of an engine, a main fuel supply for the engine and
electrical cables distributing the power to the consumers. Furthermore, as a backup fuel
support a reserve fuel support with limited capacity is installed. The power supply system
fails if the consumer is cut of from the power supply. This in turn will happen if either the
power supply cables fail or the engine stops, which in turn is assumed only to occur if the
fuel supply to the engine fails.
Solution
A fault tree system model for the power supply is illustrated in figure below also the
probabilities for the basic events are illustrated.
failure events after repair events may often omitted at least for systems with highly
reliable components.
In Figure 6.14a combined fault tree and event tree is illustrated showing how fault trees
often constitute the modeling of the initiating event for event tree.
Learning objectives
1. Understanding
importance
of
system
reliability
assessment.
2. Understanding the various steps involved in system
reliability assessment.
3. Understanding the methods adopted in system reliability
assessment.
MODULE -8
Motivation:
Importance of understanding the concept of system reliability.
Highlights
Knowledge of terms such as system reliability.
Ang, A.H.-S. and Tang, W.H. (1975). Probability Concepts in Engineering Planning
and Design, Vol. 1, Basic Principles, John Wiley, New York.
Ang, A.H.-S. and Tang, W.H. (1984). Probability concepts in engineering planning
and design. Volume II decision, risk and reliability, John Wiley & Sons, Inc., New
York.
Baecher, G.B. and Christian, J.T. (2003). Reliability and Statistics in Geotechnical
Engineering, John Wiley and Sons, London and New York.
Becker, D.E. (1996a). Eighteenth Canadian Geotechnical Colloquium: Limit states
Design for foundations, Part I. An overview of the foundation design process,
Canadian Geotechnical Journal, Vol. 33, pp.956-983.
Becker, D.E. (1996b). Eighteenth Canadian Geotechnical Colloquium: Limit States
Design for Foundations. Part II. Development for the National Building Code of
Canada, Canadian Geotechnical Journal, Vol. 33, pp. 984-1007.
Benjamin, J. R. and Cornell, C. A. (1970). Probability, Statistics, and Decision for
Civil Engineers. McGraw-HillBook Co., New York.
Cherubini, C. (2000). Reliability evaluation of shallow foundation bearing capacity on
c ,
Christian, J.T., Ladd, C.C., and Baecher, G.B. (1992). Reliability and probability in
stability analysis, In Proceedings of Stability and Performance of Slopes and
Embankments, ASCE, GSP No.31, Vol. 2, pp. 1071-1111.
Christian, J. T., Ladd, C. C., and Baecher, G. B. (1994). Reliability applied to slope
stability analysis. Journal of Geotechnical Engineering, ASCE, Vol. 120, No. 12, pp.
2180-2207.
Christian, J.T. and Urzua, A. (1998). Probabilistic evaluation of earthquake-induced
slope failure, Journal of Geotechnical and Geoenvironmental Engineering, ASCE,
Vol. 124, No.11, pp.1140-1143.
References
250
References
251
References
252