Sessions#2 and #3

Uncertainty Modelling and Risk Analysis
Xavier Romão
Assistant Professor
Civil Engineering Department – FEUP
March 2022
Expressing uncertainty
Expressing uncertainty
Quantitative methods
The usual approach in traditional science fields where sufficient hard
data is available for numerical treatment
descriptive statistics
Quantitative methods
probabilistic models
Review of probability theory and related basic concepts
3
• Set Theory
• Sample Space and Probability
• Axioms of Probability
• Conditional Probability
• Total Probability Theorem
• Bayes Theorem
• Independence
• Discrete & Continuous Distributions of Random Variables
• Moments and other Descriptors of Random Variables
• Common Probability Distribution Models
4
• Return Period
• Confidence Intervals
• Building Probabilistic Models (distribution fitting and parameter
estimation)
5
Set theory
Set Theory: some concepts
A set is a collection of well defined objects

- Example 1: a group of large numbers → that’s not well defined
- Example 2: a group of numbers larger than 50 → that’s well defined
The objects of a set are called elements
Usually sets are defined by upper-case letters, and elements by

lower-case letters
x ∈ A → x is an element of set A
x ∉ A → x is not an element of set A
7
A set with no elements is a Null Set or an Empty Set. This set is

represented by Ø
A finite set is a set with a finite number of elements
A infinite set is a set with an infinite number of elements:

- An infinite set is said to be countable if the number of elements
can be counted, and uncountable otherwise (the number of points in
a line segment is an uncountable set)
If all the elements of set A are also elements of set B then A is a

subset of B and can be represented by
A⊂ B
8
If A ⊂ B and B ⊂ A, this means that A = B

The set containing all the elements of all the sets being considered is
the space and it is denoted by S
If A is a set in S, the set of the elements that are in S but are not in A
–
is the complement of A (also called not A) and it is denoted by A
Other relations involving the complement of sets:
S= ∅ ∅ =S A= A
9
The union of sets A and B represents all the elements that belong to
A or B or both, and is represented by A U B
The intersection of sets A and B represents all the elements that
belong to both A and B, and is represented by A ∩ B.
AUB A∩B
S S
A B A B
10
Two sets A and B are said to be exhaustive of another set C if A U B = C
Two sets A and B are called disjoint or mutually exclusive if A ∩ B = Ø
AUB=C A∩B=Ø
C S
A B
B A
11

The union and intersection of sets can be generalised into
n
All the elements belonging to one
A1  ... An =  Ai or more of the sets Ai
i =1
n
A1  ... An =  Ai All the elements common to all the sets Ai
i =1 S
The difference of sets A – B are
A B
the elements of A that do not
–
belong to B → A ∩ B
A-B
A∩B
Other relations: A  ( B  C ) = ( A  B)  ( A  C )
A B = A B
A B = A B
n n
i
=i 1 =i 1
A =A i
De Morgan’s laws
n n
i
=i 1 =i 1
A =A i
13
Set Theory: some examples

A  ∅ =? A
A A = ? S
A  ∅ =? ∅
A− A = ? A
A A = ? ∅
A S = ? S
A S = ? A
if A ⊂ B then ?
A B = A
if A ⊂ B then ?
A B = B 14
Sample Space and Probability
Sample Space and Probability: some concepts
Random variable - a numerical quantity that is the result of an experiment and

takes on different values depending on chance
- Continuous random variables can take any value in a given range (data that
you measure)
- Discrete random variables can take only discrete values (data that you count)
Sample Point or Realization - an outcome of the random variable
Sample – a set of outcomes for a random variable
Population or Sample Space - the set of all possible outcomes for a random variable
Event - a statement about the possible outcome(s) of a random variable
16
Probability - there are several definitions

The classical or Laplace definition
The number of times an event occurs (i.e. a favourable outcome) with respect to
the total number of possible outcomes of the process (i.e. the population),
assuming all outcomes are equally probable
The frequentist definition
The relative frequency of an event (i.e. a favourable outcome) after a large number
of occurrences of the process which may not cover the whole population
Sample Probability - the same as Probability but relative to the sample instead of
the population
17
Example: rolling of a die
Random variable - the numerical value of the die

Population or Sample Space - values 1 to 6
Sample Point or Realization - the value of the die when it is rolled once
Sample - the value of the die after ten rolls: 3, 5, 4, 3, 2, 6, 3, 5, 2, 3… for example
Event - the value of the die is 3
Probability - the relative frequency of the outcome 3 is 1/6 ≈ 17%
Sample Probability - the relative frequency of the outcome 3 is 4/10 ≈ 40% … this
means the sample is not representative enough (the size is too small)!!!
18
Sample Space and Probability: correspondence with set theory
Set theory Probability theory

space S sample space, certain event
empty set impossible or null event
elements sample points
sets events
A event A occurs
–
A event A does not occur
AUB at least one of A and B occurs
A∩B both A and B occur
A⊂B A is a subevent of B, i.e. the occurrence of A implies that of B
A∩B=Ø A and B are mutually exclusive
19
Axioms of Probability
Axioms of Probability
The probability function associated to the occurrence of event A, P(A),
is a number assigned to this event that represents its likelihood and is
called the probability of A.
The probability has the following properties (axioms):
1. P ( A) ≥ 0 (the probability is non-negative)
2. P(S ) =1
3. For a group of mutually exclusive events A1, A2,…
 n  n
P   Ai  = ∑ P ( Ai )
 i=1  i=1 21
Axioms of Probability: some consequences
P ( ∅ ) =0 ( )
P A = 1 − P ( A ) (for any event A)
0 ≤ P ( A ) ≤ 1 (for any event A) If A ⊆ B then P ( A ) ≤ P ( B )
For any events A and B

P ( A ∪ B=
) P ( A) + P ( B ) − P ( A ∩ B )
For any events A, B and C
P ( A ∪ B ∪ C ) = P ( A) + P ( B ) + P ( C ) − P ( A ∩ B ) − P ( A ∩ C ) − P ( B ∩ C ) + P ( A ∩ B ∩ C )
For any number n of events Ai

 n  n n −1 n n − 2 n −1 n
 n 
Ai  ∑ P ( Ai ) − ∑∑ P ( Ai ∩ Aj ) + ∑∑∑ P ( Ai ∩ Aj ∩ Ak ) − ... + ( −1) P   Ai 
n +1
P =

 i 1=  i= 1 i = 1 j >i i = 1 j >i k > j i 1 
For any disjoint (or mutually exclusive) events A and B

Since P ( A ∩ B ) =
∅ P ( A ∪ B=
) P ( A) + P ( B )
For any disjoint (or mutually exclusive) events A, B and C
P( A ∩ B) =
∅
P( A ∩C) =∅
Since P ( A ∪ B ∪ C=
) P ( A) + P ( B ) + P ( C )
P(B ∩C) = ∅
P( A ∩ B ∩C) =∅
For any number n of disjoint (or mutually exclusive) events Ai
 n  n
P   Ai  = ∑ P ( Ai )
 i =1  i =1 23
Example: television sports viewing habits of people
A survey of a people viewing habits of car racing (A), golf (B) and football
(C) revealed that:
- 28% watched A, 29% watched C, 19% watched B
- 14% watched A and C
- 12% watched C and B
- 10% watched A and B
- and 8% watched all three sports
What percentage of the people watched none of the three sports?
The event representing none of the three sports being watched is A∪ B ∪C

24
Example: television sports viewing habits of people
( )
P A ∪ B ∪ C =1 − P ( A ∪ B ∪ C )
0.29
0.12 0.14
0.08
0.19
0.10 0.28
P ( A ∪ B ∪ C ) = P ( A) + P ( B ) + P ( C ) − P ( A ∩ B ) − P ( A ∩ C ) − P ( B ∩ C ) + P ( A ∩ B ∩ C )
P ( A ∪ B ∪ C ) = 0.28 + 0.19 + 0.29 − 0.10 − 0.14 − 0.12 + 0.08 = 0.47
( )
P A ∪ B ∪ C =1 − 0.47 =0.53 25
Conditional Probability
Given two arbitrary events A and B, the probability P(A|B) is defined as
the conditional probability of event A given that event B has occurred.
P( A ∩ B)
P( A | B) , P( B) ≠ 0
P( B)
Note that if event B is S:
P ( A ∩ S ) P ( A)
P( A| S ) = = ⇔ P( A| S ) =P ( A ) which is obvious
P(S ) 1
it helps to define conditional probability P(A|B) as the probability of A
with respect to a reduced sample space defined by the outcomes of
event B
27
Consider a 1x1 square (sample space S) and events A and B. The area
of squares A and B are P(A) = 0.25 and P(B) = 0.375.
A B
We can see that P(A ∩ B) = 0.25/4 = 0.0625
P ( A ∩ B ) 0.0625 1
P ( A | B )= = = ≈ 0.17
1
P( B) 0.375 6
which means that the area (probability) of events A and B occurring

at the same time is about 17% of the area (probability) of event B
28
Example: picking 2 aces out of a deck of cards
Two cards are drawn in succession without replacement from an ordinary deck of
cards (52 cards). What is the probability that both cards are aces?
A is the event corresponding to the first card being an ace
B is the event corresponding to the second card being an ace
P( A ∩ B)
P ( A | B=
) ⇒ P ( A ∩ B=
) P( B)× P( A | B)
P( B)
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A) Let’s focus on this
4 3
P ( A ∩ B ) = P ( A) × P ( B | A) = × = 0.0045=0.45%
52 51 29
Conditional Probability: multiplication rules
General Multiplication Rule – For arbitrary events A and B
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A)
General Multiplication Rule – For arbitrary events A, B and C
P ( A ∩ B ∩= B ) P ( C | A ∩ B ) × P ( B | A) × P ( A)
C ) P ( C | A ∩ B ) × P ( A ∩=
Event D
General Multiplication Rule – For n arbitrary events
P ( A1 ∩ A2 ... ∩ An ) = P ( A1 ∩ A2 ... ∩ An −1 ) × P ( An | A1 ∩ A2 ... ∩ An −1 )
 n  n  i −1 
which then turns into P  Ai = ∏ P  Ai  A j 
 
=  i =1  j 1 
 i 1= 30
Total Probability Theorem
Total Probability Theorem
From the General Multiplication Rule we define the Total Probability

Theorem → considering n disjoint and collectively exhaustive events
A1, … An whose probabilities sum to 1.0, the probability of an event B is:
P ( B )= P ( B ∩ A1 ) + ... + P ( B ∩ An )= Probability of event Ai
n
P ( B | A1 ) × P ( A1 ) + ... + P ( B | An ) × P=
( An ) ∑ [ P( B | A ) × P( A )]
i i
A1 i =1
B A3 Probability of event B given that event Ai occurred
A4 P= ( A1 ) P= ( A2 ) P= ( A3 ) P= ( A4 ) 0.25
1
P ( B ∩ A1 ) = P ( B | A1 ) × P ( A1 ) = 0.25 × 0.25 = 0.0625
n
A2
1
∑ [ P( B | Ai ) × P( Ai )] =
P( B) =
i =1
4 × ( 0.25 × 0.25 ) =
0.25
Bayes Theorem
Bayes Theorem
General Multiplication Rule – For arbitrary events A and B
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A)
Prior Probability The support that B

(before the new data) P ( A) × P ( B | A) provides for A
P( A | B) =
P( B)
The Bayes Theorem reflects how the probability

Posterior Probability
of an event (in this case A) is affected by new
(after getting the new data)
data (in this case the occurrence of event B)
34
Bayes Theorem
Considering a more complex case with n disjoint and collectively
exhaustive events A1, … An
P ( Ai ) × P ( B | Ai )
P ( Ai | B ) =
P( B)
The probability P(B) can be eliminated using the Total Probability Theorem
P ( Ai ) × P ( B | Ai )
P ( Ai | B ) = n
∑ [ P( B | A ) × P( A )]
i =1
i i
35
Bayes Theorem
Example: rain on a wedding day
Marie is getting married tomorrow at an outdoor ceremony in the desert.

In recent years, it has rained only 5 days each year. Unfortunately, the weatherman
has predicted rain for tomorrow.
When it actually rains, the weatherman correctly forecasts rain 90% of the time.
When it doesn't rain, he incorrectly forecasts rain 10% of the time.
What is the probability that it will rain on the day of Marie's wedding?
The sample space is defined by two mutually-exclusive events - it rains on Marie's
wedding (event A1) or it does not rain on Marie's wedding (event A2). Additionally,
the third event is the weatherman predicting rain (event B).
36
Bayes Theorem
What we know:
( A1 ) 5=
P= 365 0.0137 It rains 5 days during the year
P ( A2 ) 360
= = 365 0.9863 It does not rain 360 days during the year
When it rains, the weatherman predicts rain
P ( B A1 ) = 0.9 90% of the time
When it does not rain, the weatherman predicts
P ( B A2 ) = 0.1 rain 10% of the time
We want to know P(A1|B), the probability that it will rain on the day of
Marie's wedding, given a forecast for rain by the weatherman. 37
Bayes Theorem
P ( A1 ) × P ( B | A1 )
P ( A1 | B ) =
P ( A1 ) × P ( B | A1 ) + P ( A2 ) × P ( B | A2 )
0.0137 × 0.9
P ( A1 | B ) = 0.111
0.0137 × 0.9 + 0.9863 × 0.1
This result may seem somewhat contradictory!!
38
Statistical independent events

Two events A and B are independent if the occurrence or non-occurrence
of one of the events does not change the probability of the other event
P ( A | B ) = P ( A ) and P ( B | A ) = P ( B )
Based on the General Multiplication Rule
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A)
we get
P ( A ∩ B=
) P ( A) × P ( B )
In a more general form we get
n
∏ P ( Ai )
P ( A1 ∩ A2 ... ∩ An ) =
i =1 40
If two events are mutually exclusive (or disjoint) this

does not imply that they are independent events
If two events are independent this does not imply
that they are mutually exclusive (or disjoint) events
P ( A ∩ B=
) P ( A) × P ( B ) P( A ∩ B) =
∅
independent events disjoint events
If two events are uncorrelated this does not

imply that they are independent events**
If two events are independent this implies ** except in one
case we’ll see later
that they are uncorrelated events
41
Discrete & Continuous Distributions
of Random Variables
Discrete & Continuous Distributions of Random Variables
Random variable (RV) - a numerical quantity that is the results of an experiment

and takes on different values depending on chance
- Continuous random variables can take any value in a given range (data that you
measure) (data can take an infinite number of values that can’t be counted)
- Discrete random variables can take only discrete values (data that you count)
(data can take a finite or an infinite number of values that can be counted)
The behaviour a random variable is defined by its probability distribution (it defines
how probabilities are distributed across the different values of the random variable)
The probability distribution function FX (x) (usually
called the cumulative distribution function)
Probability distributions
are defined by 2 functions The probability density function fX (x) - for continuous
RVs - and the probability mass function pX (x) - for
discrete RVs 43

The cumulative distribution function (cdf) is defined as 𝐹𝐹𝑋𝑋 : ℝ → 0, 1 such that
𝐹𝐹𝑋𝑋 𝑥𝑥 = 𝑃𝑃(𝑋𝑋 ≤ 𝑥𝑥)
The cdf defines the probability of the RV to take any value between its lowest
possible value (which could be - ∞) up to x.
The cdf has the following properties:
• 0 ≤ 𝐹𝐹𝑋𝑋 𝑥𝑥 ≤ 1
• lim𝑥𝑥→−∞ 𝐹𝐹𝑋𝑋 𝑥𝑥 = 0
• lim𝑥𝑥→∞ 𝐹𝐹𝑋𝑋 𝑥𝑥 = 1
• 𝑥𝑥 ≤ 𝑦𝑦 ⇒ 𝐹𝐹𝑋𝑋 𝑥𝑥 ≤ 𝐹𝐹𝑋𝑋 𝑦𝑦
44

1
0.9
0.8
0.7
0.6
𝐹𝐹𝑋𝑋 𝑥𝑥 𝐹𝐹𝑋𝑋 𝑥𝑥 0.5

0.4
0.3
0.2
0.1
0
0 2 4 6 8 10
x x
The cdf of a continuous The cdf of a discrete RV is

RV is a smooth function a stepwise function
𝑥𝑥
𝑑𝑑𝑑𝑑𝑋𝑋 𝑥𝑥
For continuous RVs we have 𝐹𝐹𝑋𝑋 𝑥𝑥 = � 𝑓𝑓𝑋𝑋 𝑢𝑢 𝑑𝑑𝑢𝑢 and 𝑓𝑓𝑋𝑋 𝑥𝑥 =
−∞ 𝑑𝑑𝑑𝑑
where 𝑓𝑓𝑋𝑋 𝑥𝑥 is the probability density function (pdf)
45

1
0.9
0.8
0.7
0.6
𝐹𝐹𝑋𝑋 𝑥𝑥 𝐹𝐹𝑋𝑋 𝑥𝑥 0.5

0.4
0.3
0.2
0.1
0
0 2 4 6 8 10
x x
The cdf of a continuous The cdf of a discrete RV is

RV is a smooth function a stepwise function
A
For discrete RVs we have 𝐹𝐹𝑋𝑋 𝑥𝑥 = � 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) and 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) = 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗 − 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗−1
𝑥𝑥𝑗𝑗 ≤𝑥𝑥
where 𝑓𝑓𝑋𝑋 𝑥𝑥 is the probability mass function (pmf)… usually also called pdf
46

0.4 0.7
0.35 0.6
Examples of a
0.3
0.5
0.25
discrete (stepwise)
0.4
0.2
0.3
pdf (pmf) 0.15
0.1
0.2
0.05 0.1
0 0
-5 0 5 0 2 4 6 8 10
0.4 0.7
0.35 0.6
0.3
0.5
Examples of a 0.25
0.2
0.4
continuous 0.15
0.3
(smooth) pdf 0.1

0.2
0.1
0.05
0 0
-5 0 5 0 2 4 6 8 10
47
Important things to remember
∞
� 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑑𝑑𝑑𝑑 = 1
−∞
Consistency Condition
� 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) = 1
𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥𝑗𝑗
From the cdf it is possible to define the probability of the RV to

take values between a lower bound a and an upper bound b
𝑃𝑃 𝑎𝑎 < 𝑋𝑋 ≤ 𝑏𝑏 = 𝐹𝐹𝑋𝑋 𝑏𝑏 − 𝐹𝐹𝑋𝑋 𝑎𝑎
B 48

A
𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) = 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗 − 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗−1 For a discrete RV, the pdf defines
the probability of occurrence of
each value of the RV!!
𝑃𝑃 𝑎𝑎 < 𝑋𝑋 ≤ 𝑏𝑏 = 𝐹𝐹𝑋𝑋 𝑏𝑏 − 𝐹𝐹𝑋𝑋 𝑎𝑎
For a continuous RV, the values of the pdf have no meaning!!

For a continuous RV, the probability of occurrence of a specific
value of the RV is zero!!
49

FX = P(X ≤ x)
fX 1.0
FX(4)
FX(3)
0
0 3 4 x 0 3 4
x
4 4 3
P (3 < X ≤ 4) =
area = ∫ f X ( x ) dx = ∫ f X ( x ) dx − ∫ f X ( x ) dx =
3 0 0
= FX ( 4 ) − FX ( 3)
4
P ( X= 4=
) area ? =∫ f X ( x ) dx
3.9999999999999999999
FX ( 4 ) − FX ( 3.9999999999999999999 ) ≈ 0
=
An histogram and a pmf may look similar, but they’re not the same:
- An histogram is a discrete version of the pdf of a continuous RV
(because usually we don’t have the full population, just a
representative sample). The pmf is the pdf of a discrete RV
- The vertical axis of the histogram represents the number of times a
value of the RV falls within a certain interval (bin). The vertical axis of
the pmf represents the probability of each value of the discrete RV
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 51
-5 0 5
Example: levels of earthquake damage in 10.000 buildings
• X is the (variable) level of damage of buildings hit

by an earthquake X Level of damage
• X is a discrete random variable that can be either 0 None
0, 1, 2, 3, or 4 1 Light
2 Moderate
• The distribution of the probability of the buildings
being damaged by a certain level X is defined by the 3 High
probability mass function (pmf) 4 Collapse
x 0 1 2 3 4
P(X=x) 0.0039 0.0469 0.2109 0.4219 0.3164
nº of buildings 39 469 2109 4219 3164 52

x 0 1 2 3 4
P(X=x) 0.0039 0.0469 0.2109 0.4219 0.3164
Probability
Graphically
53
Level of damage
What was the probability of a building not collapsing?

Let A be the event corresponding to
buildings damaged by level 4 (collapse)
Then, Ā = “damage level 3 or less”
Ā
Probability
Since A and Ā are disjoint and collectively
exhaustive events:
P(Ā) = 1 – P(A)
A
= 1 – 0.3164
= 0.6836
Level of damage
68.36% is the probability of a building not collapsing 55
What was the probability of a building having a damage

level 3 or 4?
Let A be the event corresponding to having
a damage level 4 and B the event
corresponding to having a damage level 3
Probability
Since A and B are disjoint events:
B A
P(A U B)
= P(A) + P(B)
= 0.3164 + 0.4129
= 0.7293 Level of damage
72.93% is the probability of a building to have a damage level of 3 or 4 56

Example: random numbers between 0 and 1 from a spinner
Random variable - the numerical value of the spinner

(continuous random variable)
Sample Space - any number between 0 and 1
(there is an infinite number of outcomes)
• The probabilities for continuous random variables

are assigned by a probability density function (pdf)
• Since no value has a larger chance of occurring than
another, the distribution of values is uniform
• The probability of any exact value is 0!!
• What we can determine is the probability of a range
of values of the random variable

• The probability of a range of values of the random

variable is the area under the pdf function
• For example:
The probability of a value between 0 and 0.5
P(0 ≤ X ≤ 0.5)
= shaded rectangle
= height × base
= 1 × 0.5 = 0.5
• Areas = Probabilities!!
• Areas = Integral of the pdf function!!
58

• Let’s try to deduce the cdf and the pdf of the RV
Let the RV be X: the number obtained by the spinner

Considering that any number between 0 and 1 has the
same likelihood of occurring
x any possible number
FX ( x )= P ( X ≤ x )= = = x
1− 0 sample space
dFX ( x )
fX ( x) = 1
= Only valid for values of X
dx between 0 and 1
59

• Let’s compare it with the cdf and the pdf

of the uniform distribution
0 for x < a
 x − a
FX ( x ) 
= for a ≤ x < b (where a = 0 and b = 1 in our case)
b − a
1 for x ≥ b
 1
 for a ≤ x ≤ b
fX ( x) = b − a
0 for x < a or x > b 60
Moments as Descriptors of Random Variables

The probabilistic model of a RV can be completely described if the cdf or
pdf are specified along with their parameters.
In practice, the exact form of the probabilistic model may not be known.
In those cases, a RV can be defined using its moments.
A moment is a quantitative measure of the shape of a set of points, used in

both mechanics and statistics. Moments are defined using the expectation
operator.
Considering a real function g(X), with X being a RV, the expectation E[g(X)] is
∞
E  g ( X )  = ∫ g ( x ) f ( x ) dx
X
for a continuous RV
−∞
E  g ( X )  = ∑ i g ( xi ) f X ( xi ) for a discrete RV 62

The function g(X) usually takes the general form:
g ( x=
) ( x − c)
k
where k is an integer defining the kth order of the moment

∞
E ( X − c )  = ∫ ( x − c) f X ( x ) dx
k k
 
−∞
When k = 0, the zeroth order moment gives the consistency condition

∞
E ( X − c )  = ∫ ( x − c) f X ( x ) dx = 1
0 0
 
−∞
63

When c = 0, the moments are called raw moments. The 1st order moment
is a raw moment called expected value or mean value:
∞
µ=
X E [ X=] ∫ x ⋅ f ( x ) dx
X
−∞
which corresponds to the centroid of the area under the curve of f(x)
In statistics, the 2nd order moment, the variance, is called a central moment
since c = μX and measures the dispersion of the RV X around its mean.
∞
σ X2 =V ( X ) = E ( X − µ X )  = ∫ ( x − µ X ) ⋅ f X ( x ) dx
2 2
 
−∞
2
This can be re-written as = [ ]
V ( X ) E  X 2  −  E X  64

From the variance, we can define alternative measure of dispersions such as
the standard deviation σ which is an with the same units of X, and the
coefficient of variation CoV which is unitless
σ X = σ X2 CoVX = σ X µ X
The 3rd order moment is also a central moment that, when divided by σ3, is
called the skewness coefficient and measures the asymmetry of the RV X with
respect to its mean.
∞
E ( X − µ X ) = ∫ ( x − µ X ) ⋅ f X ( x ) dx
 3 3
 
−∞
E ( X − µX ) 
 3
γ1 =  
3
σX 65

The 4th order moment is also a central moment that, when divided by σ4, is
called the kurtosis coefficient and measures the flateness/peakedness of the
RV X around its mean.
∞
E ( X − µ X )  = ∫ ( x − µ X ) ⋅ f X ( x ) dx
4 4
 
−∞
E ( X − µ X ) 
4
γ2 =  
4
σX
The kurtosis coefficient is usually compared to the value 3 (the kurtosis
coefficient of a RV that follows a Normal distribution).
66
this is called
“excess kurtosis”
– kurtosis above
(or below) the
normal value of 3
67
Other Descriptors of Random Variables: quantiles (or fractiles or

percentiles)
The pth level quantile of a RV X that has a cdf F(x) is the value xp such that:
X ( xp )
F= p, with 0 ≤ p ≤ 1
For example, the median is the quantile for level p = 0.50. Quantiles are often
used in civil engineering to set the value of loads and material properties.
The pth level quantile xp of a RV is the value of the RV that has a probability
of being exceeded 1-p.
P ( X > xp ) =
1 − FX ( x p ) =
1− p
68
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
The mean of X is defined by:

4
1 µX =E[X ] =∑ x ⋅ fX ( x) =
2 × 0.1 + 3 × 0.4 + 4 × 0.4 + 5 × 0.1 =3.5
i =1
6
2 µX = E [ X ] = ∑ x ⋅ f ( x ) = (1 + 2 + 3 + 4 + 5 + 6 ) ×1 6 = 3.5
X 69
i =1
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
The variance of X is defined by:

4
V ( X ) = E ( X − µ X )  = ∑( x − µX ) ⋅ fX ( x) =
2 2
1
  i =1
( 2 − 3.5) × 0.1 + ( 3 − 3.5) × 0.4 + ( 4 − 3.5) × 0.4 + ( 5 − 3.5) × 0.1 =

2 2 2 2
=
= 0.65 70
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x

6
V ( X ) = E ( X − µ X )  = ∑( x − µX ) ⋅ fX ( x) =
2 2
2
  i =1
(1 − 3.5 )2 + ( 2 − 3.5 )2 + ( 3 − 3.5 )2 + ( 4 − 3.5 )2 + ( 5 − 3.5 )2 + ( 6 − 3.5 )2 
=  2.9
6
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
The standard deviation and CoV of X are defined by:
σX
1= 0.65 0.81
= CoVX σ=
= X µX 0.81
= 3.5 0.23
2 σX
= 2.9 1.7
= CoVX σ=
= X µX 1.7
= 3.5 0.49 72
Example: lifespan of a structural component
The lifespan of a structural component X is a RV described by the pdf:
fX ( x) =
a ⋅ e − ax , with x ≥ 0, a ≥ 0
The mean of X is defined by: integration by parts
∞ ∞ ∞ ∞
 e  − ax
µ X = E [ X ] = ∫ x ⋅ f X ( x ) dx = ∫ x ⋅ a ⋅ e − ax dx =  x ⋅ a ⋅  − ∫ − e − ax
dx
0 0  −a  0 0
∞
(1 + a ⋅ x ) 
∞ ∞
 e 
− ax
e   e
− ax − ax
 1 1
= x ⋅ a ⋅  +  = −  = 0−−  =
 −a  0  −a  0  a 0  a a
73
using limits

∞ ∞ 2
 1
V ( X )= E ( X − µ X ) = ∫(x − µ ) ⋅ f X ( x ) dx= ∫  x −  ⋅ a ⋅ e − ax dx=
2 2
  X
a
0 0
∞
  1  − ax 
2 ∞
 1  − ax  1 2 
∞
1  − ax
 −  x −  e  + ∫ 2  x −  e dx =
=  0 + 2  + ∫ a  x −  e dx =
  a  0 0  a  a  a0  a
integration by parts using limits
1 2    1  − ax 
∞ ∞  1 2   1  e  − ax ∞ 
+   −  x −  e  + ∫ e dx= − ax
 +  0 −  +  −  =

a 2
a    a 0 0  a

2
a  a   a  
integration by parts
 0 
using limits
1 2 1  1 1
... = 2
+  − +  0 +   = 2
a a a  a a
using limits
To obtain the quantiles we need to define the cdf:

x x
f X ( x )= a ⋅ e − ax FX ( x ) =
∫ X
f ( u ) du ∫
= a ⋅ e − a ⋅u
du 1
= − e − ax
0 0
75
The pth level quantiles are then obtained by:
FX ( x p ) = p ⇔ 1 − e
− ax p
= p
−1
=xp ln (1 − p )
a
76
Common Probability Distribution Models
(including the Return Period and the
Central Limit Theorem)
Common Probability Distribution Models

There are dozens of probability distribution models. Many of them were
developed for specific applications or to model very specific phenomena.
There are some probability distribution models that are more general and
that appear in many situations. We’ll focus models representing the
behaviour of RVs that are:
- the result of independent events
- the sum of different effects
- the product of different effects
- the extremes of different effects
… and we’ll also address a few other useful probability distribution models
78
Common Probability Distribution Models: RVs that come from

independent events
• Bernoulli Process is a sequence of binary RVs, so it is a discrete-time
stochastic (i.e. random) process that takes only two values. The Bernoulli
variables Xi are identical and independent. Every variable Xi in the
sequence is associated with a Bernoulli trial (a random experiment with
exactly two possible outcomes, "success" and "failure", in which the
probability of success is the same every time the experiment is
conducted)... So it’s basically a repeated coin toss
• Poisson Process is a continuous-time stochastic process in which events

occur continuously and independently of one another. It is a collection of
RVs that represent the number of events and the time points at which they
occur in a given time interval (starting from time 0)... So let’s imagine the
number and time of page view requests of a website… or the number and
time of earthquake occurrences in a given region 79

independent events
Bernoulli Process Poisson Process

How many successes
or events in a fixed Binomial distribution Poisson distribution
time interval or B(n,p) P(ν)
number of trials?
How long (time or Geometric Exponential
trials) until first distribution distribution
success/event? G(p) Exp(ν)
How long (time or Negative Binomial

Gamma distribution
trials) until the kth distribution
Γ(k, 1/ν)
success/event? NB(r,p) 80

independent events
Binomial distribution B(n,p) is a discrete probability distribution that gives the
probability of k successes in n independent yes/no experiments, each having a
probability p of success.
n k
f B ( k )   p (1 − p )
n−k
=
k 
k
n j
FB ( k ) ∑   p (1 − p )
n− j
j =1  j 
n n!
with   = Mean: µ K = np
 k  k !( n − k )! Variance: V (=
K ) np (1 − p )
Binomial coefficient

independent events
Example: operative bulldozers
A contractor has 3 independent bulldozers, each with a probability of being
operative of 0.90. What is the probability of 2 bulldozers being inoperative?
Possible combinations of events Let X = number of operative bulldozers

(G: good bulldozer; B: bad bulldozer) (operative = “success”)
GGG X=3 P ( X = 3) = p × p × p
GGB
GBG X=2 P ( X = 2 ) = 3  p × p × (1 − p ) 
BGG
BBG
BGB X=1 P ( X = 1) = 3  p × (1 − p ) × (1 − p ) 
GBB
BBB X=0 P ( X = 0 ) = (1 − p ) × (1 − p ) × (1 − p ) 82

independent events
Possible combinations of events Let X = number of operative bulldozers

(G: good bulldozer; B: bad bulldozer) (operative = “success”)
 3 x
P ( X= x=
)   p (1 − p )
3− x
Generalised to
 x
Event A = 2 bulldozers are inoperative ≡ Event B = 1 bulldozer is operative
3!
P ( B) 0.9 (1 − =
0.9 )
1 3−1
= 0.027
1!( 3 − 1)!
83

independent events
Alternatively, if we consider the event Y = number of inoperative bulldozers

(inoperative = “success”)
and the probability of success p = 0.10, we have event A = 2 bulldozers are inoperative
3!
P ( A) 0.12 (1 −=
0.1)
3− 2
= 0.027
2!( 3 − 2 )!
84

independent events
Geometric distribution G(p) is a discrete probability distribution that gives the
probability of the number x of experiments, each having a probability p of
success, that are needed to get the first success.
fG (=
x ) p (1 − p )
x −1
FG ( x ) =1 − (1 − p )
x
1
Mean: µX =
p
1− p
Variance: V ( X ) =
p2 85

independent events
the Return Period
The number x of experiments until the first success can be seen as discrete and
independent time intervals. In this case, the number of time intervals x (e.g. the
number of years, of days, of weeks, etc) until the first success is the first
occurrence time. Since the time intervals are independent, the time until the first
success must be the same as the time between 2 consecutive successes. The mean
of x, μX, can then be seen as the mean time between 2 consecutive successes (or
events) which is usually called Return Period.
Return Period
86
x = Tfirst tornado x = Tsecond tornado

independent events
Geometric distribution G(p) is a discrete probability distribution that gives the
probability of the number x of experiments, each having a probability p of
success, that are needed to get the first success.
fG (=
x ) p (1 − p )
x −1
FG ( x ) =1 − (1 − p )
x
1
Mean: µX = Return Period
p
1− p
Variance: V ( X ) =
p2 87

independent events
Example: excessive wind speed
A wind tower was designed to be operational for wind speeds up to 100km/h. This
wind speed has a 5% annual probability of exceedance. What is the probability of
exceeding this wind speed during the lifetime of the tower which is 100 years?
Consider the event X “wind speed > 100km/h” = “success”

The Return Period of this event = 1/0.05 = 20 years
The probability of having one success during a period up to 100 years is
FG (100 ) =P ( X ≤ 100 ) =1 − (1 − 0.05 )

100
=1 − 0.95100 =0.994
88

independent events
Negative Binomial distribution NB(r,p) is a discrete probability distribution that
gives the probability of getting the rth success in a sequence of experiments,
each having a probability p of success.
 x − 1 r
f NB ( x ) = p (1 − p ) , with x ≥ r
x−r

 r − 1
r+x j −1 There are alternative definitions
  r
FNB ( x ) ∑  ( )
j −r
 p 1 − p for this distribution:
j =r  r − 1 
Y = number of failures before
rth success. This formulation is
r equivalent to the one in terms
Mean: µ X = of X = trial at which the rth
p success occurs, since Y = X − r
r (1− p )
Variance: V ( X ) =
p2 89

independent events
Poisson distribution P(ν) is a discrete probability distribution that gives the
probability of k events occurring in a fixed interval of time and/or space if
these events occur with a known mean rate of occurrence ν and independently
of the time since the last event.
(νt)
k
fP ( k ) = e −ν t
k!
(ν t )
j
k
FP ( k ) = e −ν t
∑
j =0 j!
Mean: µK = ν t λ = νt is the mean number of events
that occur in a specified time t
Variance: V ( K ) = ν t
90

independent events
Exponential distribution Exp(ν) is a continuous probability distribution that
gives the probability of the time t between events in a Poisson process with a
known mean rate of occurrence ν
f Exp ( t ) ν e −ν t , with t ≥ 0
=
FExp ( t ) = 1 − e −ν t
1
Mean: µT =
ν
1
Variance: V (T ) =
ν2 91

independent events
Revisiting the Return Period
In the Poisson distribution, νt is the mean number of events that occur in a
specified time (or reference period) t. If t is set to 1 unit (year, day, etc), 1/ν is
the mean time between events, which coincides with the mean value of the
exponential distribution and corresponds also to the Return Period.
What’s the return period

of a boomerang?
92

independent events
Alternative explanation for the Return Period
… still considering the Poisson distribution, the return period (RP) can also be
established by the following reasoning
Consider the particular case where we need the probability Pt of any non-zero
number of events occurring during a reference period of time t (in years)
(ν t )
0
Pt =p (1) + p ( 2 ) + ... =1 − p ( 0 ) =1 − f P ( 0 ) =1 − e −ν t =1 − e −ν t =1 − e − t RP
0!
−t t
RP = → for low probability events → RP ≈
ln (1 − Pt ) Pt
1
In earthquake engineering, Pt is called seismic hazard Ht . Hence RP ≈
H1 93

independent events
Example: earthquake occurrences #1
What is the return period (RP) of an earthquake with a 2% probability of occurrence

in 50 years?
−50
=RP = 2474.9 years
ln (1 − 0.02 )
50
RP ≈ 2500 years
=
0.02
The annual seismic hazard can then be seen to be
1 1
H1 ≈ = = 0.04%
RP 2500
1 − e −T
H1 = RP
1 − e −1 2474.9 =
= 0.0404% 94

independent events
Consider that in the last 50 years 2 large earthquakes (MW > 6) occurred in a given
region and that such occurrences can be modelled by a Poisson process. What is
the probability of occurrence of such earthquakes within the next 2 years?
Consider the event “occurrence of a MW > 6 earthquake”
The mean rate of occurrence of this event = 2/50 = 0.04/year and the return
period is 1/0.04 = 25 years
The probability of having an event within the next 2 years is
FExp ( 2 ) =P ( t ≤ 2 ) =−
1 e −0.04×2 =0.077
We can also determine this probability using the Poisson distribution 95

independent events
For the case where we use the probability Pt of any non-zero number of events
occurring during a reference period of time t (in years)
(ν t )
0
Pt =p (1) + p ( 2 ) + ... =1 − p ( 0 ) =1 − f P ( 0 ) =1 − e −ν t =1 − e −ν t =1 − e −0.04×2 =0.077

0!
96

independent events
Gamma distribution Γ(k, 1/ν) is a continuous probability distribution that
gives the probability of having k events with a known mean rate of occurrence
ν occurring in a time interval (0,t)
ν (ν t )
k −1
ν = 1/θ
fΓ ( t ) −ν t
e , with t ≥ 0
Γ(k )
( y)
νt k −1
FΓ ( t ) = ∫ e − y dy
0
Γ(k )
∞ k
with Γ ( k ) =
Mean: µT =
∫
−y k −1
e y dy ν
k
Variance: V (T ) = 2
0
Gamma function 97
ν
Common Probability Distribution Models: RVs that come from the sum of
different effects
• Central Limit Theorem: this theorem states that if we consider Sn to be the
sum (or average) of n independent RVs, each with an arbitrary probability
distribution, under certain conditions (the Lindeberg condition: the variance
of each RV divided by the sum of the variances of all the RVs tends to zero as
n tends to ∞), the distribution of Sn is well-approximated by a certain type of
continuous function known as a normal density function.
Sn
98
Common Probability Distribution Models: RVs that come from the sum of
different effects
Normal (or Gaussian) distribution N(μ,σ) is the continuous probability
distribution that is most used in statistics and probability analysis across all areas
of research (especially due to the Central Limit Theorem and its implications)
2
1  x−µ 
1 −  
f N= ( x)
( x ) ϕ= e 2 σ 
σ 2π
2
x 1  y−µ 
1 −  
FN ( x ) =
Φ ( x) =
∫−∞ σ 2π e
2 σ 
dy
Mean: µX = µ (when μ = 0 and σ = 1, we have

a standard normal distribution )
Variance: V ( X ) = σ 2
Common Probability Distribution Models: RVs that come from the product
of different effects
Lognormal distribution LN(λ,β) is the continuous probability distribution of a RV
X when its natural logarithm ln(X) = Y follows a normal distribution
2
1  ln ( x ) − λ 
1 −  
f LN ( x ) =
2 β
e 
β x 2π
2
x 1  ln ( z ) − λ 
1 −  
FLN ( x ) = ∫ βz
2 β
e 
dz
−∞ 2π
λ = µln ( x ) mean value of the log of the data λ+
β2
Mean: µ X = e 2
β = σ ln ( X ) standard deviation of the log of the data
V (X )
Variance: = µ 2
X (e β2
)
−1
Common Probability Distribution Models: RVs that come from extremes of

different effects
• Extreme distributions: Let Y1, Y2,…, Yn be a series of identically distributed
independent RVs, considering that Xmax and Xmin represent the maximum and
minimum values among the variables Yi, it can be proven that the distribution
Xmax and Xmin must be one of the following models:
type I or Gumbel distribution function for maxima
Xmax type II or Fréchet distribution function for maxima

type III or Weibull distribution function for maxima
type I or Gumbel distribution function for minima

Xmin type II or Fréchet distribution function for minima
type III or Weibull distribution function for minima 101

different effects
Gumbel distribution for maximum values Extreme Value
− α ( x −u ) distribution in Matlab
FI ( x ) e −e
, with − ∞ ≤ x ≤ ∞
γ π 2
Mean: µ X= u + Variance: V ( X ) =
α 6α 2
γ is the Euler constant (0.57721566490…)
Fréchet distribution for maximum values In Matlab, define 1/x

u
k
and use the Weibull
− 
FII ( x ) e
distribution
x
, with 0 ≤ x ≤ ∞
 1   2 2 1 
Mean: µ X =uΓ  1 −  Variance: V ( X=
) u Γ 1 −  − Γ 1 −  
2
 k   k  k 

different effects
Weibull distribution for maximum values
k
 ε −x 
− 
FIII ( x ) =
1− e , with x ≤ ε
 ε −u 
 1
Mean: µ X =ε + ( u − ε ) Γ  1 + 
 k
  2 2 1 
Variance: V ( X =
) (u − ε )
2
Γ 1 + k  − Γ 1 + k  
    
There are alternative definitions for this distribution (e.g. in many cases the distribution is
presented for ε = 0)
The distributions for the minimum values can be derived by noting that min(Yi) = -max(-Yi)
103

different effects
How to determine which extreme model is best? Although data fitting

techniques are usually used for this, there are a few other aspects to
also account for:
- If the tail decay of the data is exponential (e.g. as in the Gamma,

Gaussian and Exponential models), the maxima distribution converges
towards the Type I (Gumbel)
- If the tail decay of the data is proportional to x-k as x → ∞, the maxima

distribution converges towards the Type II (Fréchet)
- If the tail decay of the data is proportional to xk as x → 0-, the maxima

distribution converges towards the Type III (Weibull)
104

different effects
Generalized extreme value distribution: it is a family of continuous probability
distributions that combines the Gumbel, Fréchet and Weibull families
−1 ζ
−1 ζ −1   x −u  
1  x − u  −1+ζ  
fGExt ( x ) = 1 + ζ   ⋅e   s 
s   s 
−1 ζ
  x −u  
−1+ζ  
FGExt ( x ) = e   s 
 x−u 
with 1 + ζ   > 0, and ζ , u , s are the parameters
 s 
by setting ζ = 0, > 0 or < 0, the Gumbel, Fréchet and Weibull families are
obtained, respectively 105
Common Probability Distribution Models: other distributions
Uniform distribution: useful to model data with values that are equally probable
 1
 for a ≤ x ≤ b
fU ( x ) =  b − a
0 for x < a or x > b
0 for x < a
 x − a
FU ( x ) 
= for a ≤ x < b
b − a
1 for x ≥ b
( b − a)
2
a+b
Mean: µ X = Variance: V ( X ) =
2 12
χ2 distribution with k degrees of freedom χ2 (k) is the distribution of a sum of the

squares of k independent standard normal random variables X12 + X22 +…+ Xk2
k y
−1 −
x 2
e 2
fχ2 ( y ) = k
k
2 Γ 
2
2
k y γ is the lower incomplete
γ ,  Gamma function
2 2
Fχ 2 ( y ) = 
k Mean: µY = k
Γ 
2
Variance: V (Y ) = 2k 107
Student’s t distribution with ν degrees of freedom t(ν): if we have a random

sample of size n drawn from a Normal distribution N(μ,σ) and denote the
sample mean x and the sample standard deviation s, the quantity
x −µ 1 n
z= ∑ ( xi − x )
2
don ' t forget
= that s
n − 1 i =1
s n
has a t distribution with n-1 degrees of freedom ν
ν +1
−1 −
ν + 1    ν   z 
2 2
ft ( z ) =
Γ   νπ Γ    1 + 
 2   2   ν 
µZ 0
Mean:= (ν > 1) for ν = 1 → Cauchy distribution
that has no mean and no variance
ν
Variance: V ( Z ) = for 1 < ν ≤ 2 it is ∞ 108
ν −2
Beta distribution Beta(α,β) is a family of continuous distributions defined over

the interval [a, b], with a > 0 and b > 0
( x − a) (b − x )
α −1 β −1
f Beta ( x ) = α > 0, β > 0

B (α , β ) ⋅ ( b − a )
α + β −1
(α , β ) ∫ (1 − x )
α −1 β −1
B= x dx
0
X −a
Y= FBeta ( y ) = I x (α , β ) regularized beta function
b−a
αβ ( b − a )
2
α (b − a )
Mean: µ X = a + Variance: V ( X ) =
(α + β ) (α + β + 1)
2
α +β 109
Triangular distribution T(a,b) is a simple continuous probability distribution

defined over the interval [a, b], with a > 0 and b > 0
 2( x − a)
 for a ≤ x ≤ c
 ( b − a )( c − a )
fT ( x ) = 
 2 (b − x ) for c ≤ x ≤ b
 ( b − a )( b − c )
 c is the mode
c = (a + b)/2 gives a symmetric distribution
 ( x − a )2
 for a ≤ x ≤ c a+b+c
 ( b − a )( c − a ) Mean: µ X =
FT ( x ) =  3
( b − x) Variance:
2
 a 2 + b 2 + c 2 + ab − ac − bc
1 − ( b − a )( b − c ) for c ≤ x ≤ b V (X )=
 18
Confidence Intervals
Remember this…
• An estimator of a Population Parameter is a Sample Statistic used to estimate or
predict that Population Parameter.
• An estimate is a particular numerical value of a Sample Statistic obtained through
sampling.
Considering a single sample of data and an estimate obtained from that sample
to establish a population parameter, a confidence interval for that population
parameter corresponds to a range of values bounding the estimate with a
certain probability of containing the true value of the population parameter
This is the single sample interpretation for what is a confidence interval... There is also the
repeated sample interpretation.
112
More formally, consider that θ is a population parameter, that 𝜃𝜃̂ is used as an

estimate for θ (and was obtained using an estimator T), an interval estimate for
θ has the form
θˆL ≤ θ ≤ θÛ
where 𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 are lower and upper bounds of the interval, and are computed
from the sample data (so the bounds 𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 are a function of 𝜃𝜃, ̂ which means
that different samples will produce different values for 𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 ).
If 𝜃𝜃̂𝐿𝐿 is set to be - ∞ or if 𝜃𝜃̂𝑈𝑈 is set to be ∞, a one-sided bound (or interval) is

obtained.
If the probabilistic distribution of T is known, it is possible to define values for

𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 such that
( )
P θˆL ≤ θ ≤ θÛ =1 − α ( with 0 ≤ α ≤ 1)
This expression states that the interval defined by the bounds 𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 has a
(1 - α) probability of containing θ.
(1 - α) is called the confidence level and α is called the significance level.
So we need the probabilistic distribution of the estimator T to be able to

construct a confidence interval...
114
Remember this…
• Central Limit Theorem: if we consider Sm to be the sum (or average) of m independent
RVs, … , the distribution of Sm is well-approximated by … a normal density function
If we extract m independent samples from a certain population and calculate

the mean 𝑋𝑋� for each sample, the data 𝑋𝑋�1 , 𝑋𝑋�2 , … 𝑋𝑋�𝑚𝑚 will follow a normal
distribution.
In particular, it can be proven that if we extract m independent samples of size n

from a certain distribution with a true mean μ and a true standard deviation σ,
the distribution of the sample mean 𝑋𝑋� follows a normal distribution N(μ,σ/ 𝑛𝑛)
So we can construct a confidence interval for the mean of a distribution. Now

we need to define 𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 based on this information...
115
If the sample mean 𝑋𝑋� follows a normal distribution N(μ,σ/ 𝑛𝑛) we can define
the standard normal variable Z by
X −µ
Z=
σ n
Variable Z follows a standard normal distribution N(0,1).
Considering that we want to construct a confidence interval with a (1 - α)
confidence level of (1 – 0.05) = 0.95 (i.e. an interval that has a 95% probability
of containing μ), what is the value of Z that has a:
• (1 – α/2) probability of occurrence → P(Z = ?) = 1 – α/2 = 97.5% → Z = 1.96

• (α/2) probability of ocurrence → P(Z = ?) = α/2 = 2.5% → Z = -1.96
116
Standard normal
distribution
2nd digit of x
1st digit of x
1 − α = 0.95
α α
= .025 = .025
2 2
Z units: z = -1.96 0 z = 1.96

Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
The values of Z = zα/2 = -1.96 and Z = z1-α /2 = 1.96 are found using the cdf of a
standard normal distribution and looking for the values of Z corresponding to
the required probabilities. 118
Using the values of Z = zα/2 = -1.96 and Z = z1-α /2 = 1.96, we can say that
 X −µ 
P ( −1.96 ≤ Z ≤ 1.96 ) =P  −1.96 ≤ ≤ 1.96  =0.95
 σ n 
which states that there is a 95% that the value of Z is between -1.96 and 1.96.
We can rewrite this expression as:
 X −µ 
P ( zα 2 ≤ Z ≤ z1−α=
2) P  zα 2 ≤ ≤ z1−α=
2
 σ n 
 σ σ 
P  X − z1−α 2 ≤ µ ≤ X − zα 2  =1 − α
 n n 119
Since zα/2 = - z1-α /2, we get the more general form of the confidence interval of
the mean
σ σ
X − z1−α 2 ≤ µ ≤ X + z1−α 2
n n
sample mean
true value
of the mean margin of error
sample size
value of a standard normal variable true value of
with a 1-α/2 probability of occurrence the standard
deviation 120
The one-sided versions of the this interval are then
σ
µ ≤ X + z1−α Upper-bounded one sided interval
n
σ
X − z1−α ≤µ Lower-bounded one sided interval
n
121
How large should n be? (According to the Central Limit Theorem, the larger
the sample size, the better will be the normal approximation to the sampling
�
distribution of 𝑋𝑋)
For practical applications, n = 30 is usually seen as a minimum
How to reduce the margin of error?

Decreasing the margin of error = a narrower confidence interval
The margin of error can be reduced if
• the standard deviation is lower: σ ↓
• The sample size is increased: n ↑
• The confidence level is decreased: (1 – α) ↓
A lower margin of error = less uncertainty!! 122

How to choose or define the confidence level?
There are no specific answers for this… 95% is perhaps the most common
value… other common values are 99%, 90%... Values lower than 75% are
not found. The higher the desired confidence level, the wider the
confidence interval will have to be.
Confidence Level Zα/2 value
80% 1.28
90% 1.645
95% 1.96
98% 2.33
99% 2.58
99.8% 3.08
99.9% 3.27 123
The confidence interval assumes that σ is known, but this parameter might
not be known for most cases. We usually only have its sample estimate s.
In this case, the normal distribution can no longer be used and it can be
proven that variable T defined by
X −µ
T=
s n
follows a t distribution with n-1 degrees of freedom, t(n-1)
124
The t distribution looks like a normal distribution, but has “thicker” tails. The
tail thickness is controlled by the degrees of freedom
standard normal distribution
t with df = 5
t with df = 1
• The smaller the degrees of freedom, the thicker the tails of the t
distribution
• If the degrees of freedom is large (if we have a large sample size),
then the t distribution approaches the standard normal distribution
Variable T follows a t distribution t(n-1).
Considering that we want to construct a confidence interval with a (1 - α)
confidence level of (1 – 0.05) = 0.95 (i.e. an interval that has a 95% probability
of containing μ), what is the value of T that has a:
• (1 – α/2) probability of occurrence for a given sample size n: tn-1,1 – α/2

• (α/2) probability of ocurrence for a given sample size n: tn-1,α/2
The values of T = tn-1,1 – α/2 and T = tn-1,α/2 are found using the cdf of a t
distribution with n-1 degrees of freedom and looking for the values of T
corresponding to the required probabilities.
126
t-Student
distribution
1 − FX ( ta ) =−
1 P ( X ≤ ta ) =P ( X > ta ) =a
table values = ta
127
Using the values of T = tn-1,1 – α/2 and T = tn-1,α/2, we can say that
 X −µ 
P ( tn −1,α 2 ≤ T ≤ t n −1,1−α 2 ) =P  t n −1,α 2 ≤ ≤ tn −1,1−α 2  =1 − α
 s n 
We can rewrite this expression as:
 s s 
P  X − tn −1,1−α 2 ≤ µ ≤ X − tn −1,α 2  =1 − α
 n n
or
 s s 
P  X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2  =1 − α
 n n
to get the confidence interval
s s
X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2
n n 128
A correction for the case of finite populations
Normally the size N of the population is assumed to be ∞
s s
X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2
n n
When the size N of the population is assumed to be a finite number :
s N −n s N −n
X − tn−1,1−α 2 ≤ µ ≤ X + tn−1,1−α 2
n N −1 n N −1
129
Assessing the sample size needed to estimate the mean within a certain
margin of error and with a certain confidence level
Starting from
s s
X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2
n n
Dividing by X
cov µ cov
1 − tn−1,1−α 2 ≤ ≤ 1 + tn−1,1−α 2
n X n
Setting a certain margin of error ME (e.g. +/-10%)
cov X × (1 ± ME ) cov
1 − tn−1,1−α 2 ≤ ≤ 1 + tn−1,1−α 2
n X n 130
Separating in 2 parts
cov cov
1 − tn−1,1−α 2 ≤ 1 − ME 1 + ME ≤ 1 + tn−1,1−α 2
n n
cov cov
tn−1,1−α 2 ≥ ME ME ≤ tn−1,1−α 2
n n
leads to only one expression
cov
tn−1,1−α 2 = ME
n
131
Replacing the value of the t distribution (that depends on the sample size)
cov cov
tn−1,1−α 2 ME ⇒ z1−α 2
= ≈ ME
n n
which leads to 2
 cov 
n =  z1−α 2 
 ME 
Setting ME, defining z1−α 2
and “guessing” cov leads to the value of n
132
It is possible to construct confidence intervals for other parameters
For the confidence interval for the variance σ2 of a normal distribution it is

possible to define the following relation
 ( n − 1) s 2 ( n − 1 ) s 2

P 2 ≤σ ≤ 2
2
 1 α
=−
 χ n −1,α 2 χ 
 n −1,1−α 2 
and the following confidence interval (which is assymetric!!)
( n − 1) s 2 ≤ σ 2 ≤ ( n − 1) s 2
χ n2−1,α 2 χ n2−1,1−α 2
This interval can also provide good estimates for the variance of other distributions 133
A correction for the case of finite populations1
 n −1 N − n 1  2  n −1 N − n 1  2
s ≤σ ≤ 
2
 + × + × s
 N − 1 N − 1 F1−α /2,n−1, N −n   N − 1 N − 1 Fα /2,n−1, N −n 
value of a variable with a 1-α/2 probability of occurrence

that follows an F distribution of parameters n-1 and N-n
This interval can also provide good estimates for the variance of other distributions
1 O’Neill,
134
B. (2014) Some useful moment results in sampling problems. The American Statistician, 68(4), 282-296
Assessing the sample size needed to estimate the variance within a certain
Starting from
( )
n − 1 s 2
≤σ 2 ≤
( )
n − 1 s 2
χ n2−1,α 2 χ n2−1,1−α 2
Dividing by s 2
( n − 1) σ 2 ( n − 1)
≤ ≤
χ 2
n −1,α 2 s 2
χ n2−1,1−α 2
Setting a certain margin of error ME (e.g. 1.10 or 0.90)
( n − 1) s 2 × ME ( n − 1)
≤ ≤ 2
χ n2−1,α 2 s 2
χ n−1,1−α 2 135
That simplifies to
( n − 1) ( n − 1)
≤ ME ≤
χ 2
n −1,α 2 χ n2−1,1−α 2
Since this interval is assymetric, both sides need to be analysed:

- Select a confidence level
- Define multiple values of n and compute χ n2−1,α 2 and χ n2−1,1−α 2
- Compute the bounds of the interval until they match the desired value of ME
136
Considering a confidence level of 10%
3
2.5
2
ME
1.5
0.5
0 20 40 60 80 100 120 140 160 180 200 137
Sample size
Building Probabilistic Models (distribution
fitting and parameter estimation)
Building Probabilistic Models (distribution fitting and parameter estimation)

Let’s consider a set of (variable) data that was generated by a certain process
and let’s assume that a RV following a certain (unknown) statistical distribution
is able to represent the variability of this data.
Based on the available data, we want to fully characterize the statistical

distribution of this RV
X = { X 1 , X 2 ,..., X n } f(x)
INPUT
MAGICAL PROCESS
x
PERFECT
OUTPUT
139
Real (continuous) data will hardly ever follow an exact theoretical statistical
model
Usually, available samples of real data are not entirely representative of the
true population
X = { X 1 , X 2 ,..., X n } f(x)
INCOMPLETE
INPUT MAGICAL PROCESS
x
IMPERFECT/APPROXIMATED
OUTPUT
140
Let’s focus on the MAGICAL PROCESS
First, analyse the shape of the data by visualising the data

• Use histograms to check the symmetry/asymmetry of the data
141
Let’s focus on the MAGICAL PROCESS
First, analyse the shape of the data by visualising the data

• Use boxplots to check for outliers in the data (and to see if the use of
robust statistics is needed)
142
Second, analyse the distribution type using a probability plot
A probability plot is not a p-p plot or a q-q plot!!
For a probability plot, we don’t need to estimate parameters for the

distribution type we are testing (unlike for the p-p plot or q-q plot)
The probability plot assumes the data is from a location-scale family of

distributions formed by a translation and rescaling of a standard
distribution in that family. So, if X is an RV with a distribution of that family,
the RV Z will also have a distribution of that family if it can be written as:
X −m m is a location parameter (not necessarily the mean)

Z=
s s is a scale parameter (not necessarily the standard deviation) 143
Based on this assumption, it can also be seen that
 x−m
F ( x ) G=
=   G(z)
 s 
cdf of the data
cdf of the standardized variable

By inverting this relation we get
x−m x m
z= G  F ( x ) =
−1
= −
s s s
We see there is a linear relation between x and G-1[F(x)] (note that G-1[F(x)]
are the percentile values of x) 144
Since the true cdf F(x) is not known, we have to define an empirical cdf Fn(x)
based on the number of points in the data.
A simple empirical cdf Fn(x) (ecdf) can be defined by:
i
( )
Fn x( i ) =
n
where x(i) are the ordered values of X (usually called
ordered statistics or ranks)
Instead of this simple expression, the ecdf is usually defined by an estimate

of the median ordered statistics (for each value x(i), we are trying to estimate
a value for its probability that has a 50% chance of being the true percentile,
given that we have a small sample size).
Several empirical expressions have been proposed for this… 145

A few examples (there are others)…
i − 0.3 i − 0.3175
Benard ( )
Fn x( i ) =
n + 0.4
Filiben ( )
Fn x( i ) =
n + 0.365
i − 0.35 i − 0.375
Hosking
and Wallis ( )
Fn x( i ) =
n
Blom ( )
Fn x( i ) =
n + 0.25
i − 0.5 i
Hazen ( )
Fn x( i ) =
n
Weibull ( )
Fn x( i ) =
n +1
i − 0.44 i − 0.4
Gringorten ( )
Fn x( i ) =
n + 0.12
Cunnane ( )
Fn x( i ) =
n + 0.2 146
But which one to choose?
If the goodness of the linear relation given by the probability plot

depends on the selected proposal for the ecdf, this means we are
probably choosing the wrong family of distributions
The Benard proposal is one of the most popular choices
i − 0.3
( )
Fn x( i ) =
n + 0.4
147
And which family of distributions to choose?
Based on the visual assessment of symmetry/asymmetry, start by the

most common families. If the data is symmetric, start by the normal
distribution. If the data is asymmetric, start by the lognormal,
exponential and other extreme type families of distributions
If the data comes from a phenomenon that has been previously

analysed, check the literature for suggested families of distributions
The next issue is how to obtain G-1[F(x(i))] since there is no analytical

expression for F(x(i)), just a pointwise representation
148
For the case of the normal distribution family, it can be seen that when
 x−m
F ( x ) G=
=   G(z)
 s 
z follows a standard normal distribution N(0,1) and it is possible to obtain
numerical values for its inverse
Fn=( )
x( i )
i − 0.3
n + 0.4
⇒ Φ −1 Fn = ( ( ))
x( i ) z( i )
The probability plot is then a plot of x(i) versus z(i)
149
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
z( i ) 0 z( i ) 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-3 -2 -1 0 1 2 3 4 0 2 4 6 8 10 12
x( i ) x( i )
close to normal not normal at all
150
Other information we get from a probability plot for the normal distribution
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
z( i ) 0 z( i ) 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
0 10 20 30 40 50 60 70 80 90 0.9 0.92 0.94 0.96 0.98 1 1.02 1.04
x( i ) x( i )
data is skewed to the right data is skewed to the left
(there’s always data below one (there’s always data above one
potential straight line) potential straight line)
151
Other information we get from a probability plot for the normal distribution
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
z( i ) 0 z( i ) 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-6 -4 -2 0 2 4 6 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x( i ) x( i )
data is symmetric heavy-tailed data is symmetric light-

(or fat-tailed) tailed (or thin-tailed)
152
For the case of the lognormal distribution family, given the relation between
the normal and the lognormal distributions, we just have to set
y = ln ( x )
and do the plot for ln(x) that now follows a normal distribution
The probability plot is then a plot of ln(x(i)) versus z(i)
153
For the case of the Weibull distribution family there is an alternative and
even simpler process because the cdf has an analytical expression
k k
 ε −x  x
−  − 
F ( x) =
1− e , with x ≤ ε
 ε −u 
F ( x )= 1 − e u
assume ε = 0
From which we can get
k
x
− 
1 e
F =− u
⇔ ln ( − ln (1 − F ) ) =k ln ( x ) − k ln ( u )
To get the probability plot, we just plot ln(x(i)) versus ln(-ln(1-F(x(i))))

In this case we can even get the distribution
154
parameters by fitting the equation of a straight line
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5
-12 -10 -8 -6 -4 -2 0 2 4 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
( )
ln x( i ) ( )
ln x( i )
close to Weibull not weibull at all
155
Third, analyse the fitting of the distribution using a q-q plot or a p-p plot
A p-p plot compares the empirical cumulative distribution function of the
data with a specified theoretical cumulative distribution function
x( i )
fit parameters of the
selected distribution ( )
determine F x( i )
i − 0.3
select the expression
for the empirical cdf ( )
Fn x( i ) =
n + 0.4
(for example)
( ) ( )
plot Fn x( i ) versus F x( i )
156
Third, analyse the fitting of the distribution using a q-q plot or a p-p plot
A q-q plot compares the quantiles of the empirical data with the quantiles of
a theoretical distribution
i − 0.3
x( i ) select the expression
for the empirical cdf ( )
Fn x( i ) = y=
n + 0.4
(for example)
fit parameters of the

selected distribution determine F −1 ( y )
plot x( i ) versus F −1 ( y )
157
When to use one or the other?
A p-p plot tends to magnify deviations between the data and the
selected theoretical distribution in the middle range of the
distribution.
A q-q plot tends to magnify deviations between the data and the
selected theoretical distribution in the tail range of the distribution.
The more linear the plot looks, the better the fit between the data
and the theoretical distribution
158
When to use one or the other?
1 1.06
0.9
1.04
0.8
0.7 1.02
0.6
1
0.5
0.98
0.4
0.3 0.96
0.2
0.94
0.1
0 0.92
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.88 0.9 0.92 0.94 0.96 0.98 1 1.02 1.04 1.06
p-p plot for a normal distribution q-q plot for a normal distribution
trying to fit Weibull data trying to fit the same Weibull data
159
What to do when nothing seems to fit the data??

5
10
4 13
3.5
12
11
2.5
Quantiles of the empirical data
Quantiles of the empirical data

2 10
1.5
9
8
0.5
0 7
0 0.5 1 1.5 2 2.5 3 7 8 9 10 11 12 13
Theoretical quantiles 10
5 Theoretical quantiles
qq plot (lognormal fit – original data) qq plot (lognormal fit – log of data)
But how to determine the parameters of the selected distribution?

• Method of Moments
• Maximum Likelihood Method
• Bayesian analysis
161
Method of Moments
After selecting the distribution type, the number of parameters that need
to be determined is known and it is assumed that the available data is
sufficient to estimate their values
The method of moments defines the distribution parameters by

assuming that the sample moments (i.e. from the data) and the
theoretical moments (i.e. from the selected distribution) are identical
1 n ∞
∑ ( i )
j
ˆ
∫ ( x − c) f X ( x ) dx
j
=mj x − c λj
=
n i =1 −∞
sample moments theoretical moments
162
Method of Moments
If we need to estimate n parameters, we then need n equations:
=m j λ=
j , with j 1,..., n
∞
1 n
∑ ( xî − c ) = ∫ ( x − c) f X ( x ) dx , with j = 1,..., n
j j
n i =1 −∞
The method is usually applied considering raw moments, so c = 0
163
Method of Moments Example: parameters of the distribution of

concrete compressive strength
Available data (MPa):
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x 24.4 27.6 27.8 27.9 28.5 30.1 30.3 31.7 32.2 32.8 33.3 33.5 34.1 34.6 35.8 35.9 36.8 37.1 39.2 39.7
It is assumed that concrete compressive strength follows a normal

distribution. Since this distribution has 2 parameters, we need 2 equations.
∞
1 n
m1 = ∑ xî λ1 = ∫ x f ( x ) dx
X
n i =1 −∞
∞
1 n 2
m2 = ∑ xî λ2 = ∫ x 2 f X ( x ) dx
n i =1 −∞
164

The sample moments are then

1 n 1 n 2
=m1 = ∑
20 i =1
xî 32.67 =m2 = ∑
20 i =1
xî 1083.36
The theoretical moments are established as function of the parameters

2 2
∞ 1  x−µ  ∞ 1  x−µ 
1 −   1 −  
λ1 ∫−∞ σ 2π e dx µ λ= ∫ x2 = µ2 +σ 2
2 σ  2 σ 
=x 2 e dx
−∞ σ 2π
165

By formulating the following function
( λ ( µ ,σ ) − m ) + ( λ ( µ ,σ ) − m )
2 2
g ( µ ,=
σ) 1 1 2 2
g ( µ , σ ) = ( µ − m1 ) + ( µ + σ − m2 )
2 2 2 2
the parameters can be obtained numerically by finding the solution of μ

and σ that minimizes the function g (e.g. using the least-squares method)
The solution is: µ = 32.67

σ = 4.04 166
Maximum Likelihood Method

What is the likelihood?
The likelihood can be understood as a measure of the extent to which a
sample provides support for particular values of a parameter in a parametric
model, i.e. the chance (probability) of occurrence of the observed data
conditional on the model.
If 𝑓𝑓 𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑛𝑛 , 𝜃𝜃 is the joint probability distribution of n independent RVs

X1, X2,…, Xn that follow the same distribution (that has parameter 𝜃𝜃) and
have sample values x1, x2,…, xn, the likelihood function of the sample is
L (θ | x1 , x2 ,..., xn ) = L (θ )
167

Let’s assume that a given data follows a normal distribution
2
1  x−µ 
1 −  
fX ( x) = e 2 σ 
σ 2π
the likelihood of one of the elements in the data x1 is
2
1  x −µ 
1 −  1 
L ( µ , σ | x1 ) = e 2 σ 
σ 2π
the likelihood of two of the elements in the data x1 and x2 is
2 2
1  x −µ  1  x −µ 
1 −  1  1 −  2 
L ( µ , σ | x1 , x2 )
= e 2 σ 
× e 2 σ 
σ 2π σ 2π
168

By considering the n elements of the data, their likelihood is:
2
1  xˆ − µ 
1 n n −  i
2  σ 
= ∏
L (θ | xˆ ) ∏
=
=i 1 =i 1 σ 2π
f X ( xî )
e
the vector of the distribution parameters the vector of the data

θ = {µ , σ } xˆ = { xˆ1 , xˆ2 ,...xˆn }
169

The maximum likelihood method defines the value of the distribution
parameters by maximizing the likelihood of the observed data
2
1  xˆ − µ 
n
1 n −  i
2  σ 
L (θ | xˆ ) ∏
= = f X ( xî ) ∏ e
=i 1 =i 1 σ 2π
Parameters θ are estimated as those maximizing the likelihood

function or, equivalently, those minimizing the –likelihood function:
min ( −L (θ | xˆ ) )
170

Due to mathematical convenience, there are advantages in considering
the log-likelihood function l instead
 n 
l (θ | x ) log
= ˆ =  L (θ | x )  log  ∏ f X ( xi ) 
ˆ ˆ
 i =1 
n
l (θ | xˆ ) = ∑ log  f X ( xˆ ) 
i =1
This simplifies the problem, since now we have to maximize a sum

of terms rather than a long product of terms
171

If parameters θ are estimated as those minimizing the –log-likelihood
function
min ( −l (θ | xˆ ) )
It can be shown that the parameter estimates are RVs that follow a a
joint normal distribution with
Mean values µΘ = (θ1* ,θ 2* ,...θ n* )

∂ 2l (θ | xˆ )
Covariance matrix CΘΘ = H −1 where H ij = −
∂θi ∂θ j
θ =θ *
172
Maximum Likelihood Method Example: parameters of the distribution of

The log-likelihood function can be written as
 −  i 2 
2
1  xˆ −θ  2
n
1  1  1  xî − θ 2 
n
∑
l (θ | xˆ ) = ln 
 θ1 2π
e
2  θ1  

n × ln 
=  − ∑
θ1 2π  2 i 1 θ 
i 1=  1 
 
The minimum can be obtained by solving the following equations
∂l n 1 n ∂l 1 n
− + 3 ∑ ( xi − θ 2 ) = = 2 ∑ ( xî −=
θ2 ) 0
2
= ˆ 0
∂θ1 θ1 θ1 i =1 ∂θ 2 θ1 i =1
173

which then leads to:
n
∑ ( xˆ − θ )
2
i 2 1 n
θ1 = i =1 θ 2 = ∑ xî
n n i =1
For the case of the normal distribution, the sample mean and the
sample standard deviation are the Maximum Likelihood estimators!
Finally, we obtain again:
θ1 = 4.04 θ 2 = 32.67 174

We can also obtain the covariance matrix of these estimators:
 n 3 n 2 n 
 − θ + θ 4 ∑ ( xi − θ 2 ) θ 3 ∑ ( xi − θ 2 ) 
2
ˆ ˆ
=
H=  1 1 i 1= 1 i 1

2 n
n 
θ 3 ∑ ( xi − θ 2 )
ˆ
θ 2 
 1 i =1 1 
0.836 0
−1  Variance of the standard deviation
C=
ΘΘ H=  
 0 0.165  Variance of the mean value
175
How good are these estimates of the parameters?
We have 2 measures to assess the goodness of these parameter estimates:
- BIAS: how close is the estimate to the true value?

- VARIANCE: how much does it change for different datasets?
176
The bias-variance tradeoff: in most cases, we can only decrease one of

them at the expense of the other
177
For the case of the mean and standard deviation, it can be proven that the
mean estimate is not biased, while the estimate of the standard deviation
is biased. For data samples of larger size, the bias becomes very small.
However, for samples with “common” sizes, a correction to the estimator is
used to correct the bias:
n n
∑ ( xî − x ) ∑ ( xî − x )
2 2
s= i =1
s= i =1
n n −1
178
For skewness and kurtosis estimators, similar “small-sample” correction

factors can be defined:
3
n n
 xi − x 
( n − 1)( n − 2 ) ∑
β1 =  
i =1  s 
n ( n + 1)
4
n
 xi − x 
( n − 1)( n − 2 )( n − 3) ∑
β2 =  
i =1  s 
179
For skewness and kurtosis estimators, similar “small-sample” correction

factors can be defined:
3
n n
 xi − x 
( n − 1)( n − 2 ) ∑
β1 =  
i =1  s 
n ( n + 1) 3 ( n − 1)
4 2
n
 xi − x 
=β2 ∑   −
( n − 1)( n − 2 )( n − 3) i =1  s  ( n − 2 )( n − 3)
180
Bayesian Estimation
Bayesian Estimation assumes that parameters 𝜃𝜃 are random variables that
have a known prior distribution f (𝜃𝜃). This distribution is typically very
broad or vague to reflect the fact that we know little about its true value
Once we obtain data X, we use the Bayes theorem to find the posterior
distribution f*(𝜃𝜃). Ideally, we want this data to reduce our uncertainty
about the parameters.
181
Bayesian Estimation
By recalling the following relation from the Bayes Theorem:
P ( Ai ) × P ( B | Ai )
P ( Ai | B ) = n
∑ [ P( B | A ) × P( A )]
i =1
i i
we can obtain the following equation for a continuous distribution:

prior distribution
f (θ ) × P ( X | θ )
f (θ ) =
*
∞
∫ f (θ ) × P ( X | θ ) dθ conditional probability or
−∞ likelihood of observing X
posterior distribution assuming that
(after getting the data in X) parameters are θ
normalizing constant
Bayesian Estimation
Considering that:
−1
k  ∫ f (θ ) × P ( X | θ ) dθ  L (θ | X ) = P ( X | θ )
∞
=
 −∞ 
we get:
f * (θ ) =
k × f (θ ) × L (θ | X ) distribution of parameter θ
But we need an estimate of parameter θ
∞
usually the expected value θ
= *
∫ θ × f * (θ ) dθ
−∞
183
Bayesian Estimation Example: defective reinforced concrete piles
Consider that the reinforced concrete piles of a building foundation could be

defective due to poor construction quality. We want to know the proportion
p of defective piles in a given project that may have hundreds of piles.
We assume that p is a continuous RV and we assume that there is no
adequate prior information about p. Therefore, the prior distribution of p
will be a uniform distribution (also called a diffuse prior):
( p ) 1, for 0 ≤ p ≤ 1
f=
On the basis of the inspection of one pile, revealing that it is defective,
the likelihood is the probability of the event X = one pile selected for
inspection is defective, which is p.
184
Bayesian Estimation
Therefore
f * ( p ) =k × f ( p ) × L ( p | X ) =k × 1.0 × p, for 0 ≤ p ≤ 1
−1
k is k =
pdp 
1
 ∫0
=
and the normalizing constant 2

The posterior distribution of p is then:
f * ( p ) 2 p, for 0 ≤ p ≤ 1
=
The point estimate of p is then:
∞ 1
∫ θ × f (θ ) dθ =
∫ p × 2 pdp =0.667
* *
p = 185
−∞ 0
Bayesian Estimation Example: distribution of wind speed
Since parameters s and u are RVs, their (best) estimates (i.e. their average
values) may change when new data is obtained. Assume the distribution of
parameter u is the following exponential distribution with a mean value of
6 m/s: u
−
e 6
f (u ) =
6
Considering that 1 new value of wind speed data is obtained (𝑥𝑥� = 18m/s),
what is the updated value of parameter u?
186

u
Prior distribution of u: −
e 6
f (u ) =
6
Likelihood function of the wind speed given the current value of u:
2 2
2−1 x x
2 x −  2x − 
L ( u | x ) =×   ×e u
=×2
e u
u u u
Likelihood function for the new wind speed data:
2
 18  324
2 ×18 −  36 −
L ( u | xˆ ) =
L ( u |18 ) = 2 × e u 
=2 × e u2
u u 187
Considering that:
−1
k  ∫ f ( u ) × L ( u | xˆ ) du 
∞
=
 −∞ 
we get: −
u u 324
− − 2
324
e 36 − u 2
6
6× e 6 u
f ( u ) =k × f ( u ) × L ( u | x ) =k ×
*
ˆ × 2 ×e =k ×
6 u u2
−1
and the normalizing constant k is:
  u 324
− − 2
∞ 6× e 6 u
 
 ∫0
=k = du 151.987
u 2 
  188
We then get the posterior distribution of parameter u:

u 324
911.922 − − 2
f * (u ) = 2
e 6 u
u
The new estimate of parameter u is then:
u 324
∞ 911.922 − 6 − u 2
∞
∫−∞ u × f ( u ) du =
∫0 u e du =
* *
u = 15.67
189
0.18
0.16 Prior distribution
0.14 Posterior distribution
0.12 Likelihood
0.10
0.08
0.06
0.04
0.02
0.00
0.0 10.0 20.0 30.0 40.0 50.0 60.0
u
190
Bayesian Estimation
There are advantages when we “know” the prior distribution of the
parameter we want to estimate and the likelihood function of the data
that is used to estimate the parameter: the posterior distribution may
already be known from existing theoretical results and is often of the same
family
f * (θ ) =
k × f (θ ) × L (θ | X )
The cases where there is a theoretical connection between the prior
distribution, the likelihood function and the posterior distribution which is of
the same family of the prior distribution are known as conjugate distributions
191
192
https://en.wikipedia.org/wiki/Conjugate_prior
• Method of Moments
The simplest approach to obtain parameters, but the estimates are usually
not the best (it is rarely used in practice).
• Maximum Likelihood Method
An approach a bit more complicated (used by most statistical analysis
softwares). We also obtain information about the distribution of the
parameters.
• Bayesian analysis
The most complex approach of the three. It leads directly to the distribution
of the parameters and any prior assumption made about their distribution
may be corrected by a posterior distribution.
193
Building Probabilistic Models (distribution
fitting and parameter estimation)
Fourth, analyse the fitting of the distribution using a goodness-of-fit test
When using these techniques, you need to know what you’re doing!!!!
195
Fourth , analyse the fitting of the distribution using a goodness-of-fit test

A goodness-of-fit test is a statistical hypothesis test that is designed to assess
formally if the sample of data comes from a certain statistical distribution
Hypothesis testing is a class of statistical techniques designed to

extrapolate information from samples of data to make inferences
about populations for the purpose of decision making.
Hypothesis testing can be used to test if

• a certain RV follows some specific distribution
• a population parameter (e.g. the mean) has some specific value
• two population parameters are the same
196
For an overview of other types of statistical tests, see for example:
197

The basic components of a statistical hypothesis test are:
• Null hypothesis H0 – A statement regarding what we want to test
formulated as an equality (e.g. the RV X follows a normal distribution)
• Alternative hypothesis H1 or HA – A statement contradictory to the null
hypothesis
• Test statistic – A quantity that reflects the hypothesis we want to test and
that is computed using the sample data. Since the test statistic value
changes from sample to sample, it is a RV and has a sampling distribution
(that may be known or not)
• Rejection region – Values of the test statistic for which we reject the null
hypothesis in favour of the alternative hypothesis 198

In hypothesis testing, no matter the outcome of the test, WE ARE
NEVER ABLE TO PROVE THAT WE CAN ACCEPT THE NULL HYPOTHESIS!!!
We can only prove that we have to REJECT or that we FAIL TO REJECT
the null hypothesis!!!
So what’s the difference between ACCEPTING and FAILING TO REJECT
the null hypothesis?
In the first case, there are facts and arguments that lead to an
acceptance of the null hypothesis (we have enough evidence to accept).
In the second case, there are only facts and arguments stating that
rejecting is not possible (we don’t have enough evidence to reject, so
we must accept the null hypothesis)
199

So when to REJECT or when to FAIL TO REJECT?
Managing the Level of Significance, the Power of the test and the Type of Errors
The Type of Errors in hypothesis testing are:
• Type I Error – A Type I Error occurs when we reject a true null hypothesis
• Type II Error – A Type II error occurs when we fail to reject a false null
hypothesis
So what are the possible results of the test?

200
True Nature of the Hypothesis
The null hypothesis is The null hypothesis is

true false
Type I Error
Result of the test
The test rejects

(rejecting a true null correct decision
the null hypothesis
hypothesis) - α
Type II Error
The test fails to reject
correct decision (failure to reject a false
the null hypothesis
null hypothesis) - β
201

• α is the Level of Significance which is the probability of having a Type I
Error. α is set by the person that performs the test (usually 0.05)
• β is the probability of having a Type II Error and it is related to the Power
of the test. The Power of the test is the probability of the test to reject
the null hypothesis when it is false, i.e. 1 - β
In practice, we want both α and β to be as small as possible!
α and β are not independent!! When α is reduced, β increases, but the
actual relation between α and β is unknown!! Only α can be set by the
person performing the test. The value of β is a property of the test
(which we usually don’t know and is actually more important than α!!).
For a given test, the only way to reduce β is by increasing the sample
size of the data under analysis 202

And what’s the test statistic and how to use it?
The test statistic is a parameter that is a function of the data and addresses a
specific feature of the data in order to reflect the hypothesis we want to test.
The value of the test statistic based on the sample is compared to a critical
value. The critical value is a specific value of the test statistic defining the
boundary of the rejection region above or below which (depending on the
test) we reject the null hypothesis.
The critical value depends on the statistical distribution of the test statistic
and on the selected level of significance
So, how does it all work? 203


Let’s assume that a statistic δ is able to reflect a unique feature of a sample
of data that enables us to determine if the sample comes from a normal
distribution. Let’s also consider that δ exhibits positive low values (e.g. closer
to zero) when the data being tested comes from a true normal distribution.
Let’s assume also that the distribution of the statistic δ is known. This
distribution defines how likely is each value of the statistic δ.
fδ
δ
204

Given A , the occurrence of higher values of the statistic δ is less likely if
we are testing a sample data that comes from a normal distribution.
Let’s consider that δ* is the value of δ with a probability of being exceeded

of 5%
fδ
P(δ > δ*) = 0.05

(Rejection region - the most
unlikely values of δ)
δ 205
δ*

By considering that δ* is the critical value of δ, δcrit, we can conduct a
hypothesis test with a level of significance of 5%. Therefore, if a certain
sample of data X# has a value of the test statistic δ# > δcrit, we reject the null
hypothesis with a level of significance of 5% (i.e. with a 5% probability of
having a Type I Error). If δ# ≤ δcrit, we can’t reject the null hypothesis.
fδ
P(δ > δcrit) = 0.05
(Rejection region - the most
unlikely values of δ)
δcrit δ
206

Since we are rejecting the null hypothesis when the data presents a high
value of δ, this type of test is called a one-sided upper test.
If the rejection condition was “reject the null hypothesis when the data
sample presents a low value of δ”, the test would be a one-sided lower test.
fδ
P(δ < δcrit) = 0.05
δ
δcrit
207

If the rejection condition was “reject the null hypothesis when the data
sample presents either a low value or a high value of δ”, the test would be a
two-sided test.
fδ
P(δ < δcrit, low) = 0.025
P(δ < δcrit, up) = 0.025
δ
δcrit, low δcrit, up
208

Sometimes, the outcome of a test is not a “reject/failure to reject” decision
based on the critical value of the test statistic (e.g. when using statistical
analysis softwares).
Instead the result is a numerical value known as the p-value. The p-value
does not provide a “reject/failure to reject” answer but “helps” you
determine the significance of the test result.
But what does a p-value represent?
Technically, a p-value is the probability of a value of the test statistic at

least as extreme as the one obtained from the sample under analysis,
assuming that the null hypothesis is true.
Let’s see what this means really… 209

Let’s consider that a certain sample of data X# has a value of the test statistic
δ#. Let’s also consider that, according to the distribution of the test statistic,
the probability of having a test statistic with a value of δ# is 0.005 (0.5%).
Therefore, the p-value is 0.005.
How significant is this result? (We have a non significant result

fδ when the probability of δ# is so low so that we reject H0 by saying
that δ# is too unlikely to occur in a situation where H0 is true)
So, is a probability of
P(δ > δ#) = 0.005
0.5% low enough?
δ 210
δcrit δ#

A few pointer on how to interpret p-values:
The smaller the p-value, the more statistical evidence there is to support the
alternative hypothesis (i.e. to reject the null hypothesis):
• If the p-value is less than 1%, there is overwhelming evidence that
supports the alternative hypothesis.
• If the p-value is between 1% and 5%, there is a strong evidence that
• If the p-value is between 5% and 10%, there is a weak evidence that
• If the p-value exceeds 10%, there is no evidence that supports the
alternative hypothesis.
211

What are the available goodness-of-fit tests?
There are 2 very popular goodness-of-fit tests that can be found in most
statistical analysis softwares:
• The χ2 test
• The Kolmogorov-Smirnov test
These tests are popular not because they are good!

They are popular because they can be found in most
statistical analysis softwares and because they can be used
to assess goodness-of-fit for any statistical distribution!
212

What are the available goodness-of-fit tests?
There are other goodness-of-fit tests that can be used to assess goodness-of-
fit for any statistical distribution. These are 2 others that are easy to
implement:
• The Cramér-von Mises test

• The Anderson-Darling test
213

The χ2 test can be used to determine if a given sample of data follows a
certain distribution. The parameters of the target distribution must be
estimated first. The test involves the following steps:
Consider a sample of data X of size n. After computing estimates of the m
parameters of the target distribution fX :
• Divide the data X into M cells (e.g. like for constructing the bins of an
histogram). The value of M is selected between M1 and M2 given by:
15
 2n  2
qα is the (1 - α) percentile of the standard
M1 = 4  2  normal distribution, where α is the level of
 qα  signficance of the test
M=
2 0.5 × M 1 214

• Count how many values of X are within each cell i → Oi
• For the target distribution, determine the expected fraction of data pi
that would be in each cell i (i.e. the probability of a given data value
being within cell i):
Mi
pi = ∫ f X ( x )dx
M i −1
• Compute the χ2 test statistic:
( Oi − n × pi )
2
M Oi and n×pi should be > 5
χ =∑
2
i =1 n × pi Reduce M if needed
215

• Since larger values of the χ2 test statistic mean the target distribution fits
poorly to the data, and since the χ2 test statistic follows a χ2 distribution
with (M - m - 1) degrees of freedom, the null hypothesis is rejected if the
test statistic exceeds the critical value defined by the (1 - α) percentile of
the χ2 distribution with (M - m - 1) degrees of freedom
If χ 2 > χ M2 −m−1,1−α ⇒ reject H 0

If χ 2 ≤ χ M2 −m−1,1−α ⇒ do not reject H 0
216
Example of application of the χ2 test:

Severe thunderstorms have been recorded at a given station over a period of 66
years. During this period, the frequencies of severe thunderstorms observed are
as follows:
- 20 years with zero storm
- 23 years with one storm
- 15 years with two storms 0.4
- 6 years with three storms data
- 2 years with four storms 0.3
The histogram of the annual 0.2
number of rainstorms recorded is:

Probability
0.1
0 1 2 3 4
Number of occurrences
217

We want to fit a Poisson distribution to the yearly occurrence of thunderstorms
and test the goodness-of-fit of that distribution to the data.
Poisson distribution P(ν) is a discrete probability distribution that gives the probability of k
events occurring in a fixed interval of time and/or space if these events occur with a known
mean rate of occurrence ν and independently of the time since the last event.
(ν t ) (ν )
k k
fP ( k ) = e −ν t t = 1 year fP ( k ) = e −ν
k! k!
Mean: µK = ν t t = 1 year µK = ν
218

Case #1: fit the Poisson distribution to the data using the method of moments
Since this distribution has 1 parameter, we need 1 equation:
1 n ∞
m1 = ∑ xî
n i =1
λ1 = ∫ x f ( x ) dx
−∞
X
1 n 20 × 0 + 23 ×1 + 15 × 2 + 6 × 3 + 2 × 4
m1 = ∑
n i =1
xî
66
= 1.197
(ν )
∞ x
m1= λ1= ∫x
0
x!
e −ν dx= µ= ν= 1.197
219

Case #2: fit the Poisson distribution to the data using the maximum likelihood
method
n
 n 
L (θ | xˆ ) = ∏ f X ( xî ) l (θ | xˆ ) log
= =  L (θ | xˆ )  log  ∏ f X ( xî ) 
i =1  i =1 
n
l (θ | xˆ ) = ∑ log  f X ( xˆ )  min ( −l (θ | xˆ ) )
i =1
For the Poisson distribution:
(ν )
x
(ν )
xî
n
fX ( x) = e −ν
L (ν | xˆ ) = ∏ e −ν …
x! i =1 xî !
220

Case #2: fit the Poisson distribution to the data using the maximum likelihood
method
1 n
… min ( −l (ν | xˆ ) ) =ν = ∑ xî 1.197
n i =1
0.4
data
Poisson distribution
0.3
0.2
Probability
0.1
0 1 2 3 4
Number of occurrences
221

Applying the fit the χ2 test:
(1.197 )
x
fX ( x) = e −1.197 M
( Oi − n × pi )
2
x! χ =∑
2
i =1 n × pi
( Oi − n × pi )
2
Nº of storms Observed Theoretical
( Oi − n × pi )
2
per year frequencies Oi frequencies n×pi n × pi
0 20
I 23
2 15 with M = 5, we have a number of
3 6 observed frequencies that is lower than
4 2 5, so we need to reduce M: aggregate
the cases with 3 and 4 storms per year 222

(1.197 )
x
fX ( x) = e −1.197 M
( Oi − n × pi )
2
x! χ =∑
2
i =1 n × pi
( Oi − n × pi )
2
( Oi − n × pi )
2
0 20 19.94 0.0036 0.0002
I 23 23.87 0.7569 0.0317
2 15 14.29 0.5041 0.0353
≥3 8 = 6+2 7.90 0.0100 0.0013
Total 66 66 0.0685
223

Applying the fit the χ2 test: (1.197 )
1
n × f X (1) =×
66 e −1.197 =
23.87
1!
(1.197 )
x
fX ( x) = e −1.197 M
( Oi − n × pi )
2
x! χ =∑2
i =1 n × pi
( Oi − n × pi )
2
( Oi − n × pi )
2
0 20 19.94 0.0036 0.0002
I 23 23.87 0.7569 0.0317
2 15 14.29 0.5041 0.0353
≥3 8 = 6+2 7.90 0.0100 0.0013
Total 66 66 0.0685
224

( Oi − n × pi )
2
M
χ 2
∑
=
i =1 n × pi
0.0685
H0 is “the data follows a Poisson distribution”

If χ 2 > χ M2 −m−1,1−α ⇒ reject H 0
If χ 2 ≤ χ M2 −m−1,1−α ⇒ do not reject H 0
For a significance level of 5%, we get:
χ M2 −m−1,1−α = χ 42−1−1,1−0.05 = χ 2,0.95

2
⇒ P ( X ≤ x p )= P ( Χ 22 ≤ χ 2,0.95
2
)= 0.95
225
χ2 distribution
1 − FX ( x p ) =
p= 1− P ( X ≤ xp ) =
P ( X > xp )
table values = x p
χ M2 −m−1,1−α = χ 42−1−1,1−0.05 = χ 2,0.95

2
⇒ P ( X ≤ x p )= P ( Χ 22 ≤ χ 2,0.95
2
)= 0.95
1− P (Χ ≤ χ )=
P(Χ )= χ 2,0.95
2
2
2
2
2,0.95
2
2 >χ 2
2,0.95 0.05 = 5.991
χ 2 ≤ χ M2 −m−1,1−α 0.0685 ≤ 5.991 do not reject H 0 226


The Kolmogorov-Smirnov test can be used to determine if a given sample of
data follows a certain distribution. The target distribution must be
completely defined first and its parameters must not be assessed from the
sample of data being tested. The test involves the following steps:
Consider a sample of data X of size n. After defining the target distribution fX:
• Arrange the data in ascending order (i.e. define the ordered statistics x(i))
• Compute the distances D+ and D-:
D−
1≤i ≤ n
( )
max FX x( i ) − ( i − 1) n =D + max i n − FX x( i )
1≤i ≤ n
( )
Maximum vertical distance between the Maximum vertical distance between the
cdf of the target distribution FX and the cdf of the target distribution FX and the
empirical cdf FnX when FX > FnX empirical cdf FnX when FX < FnX

• Compute the test statistic D:
D
= n × max ( D + , D − )
Large values of D mean the target
distribution fits poorly to the data.
• It can be shown that, when n → ∞, D
has the following distribution
∞
1 − 2∑ ( −1) e
FD ( x ) =
i −1 2 −i 2 x 2
i =1
But for low sample sizes it’s not adequate

228

There are critical values of D (or for D/n0.5) widely available for various
significance levels, with asymptotic formulae for samples with size n > 30
and tabulated values for small samples.
• The null hypothesis is rejected if the test statistic exceeds the critical
value defined by the (1 - α) percentile value
229

The results of the Kolmogorov-Smirnov test are not valid if the distribution
parameters are determined from the sample of data being tested.
For a procedure to apply the test when the distribution parameters are
determined from the sample of data being tested see:
Capasso, M., Alessi, L., Barigozzi, M., Fagiolo, G. (2009) On approximating the distributions of goodness-of-fit
test statistics based on the empirical distribution function: The case of unknown parameters. Advances in
complex systems, 12(02), 157-167.
If the target distribution is the normal distribution, an alternative version

of this test called the Lilliefors test can be used.
This version of the test allows for the parameters of the distribution to be
determined using the sample of data being tested.
This test is the same as the Kolmogorov-Smirnov test. The difference is
that it uses different critical values of the statistic 230
Example of application of the KS test:
Assume the data follows a normal distribution N(30,4.5) and test this
hypothesis using the KS test and considering a 5% significance level 231

No. of Compressive
i/n (i-1)/n Φ(30,4.5) |D-| |D+|
sample (i) strength (MPa)
1 24.4 0.05 0 0.1067 0.1067 0.0567
2 27.6 0.1 0.05 0.2969 0.2469 0.1969 D = 0.2972
3 27.8 0.15 0.1 0.3125 0.2125 0.1625
4 27.9 0.2 0.15 0.3204 0.1704 0.1204
5 28.5 0.25 0.2 0.3694 0.1694 0.1194
6 30.1 0.3 0.25 0.5089 0.2589 0.2089
7 30.3 0.35 0.3 0.5266 0.2266 0.1766
8 31.7 0.4 0.35 0.6472 0.2972 0.2472
9 32.2 0.45 0.4 0.6875 0.2875 0.2375
10 32.8 0.5 0.45 0.7331 0.2831 0.2331
11 33.3 0.55 0.5 0.7683 0.2683 0.2183
12 33.5 0.6 0.55 0.7816 0.2316 0.1816
13 34.1 0.65 0.6 0.8189 0.2189 0.1689
14 34.6 0.7 0.65 0.8467 0.1967 0.1467
15 35.8 0.75 0.7 0.9013 0.2013 0.1513
16 35.9 0.8 0.75 0.9051 0.1551 0.1051
17 36.8 0.85 0.8 0.9346 0.1346 0.0846
18 37.1 0.9 0.85 0.9427 0.0927 0.0427
19 39.2 0.95 0.9 0.9795 0.0795 0.0295
232
20 39.7 1 0.95 0.9844 0.0344 0.0156
D > Dcrit ,1−α 0.2972 > 0.2941 reject H 0
233

No. of Compressive If we use the best
i/n (i-1)/n Φ(32.7,4.04) |D-| |D+|
sample (i) strength (MPa)
1 24.4 0.05 0 0.0200 0.0200 0.0300
estimates of the
2 27.6 0.1 0.05 0.1034 0.0534 0.0034 normal distribution
3 27.8 0.15 0.1 0.1126 0.0126 0.0374 parameters:
4 27.9 0.2 0.15 0.1174 0.0326 0.0826
5 28.5 0.25 0.2 0.1493 0.0507 0.1007
6 30.1 0.3 0.25 0.2599 0.0099 0.0401 D = 0.1007
7 30.3 0.35 0.3 0.2762 0.0238 0.0738
8 31.7 0.4 0.35 0.4023 0.0523 0.0023 But we need to sue
9 32.2 0.45 0.4 0.4508 0.0508 0.0008
10 32.8 0.5 0.45 0.5099 0.0599 0.0099
the Lilliefors critical
11 33.3 0.55 0.5 0.5590 0.0590 0.0090 value instead: 0.173
12 33.5 0.6 0.55 0.5785 0.0285 0.0215
13
14
34.1
34.6
0.65
0.7
0.6
0.65
0.6355
0.6809
0.0355
0.0309
0.0145
0.0191
D < Dcrit ,1−α
15 35.8 0.75 0.7 0.7786 0.0786 0.0286
16 35.9 0.8 0.75 0.7858 0.0358 0.0142
17 36.8 0.85 0.8 0.8449 0.0449 0.0051
18 37.1 0.9 0.85 0.8619 0.0119 0.0381 do not reject H 0
19 39.2 0.95 0.9 0.9462 0.0462 0.0038
234
20 39.7 1 0.95 0.9584 0.0084 0.0416

Sessions#2 and #3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sessions#2 and #3

Uploaded by

Copyright:

Available Formats

Uncertainty Modelling and Risk Analysis

Review of probability theory and related basic concepts

Set Theory: some concepts

A set is a collection of well defined objects

The objects of a set are called elements

Usually sets are defined by upper-case letters, and elements by

Set Theory: some concepts

A set with no elements is a Null Set or an Empty Set. This set is

A infinite set is a set with an infinite number of elements:

If all the elements of set A are also elements of set B then A is a

Set Theory: some concepts

If A ⊂ B and B ⊂ A, this means that A = B

Other relations involving the complement of sets:

Set Theory: some concepts

Set Theory: some concepts

Two sets A and B are said to be exhaustive of another set C if A U B = C

Two sets A and B are called disjoint or mutually exclusive if A ∩ B = Ø

Set Theory: some concepts

Set Theory: some concepts

Set Theory: some examples

Sample Space and Probability: some concepts

Random variable - a numerical quantity that is the result of an experiment and

Sample Space and Probability: some concepts

Probability - there are several definitions

Sample Space and Probability: some concepts

Example: rolling of a die

Random variable - the numerical value of the die

Sample Space and Probability: correspondence with set theory

Set theory Probability theory

The probability has the following properties (axioms):

1. P ( A) ≥ 0 (the probability is non-negative)

Axioms of Probability: some consequences

0 ≤ P ( A ) ≤ 1 (for any event A) If A ⊆ B then P ( A ) ≤ P ( B )

For any events A and B

For any number n of events Ai

Axioms of Probability: some consequences

For any disjoint (or mutually exclusive) events A and B

Axioms of Probability: some consequences

Example: television sports viewing habits of people

What percentage of the people watched none of the three sports?

The event representing none of the three sports being watched is A∪ B ∪C

Axioms of Probability: some consequences

Example: television sports viewing habits of people

which means that the area (probability) of events A and B occurring

Example: picking 2 aces out of a deck of cards

Conditional Probability: multiplication rules

General Multiplication Rule – For arbitrary events A and B

P ( A1 ∩ A2 ... ∩ An ) = P ( A1 ∩ A2 ... ∩ An −1 ) × P ( An | A1 ∩ A2 ... ∩ An −1 )

Total Probability Theorem

From the General Multiplication Rule we define the Total Probability

General Multiplication Rule – For arbitrary events A and B

Prior Probability The support that B

The Bayes Theorem reflects how the probability

Example: rain on a wedding day

Marie is getting married tomorrow at an outdoor ceremony in the desert.

Example: rain on a wedding day

Example: rain on a wedding day

This result may seem somewhat contradictory!!

Statistical independent events

Statistical independent events

If two events are mutually exclusive (or disjoint) this

If two events are uncorrelated this does not

Discrete & Continuous Distributions of Random Variables