Professional Documents
Culture Documents
Xavier Romão
Assistant Professor
Civil Engineering Department – FEUP
March 2022
Expressing uncertainty
Expressing uncertainty
Quantitative methods
The usual approach in traditional science fields where sufficient hard
data is available for numerical treatment
descriptive statistics
Quantitative methods
probabilistic models
3
Review of probability theory and related basic concepts
• Set Theory
• Sample Space and Probability
• Axioms of Probability
• Conditional Probability
• Total Probability Theorem
• Bayes Theorem
• Independence
• Discrete & Continuous Distributions of Random Variables
• Moments and other Descriptors of Random Variables
• Common Probability Distribution Models
4
Review of probability theory and related basic concepts
• Return Period
• Confidence Intervals
• Building Probabilistic Models (distribution fitting and parameter
estimation)
5
Set theory
Review of probability theory and related basic concepts
A⊂ B
8
Review of probability theory and related basic concepts
If A is a set in S, the set of the elements that are in S but are not in A
–
is the complement of A (also called not A) and it is denoted by A
S= ∅ ∅ =S A= A
9
Review of probability theory and related basic concepts
The union of sets A and B represents all the elements that belong to
A or B or both, and is represented by A U B
The intersection of sets A and B represents all the elements that
belong to both A and B, and is represented by A ∩ B.
AUB A∩B
S S
A B A B
10
Review of probability theory and related basic concepts
AUB=C A∩B=Ø
C S
A B
B A
11
Review of probability theory and related basic concepts
Other relations: A ( B C ) = ( A B) ( A C )
A B = A B
A B = A B
n n
i
=i 1 =i 1
A =A i
De Morgan’s laws
n n
i
=i 1 =i 1
A =A i
13
Review of probability theory and related basic concepts
16
Review of probability theory and related basic concepts
Sample Probability - the same as Probability but relative to the sample instead of
the population
17
Review of probability theory and related basic concepts
Axioms of Probability
The probability function associated to the occurrence of event A, P(A),
is a number assigned to this event that represents its likelihood and is
called the probability of A.
2. P(S ) =1
3. For a group of mutually exclusive events A1, A2,…
n n
P Ai = ∑ P ( Ai )
i=1 i=1 21
Review of probability theory and related basic concepts
P ( ∅ ) =0 ( )
P A = 1 − P ( A ) (for any event A)
A survey of a people viewing habits of car racing (A), golf (B) and football
(C) revealed that:
- 28% watched A, 29% watched C, 19% watched B
- 14% watched A and C
- 12% watched C and B
- 10% watched A and B
- and 8% watched all three sports
( )
P A ∪ B ∪ C =1 − P ( A ∪ B ∪ C )
0.29
0.12 0.14
0.08
0.19
0.10 0.28
P ( A ∪ B ∪ C ) = P ( A) + P ( B ) + P ( C ) − P ( A ∩ B ) − P ( A ∩ C ) − P ( B ∩ C ) + P ( A ∩ B ∩ C )
P ( A ∪ B ∪ C ) = 0.28 + 0.19 + 0.29 − 0.10 − 0.14 − 0.12 + 0.08 = 0.47
( )
P A ∪ B ∪ C =1 − 0.47 =0.53 25
Conditional Probability
Review of probability theory and related basic concepts
Conditional Probability
Given two arbitrary events A and B, the probability P(A|B) is defined as
the conditional probability of event A given that event B has occurred.
P( A ∩ B)
P( A | B) , P( B) ≠ 0
P( B)
Note that if event B is S:
P ( A ∩ S ) P ( A)
P( A| S ) = = ⇔ P( A| S ) =P ( A ) which is obvious
P(S ) 1
it helps to define conditional probability P(A|B) as the probability of A
with respect to a reduced sample space defined by the outcomes of
event B
27
Review of probability theory and related basic concepts
Conditional Probability
Consider a 1x1 square (sample space S) and events A and B. The area
of squares A and B are P(A) = 0.25 and P(B) = 0.375.
A B
We can see that P(A ∩ B) = 0.25/4 = 0.0625
P ( A ∩ B ) 0.0625 1
P ( A | B )= = = ≈ 0.17
1
P( B) 0.375 6
28
Review of probability theory and related basic concepts
Conditional Probability
Two cards are drawn in succession without replacement from an ordinary deck of
cards (52 cards). What is the probability that both cards are aces?
A is the event corresponding to the first card being an ace
B is the event corresponding to the second card being an ace
P( A ∩ B)
P ( A | B=
) ⇒ P ( A ∩ B=
) P( B)× P( A | B)
P( B)
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A) Let’s focus on this
4 3
P ( A ∩ B ) = P ( A) × P ( B | A) = × = 0.0045=0.45%
52 51 29
Review of probability theory and related basic concepts
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A)
General Multiplication Rule – For arbitrary events A, B and C
P ( A ∩ B ∩= B ) P ( C | A ∩ B ) × P ( B | A) × P ( A)
C ) P ( C | A ∩ B ) × P ( A ∩=
Event D
General Multiplication Rule – For n arbitrary events
n n i −1
which then turns into P Ai = ∏ P Ai A j
= i =1 j 1
i 1= 30
Total Probability Theorem
Review of probability theory and related basic concepts
n
P ( B | A1 ) × P ( A1 ) + ... + P ( B | An ) × P=
( An ) ∑ [ P( B | A ) × P( A )]
i i
A1 i =1
B A3 Probability of event B given that event Ai occurred
A4 P= ( A1 ) P= ( A2 ) P= ( A3 ) P= ( A4 ) 0.25
1
P ( B ∩ A1 ) = P ( B | A1 ) × P ( A1 ) = 0.25 × 0.25 = 0.0625
n
A2
1
∑ [ P( B | Ai ) × P( Ai )] =
P( B) =
i =1
4 × ( 0.25 × 0.25 ) =
0.25
Bayes Theorem
Review of probability theory and related basic concepts
Bayes Theorem
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A)
34
Review of probability theory and related basic concepts
Bayes Theorem
Considering a more complex case with n disjoint and collectively
exhaustive events A1, … An
P ( Ai ) × P ( B | Ai )
P ( Ai | B ) =
P( B)
The probability P(B) can be eliminated using the Total Probability Theorem
P ( Ai ) × P ( B | Ai )
P ( Ai | B ) = n
∑ [ P( B | A ) × P( A )]
i =1
i i
35
Review of probability theory and related basic concepts
Bayes Theorem
36
Review of probability theory and related basic concepts
Bayes Theorem
What we know:
( A1 ) 5=
P= 365 0.0137 It rains 5 days during the year
P ( A2 ) 360
= = 365 0.9863 It does not rain 360 days during the year
When it rains, the weatherman predicts rain
P ( B A1 ) = 0.9 90% of the time
When it does not rain, the weatherman predicts
P ( B A2 ) = 0.1 rain 10% of the time
We want to know P(A1|B), the probability that it will rain on the day of
Marie's wedding, given a forecast for rain by the weatherman. 37
Review of probability theory and related basic concepts
Bayes Theorem
P ( A1 ) × P ( B | A1 )
P ( A1 | B ) =
P ( A1 ) × P ( B | A1 ) + P ( A2 ) × P ( B | A2 )
0.0137 × 0.9
P ( A1 | B ) = 0.111
0.0137 × 0.9 + 0.9863 × 0.1
38
Statistical independent events
Review of probability theory and related basic concepts
P ( A | B ) = P ( A ) and P ( B | A ) = P ( B )
Based on the General Multiplication Rule
P ( A ∩ B=
) P ( B ) × P ( A | B =) P ( A) × P ( B | A)
we get
P ( A ∩ B=
) P ( A) × P ( B )
In a more general form we get
n
∏ P ( Ai )
P ( A1 ∩ A2 ... ∩ An ) =
i =1 40
Review of probability theory and related basic concepts
P ( A ∩ B=
) P ( A) × P ( B ) P( A ∩ B) =
∅
independent events disjoint events
The behaviour a random variable is defined by its probability distribution (it defines
how probabilities are distributed across the different values of the random variable)
The probability distribution function FX (x) (usually
called the cumulative distribution function)
Probability distributions
are defined by 2 functions The probability density function fX (x) - for continuous
RVs - and the probability mass function pX (x) - for
discrete RVs 43
Review of probability theory and related basic concepts
The cdf defines the probability of the RV to take any value between its lowest
possible value (which could be - ∞) up to x.
The cdf has the following properties:
• 0 ≤ 𝐹𝐹𝑋𝑋 𝑥𝑥 ≤ 1
• lim𝑥𝑥→−∞ 𝐹𝐹𝑋𝑋 𝑥𝑥 = 0
• lim𝑥𝑥→∞ 𝐹𝐹𝑋𝑋 𝑥𝑥 = 1
• 𝑥𝑥 ≤ 𝑦𝑦 ⇒ 𝐹𝐹𝑋𝑋 𝑥𝑥 ≤ 𝐹𝐹𝑋𝑋 𝑦𝑦
44
Review of probability theory and related basic concepts
For discrete RVs we have 𝐹𝐹𝑋𝑋 𝑥𝑥 = � 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) and 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) = 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗 − 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗−1
𝑥𝑥𝑗𝑗 ≤𝑥𝑥
where 𝑓𝑓𝑋𝑋 𝑥𝑥 is the probability mass function (pmf)… usually also called pdf
46
Review of probability theory and related basic concepts
0.35 0.6
Examples of a
0.3
0.5
0.25
discrete (stepwise)
0.4
0.2
0.3
0.1
0.2
0.05 0.1
0 0
-5 0 5 0 2 4 6 8 10
0.4 0.7
0.35 0.6
0.3
0.5
Examples of a 0.25
0.2
0.4
continuous 0.15
0.3
0.1
0.05
0 0
-5 0 5 0 2 4 6 8 10
47
Review of probability theory and related basic concepts
∞
� 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑑𝑑𝑑𝑑 = 1
−∞
Consistency Condition
� 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) = 1
𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥𝑗𝑗
B 48
Review of probability theory and related basic concepts
𝑓𝑓𝑋𝑋 (𝑥𝑥𝑗𝑗 ) = 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗 − 𝐹𝐹𝑋𝑋 𝑥𝑥𝑗𝑗−1 For a discrete RV, the pdf defines
the probability of occurrence of
each value of the RV!!
𝑃𝑃 𝑎𝑎 < 𝑋𝑋 ≤ 𝑏𝑏 = 𝐹𝐹𝑋𝑋 𝑏𝑏 − 𝐹𝐹𝑋𝑋 𝑎𝑎
FX(4)
FX(3)
0
0 3 4 x 0 3 4
x
4 4 3
P (3 < X ≤ 4) =
area = ∫ f X ( x ) dx = ∫ f X ( x ) dx − ∫ f X ( x ) dx =
3 0 0
= FX ( 4 ) − FX ( 3)
4
P ( X= 4=
) area ? =∫ f X ( x ) dx
3.9999999999999999999
FX ( 4 ) − FX ( 3.9999999999999999999 ) ≈ 0
=
Review of probability theory and related basic concepts
An histogram and a pmf may look similar, but they’re not the same:
- An histogram is a discrete version of the pdf of a continuous RV
(because usually we don’t have the full population, just a
representative sample). The pmf is the pdf of a discrete RV
- The vertical axis of the histogram represents the number of times a
value of the RV falls within a certain interval (bin). The vertical axis of
the pmf represents the probability of each value of the discrete RV
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 51
-5 0 5
Review of probability theory and related basic concepts
x 0 1 2 3 4
P(X=x) 0.0039 0.0469 0.2109 0.4219 0.3164
nº of buildings 39 469 2109 4219 3164 52
Review of probability theory and related basic concepts
Graphically
53
Level of damage
Review of probability theory and related basic concepts
Probability
Since A and Ā are disjoint and collectively
exhaustive events:
P(Ā) = 1 – P(A)
A
= 1 – 0.3164
= 0.6836
Level of damage
68.36% is the probability of a building not collapsing 55
Review of probability theory and related basic concepts
Probability
Since A and B are disjoint events:
B A
P(A U B)
= P(A) + P(B)
= 0.3164 + 0.4129
= 0.7293 Level of damage
72.93% is the probability of a building to have a damage level of 3 or 4 56
Review of probability theory and related basic concepts
59
Review of probability theory and related basic concepts
0 for x < a
x − a
FX ( x )
= for a ≤ x < b (where a = 0 and b = 1 in our case)
b − a
1 for x ≥ b
1
for a ≤ x ≤ b
fX ( x) = b − a
0 for x < a or x > b 60
Moments as Descriptors of Random Variables
Review of probability theory and related basic concepts
In practice, the exact form of the probabilistic model may not be known.
In those cases, a RV can be defined using its moments.
E g ( X ) = ∑ i g ( xi ) f X ( xi ) for a discrete RV 62
Review of probability theory and related basic concepts
g ( x=
) ( x − c)
k
−∞
−∞
63
Review of probability theory and related basic concepts
which corresponds to the centroid of the area under the curve of f(x)
In statistics, the 2nd order moment, the variance, is called a central moment
since c = μX and measures the dispersion of the RV X around its mean.
∞
σ X2 =V ( X ) = E ( X − µ X ) = ∫ ( x − µ X ) ⋅ f X ( x ) dx
2 2
−∞
2
This can be re-written as = [ ]
V ( X ) E X 2 − E X 64
Review of probability theory and related basic concepts
σ X = σ X2 CoVX = σ X µ X
The 3rd order moment is also a central moment that, when divided by σ3, is
called the skewness coefficient and measures the asymmetry of the RV X with
respect to its mean.
∞
E ( X − µ X ) = ∫ ( x − µ X ) ⋅ f X ( x ) dx
3 3
−∞
E ( X − µX )
3
γ1 =
3
σX 65
Review of probability theory and related basic concepts
−∞
E ( X − µ X )
4
γ2 =
4
σX
The kurtosis coefficient is usually compared to the value 3 (the kurtosis
coefficient of a RV that follows a Normal distribution).
66
Review of probability theory and related basic concepts
this is called
“excess kurtosis”
– kurtosis above
(or below) the
normal value of 3
67
Review of probability theory and related basic concepts
X ( xp )
F= p, with 0 ≤ p ≤ 1
For example, the median is the quantile for level p = 0.50. Quantiles are often
used in civil engineering to set the value of loads and material properties.
The pth level quantile xp of a RV is the value of the RV that has a probability
of being exceeded 1-p.
P ( X > xp ) =
1 − FX ( x p ) =
1− p
68
Review of probability theory and related basic concepts
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
Example:
P(X = x) 0.4 0.4 1 P(X = x) 2
1/6
0.1 0.1
1 2 3 4 5 6 x 1 2 3 4 5 6 x
σX
1= 0.65 0.81
= CoVX σ=
= X µX 0.81
= 3.5 0.23
2 σX
= 2.9 1.7
= CoVX σ=
= X µX 1.7
= 3.5 0.49 72
Review of probability theory and related basic concepts
fX ( x) =
a ⋅ e − ax , with x ≥ 0, a ≥ 0
The mean of X is defined by: integration by parts
∞ ∞ ∞ ∞
e − ax
µ X = E [ X ] = ∫ x ⋅ f X ( x ) dx = ∫ x ⋅ a ⋅ e − ax dx = x ⋅ a ⋅ − ∫ − e − ax
dx
0 0 −a 0 0
∞
(1 + a ⋅ x )
∞ ∞
e
− ax
e e
− ax − ax
1 1
= x ⋅ a ⋅ + = − = 0−− =
−a 0 −a 0 a 0 a a
73
using limits
Review of probability theory and related basic concepts
X
a
0 0
∞
1 − ax
2 ∞
1 − ax 1 2
∞
1 − ax
− x − e + ∫ 2 x − e dx =
= 0 + 2 + ∫ a x − e dx =
a 0 0 a a a0 a
integration by parts using limits
1 2 1 − ax
∞ ∞ 1 2 1 e − ax ∞
+ − x − e + ∫ e dx= − ax
+ 0 − + − =
a 2
a a 0 0 a
2
a a a
integration by parts
0
using limits
Review of probability theory and related basic concepts
1 2 1 1 1
... = 2
+ − + 0 + = 2
a a a a a
using limits
f X ( x )= a ⋅ e − ax FX ( x ) =
∫ X
f ( u ) du ∫
= a ⋅ e − a ⋅u
du 1
= − e − ax
0 0
75
Review of probability theory and related basic concepts
FX ( x p ) = p ⇔ 1 − e
− ax p
= p
−1
=xp ln (1 − p )
a
76
Common Probability Distribution Models
(including the Return Period and the
Central Limit Theorem)
Review of probability theory and related basic concepts
There are some probability distribution models that are more general and
that appear in many situations. We’ll focus models representing the
behaviour of RVs that are:
- the result of independent events
- the sum of different effects
- the product of different effects
- the extremes of different effects
… and we’ll also address a few other useful probability distribution models
78
Review of probability theory and related basic concepts
n k
f B ( k ) p (1 − p )
n−k
=
k
k
n j
FB ( k ) ∑ p (1 − p )
n− j
j =1 j
n n!
with = Mean: µ K = np
k k !( n − k )! Variance: V (=
K ) np (1 − p )
Binomial coefficient
Review of probability theory and related basic concepts
GGG X=3 P ( X = 3) = p × p × p
GGB
GBG X=2 P ( X = 2 ) = 3 p × p × (1 − p )
BGG
BBG
BGB X=1 P ( X = 1) = 3 p × (1 − p ) × (1 − p )
GBB
BBB X=0 P ( X = 0 ) = (1 − p ) × (1 − p ) × (1 − p ) 82
Review of probability theory and related basic concepts
3!
P ( B) 0.9 (1 − =
0.9 )
1 3−1
= 0.027
1!( 3 − 1)!
83
Review of probability theory and related basic concepts
and the probability of success p = 0.10, we have event A = 2 bulldozers are inoperative
3!
P ( A) 0.12 (1 −=
0.1)
3− 2
= 0.027
2!( 3 − 2 )!
84
Review of probability theory and related basic concepts
fG (=
x ) p (1 − p )
x −1
FG ( x ) =1 − (1 − p )
x
1
Mean: µX =
p
1− p
Variance: V ( X ) =
p2 85
Review of probability theory and related basic concepts
Return Period
86
x = Tfirst tornado x = Tsecond tornado
Review of probability theory and related basic concepts
fG (=
x ) p (1 − p )
x −1
FG ( x ) =1 − (1 − p )
x
1
Mean: µX = Return Period
p
1− p
Variance: V ( X ) =
p2 87
Review of probability theory and related basic concepts
A wind tower was designed to be operational for wind speeds up to 100km/h. This
wind speed has a 5% annual probability of exceedance. What is the probability of
exceeding this wind speed during the lifetime of the tower which is 100 years?
88
Review of probability theory and related basic concepts
x − 1 r
f NB ( x ) = p (1 − p ) , with x ≥ r
x−r
r − 1
r+x j −1 There are alternative definitions
r
FNB ( x ) ∑ ( )
j −r
p 1 − p for this distribution:
j =r r − 1
Y = number of failures before
rth success. This formulation is
r equivalent to the one in terms
Mean: µ X = of X = trial at which the rth
p success occurs, since Y = X − r
r (1− p )
Variance: V ( X ) =
p2 89
Review of probability theory and related basic concepts
(νt)
k
fP ( k ) = e −ν t
k!
(ν t )
j
k
FP ( k ) = e −ν t
∑
j =0 j!
Mean: µK = ν t λ = νt is the mean number of events
that occur in a specified time t
Variance: V ( K ) = ν t
90
Review of probability theory and related basic concepts
f Exp ( t ) ν e −ν t , with t ≥ 0
=
FExp ( t ) = 1 − e −ν t
1
Mean: µT =
ν
1
Variance: V (T ) =
ν2 91
Review of probability theory and related basic concepts
Consider the particular case where we need the probability Pt of any non-zero
number of events occurring during a reference period of time t (in years)
(ν t )
0
Pt =p (1) + p ( 2 ) + ... =1 − p ( 0 ) =1 − f P ( 0 ) =1 − e −ν t =1 − e −ν t =1 − e − t RP
0!
−t t
RP = → for low probability events → RP ≈
ln (1 − Pt ) Pt
1
In earthquake engineering, Pt is called seismic hazard Ht . Hence RP ≈
H1 93
Review of probability theory and related basic concepts
Consider that in the last 50 years 2 large earthquakes (MW > 6) occurred in a given
region and that such occurrences can be modelled by a Poisson process. What is
the probability of occurrence of such earthquakes within the next 2 years?
Consider the event “occurrence of a MW > 6 earthquake”
The mean rate of occurrence of this event = 2/50 = 0.04/year and the return
period is 1/0.04 = 25 years
FExp ( 2 ) =P ( t ≤ 2 ) =−
1 e −0.04×2 =0.077
We can also determine this probability using the Poisson distribution 95
Review of probability theory and related basic concepts
For the case where we use the probability Pt of any non-zero number of events
occurring during a reference period of time t (in years)
(ν t )
0
96
Review of probability theory and related basic concepts
ν (ν t )
k −1
ν = 1/θ
fΓ ( t ) −ν t
e , with t ≥ 0
Γ(k )
( y)
νt k −1
FΓ ( t ) = ∫ e − y dy
0
Γ(k )
∞ k
with Γ ( k ) =
Mean: µT =
∫
−y k −1
e y dy ν
k
Variance: V (T ) = 2
0
Gamma function 97
ν
Review of probability theory and related basic concepts
Common Probability Distribution Models: RVs that come from the sum of
different effects
• Central Limit Theorem: this theorem states that if we consider Sn to be the
sum (or average) of n independent RVs, each with an arbitrary probability
distribution, under certain conditions (the Lindeberg condition: the variance
of each RV divided by the sum of the variances of all the RVs tends to zero as
n tends to ∞), the distribution of Sn is well-approximated by a certain type of
continuous function known as a normal density function.
Sn
98
Review of probability theory and related basic concepts
Common Probability Distribution Models: RVs that come from the sum of
different effects
Normal (or Gaussian) distribution N(μ,σ) is the continuous probability
distribution that is most used in statistics and probability analysis across all areas
of research (especially due to the Central Limit Theorem and its implications)
2
1 x−µ
1 −
f N= ( x)
( x ) ϕ= e 2 σ
σ 2π
2
x 1 y−µ
1 −
FN ( x ) =
Φ ( x) =
∫−∞ σ 2π e
2 σ
dy
Common Probability Distribution Models: RVs that come from the product
of different effects
Lognormal distribution LN(λ,β) is the continuous probability distribution of a RV
X when its natural logarithm ln(X) = Y follows a normal distribution
2
1 ln ( x ) − λ
1 −
f LN ( x ) =
2 β
e
β x 2π
2
x 1 ln ( z ) − λ
1 −
FLN ( x ) = ∫ βz
2 β
e
dz
−∞ 2π
λ = µln ( x ) mean value of the log of the data λ+
β2
Mean: µ X = e 2
β = σ ln ( X ) standard deviation of the log of the data
V (X )
Variance: = µ 2
X (e β2
)
−1
Review of probability theory and related basic concepts
k k k
Review of probability theory and related basic concepts
1
Mean: µ X =ε + ( u − ε ) Γ 1 +
k
2 2 1
Variance: V ( X =
) (u − ε )
2
Γ 1 + k − Γ 1 + k
There are alternative definitions for this distribution (e.g. in many cases the distribution is
presented for ε = 0)
The distributions for the minimum values can be derived by noting that min(Yi) = -max(-Yi)
103
Review of probability theory and related basic concepts
s s
−1 ζ
x −u
−1+ζ
FGExt ( x ) = e s
x−u
with 1 + ζ > 0, and ζ , u , s are the parameters
s
by setting ζ = 0, > 0 or < 0, the Gumbel, Fréchet and Weibull families are
obtained, respectively 105
Review of probability theory and related basic concepts
Uniform distribution: useful to model data with values that are equally probable
1
for a ≤ x ≤ b
fU ( x ) = b − a
0 for x < a or x > b
0 for x < a
x − a
FU ( x )
= for a ≤ x < b
b − a
1 for x ≥ b
( b − a)
2
a+b
Mean: µ X = Variance: V ( X ) =
2 12
Review of probability theory and related basic concepts
2
k y γ is the lower incomplete
γ , Gamma function
2 2
Fχ 2 ( y ) =
k Mean: µY = k
Γ
2
Variance: V (Y ) = 2k 107
Review of probability theory and related basic concepts
( x − a) (b − x )
α −1 β −1
(α , β ) ∫ (1 − x )
α −1 β −1
B= x dx
0
X −a
Y= FBeta ( y ) = I x (α , β ) regularized beta function
b−a
αβ ( b − a )
2
α (b − a )
Mean: µ X = a + Variance: V ( X ) =
(α + β ) (α + β + 1)
2
α +β 109
Review of probability theory and related basic concepts
Confidence Intervals
Remember this…
• An estimator of a Population Parameter is a Sample Statistic used to estimate or
predict that Population Parameter.
• An estimate is a particular numerical value of a Sample Statistic obtained through
sampling.
Considering a single sample of data and an estimate obtained from that sample
to establish a population parameter, a confidence interval for that population
parameter corresponds to a range of values bounding the estimate with a
certain probability of containing the true value of the population parameter
This is the single sample interpretation for what is a confidence interval... There is also the
repeated sample interpretation.
112
Review of probability theory and related basic concepts
Confidence Intervals
Confidence Intervals
( )
P θˆL ≤ θ ≤ θˆU =1 − α ( with 0 ≤ α ≤ 1)
This expression states that the interval defined by the bounds 𝜃𝜃̂𝐿𝐿 and 𝜃𝜃̂𝑈𝑈 has a
(1 - α) probability of containing θ.
114
Review of probability theory and related basic concepts
Confidence Intervals
Remember this…
• Central Limit Theorem: if we consider Sm to be the sum (or average) of m independent
RVs, … , the distribution of Sm is well-approximated by … a normal density function
Confidence Intervals
If the sample mean 𝑋𝑋� follows a normal distribution N(μ,σ/ 𝑛𝑛) we can define
the standard normal variable Z by
X −µ
Z=
σ n
Variable Z follows a standard normal distribution N(0,1).
Considering that we want to construct a confidence interval with a (1 - α)
confidence level of (1 – 0.05) = 0.95 (i.e. an interval that has a 95% probability
of containing μ), what is the value of Z that has a:
2nd digit of x
1st digit of x
Review of probability theory and related basic concepts
Confidence Intervals
1 − α = 0.95
α α
= .025 = .025
2 2
Confidence Intervals
Using the values of Z = zα/2 = -1.96 and Z = z1-α /2 = 1.96, we can say that
X −µ
P ( −1.96 ≤ Z ≤ 1.96 ) =P −1.96 ≤ ≤ 1.96 =0.95
σ n
which states that there is a 95% that the value of Z is between -1.96 and 1.96.
We can rewrite this expression as:
X −µ
P ( zα 2 ≤ Z ≤ z1−α=
2) P zα 2 ≤ ≤ z1−α=
2
σ n
σ σ
P X − z1−α 2 ≤ µ ≤ X − zα 2 =1 − α
n n 119
Review of probability theory and related basic concepts
Confidence Intervals
Since zα/2 = - z1-α /2, we get the more general form of the confidence interval of
the mean
σ σ
X − z1−α 2 ≤ µ ≤ X + z1−α 2
n n
sample mean
true value
of the mean margin of error
sample size
value of a standard normal variable true value of
with a 1-α/2 probability of occurrence the standard
deviation 120
Review of probability theory and related basic concepts
Confidence Intervals
The one-sided versions of the this interval are then
σ
µ ≤ X + z1−α Upper-bounded one sided interval
n
σ
X − z1−α ≤µ Lower-bounded one sided interval
n
121
Review of probability theory and related basic concepts
Confidence Intervals
How large should n be? (According to the Central Limit Theorem, the larger
the sample size, the better will be the normal approximation to the sampling
�
distribution of 𝑋𝑋)
For practical applications, n = 30 is usually seen as a minimum
Confidence Intervals
How to choose or define the confidence level?
There are no specific answers for this… 95% is perhaps the most common
value… other common values are 99%, 90%... Values lower than 75% are
not found. The higher the desired confidence level, the wider the
confidence interval will have to be.
80% 1.28
90% 1.645
95% 1.96
98% 2.33
99% 2.58
99.8% 3.08
99.9% 3.27 123
Review of probability theory and related basic concepts
Confidence Intervals
The confidence interval assumes that σ is known, but this parameter might
not be known for most cases. We usually only have its sample estimate s.
In this case, the normal distribution can no longer be used and it can be
proven that variable T defined by
X −µ
T=
s n
follows a t distribution with n-1 degrees of freedom, t(n-1)
124
Review of probability theory and related basic concepts
Confidence Intervals
The t distribution looks like a normal distribution, but has “thicker” tails. The
tail thickness is controlled by the degrees of freedom
standard normal distribution
t with df = 5
t with df = 1
• The smaller the degrees of freedom, the thicker the tails of the t
distribution
• If the degrees of freedom is large (if we have a large sample size),
then the t distribution approaches the standard normal distribution
Review of probability theory and related basic concepts
Confidence Intervals
Variable T follows a t distribution t(n-1).
Considering that we want to construct a confidence interval with a (1 - α)
confidence level of (1 – 0.05) = 0.95 (i.e. an interval that has a 95% probability
of containing μ), what is the value of T that has a:
The values of T = tn-1,1 – α/2 and T = tn-1,α/2 are found using the cdf of a t
distribution with n-1 degrees of freedom and looking for the values of T
corresponding to the required probabilities.
126
t-Student
distribution
1 − FX ( ta ) =−
1 P ( X ≤ ta ) =P ( X > ta ) =a
table values = ta
127
Review of probability theory and related basic concepts
Confidence Intervals
Using the values of T = tn-1,1 – α/2 and T = tn-1,α/2, we can say that
X −µ
P ( tn −1,α 2 ≤ T ≤ t n −1,1−α 2 ) =P t n −1,α 2 ≤ ≤ tn −1,1−α 2 =1 − α
s n
We can rewrite this expression as:
s s
P X − tn −1,1−α 2 ≤ µ ≤ X − tn −1,α 2 =1 − α
n n
or
s s
P X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2 =1 − α
n n
to get the confidence interval
s s
X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2
n n 128
Review of probability theory and related basic concepts
Confidence Intervals
A correction for the case of finite populations
s s
X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2
n n
When the size N of the population is assumed to be a finite number :
s N −n s N −n
X − tn−1,1−α 2 ≤ µ ≤ X + tn−1,1−α 2
n N −1 n N −1
129
Review of probability theory and related basic concepts
Confidence Intervals
Assessing the sample size needed to estimate the mean within a certain
margin of error and with a certain confidence level
Starting from
s s
X − tn −1,1−α 2 ≤ µ ≤ X + tn −1,1−α 2
n n
Dividing by X
cov µ cov
1 − tn−1,1−α 2 ≤ ≤ 1 + tn−1,1−α 2
n X n
cov X × (1 ± ME ) cov
1 − tn−1,1−α 2 ≤ ≤ 1 + tn−1,1−α 2
n X n 130
Review of probability theory and related basic concepts
Confidence Intervals
Assessing the sample size needed to estimate the mean within a certain
margin of error and with a certain confidence level
Separating in 2 parts
cov cov
1 − tn−1,1−α 2 ≤ 1 − ME 1 + ME ≤ 1 + tn−1,1−α 2
n n
cov cov
tn−1,1−α 2 ≥ ME ME ≤ tn−1,1−α 2
n n
leads to only one expression
cov
tn−1,1−α 2 = ME
n
131
Review of probability theory and related basic concepts
Confidence Intervals
Assessing the sample size needed to estimate the mean within a certain
margin of error and with a certain confidence level
Replacing the value of the t distribution (that depends on the sample size)
cov cov
tn−1,1−α 2 ME ⇒ z1−α 2
= ≈ ME
n n
which leads to 2
cov
n = z1−α 2
ME
Setting ME, defining z1−α 2
and “guessing” cov leads to the value of n
132
Review of probability theory and related basic concepts
Confidence Intervals
It is possible to construct confidence intervals for other parameters
( n − 1) s 2 ( n − 1 ) s 2
P 2 ≤σ ≤ 2
2
1 α
=−
χ n −1,α 2 χ
n −1,1−α 2
and the following confidence interval (which is assymetric!!)
( n − 1) s 2 ≤ σ 2 ≤ ( n − 1) s 2
χ n2−1,α 2 χ n2−1,1−α 2
This interval can also provide good estimates for the variance of other distributions 133
Review of probability theory and related basic concepts
Confidence Intervals
A correction for the case of finite populations1
n −1 N − n 1 2 n −1 N − n 1 2
s ≤σ ≤
2
+ × + × s
N − 1 N − 1 F1−α /2,n−1, N −n N − 1 N − 1 Fα /2,n−1, N −n
This interval can also provide good estimates for the variance of other distributions
1 O’Neill,
134
B. (2014) Some useful moment results in sampling problems. The American Statistician, 68(4), 282-296
Review of probability theory and related basic concepts
Confidence Intervals
Assessing the sample size needed to estimate the variance within a certain
margin of error and with a certain confidence level
Starting from
( )
n − 1 s 2
≤σ 2 ≤
( )
n − 1 s 2
χ n2−1,α 2 χ n2−1,1−α 2
Dividing by s 2
( n − 1) σ 2 ( n − 1)
≤ ≤
χ 2
n −1,α 2 s 2
χ n2−1,1−α 2
Setting a certain margin of error ME (e.g. 1.10 or 0.90)
( n − 1) s 2 × ME ( n − 1)
≤ ≤ 2
χ n2−1,α 2 s 2
χ n−1,1−α 2 135
Review of probability theory and related basic concepts
Confidence Intervals
Assessing the sample size needed to estimate the variance within a certain
margin of error and with a certain confidence level
That simplifies to
( n − 1) ( n − 1)
≤ ME ≤
χ 2
n −1,α 2 χ n2−1,1−α 2
136
Review of probability theory and related basic concepts
Confidence Intervals
Assessing the sample size needed to estimate the variance within a certain
margin of error and with a certain confidence level
Considering a confidence level of 10%
3
2.5
2
ME
1.5
0.5
0 20 40 60 80 100 120 140 160 180 200 137
Sample size
Building Probabilistic Models (distribution
fitting and parameter estimation)
Review of probability theory and related basic concepts
X = { X 1 , X 2 ,..., X n } f(x)
INPUT
MAGICAL PROCESS
x
PERFECT
OUTPUT
139
Review of probability theory and related basic concepts
Real (continuous) data will hardly ever follow an exact theoretical statistical
model
Usually, available samples of real data are not entirely representative of the
true population
X = { X 1 , X 2 ,..., X n } f(x)
INCOMPLETE
INPUT MAGICAL PROCESS
x
IMPERFECT/APPROXIMATED
OUTPUT
140
Review of probability theory and related basic concepts
141
Review of probability theory and related basic concepts
142
Review of probability theory and related basic concepts
x−m
F ( x ) G=
= G(z)
s
cdf of the data
x−m x m
z= G F ( x ) =
−1
= −
s s s
We see there is a linear relation between x and G-1[F(x)] (note that G-1[F(x)]
are the percentile values of x) 144
Review of probability theory and related basic concepts
Since the true cdf F(x) is not known, we have to define an empirical cdf Fn(x)
based on the number of points in the data.
i
( )
Fn x( i ) =
n
where x(i) are the ordered values of X (usually called
ordered statistics or ranks)
i − 0.3 i − 0.3175
Benard ( )
Fn x( i ) =
n + 0.4
Filiben ( )
Fn x( i ) =
n + 0.365
i − 0.35 i − 0.375
Hosking
and Wallis ( )
Fn x( i ) =
n
Blom ( )
Fn x( i ) =
n + 0.25
i − 0.5 i
Hazen ( )
Fn x( i ) =
n
Weibull ( )
Fn x( i ) =
n +1
i − 0.44 i − 0.4
Gringorten ( )
Fn x( i ) =
n + 0.12
Cunnane ( )
Fn x( i ) =
n + 0.2 146
Review of probability theory and related basic concepts
i − 0.3
( )
Fn x( i ) =
n + 0.4
147
Review of probability theory and related basic concepts
148
Review of probability theory and related basic concepts
For the case of the normal distribution family, it can be seen that when
x−m
F ( x ) G=
= G(z)
s
z follows a standard normal distribution N(0,1) and it is possible to obtain
numerical values for its inverse
Fn=( )
x( i )
i − 0.3
n + 0.4
⇒ Φ −1 Fn = ( ( ))
x( i ) z( i )
149
Review of probability theory and related basic concepts
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
z( i ) 0 z( i ) 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-3 -2 -1 0 1 2 3 4 0 2 4 6 8 10 12
x( i ) x( i )
150
Review of probability theory and related basic concepts
Other information we get from a probability plot for the normal distribution
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
z( i ) 0 z( i ) 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
0 10 20 30 40 50 60 70 80 90 0.9 0.92 0.94 0.96 0.98 1 1.02 1.04
x( i ) x( i )
data is skewed to the right data is skewed to the left
(there’s always data below one (there’s always data above one
potential straight line) potential straight line)
151
Review of probability theory and related basic concepts
Other information we get from a probability plot for the normal distribution
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
z( i ) 0 z( i ) 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-6 -4 -2 0 2 4 6 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x( i ) x( i )
For the case of the lognormal distribution family, given the relation between
the normal and the lognormal distributions, we just have to set
y = ln ( x )
and do the plot for ln(x) that now follows a normal distribution
153
Review of probability theory and related basic concepts
For the case of the Weibull distribution family there is an alternative and
even simpler process because the cdf has an analytical expression
k k
ε −x x
− −
F ( x) =
1− e , with x ≤ ε
ε −u
F ( x )= 1 − e u
assume ε = 0
From which we can get
k
x
−
1 e
F =− u
⇔ ln ( − ln (1 − F ) ) =k ln ( x ) − k ln ( u )
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5
-12 -10 -8 -6 -4 -2 0 2 4 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
( )
ln x( i ) ( )
ln x( i )
155
Review of probability theory and related basic concepts
Third, analyse the fitting of the distribution using a q-q plot or a p-p plot
A p-p plot compares the empirical cumulative distribution function of the
data with a specified theoretical cumulative distribution function
x( i )
fit parameters of the
selected distribution ( )
determine F x( i )
i − 0.3
select the expression
for the empirical cdf ( )
Fn x( i ) =
n + 0.4
(for example)
( ) ( )
plot Fn x( i ) versus F x( i )
156
Review of probability theory and related basic concepts
Third, analyse the fitting of the distribution using a q-q plot or a p-p plot
A q-q plot compares the quantiles of the empirical data with the quantiles of
a theoretical distribution
i − 0.3
x( i ) select the expression
for the empirical cdf ( )
Fn x( i ) = y=
n + 0.4
(for example)
plot x( i ) versus F −1 ( y )
157
Review of probability theory and related basic concepts
A p-p plot tends to magnify deviations between the data and the
selected theoretical distribution in the middle range of the
distribution.
A q-q plot tends to magnify deviations between the data and the
selected theoretical distribution in the tail range of the distribution.
The more linear the plot looks, the better the fit between the data
and the theoretical distribution
158
Review of probability theory and related basic concepts
1 1.06
0.9
1.04
0.8
0.7 1.02
0.6
1
0.5
0.98
0.4
0.3 0.96
0.2
0.94
0.1
0 0.92
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.88 0.9 0.92 0.94 0.96 0.98 1 1.02 1.04 1.06
p-p plot for a normal distribution q-q plot for a normal distribution
trying to fit Weibull data trying to fit the same Weibull data
159
Review of probability theory and related basic concepts
3.5
12
11
2.5
Quantiles of the empirical data
1.5
9
8
0.5
0 7
0 0.5 1 1.5 2 2.5 3 7 8 9 10 11 12 13
Theoretical quantiles 10
5 Theoretical quantiles
qq plot (lognormal fit – original data) qq plot (lognormal fit – log of data)
Review of probability theory and related basic concepts
161
Review of probability theory and related basic concepts
Method of Moments
After selecting the distribution type, the number of parameters that need
to be determined is known and it is assumed that the available data is
sufficient to estimate their values
1 n ∞
∑ ( i )
j
ˆ
∫ ( x − c) f X ( x ) dx
j
=mj x − c λj
=
n i =1 −∞
sample moments theoretical moments
162
Review of probability theory and related basic concepts
Method of Moments
If we need to estimate n parameters, we then need n equations:
=m j λ=
j , with j 1,..., n
∞
1 n
∑ ( xˆi − c ) = ∫ ( x − c) f X ( x ) dx , with j = 1,..., n
j j
n i =1 −∞
163
Review of probability theory and related basic concepts
( λ ( µ ,σ ) − m ) + ( λ ( µ ,σ ) − m )
2 2
g ( µ ,=
σ) 1 1 2 2
g ( µ , σ ) = ( µ − m1 ) + ( µ + σ − m2 )
2 2 2 2
L (θ | x1 , x2 ,..., xn ) = L (θ )
167
Review of probability theory and related basic concepts
σ 2π
the likelihood of one of the elements in the data x1 is
2
1 x −µ
1 − 1
L ( µ , σ | x1 ) = e 2 σ
σ 2π
the likelihood of two of the elements in the data x1 and x2 is
2 2
1 x −µ 1 x −µ
1 − 1 1 − 2
L ( µ , σ | x1 , x2 )
= e 2 σ
× e 2 σ
σ 2π σ 2π
168
Review of probability theory and related basic concepts
169
Review of probability theory and related basic concepts
min ( −L (θ | xˆ ) )
170
Review of probability theory and related basic concepts
n
l (θ | x ) log
= ˆ = L (θ | x ) log ∏ f X ( xi )
ˆ ˆ
i =1
n
l (θ | xˆ ) = ∑ log f X ( xˆ )
i =1
− i 2
2
1 xˆ −θ 2
n
1 1 1 xˆi − θ 2
n
∑
l (θ | xˆ ) = ln
θ1 2π
e
2 θ1
n × ln
= − ∑
θ1 2π 2 i 1 θ
i 1= 1
The minimum can be obtained by solving the following equations
∂l n 1 n ∂l 1 n
− + 3 ∑ ( xi − θ 2 ) = = 2 ∑ ( xˆi −=
θ2 ) 0
2
= ˆ 0
∂θ1 θ1 θ1 i =1 ∂θ 2 θ1 i =1
173
Review of probability theory and related basic concepts
∑ ( xˆ − θ )
2
i 2 1 n
θ1 = i =1 θ 2 = ∑ xˆi
n n i =1
For the case of the normal distribution, the sample mean and the
sample standard deviation are the Maximum Likelihood estimators!
Finally, we obtain again:
θ1 = 4.04 θ 2 = 32.67 174
Review of probability theory and related basic concepts
n 3 n 2 n
− θ + θ 4 ∑ ( xi − θ 2 ) θ 3 ∑ ( xi − θ 2 )
2
ˆ ˆ
=
H= 1 1 i 1= 1 i 1
2 n
n
θ 3 ∑ ( xi − θ 2 )
ˆ
θ 2
1 i =1 1
0.836 0
−1 Variance of the standard deviation
C=
ΘΘ H=
0 0.165 Variance of the mean value
175
Review of probability theory and related basic concepts
176
Review of probability theory and related basic concepts
177
Review of probability theory and related basic concepts
For the case of the mean and standard deviation, it can be proven that the
mean estimate is not biased, while the estimate of the standard deviation
is biased. For data samples of larger size, the bias becomes very small.
However, for samples with “common” sizes, a correction to the estimator is
used to correct the bias:
n n
∑ ( xˆi − x ) ∑ ( xˆi − x )
2 2
s= i =1
s= i =1
n n −1
178
Review of probability theory and related basic concepts
n ( n + 1)
4
n
xi − x
( n − 1)( n − 2 )( n − 3) ∑
β2 =
i =1 s
179
Review of probability theory and related basic concepts
n ( n + 1) 3 ( n − 1)
4 2
n
xi − x
=β2 ∑ −
( n − 1)( n − 2 )( n − 3) i =1 s ( n − 2 )( n − 3)
180
Review of probability theory and related basic concepts
Bayesian Estimation
Bayesian Estimation assumes that parameters 𝜃𝜃 are random variables that
have a known prior distribution f (𝜃𝜃). This distribution is typically very
broad or vague to reflect the fact that we know little about its true value
Once we obtain data X, we use the Bayes theorem to find the posterior
distribution f*(𝜃𝜃). Ideally, we want this data to reduce our uncertainty
about the parameters.
181
Review of probability theory and related basic concepts
Bayesian Estimation
By recalling the following relation from the Bayes Theorem:
P ( Ai ) × P ( B | Ai )
P ( Ai | B ) = n
∑ [ P( B | A ) × P( A )]
i =1
i i
Bayesian Estimation
Considering that:
−1
k ∫ f (θ ) × P ( X | θ ) dθ L (θ | X ) = P ( X | θ )
∞
=
−∞
we get:
f * (θ ) =
k × f (θ ) × L (θ | X ) distribution of parameter θ
∞
usually the expected value θ
= *
∫ θ × f * (θ ) dθ
−∞
183
Review of probability theory and related basic concepts
( p ) 1, for 0 ≤ p ≤ 1
f=
On the basis of the inspection of one pile, revealing that it is defective,
the likelihood is the probability of the event X = one pile selected for
inspection is defective, which is p.
184
Review of probability theory and related basic concepts
Bayesian Estimation
Therefore
f * ( p ) =k × f ( p ) × L ( p | X ) =k × 1.0 × p, for 0 ≤ p ≤ 1
−1
k is k =
pdp
1
∫0
=
and the normalizing constant 2
The posterior distribution of p is then:
f * ( p ) 2 p, for 0 ≤ p ≤ 1
=
The point estimate of p is then:
∞ 1
∫ θ × f (θ ) dθ =
∫ p × 2 pdp =0.667
* *
p = 185
−∞ 0
Review of probability theory and related basic concepts
Since parameters s and u are RVs, their (best) estimates (i.e. their average
values) may change when new data is obtained. Assume the distribution of
parameter u is the following exponential distribution with a mean value of
6 m/s: u
−
e 6
f (u ) =
6
Considering that 1 new value of wind speed data is obtained (𝑥𝑥� = 18m/s),
what is the updated value of parameter u?
186
Review of probability theory and related basic concepts
Considering that:
−1
k ∫ f ( u ) × L ( u | xˆ ) du
∞
=
−∞
we get: −
u u 324
− − 2
324
e 36 − u 2
6
6× e 6 u
f ( u ) =k × f ( u ) × L ( u | x ) =k ×
*
ˆ × 2 ×e =k ×
6 u u2
−1
and the normalizing constant k is:
u 324
− − 2
∞ 6× e 6 u
∫0
=k = du 151.987
u 2
188
Review of probability theory and related basic concepts
u 324
∞ 911.922 − 6 − u 2
∞
∫−∞ u × f ( u ) du =
∫0 u e du =
* *
u = 15.67
189
Review of probability theory and related basic concepts
0.18
0.16 Prior distribution
0.14 Posterior distribution
0.12 Likelihood
0.10
0.08
0.06
0.04
0.02
0.00
0.0 10.0 20.0 30.0 40.0 50.0 60.0
u
190
Review of probability theory and related basic concepts
Bayesian Estimation
There are advantages when we “know” the prior distribution of the
parameter we want to estimate and the likelihood function of the data
that is used to estimate the parameter: the posterior distribution may
already be known from existing theoretical results and is often of the same
family
f * (θ ) =
k × f (θ ) × L (θ | X )
The cases where there is a theoretical connection between the prior
distribution, the likelihood function and the posterior distribution which is of
the same family of the prior distribution are known as conjugate distributions
191
Review of probability theory and related basic concepts
192
https://en.wikipedia.org/wiki/Conjugate_prior
Review of probability theory and related basic concepts
• Method of Moments
The simplest approach to obtain parameters, but the estimates are usually
not the best (it is rarely used in practice).
• Maximum Likelihood Method
An approach a bit more complicated (used by most statistical analysis
softwares). We also obtain information about the distribution of the
parameters.
• Bayesian analysis
The most complex approach of the three. It leads directly to the distribution
of the parameters and any prior assumption made about their distribution
may be corrected by a posterior distribution.
193
Building Probabilistic Models (distribution
fitting and parameter estimation)
Review of probability theory and related basic concepts
When using these techniques, you need to know what you’re doing!!!!
195
Review of probability theory and related basic concepts
197
Review of probability theory and related basic concepts
Managing the Level of Significance, the Power of the test and the Type of Errors
• Type I Error – A Type I Error occurs when we reject a true null hypothesis
• Type II Error – A Type II error occurs when we fail to reject a false null
hypothesis
The value of the test statistic based on the sample is compared to a critical
value. The critical value is a specific value of the test statistic defining the
boundary of the rejection region above or below which (depending on the
test) we reject the null hypothesis.
The critical value depends on the statistical distribution of the test statistic
and on the selected level of significance
Let’s assume also that the distribution of the statistic δ is known. This
distribution defines how likely is each value of the statistic δ.
fδ
δ
204
Review of probability theory and related basic concepts
fδ
δ 205
δ*
Review of probability theory and related basic concepts
fδ
P(δ > δcrit) = 0.05
(Rejection region - the most
unlikely values of δ)
δcrit δ
206
Review of probability theory and related basic concepts
δ
δcrit
207
Review of probability theory and related basic concepts
δ
δcrit, low δcrit, up
208
Review of probability theory and related basic concepts
So, is a probability of
P(δ > δ#) = 0.005
0.5% low enough?
δ 210
δcrit δ#
Review of probability theory and related basic concepts
212
Review of probability theory and related basic concepts
213
Review of probability theory and related basic concepts
( Oi − n × pi )
2
M Oi and n×pi should be > 5
χ =∑
2
i =1 n × pi Reduce M if needed
215
Review of probability theory and related basic concepts
216
Review of probability theory and related basic concepts
0.1
0 1 2 3 4
Number of occurrences
217
Review of probability theory and related basic concepts
Poisson distribution P(ν) is a discrete probability distribution that gives the probability of k
events occurring in a fixed interval of time and/or space if these events occur with a known
mean rate of occurrence ν and independently of the time since the last event.
(ν t ) (ν )
k k
fP ( k ) = e −ν t t = 1 year fP ( k ) = e −ν
k! k!
Mean: µK = ν t t = 1 year µK = ν
218
Review of probability theory and related basic concepts
1 n 20 × 0 + 23 ×1 + 15 × 2 + 6 × 3 + 2 × 4
m1 = ∑
n i =1
xˆi
66
= 1.197
(ν )
∞ x
m1= λ1= ∫x
0
x!
e −ν dx= µ= ν= 1.197
219
Review of probability theory and related basic concepts
n
n
L (θ | xˆ ) = ∏ f X ( xˆi ) l (θ | xˆ ) log
= = L (θ | xˆ ) log ∏ f X ( xˆi )
i =1 i =1
n
l (θ | xˆ ) = ∑ log f X ( xˆ ) min ( −l (θ | xˆ ) )
i =1
(ν )
x
(ν )
xˆi
n
fX ( x) = e −ν
L (ν | xˆ ) = ∏ e −ν …
x! i =1 xˆi !
220
Review of probability theory and related basic concepts
data
Poisson distribution
0.3
0.2
Probability
0.1
0 1 2 3 4
Number of occurrences
221
Review of probability theory and related basic concepts
(1.197 )
x
fX ( x) = e −1.197 M
( Oi − n × pi )
2
x! χ =∑
2
i =1 n × pi
( Oi − n × pi )
2
Nº of storms Observed Theoretical
( Oi − n × pi )
2
per year frequencies Oi frequencies n×pi n × pi
0 20
I 23
2 15 with M = 5, we have a number of
3 6 observed frequencies that is lower than
4 2 5, so we need to reduce M: aggregate
the cases with 3 and 4 storms per year 222
Review of probability theory and related basic concepts
(1.197 )
x
fX ( x) = e −1.197 M
( Oi − n × pi )
2
x! χ =∑
2
i =1 n × pi
( Oi − n × pi )
2
Nº of storms Observed Theoretical
( Oi − n × pi )
2
per year frequencies Oi frequencies n×pi n × pi
0 20 19.94 0.0036 0.0002
I 23 23.87 0.7569 0.0317
2 15 14.29 0.5041 0.0353
≥3 8 = 6+2 7.90 0.0100 0.0013
Total 66 66 0.0685
223
Review of probability theory and related basic concepts
n × f X (1) =×
66 e −1.197 =
23.87
1!
(1.197 )
x
fX ( x) = e −1.197 M
( Oi − n × pi )
2
x! χ =∑2
i =1 n × pi
( Oi − n × pi )
2
Nº of storms Observed Theoretical
( Oi − n × pi )
2
per year frequencies Oi frequencies n×pi n × pi
0 20 19.94 0.0036 0.0002
I 23 23.87 0.7569 0.0317
2 15 14.29 0.5041 0.0353
≥3 8 = 6+2 7.90 0.0100 0.0013
Total 66 66 0.0685
224
Review of probability theory and related basic concepts
( Oi − n × pi )
2
M
χ 2
∑
=
i =1 n × pi
0.0685
χ2 distribution
1 − FX ( x p ) =
p= 1− P ( X ≤ xp ) =
P ( X > xp )
table values = x p
D−
1≤i ≤ n
( )
max FX x( i ) − ( i − 1) n =D + max i n − FX x( i )
1≤i ≤ n
( )
Maximum vertical distance between the Maximum vertical distance between the
cdf of the target distribution FX and the cdf of the target distribution FX and the
empirical cdf FnX when FX > FnX empirical cdf FnX when FX < FnX
Review of probability theory and related basic concepts
i =1
229
Review of probability theory and related basic concepts
Assume the data follows a normal distribution N(30,4.5) and test this
hypothesis using the KS test and considering a 5% significance level 231
Review of probability theory and related basic concepts
233
Review of probability theory and related basic concepts