Professional Documents
Culture Documents
SI-5101
Engineering Analysis
Foundation of Fact-based Decision
Making:
Statistics and Probability
BIEMO W. SOEMARDI
b.soemardi@itb.ac.id
AUGUST 2022
1
• What is quantitative
analysis and what is
it for?
• 10 minutes, send
your answer via
MsTeam
Quiz #1
SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 8/23/2022 2
01 Introduction
Outline
Foundation of Fact- Review:
based Decision Making: 02 Statistic
Statistics and
Probability Review:
03 Probabilistic
3
01
Introduction
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 4
Review:
Statistics & Probability
*) Oxford Dictionaries:
- Fact (noun): A thing that is know of proved to be true
- Decision (noun): A conclusion or resolution reached after consideration
Measurable
Collectable and Storable
Can be analysed and manipulated
Can be visualized
https://gking.harvard.edu/publications/preface-big-data-not-about-data
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 6
Categorical; Non-rank ordered
Nominal e.g., gender, marital status
Discrete
Like Interval, but with zero
e.g., height, weight
QUALITATIVE
Continuous
https://microbenotes.com/nominal-ordinal-interval-and-ratio-
data/#:~:text=%20Nominal%20Data%20%201%20Nominal%20data%20is,codes%20or%20names%20that%20are%20used...%20More%20
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 8
Value of Data
Data are a set of values of qualitative or quantitative
variable about one or more persons or objects, while
a datum (singular of data) is a single value of a single variable.
Data are collected and analyzed; data only becomes
information suitable for making decisions once it has been
analyzed in some fashion.
Data Quality is the overall utility of dataset(s) as a function
of its ability to be easily process, analyzed for other uses.
Review
Statistic
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 10
Statistics - Basic
o Statistics is the discipline that concerns the
collection, organization, analysis, interpretation
and presentation of data.
o Quality of Data depends on how it is collected
• Population
• Sample
• Sampling
https://upload.wikimedia.org/wikipedia/commons/2/21/Iris_Pairs_Plot.png
Sampling
Method
Non-
Probabilistic
Probabilistic
R2 Strength of Relationship
R2 < 0.20 Weak
0.20 < R2 < 0.40 Moderate
0.40 < R2 < 0.65 Strong
R2 > 0.65 Very strong
XY − X Y
Jakarta 8,821,000
Income
50,000,000
r= N Bandung 2,395,000 31,250,000
( X )2
( Y )
2 Yogyakarta
Semarang
388,500
1,556,000
25,165,000
31,250,000
X − Y −
2 2
Regression
City Population
Income
Jakarta 8,821,000 50,000,000
Bandung
Yogyakarta
2,395,000
388,500
31,250,000
25,165,000
Analysis
Semarang 1,556,000 31,250,000
Surabaya 2,765,000 39,600,000 Pearson’s Correlation Coefficients
Tangerang 925,000 30,210,000
XY − X Y
r= N
( X )
2
( Y )2
X − −
2 2
N Y N
2 2
X Y X Y XY
8,82 50,00 77,81 2.500,00 441,05
2,40 31,25 5,74 976,56 74,84 r = 0,18
0,39 25,17 0,15 633,28 9,78
1,56 31,25 2,42 976,56 48,63
2,77 39,60 7,65 1.568,16 109,49
0,93 30,21 0,86 912,64 27,94
16,85 207,48 94,62 7.567,21 711,73
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 19
Regression
Analysis
Spearman Rank Correlation Coefficients
Spearman’s rank correlation coefficient or
Spearman’s ρ, rs is a nonparametric measure of
rank correlation (statistical dependence between
the rankings of two variables).
Analysis
2.750.000 40 5 8 -3 9
4.525.000 45 10 12 -2 4 Spearman Rank Correlation Coefficients
1.757.500 30 2 3 -1 1
2.115.000 43 3 11 -8 64
3.557.500 36 7 6 1 1
Hours of
2.125.700 35 4 5 -1 1 Income
Works
4.212.550 38 8 7 1 1
2.750.000 40
5.121.500 42 11 10 1 1
4.525.000 45
3.520.150 28 6 1 5 25
1.757.500 30
5.215.250 29 12 2 10 100
2.115.000 43
4.254.550 41 9 9 0 0
3.557.500 36
1.151.500 32 1 4 -3 9 2.125.700 35
216 4.212.550 38
5.121.500 42
3.520.150 28
5.215.250 29
4.254.550 41
1.151.500 32
rs = 0,245
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 21
Multiple Regression
Analysis
Technique used for predicting the unknown value of a (dependent) variable from
the known value of two or more (independent) variables (called the predictors)
Multiple regression analysis helps us to predict the value of Y for given values of
X1, X2, …, Xk.
Y = b0 + b1 X1 + b2 X2 + …………………… + bk Xk
Y = productivity
X1 = location
X2 = size of crew
X3 = timing
X4 = management
Expected
Observed
Male Female Total Male Female Total
Pass 17 20 37 Pass 18,5 18,5 37
Fail 8 5 13 Fail 6,5 6,5 13
Total 25 25 50 Total 25 25 50
Chi Squared
Test
https://howecoresearch.blogspot.com/2019/01/using-analysis-of-variance-anova-in.html
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 25
ANOVA
• The averages of the groups are
not significantly different
https://howecoresearch.blogspot.com/2019/01/using-analysis-of-variance-anova-in.html
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 26
ANOVA Data Point Regression Value
Sum Squared Regression Error
2
𝑠𝑠𝑅 = 𝑦𝑖 − 𝑦𝑅
Two-way ANOVA
Have two independent
variables
Two-way ANOVA
Have two independent
variables
mean-1 mean-2
𝑥ҧ − 𝜇0
𝑡=
𝑠Τ 𝑛
Standard deviation of the
sample size
• The data used in must be continuous or ordinal,
different between the pair
randomly selected from population
• Population is infinite, data distribution is normal (bell
https://serc.carleton.edu/introgeo/teachingwdata/Ttest.html shape), and variance is unknown but homogenous.
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 30
Measure of goodness
Student t-Test
mean
𝑥ҧ − 𝜇0
𝑡=
var 12 var 22
+
𝑛1 𝑛2
variance sample size
Review
Probabilistic
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 33
Waiting for ..
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 34
Introduction
Possibility degree of preciseness of certainty
(kemungkinan)
Statement Meaning
• There's some uncertainty or doubt to the train’s time of arrival.
"There's a possibility that the • There is doubt that it will arrive on time and a chance it will
w v .“ arrive late.
• Both events can occur. In fact, the train could arrive early.
John Tsitsiklis, and Patrick Jaillet. RES.6-012 Introduction to Probability. Spring 2018. Massachusetts
Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu.
𝒙𝒊
• 0 < p(event) < 1
𝒑 𝒙 =
Probability Rules: 𝜮𝒙𝒊
• Sum of all p(event) = 1
Events: • Mutually Exclusive → only one of the events can occur on any one trial
• Collective Exhaustive → if outcomes include all possible outcome
n
X = E ( X ) = xi f ( xi )
i =1
= Var ( X ) = E ( X − X )
2
X 2
The square root of Var(X) is the standard deviation of X
Var(X) can alternatively be written in terms of a weighted sum of squared
deviations, because
E ( X − X ) = ( xi − X ) f ( xi )
2 2
2 π ~ 3.14159
e ~ 2.71828
X1
df
F= 1
X2
df
2
n-k
μx = mean
σx = standard deviation
n = number of trial
k = number of trial, success
p = probability [0,1] 1= success
q = probability [0,1] 0 = failure = (1-p)
23/08/2022 SI-5101 ANALISIS REKAYASA – Ir. Biemo W. Soemardi Ph.D 57
Binomial distribution:
Cumulative Distribution Function (CDF)
𝑘
𝑛 ⅈ 𝑛−1
𝑃 𝑥≤𝑘 = 𝑝 1−𝑝
𝑝
𝑗=0
Binomial coefficient
𝑛 𝑛!
=
𝑘 𝑘! 𝑛 − 𝑘 !
1
Mean, 𝜇=
𝜆
1
Variance, 𝑣𝑎𝑟 = 2
𝜆
λ = rate parameter
𝜆𝑥
𝐹 𝑥; 𝜆 = ቊ1 − ⅇ 𝑥≥0
0 𝑥<0
𝜆𝑥 ⅇ−𝜆
𝑃 𝑥 =
𝑥!
λ = rate parameter
e = Euler number, 2.71828
p(A) p(B)
statistically
Conditional Probability p(A) p(B) p(A|B) = p(AB)/p(B)
dependence
DATA
p(x|c)
Supplier A2 95 5
• Let G denote that a part is good and B denote the event that a part is bad.
Then, we have the following conditional probabilities:
P(G | A1 ) = 0.98 and P(B | A2 ) = 0.02
P(G | A2 ) = 0.95 and P(B | A2 ) = 0.05
A1 B
(A1, B)
A2 G (A2, G)
B
(A2, B)
P(A1)
P(B | A2)
P ( A1 B ) = P ( A1 ) P ( B | A1 )
P(B | A2)
P(A2) P ( A2 G ) = P ( A2 ) P (G | A2 )
P(B | A2)
P ( A2 B ) = P ( A2 ) P (G | A2 )
P( A1 , G ) = P( A1 G ) = P( A1 ) P(G | A1 ) (1)
P( A1 B) = P( A1 ) P( B | A1 ) (3)
P ( B ) = P ( A1 B ) + P ( A2 B ) (4)
P ( B ) = P ( A1 ) P ( B | A1 ) + P ( A2 ) P ( B / A2 )
P ( Ai ) P ( B | Ai )
P ( Ai | B ) =
P ( A1 ) P ( B | A1 ) + P ( A2 ) P ( B | A2 ) + ... + P ( An ) P ( B | An )
P ( A2 ) P ( B | A2 )
P ( A2 | B ) =
P ( A1 ) P ( B | A1 ) + P ( A2 ) P ( B | A2 )
(.35)(.05) .0175
= = = .5738
(.65)(.02) + (.35)(.05) .0305
P(B | A2)
P(A2) P( A2 G ) = P( A2 ) P(G | A2 ) = .3325
.95
.35
P(B | A2)
P( A2 B) = P( A2 ) P(G | A2 ) = .0175
.05