Professional Documents
Culture Documents
Ricco Rakotomalala
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 1
Calculation of CDF and PPF in inferential statistics
Via Excel statistical functions (new functions are available from Excel 2010)
Via R’s statistical functions provided by the “stats” package (directly accessible)
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 2
NORMAL DISTRIBUTION
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 3
CDF of the standard normal distribution (μ = 0 and σ = 1).
Probability of less than x = 1.65 is equal to 0.9505285
TRUE for the CDF. If FALSE, we have the value of
Probability density function the density function. Required.
EXCEL
1 −
1 𝑥−𝜇 2 NORM.DIST(1.65, 0 , 1 , TRUE)
𝑓 𝑥 = 𝑒 2 𝜎
𝜎 2𝜋 (μ = 0) and (σ = 1). Required settings.
0.9505285 R
pnorm(1.65, mean = 0, sd = 1, lower.tail = TRUE)
Python
stats.norm.cdf(1.65, loc = 0, scale = 1)
x=1.65
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 4
Calculation of the p-value for the standard normal distribution in a right
tailed test. The probability of more than z = 2.1 is equal to 0.01786442
EXCEL
1- NORM.S.DIST(2.1, TRUE)
p-value = 0.01786442
R
1 - pnorm(2.1)
z=2.1
Python
1 - stats.norm.cdf(2.1)
stats.norm.sf(2.1)
sf = 1 - cdf
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 5
Calculation of the p-value for the standard normal distribution in a two-
tailed test. The probability of more than z = 2.1 in absolute value is equal
to 0.03572884
R
2 * pnorm(2.1, lower.tail = FALSE)
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 6
PPF (q) of the standard normal distribution for
the probability (1 – α) = 0.95
EXCEL
NORM.INV(0.95, 0, 1)
NORM.S.INV(0.95)
α =0.05 R
1-α qnorm(0.95,mean=0,sd = 1,lower.tail=TRUE)
qnorm(0.05,mean=0,sd=1,lower.tail=FALSE)
q = 1.644854
Python
stats.norm.ppf(0.95, loc =0, scale = 1)
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 7
Generating random numbers from standard normal distribution
N(μ=0,σ=1)
RAND() returns an evenly distributed random real
number greater than or equal to 0 and less than 1.
EXCEL
NORM.S.INV(RAND())
R
rnorm(n=1,mean=0,sd = 1)
𝑥²
−
𝑒 2 0.4361836 −0.1201676 0.9772980
Φ1 𝑥 = 1 − + 2
+ 3
2𝜋 1 + 0.33267𝑥 1 + 0.33267𝑥 1 + 0.33267𝑥
(https://fr.wikipedia.org/wiki/Loi_normale)
1
1 1 𝑥2 2 1 2
−2 2
Φ2 𝑥 = 0.5 + 1− 7𝑒 + 16𝑒 −𝑥 2− 2
+ 7 + 𝜋𝑥2 𝑒 −𝑥
2 30 4
(http://mathworld.wolfram.com/NormalDistributionFunction.html)
Φ1 1.65 = 0.9494966
(Excel, R and Python 0.9505285)
Φ2 1.65 = 0.9505364
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 9
STUDENT’S T-DISTRIBUTION
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 10
CDF of Student’s t-distribution with k (k > 0) degrees of freedom.
Probability of less than t = 1.5 with k = 10.
R
0.9177463
pt(1.5,df=10,lower.tail=TRUE)
1 - pt(1.5,df=10,lower.tail=FALSE)
Python
stats.t.cdf(1.5,df=10)
t = 1.5
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 11
PPF (q) of the Student’s t-distribution with k = 10 degrees
of freedom for the probability (1 – α) = 0.95
EXCEL
T.INV(0.95,10)
R
1-α=0.95 qt(0.95,df=10,lower.tail=TRUE)
qt(0.05,df=10,lower.tail=FALSE)
q = 1.812461 Python
stats.t.ppf(0.95,df=10)
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 12
CDF and PPF for two-tailed Student’s t-distribution.
EXCEL provides two specific functions.
T.DIST.2T(ABS(1.5),10)
-|1.5| +|1.5|
𝛼
𝛼 = 0.05 T.INV.2T(0.1,10)
= 0.05 2
2
𝛼 = 0.1
q = 1.812461
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 13
CHI-SQUARED DISTRIBUTION
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 14
CDF of the CHI-SQUARED distribution with k (k > 0) degrees of freedom.
Probability of less than t = 12.0 with k = 5.
1 𝑘 𝑡
EXCEL
𝑓𝑘 𝑡 = 𝑘 𝑡 2−1 𝑒 −2
𝑘
22 Γ CHISQ.DIST(12.0,5,TRUE)
2
1 – CHISQ.DIST.RT(12.0,5) We can use also the probability
of more than t = 12.0
0.9652122
R
pchisq(12.0,df=5)
1 - pchisq(12.0,df=5,lower.tail=FALSE)
Python
stats.chi2.cdf(12.0,df=5)
t = 12.0
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 15
PPF (q) of the chi-squared distribution with k = 7 degrees of
freedom for the probability (1 – α) = 0.95
EXCEL
CHISQ.INV (0.95,7)
CHISQ.INV.RT (0.05,7)
R
qchsiq(0.95,df=7)
1-α=0.95
qchisq(0.05,df=7,lower.tail=FALSE)
q = 14.06714
Python
stats.chi2.ppf(0.95, df=7)
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 16
FISHER-SNEDECOR DISTRIBUTION
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 17
CDF of F-distribution with d1 (d1 > 0) and d2 (d2 > 0) degrees of freedom.
Probability of less than x = 3.5 with (d1 = 4, d2 = 26).
0.97948051 R
pf(3.5,df1=4,df2=26)
1 - pf(3.5,df1=4,df2=26,lower.tail=FALSE)
Python
stats.f.cdf(3.5,dfn=4,dfd=26)
x = 3.5
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 18
PPF (q) of the F-Distribution with (d1 = 4, d2 = 26) degrees
of freedom for the probability (1 – α) = 0.95
EXCEL
F.INV(0.95,4,26)
F.INV.RT(0.05,4,26)
R
qf(0.95,df1=4,df2=26)
1-α=0.95 qf(0.05,df1=4,df2=26,lower.tail=FALSE)
Python
q = 2.742594
stats.f.ppf(0.95,dfn=4,dfd=26)
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 19
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 20
References
https://docs.scipy.org/doc/scipy/reference/stats.html
http://www.cyclismo.org/tutorial/R/probability.html
Ricco Rakotomalala
Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 21