You are on page 1of 8

PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

RANDOM VARIABLES
DISCRETE RANDOM VARIABLE
PROBABILITY FUNCTION
0 ≤ 𝑃𝑃[𝑋𝑋 = 𝑥𝑥 ] ≤ 1 𝑃𝑃[𝑋𝑋 > 𝑥𝑥 ] = 1 − 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥]
� 𝑃𝑃[𝑋𝑋 = 𝑥𝑥 ] = 1 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥 ] = � 𝑃𝑃[𝑋𝑋 = 𝑥𝑥]

DISTRIBUTION FUNCTION (c.d.f.)


0 ≤ 𝐹𝐹 (𝑥𝑥 ) ≤ 1 𝐼𝐼𝐼𝐼 𝑥𝑥1 ≤ 𝑥𝑥2 , 𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝐹𝐹 (𝑥𝑥1 ) ≤ 𝐹𝐹(𝑥𝑥2 )
𝐹𝐹 (𝑦𝑦) = 0 ∀𝑦𝑦 < min 𝑆𝑆. 𝐹𝐹 (𝑦𝑦) = 1 ∀𝑦𝑦 > max 𝑆𝑆.
𝑇𝑇ℎ𝑒𝑒𝑒𝑒, 𝐹𝐹 (−∞) = 0 𝑇𝑇ℎ𝑒𝑒𝑒𝑒, 𝐹𝐹 (∞) = 1
∀𝑎𝑎, 𝑏𝑏 ∈ ℝ, 𝑃𝑃(𝑎𝑎 < 𝑋𝑋 ≤ 𝑏𝑏) = 𝑃𝑃(𝑋𝑋 ≤ 𝑏𝑏) − 𝑃𝑃(𝑋𝑋 ≤ 𝑎𝑎) = 𝐹𝐹 (𝑏𝑏) − 𝐹𝐹(𝑎𝑎)

EXPECTATION of a D.R.V.: 𝐸𝐸 [𝑋𝑋 ] = ∑ 𝑥𝑥𝑖𝑖 𝑝𝑝𝑖𝑖


𝐸𝐸 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 [𝑋𝑋] 𝐸𝐸 [𝑔𝑔(𝑥𝑥)] = � 𝑔𝑔(𝑥𝑥)𝑃𝑃(𝑋𝑋 = 𝑥𝑥)
VARIATION of a D.R.V.: 𝑉𝑉 [𝑋𝑋 ] = 𝐸𝐸 [𝑋𝑋 2 ] − 𝐸𝐸 [𝑋𝑋]2 𝑉𝑉 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑏𝑏 2 𝑉𝑉[𝑋𝑋]
STANDARD DEVIATION of a D.R.V.: 𝑆𝑆 [𝑋𝑋 ] = �𝑉𝑉[𝑋𝑋]

CONTINUOUS RANDOM VARIABLE


DISTRIBUTION FUNCTION
0 ≤ 𝐹𝐹 (𝑥𝑥 ) ≤ 1 𝐼𝐼𝐼𝐼 𝑥𝑥1 ≤ 𝑥𝑥2 , 𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝐹𝐹 (𝑥𝑥1 ) ≤ 𝐹𝐹(𝑥𝑥2 )
𝐹𝐹 (−∞) = 0 𝐹𝐹 (∞) = 1
∀𝑎𝑎, 𝑏𝑏 ∈ ℝ, 𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏) = 𝐹𝐹 (𝑏𝑏) − 𝐹𝐹(𝑎𝑎) 𝐹𝐹 (𝑥𝑥 ) 𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐

The probability mass function has no meaning for continuous r.v. because
𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = 0. In its place we use the density function:
DENSITY FUNCTION
𝑏𝑏
𝑓𝑓 (𝑥𝑥 ) ≥ 0 ∀𝑥𝑥 ∈ ℝ 𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏) = � 𝑓𝑓(𝑥𝑥 )𝑑𝑑𝑑𝑑 ∀𝑎𝑎, 𝑏𝑏 ∈ ℝ
𝑎𝑎
𝑥𝑥 ∞
𝐹𝐹 (𝑥𝑥 ) = 𝑃𝑃(𝑋𝑋 ≤ 𝑥𝑥) = � 𝑓𝑓 (𝑢𝑢)𝑑𝑑𝑑𝑑 � 𝑓𝑓 (𝑥𝑥 )𝑑𝑑𝑑𝑑 = 1
−∞ −∞

EXPECTATION of a C.R.V.: 𝐸𝐸 [𝑋𝑋 ] = ∫𝑆𝑆 𝑥𝑥 𝑓𝑓 (𝑥𝑥) 𝑑𝑑𝑑𝑑

𝐸𝐸 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 [𝑋𝑋] 𝐸𝐸 [𝑔𝑔(𝑥𝑥)] = � 𝑔𝑔(𝑥𝑥) 𝑓𝑓 (𝑥𝑥) 𝑑𝑑𝑑𝑑


VARIATION of a C.R.V.: 𝑉𝑉 [𝑋𝑋 ] = 𝐸𝐸 [𝑋𝑋 2 ] − 𝐸𝐸 [𝑋𝑋]2 𝑉𝑉 [𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑏𝑏 2 𝑉𝑉[𝑋𝑋]
STANDARD DEVIATION of a C.R.V.: 𝑆𝑆 [𝑋𝑋 ] = �𝑉𝑉[𝑋𝑋]
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

CHEBYSHEV’S INEQUALITY
The inequality provides a bound for the probability of a random variable when the
expectation (𝐸𝐸 [𝑋𝑋]) and the variance (𝑉𝑉[𝑋𝑋]) are available.
𝑉𝑉(𝑋𝑋) 𝑉𝑉(𝑋𝑋)
𝑃𝑃(|𝑋𝑋 − 𝐸𝐸 [𝑋𝑋]| ≥ 𝑘𝑘) ≤ 𝑜𝑜𝑜𝑜 𝑃𝑃(|𝑋𝑋 − 𝐸𝐸 [𝑋𝑋]| < 𝑘𝑘) ≥ 1 −
𝑘𝑘 2 𝑘𝑘 2

BIVARIATE RANDOM VARIABLE


JOINT DISTRIBUTION FUNCTION 𝑭𝑭(𝒙𝒙, 𝒚𝒚)
𝑥𝑥 𝑦𝑦
𝐹𝐹 (𝑥𝑥, 𝑦𝑦) = 𝑃𝑃(𝑋𝑋 ≤ 𝑥𝑥, 𝑌𝑌 ≤ 𝑦𝑦) = � � 𝑓𝑓(𝑥𝑥, 𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
−∞ −∞

JOINT DENSITY FUNCTION 𝒇𝒇(𝒙𝒙, 𝒚𝒚)


𝑥𝑥 𝑦𝑦
𝑓𝑓 (𝑥𝑥, 𝑦𝑦) ≥ 0 ∀𝑥𝑥, 𝑦𝑦 ∈ ℝ � � 𝑓𝑓(𝑥𝑥, 𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 1
−∞ −∞
𝑏𝑏 𝑑𝑑
𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏, 𝑐𝑐 ≤ 𝑌𝑌 ≤ 𝑑𝑑 ) = � � 𝑓𝑓 (𝑥𝑥, 𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑎𝑎 𝑐𝑐

MARGINAL DENSITY FUNCTION


∞ ∞
𝑓𝑓𝑥𝑥 (𝑥𝑥) = � 𝑓𝑓(𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 𝑓𝑓𝑦𝑦 (𝑦𝑦) = � 𝑓𝑓(𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑
−∞ −∞
∞ ∞
𝐸𝐸 [𝑋𝑋] = 𝑓𝑓𝑥𝑥 (𝑥𝑥) = � 𝑥𝑥 𝑓𝑓𝑥𝑥 (𝑥𝑥) 𝑑𝑑𝑑𝑑 𝐸𝐸 [𝑌𝑌] = 𝑓𝑓𝑦𝑦 (𝑦𝑦) = � 𝑦𝑦 𝑓𝑓𝑦𝑦 (𝑦𝑦) 𝑑𝑑𝑑𝑑
−∞ −∞

CONDITIONAL DENSITY DISTRIBUTION


𝑓𝑓𝑦𝑦 (𝑥𝑥0 , 𝑦𝑦)
𝑓𝑓�𝑌𝑌�𝑋𝑋� (𝑦𝑦 | 𝑋𝑋 = 𝑥𝑥0 ) =
𝑓𝑓𝑥𝑥 (𝑥𝑥0 )

COVARIANCE

Independent r.v 𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦] = 0


Dependent r.v. 𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦] = 𝐸𝐸 [𝑋𝑋𝑋𝑋] − 𝐸𝐸 [𝑋𝑋]𝐸𝐸[𝑌𝑌]

VARIANCE

Independent r.v 𝑉𝑉[𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎2 𝑉𝑉[𝑋𝑋] + 𝑏𝑏 2 𝑉𝑉[𝑋𝑋]


Dependent r.v. 𝑉𝑉[𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏] = 𝑎𝑎2 𝑉𝑉[𝑋𝑋] + 𝑏𝑏 2 𝑉𝑉[𝑋𝑋] + 2𝑎𝑎𝑎𝑎 𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦]

CONDITIONAL DENSITY DISTRIBUTION


𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦]
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶[𝑥𝑥, 𝑦𝑦] =
�𝑉𝑉[𝑋𝑋]𝑉𝑉[𝑌𝑌]
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

DISTRIBUTION MODELS
DISCRETE R.V.: Bernoulli, binomial, geometric, poisson
CONTINUOUS R.V.: Uniform, exponential, normal

BERNOULLI MODEL (D)


This probability model describes an experiment with two possible outcomes:
“success” or “fail”:
1 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑝𝑝) 𝑋𝑋 = �
0 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓

Let 𝑝𝑝 ∈ [0,1] denotes 𝑝𝑝(𝑥𝑥 = 1) = 𝑝𝑝, 𝑝𝑝(𝑥𝑥 = 0)1 − 𝑝𝑝

𝑝𝑝 𝑖𝑖𝑖𝑖 𝑥𝑥 = 1
 PROB. MASS FUNCTION: 𝑝𝑝(𝑥𝑥) = �
1 − 𝑝𝑝 𝑖𝑖𝑖𝑖 𝑥𝑥 = 0

0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0
 DISTRIBUTION FUNCTION: 𝐹𝐹(𝑥𝑥) = �1 − 𝑝𝑝 𝑖𝑖𝑖𝑖 0 ≤ 𝑥𝑥 < 1
1 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 1

EXPECTATION: 𝐸𝐸 [𝑋𝑋 ]= 𝑝𝑝
VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝑝𝑝(1 − 𝑝𝑝)

ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = �𝑝𝑝(1 − 𝑝𝑝)

BINOMIAL MODEL (D)


This model describes the total number of successes of a n equal Bernoulli
experiments repeated independently.

𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑛𝑛, 𝑝𝑝) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

The random variable represents the number of successes


and follows a binomial distribution (𝑝𝑝 ∈ [0,1]).

𝑛𝑛 𝑘𝑘
 PROB. MASS FUNCTION: 𝑃𝑃(𝑋𝑋 = 𝑘𝑘 ) = � � 𝑝𝑝 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ∀ 𝑘𝑘 ∈ ℕ
𝑘𝑘
𝑛𝑛 𝑛𝑛!
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 � � =
𝑘𝑘 𝑘𝑘! (𝑛𝑛 − 𝑘𝑘)!

EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] = 𝑛𝑛𝑝𝑝


VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)

ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = �𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)


PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

GEOMETRIC MODEL (D)


The random variable denotes the number of trials until the first success.

𝑋𝑋 ~ 𝐺𝐺(𝑝𝑝) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

 PROB. MASS FUNCTION: 𝑃𝑃(𝑋𝑋 = 𝑘𝑘 ) = (1 − 𝑝𝑝)𝑘𝑘−1 𝑝𝑝 ∀ 𝑘𝑘 ∈ ℕ

1
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] =
𝑝𝑝
1−𝑝𝑝
VARIATION: 𝑉𝑉 [𝑋𝑋 ] =
𝑝𝑝2
�(1−𝑝𝑝)
ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] =
𝑝𝑝

POISSON DISTRIBUTION (D)


Expresses the probability of a given number of events occurring in a fixed interval
of time or space (area, volume…) and knowing their average rate (𝜆𝜆).

𝑋𝑋 ~ 𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

If 𝑋𝑋~𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆1 ) and 𝑌𝑌~𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆2 ) are independent, then 𝑋𝑋 + 𝑌𝑌~𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆1 + 𝜆𝜆2 ).

𝜆𝜆𝑘𝑘
 PROB. MASS FUNCTION: 𝑃𝑃 (𝑋𝑋 = 𝑘𝑘 ) = 𝑒𝑒 −𝜆𝜆 ∀ 𝑘𝑘 ∈ ℕ
𝑘𝑘!

EXPECTATION: 𝐸𝐸 [𝑋𝑋] = 𝜆𝜆
VARIATION: 𝑉𝑉 [𝑋𝑋] = 𝜆𝜆
ST. DEVIATION: 𝑆𝑆[𝑋𝑋] = √𝜆𝜆

UNIFORM DISTRIBUTION (C)


For the uniform distribution every set of the same length has the same probability.
A cont. r.v. variable 𝑋𝑋 follows a uniform distribution over the interval (𝑎𝑎, 𝑏𝑏) if:

)−1 𝑖𝑖𝑖𝑖 𝑎𝑎 < 𝑥𝑥 ≤ 𝑏𝑏


𝑓𝑓(𝑥𝑥) = � 𝑏𝑏 − 𝑎𝑎
(
𝑋𝑋 ~ 𝒰𝒰(𝑎𝑎, 𝑏𝑏)
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

𝑎𝑎+𝑏𝑏
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] =
2
(𝑏𝑏−𝑎𝑎)2
VARIATION: 𝑉𝑉 [𝑋𝑋 ] =
12
𝑏𝑏−𝑎𝑎
ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] =
√12
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

EXPONENTIAL DISTRIBUTION (C)


The random variable equals the distance between successive events in a Poisson
process follows an exponential distribution with parameter 𝜆𝜆.

𝑋𝑋 ~ 𝐸𝐸𝐸𝐸𝐸𝐸(𝜆𝜆) 𝑥𝑥 ∈ {0, 1, 2, … , 𝑛𝑛}

𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 0


 DENSITY MASS FUNCTION: 𝑓𝑓(𝑥𝑥) = �
0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0
1 − 𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 0
 CUMULATIVE DISTRIB. FUNCTION: 𝐹𝐹(𝑥𝑥) = �
0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0

EXPECTATION: 𝐸𝐸 [𝑋𝑋] = 𝜆𝜆−1


VARIATION: 𝑉𝑉 [𝑋𝑋] = 𝜆𝜆−2
ST. DEVIATION: 𝑆𝑆[𝑋𝑋] = 𝜆𝜆−1

LACK OF MEMORY PROPERTY: Given 𝑥𝑥1 , 𝑥𝑥2 > 0:

𝑃𝑃(𝑋𝑋 > 𝑥𝑥1 + 𝑥𝑥2 |𝑋𝑋 > 𝑥𝑥1 ) = 𝑃𝑃(𝑋𝑋 > 𝑥𝑥2 )

NORMAL OR GAUSSIAN DISTRIBUTION (C)


Models the measurement of errors of a certain continuous quantity. The r.v. follows
a normal or Gaussian distribution with parameters 𝜇𝜇 and 𝜆𝜆.

𝑋𝑋 ~ 𝑁𝑁(𝜇𝜇, 𝜎𝜎) 𝜇𝜇 ∈ ℝ 𝑎𝑎𝑎𝑎𝑎𝑎 𝜎𝜎 ∈ ℝ+

1
1 − (𝑥𝑥−𝜇𝜇)2
 DENSITY FUNCTION: 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 2𝜎𝜎2
𝜎𝜎 √2𝜋𝜋

EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] = 𝜇𝜇 VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝜎𝜎 2 ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = 𝜎𝜎

𝜎𝜎 2
CHEBYSHEV’S INEQ.: 𝑃𝑃 (|𝑋𝑋 − 𝜇𝜇 | < 𝑘𝑘 ) = 𝑃𝑃(𝜇𝜇 − 𝑘𝑘 < 𝑋𝑋 < 𝜇𝜇 + 𝑘𝑘 ) ≥ 1 −
𝑘𝑘 2
1
Therefore if 𝑘𝑘 = 𝑐𝑐𝑐𝑐 ⟹ 𝑃𝑃(𝜇𝜇 − 𝑐𝑐𝑐𝑐 < 𝑋𝑋 < 𝜇𝜇 + 𝑐𝑐𝑐𝑐) ≥ 1 − )
𝑐𝑐 2

LINEAR TRANS.: If 𝑋𝑋 ~ 𝒩𝒩(𝜇𝜇, 𝜎𝜎) and 𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏, then: 𝑌𝑌 ~ 𝒩𝒩(𝑎𝑎 + 𝑏𝑏𝑏𝑏, |𝑏𝑏|𝜎𝜎)

EXPECTATION: 𝐸𝐸 [𝑦𝑦] = 𝑎𝑎 + 𝑏𝑏𝑏𝑏[𝑋𝑋]


VARIATION: 𝑉𝑉 [𝑌𝑌 ] = 𝑏𝑏 2 𝑉𝑉[𝑋𝑋] ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] = |𝑏𝑏|𝜎𝜎

STANDARIZATION: If 𝑋𝑋 ~ 𝒩𝒩(𝜇𝜇, 𝜎𝜎) it is possible to consider the standardized r.v.:

𝑋𝑋 − 𝜇𝜇 𝜇𝜇 1
𝑍𝑍 = = − + 𝑋𝑋 ~ 𝒩𝒩(0,1)
𝜎𝜎 𝜎𝜎 𝜎𝜎
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

CENTRAL LIMIT THEOREM (CLT)


Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a set of independent random variables where 𝐸𝐸[𝑋𝑋] = 𝜇𝜇 and
𝑉𝑉[𝑋𝑋] = 𝜎𝜎 2 . Then, for 𝑛𝑛 large enough (𝑛𝑛 → ∞):

𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛 ~ 𝒩𝒩 �∑𝑛𝑛𝑖𝑖=1 𝜇𝜇𝑖𝑖 , �∑𝑛𝑛𝑖𝑖=1 𝜎𝜎𝑖𝑖2 � The approx. is optimal for 𝑛𝑛 > 30

As a particular case, let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a set of identical distributed and independent
random variables:
𝑛𝑛
1 For n large, the distribution of 𝑋𝑋� is Gaussian
𝑋𝑋� = � 𝑋𝑋𝑖𝑖 𝑋𝑋� − 𝜇𝜇
𝑛𝑛 independently of the distribution of 𝑋𝑋: ⟹ 𝜎𝜎 ~ 𝒩𝒩(0,1)
𝑖𝑖=1 � 𝑛𝑛

APPROXIMATIONS WITH THE CLT
o BINOMIAL: Let 𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑛𝑛, 𝑝𝑝) with 𝑛𝑛 large enough, then:

𝑋𝑋 − 𝑛𝑛𝑛𝑛
𝑋𝑋 ~ 𝒩𝒩�𝑛𝑛𝑛𝑛, �𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)� ⟺ ~ 𝒩𝒩(0,1)
�𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)
o POISSON: Let 𝑋𝑋 ~ 𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆) with 𝜆𝜆 > 5 then it. can be approximated by:

𝑋𝑋 − 𝜆𝜆
𝑋𝑋 ~ 𝒩𝒩�𝜆𝜆, √𝜆𝜆� ⟺ ~ 𝒩𝒩(0,1)
√𝜆𝜆

LINEAR REGRESSION
REGRESSION MODEL: It is a model that allows us to describe an effect of a
variable X and Y, in other words, we want to describe or forecast the behavior of Y
as a function of X.
𝑿𝑿 ≡ 𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
𝒀𝒀 ≡ 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 𝑜𝑜𝑜𝑜 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

TYPES OF RELATIONSHIPS
 Deterministic: Given a value of X, the value of Y can be perfectly identified:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥)
 Nondeterministic: Given X, the value of Y cannot be perfectly known:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥) + 𝑢𝑢
 Linear: When the function 𝑓𝑓(𝑥𝑥) is linear.
𝑓𝑓(𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
If 𝛽𝛽1 > 0 ⇒ 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟. 𝛽𝛽1 < 0 ⇒ 𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟.
 Nonlinear: When 𝑓𝑓(𝑥𝑥) is nonlinear. Examples: 𝑓𝑓(𝑥𝑥) = log 𝑥𝑥 , 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 2 …
 Lack of relationship: When 𝑓𝑓(𝑥𝑥) = 0.
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

MEASURES OF LINEAR DEPENDENCE


𝐶𝐶𝐶𝐶𝐶𝐶 > 0 → 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑖𝑖𝑖𝑖
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝑛𝑛(𝑥𝑥̅ ) (𝑦𝑦�)
𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) = 𝐶𝐶𝐶𝐶𝐶𝐶 < 0 → 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑖𝑖𝑖𝑖
𝑛𝑛 − 1
𝐶𝐶𝐶𝐶𝐶𝐶 ≈ 0 → 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑖𝑖𝑖𝑖

𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦𝑖𝑖 − 𝑦𝑦�)2


𝑟𝑟(𝑥𝑥,𝑦𝑦) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) = 𝑆𝑆𝑋𝑋2 = 𝑆𝑆𝑌𝑌2 =
𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌 𝑛𝑛 − 1 𝑛𝑛 − 1

−1 ≤ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ≤ 1 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑦𝑦, 𝑥𝑥)

LINEAR REGRESSION MODEL

The simple linear regression model assumes that: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝑢𝑢𝑖𝑖
Where 𝜷𝜷𝟎𝟎 (intercept) and 𝜷𝜷𝟏𝟏 (slope) are the population coefficients and 𝑢𝑢𝑖𝑖 is an
error. The parameters that we need to estimate are 𝛽𝛽0 , 𝛽𝛽1 , 𝜎𝜎 2 in order to obtain the
regression line: 𝑦𝑦� = 𝛽𝛽̂1 + 𝛽𝛽̂1 𝑥𝑥.
The residual is: 𝑒𝑒𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖

MODEL ASSUMPTIONS
 Linearity: The relationship between X and Y is linear: 𝑓𝑓(𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
 Homogeneity: The errors have mean zero: 𝐸𝐸[𝑢𝑢𝑖𝑖 ] = 0
 Homoscedasticity: The variance of the errors is constant: 𝑉𝑉𝑉𝑉𝑉𝑉(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2
 Independence: The errors are independent: 𝐸𝐸�𝑢𝑢𝑖𝑖 𝑢𝑢𝑗𝑗 � = 0 (not time series)
 Normality: The errors follow a normal distribution: 𝑢𝑢𝑖𝑖 ~ 𝒩𝒩(0, 𝜎𝜎 2 )

LEAST SQUARE ESTIMATORS (LSE)


Proposed by Gauss, this method minimizes the sum of squares of the residuals:
𝑛𝑛 𝑛𝑛 𝑛𝑛
2
𝑀𝑀𝑀𝑀𝑀𝑀 � 𝑢𝑢𝑖𝑖2 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2 = �� 𝑦𝑦𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥� �
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

The result estimators are:

𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )(𝑦𝑦𝑖𝑖 − 𝑦𝑦�)


𝛽𝛽̂1 = = 𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
𝑆𝑆𝑋𝑋2 ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥�𝑖𝑖 )2

COEFFICIENT OF DETERMINATION, R-SQUARED


It Is used to assess the goodness-of-fit of the model. It is defined as:
2
𝑅𝑅2 = 𝑟𝑟(𝑥𝑥,𝑦𝑦) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦)2 ⟹ 0 ≤ 𝑅𝑅2 ≤ 1
The closer 𝑅𝑅2 is to 1, the better.
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING

MULTIPLE LINEAR REGRESSION MODEL


It Is used to predict the value of a response Y from the value of an explanatory
variable X. The least-squares fit:

1. We have 𝑛𝑛 observ. for 𝑖𝑖 = 1, … , 𝑛𝑛: 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖1 + 𝛽𝛽2 𝑥𝑥𝑖𝑖2 + ⋯ 𝛽𝛽𝑘𝑘 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝑢𝑢𝑖𝑖

� + 𝛽𝛽
2. We wish to fit the data in the form: 𝑦𝑦�𝑖𝑖 = 𝛽𝛽 � 𝑥𝑥 + 𝛽𝛽
� 𝑥𝑥 + ⋯ 𝛽𝛽
� 𝑥𝑥
0 1 𝑖𝑖1 2 𝑖𝑖2 𝑘𝑘 𝑖𝑖𝑖𝑖

MODEL IN MATRIX FORM


We can write the model as a matrix relationship: 𝑦𝑦 = 𝑋𝑋𝑋𝑋 + 𝑢𝑢 , where
𝒚𝒚 ≡ 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣; 𝑿𝑿 ≡ 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚;
𝜷𝜷 ≡ 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝; 𝒖𝒖 ≡ 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

LEAST-SQUARES ESTIMATION
The least-squares vector parameter estimate 𝛽𝛽̂ :
(𝑋𝑋 𝑇𝑇 𝑋𝑋)𝛽𝛽 = 𝑋𝑋 𝑇𝑇 𝑦𝑦 ⟹ 𝛽𝛽 = (𝑋𝑋 𝑇𝑇 𝑋𝑋)−1 𝑋𝑋 𝑇𝑇 𝑦𝑦
� = 𝑿𝑿𝜷𝜷
The vector 𝑦𝑦� is given by: 𝒚𝒚 �

VARIANCE ESTIMATION
2 ∑𝑛𝑛 2
𝑖𝑖=1 𝑒𝑒𝑖𝑖
An estimator for the error variance is the residual (quasi-)variance: 𝑆𝑆𝑅𝑅 =
𝑛𝑛−𝑘𝑘−1

ANOVA DESCOMPOSITION

𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆𝑆𝑆 + 𝑆𝑆𝑆𝑆𝑆𝑆


𝑛𝑛 𝑛𝑛 𝑛𝑛

𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�)2 𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2 𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑦𝑦�𝑖𝑖 − 𝑦𝑦�)2


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶. 𝑜𝑜𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑅𝑅2 = =1−
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆

You might also like