Professional Documents
Culture Documents
RANDOM VARIABLES
DISCRETE RANDOM VARIABLE
PROBABILITY FUNCTION
0 ≤ 𝑃𝑃[𝑋𝑋 = 𝑥𝑥 ] ≤ 1 𝑃𝑃[𝑋𝑋 > 𝑥𝑥 ] = 1 − 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥]
� 𝑃𝑃[𝑋𝑋 = 𝑥𝑥 ] = 1 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥 ] = � 𝑃𝑃[𝑋𝑋 = 𝑥𝑥]
The probability mass function has no meaning for continuous r.v. because
𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = 0. In its place we use the density function:
DENSITY FUNCTION
𝑏𝑏
𝑓𝑓 (𝑥𝑥 ) ≥ 0 ∀𝑥𝑥 ∈ ℝ 𝑃𝑃(𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏) = � 𝑓𝑓(𝑥𝑥 )𝑑𝑑𝑑𝑑 ∀𝑎𝑎, 𝑏𝑏 ∈ ℝ
𝑎𝑎
𝑥𝑥 ∞
𝐹𝐹 (𝑥𝑥 ) = 𝑃𝑃(𝑋𝑋 ≤ 𝑥𝑥) = � 𝑓𝑓 (𝑢𝑢)𝑑𝑑𝑑𝑑 � 𝑓𝑓 (𝑥𝑥 )𝑑𝑑𝑑𝑑 = 1
−∞ −∞
CHEBYSHEV’S INEQUALITY
The inequality provides a bound for the probability of a random variable when the
expectation (𝐸𝐸 [𝑋𝑋]) and the variance (𝑉𝑉[𝑋𝑋]) are available.
𝑉𝑉(𝑋𝑋) 𝑉𝑉(𝑋𝑋)
𝑃𝑃(|𝑋𝑋 − 𝐸𝐸 [𝑋𝑋]| ≥ 𝑘𝑘) ≤ 𝑜𝑜𝑜𝑜 𝑃𝑃(|𝑋𝑋 − 𝐸𝐸 [𝑋𝑋]| < 𝑘𝑘) ≥ 1 −
𝑘𝑘 2 𝑘𝑘 2
COVARIANCE
VARIANCE
DISTRIBUTION MODELS
DISCRETE R.V.: Bernoulli, binomial, geometric, poisson
CONTINUOUS R.V.: Uniform, exponential, normal
𝑝𝑝 𝑖𝑖𝑖𝑖 𝑥𝑥 = 1
PROB. MASS FUNCTION: 𝑝𝑝(𝑥𝑥) = �
1 − 𝑝𝑝 𝑖𝑖𝑖𝑖 𝑥𝑥 = 0
0 𝑖𝑖𝑖𝑖 𝑥𝑥 < 0
DISTRIBUTION FUNCTION: 𝐹𝐹(𝑥𝑥) = �1 − 𝑝𝑝 𝑖𝑖𝑖𝑖 0 ≤ 𝑥𝑥 < 1
1 𝑖𝑖𝑖𝑖 𝑥𝑥 ≥ 1
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ]= 𝑝𝑝
VARIATION: 𝑉𝑉 [𝑋𝑋 ] = 𝑝𝑝(1 − 𝑝𝑝)
𝑛𝑛 𝑘𝑘
PROB. MASS FUNCTION: 𝑃𝑃(𝑋𝑋 = 𝑘𝑘 ) = � � 𝑝𝑝 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ∀ 𝑘𝑘 ∈ ℕ
𝑘𝑘
𝑛𝑛 𝑛𝑛!
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 � � =
𝑘𝑘 𝑘𝑘! (𝑛𝑛 − 𝑘𝑘)!
1
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] =
𝑝𝑝
1−𝑝𝑝
VARIATION: 𝑉𝑉 [𝑋𝑋 ] =
𝑝𝑝2
�(1−𝑝𝑝)
ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] =
𝑝𝑝
𝜆𝜆𝑘𝑘
PROB. MASS FUNCTION: 𝑃𝑃 (𝑋𝑋 = 𝑘𝑘 ) = 𝑒𝑒 −𝜆𝜆 ∀ 𝑘𝑘 ∈ ℕ
𝑘𝑘!
EXPECTATION: 𝐸𝐸 [𝑋𝑋] = 𝜆𝜆
VARIATION: 𝑉𝑉 [𝑋𝑋] = 𝜆𝜆
ST. DEVIATION: 𝑆𝑆[𝑋𝑋] = √𝜆𝜆
𝑎𝑎+𝑏𝑏
EXPECTATION: 𝐸𝐸 [𝑋𝑋 ] =
2
(𝑏𝑏−𝑎𝑎)2
VARIATION: 𝑉𝑉 [𝑋𝑋 ] =
12
𝑏𝑏−𝑎𝑎
ST. DEVIATION: 𝑆𝑆[𝑋𝑋 ] =
√12
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING
𝑃𝑃(𝑋𝑋 > 𝑥𝑥1 + 𝑥𝑥2 |𝑋𝑋 > 𝑥𝑥1 ) = 𝑃𝑃(𝑋𝑋 > 𝑥𝑥2 )
1
1 − (𝑥𝑥−𝜇𝜇)2
DENSITY FUNCTION: 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 2𝜎𝜎2
𝜎𝜎 √2𝜋𝜋
𝜎𝜎 2
CHEBYSHEV’S INEQ.: 𝑃𝑃 (|𝑋𝑋 − 𝜇𝜇 | < 𝑘𝑘 ) = 𝑃𝑃(𝜇𝜇 − 𝑘𝑘 < 𝑋𝑋 < 𝜇𝜇 + 𝑘𝑘 ) ≥ 1 −
𝑘𝑘 2
1
Therefore if 𝑘𝑘 = 𝑐𝑐𝑐𝑐 ⟹ 𝑃𝑃(𝜇𝜇 − 𝑐𝑐𝑐𝑐 < 𝑋𝑋 < 𝜇𝜇 + 𝑐𝑐𝑐𝑐) ≥ 1 − )
𝑐𝑐 2
LINEAR TRANS.: If 𝑋𝑋 ~ 𝒩𝒩(𝜇𝜇, 𝜎𝜎) and 𝑌𝑌 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏, then: 𝑌𝑌 ~ 𝒩𝒩(𝑎𝑎 + 𝑏𝑏𝑏𝑏, |𝑏𝑏|𝜎𝜎)
𝑋𝑋 − 𝜇𝜇 𝜇𝜇 1
𝑍𝑍 = = − + 𝑋𝑋 ~ 𝒩𝒩(0,1)
𝜎𝜎 𝜎𝜎 𝜎𝜎
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING
𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛 ~ 𝒩𝒩 �∑𝑛𝑛𝑖𝑖=1 𝜇𝜇𝑖𝑖 , �∑𝑛𝑛𝑖𝑖=1 𝜎𝜎𝑖𝑖2 � The approx. is optimal for 𝑛𝑛 > 30
As a particular case, let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a set of identical distributed and independent
random variables:
𝑛𝑛
1 For n large, the distribution of 𝑋𝑋� is Gaussian
𝑋𝑋� = � 𝑋𝑋𝑖𝑖 𝑋𝑋� − 𝜇𝜇
𝑛𝑛 independently of the distribution of 𝑋𝑋: ⟹ 𝜎𝜎 ~ 𝒩𝒩(0,1)
𝑖𝑖=1 � 𝑛𝑛
√
APPROXIMATIONS WITH THE CLT
o BINOMIAL: Let 𝑋𝑋 ~ 𝐵𝐵𝐵𝐵𝐵𝐵(𝑛𝑛, 𝑝𝑝) with 𝑛𝑛 large enough, then:
𝑋𝑋 − 𝑛𝑛𝑛𝑛
𝑋𝑋 ~ 𝒩𝒩�𝑛𝑛𝑛𝑛, �𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)� ⟺ ~ 𝒩𝒩(0,1)
�𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)
o POISSON: Let 𝑋𝑋 ~ 𝑃𝑃𝑃𝑃𝑃𝑃(𝜆𝜆) with 𝜆𝜆 > 5 then it. can be approximated by:
𝑋𝑋 − 𝜆𝜆
𝑋𝑋 ~ 𝒩𝒩�𝜆𝜆, √𝜆𝜆� ⟺ ~ 𝒩𝒩(0,1)
√𝜆𝜆
LINEAR REGRESSION
REGRESSION MODEL: It is a model that allows us to describe an effect of a
variable X and Y, in other words, we want to describe or forecast the behavior of Y
as a function of X.
𝑿𝑿 ≡ 𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
𝒀𝒀 ≡ 𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫𝑫 𝑜𝑜𝑜𝑜 𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓 𝑜𝑜𝑜𝑜 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
TYPES OF RELATIONSHIPS
Deterministic: Given a value of X, the value of Y can be perfectly identified:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥)
Nondeterministic: Given X, the value of Y cannot be perfectly known:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥) + 𝑢𝑢
Linear: When the function 𝑓𝑓(𝑥𝑥) is linear.
𝑓𝑓(𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
If 𝛽𝛽1 > 0 ⇒ 𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟. 𝛽𝛽1 < 0 ⇒ 𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵𝑵 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟.
Nonlinear: When 𝑓𝑓(𝑥𝑥) is nonlinear. Examples: 𝑓𝑓(𝑥𝑥) = log 𝑥𝑥 , 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 2 …
Lack of relationship: When 𝑓𝑓(𝑥𝑥) = 0.
PROBABILITY AND DATA ANALYSIS DATA SCIENCE AND ENGINEERING
The simple linear regression model assumes that: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝑢𝑢𝑖𝑖
Where 𝜷𝜷𝟎𝟎 (intercept) and 𝜷𝜷𝟏𝟏 (slope) are the population coefficients and 𝑢𝑢𝑖𝑖 is an
error. The parameters that we need to estimate are 𝛽𝛽0 , 𝛽𝛽1 , 𝜎𝜎 2 in order to obtain the
regression line: 𝑦𝑦� = 𝛽𝛽̂1 + 𝛽𝛽̂1 𝑥𝑥.
The residual is: 𝑒𝑒𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖
MODEL ASSUMPTIONS
Linearity: The relationship between X and Y is linear: 𝑓𝑓(𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
Homogeneity: The errors have mean zero: 𝐸𝐸[𝑢𝑢𝑖𝑖 ] = 0
Homoscedasticity: The variance of the errors is constant: 𝑉𝑉𝑉𝑉𝑉𝑉(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2
Independence: The errors are independent: 𝐸𝐸�𝑢𝑢𝑖𝑖 𝑢𝑢𝑗𝑗 � = 0 (not time series)
Normality: The errors follow a normal distribution: 𝑢𝑢𝑖𝑖 ~ 𝒩𝒩(0, 𝜎𝜎 2 )
1. We have 𝑛𝑛 observ. for 𝑖𝑖 = 1, … , 𝑛𝑛: 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖1 + 𝛽𝛽2 𝑥𝑥𝑖𝑖2 + ⋯ 𝛽𝛽𝑘𝑘 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝑢𝑢𝑖𝑖
� + 𝛽𝛽
2. We wish to fit the data in the form: 𝑦𝑦�𝑖𝑖 = 𝛽𝛽 � 𝑥𝑥 + 𝛽𝛽
� 𝑥𝑥 + ⋯ 𝛽𝛽
� 𝑥𝑥
0 1 𝑖𝑖1 2 𝑖𝑖2 𝑘𝑘 𝑖𝑖𝑖𝑖
LEAST-SQUARES ESTIMATION
The least-squares vector parameter estimate 𝛽𝛽̂ :
(𝑋𝑋 𝑇𝑇 𝑋𝑋)𝛽𝛽 = 𝑋𝑋 𝑇𝑇 𝑦𝑦 ⟹ 𝛽𝛽 = (𝑋𝑋 𝑇𝑇 𝑋𝑋)−1 𝑋𝑋 𝑇𝑇 𝑦𝑦
� = 𝑿𝑿𝜷𝜷
The vector 𝑦𝑦� is given by: 𝒚𝒚 �
VARIANCE ESTIMATION
2 ∑𝑛𝑛 2
𝑖𝑖=1 𝑒𝑒𝑖𝑖
An estimator for the error variance is the residual (quasi-)variance: 𝑆𝑆𝑅𝑅 =
𝑛𝑛−𝑘𝑘−1
ANOVA DESCOMPOSITION
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶. 𝑜𝑜𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑅𝑅2 = =1−
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆