Professional Documents
Culture Documents
1 PROBABILITY PART
1.1 Probability formulas
Addition Rule and Multiplication Rule
• P (A+B) = P (A)+P (B)−P (A.B); P (A+B +C) = P (A)+P (B)+P (C)−P (A.B)−P (B.C)−P (A.C)+P (A.B.C)
n
X X X
• P (A1 + A2 + ... + An ) = P (Ai ) − P (Ai .Aj ) + P (Ai .Aj .Ak ) − ... + (−1)n−1 .P (A1 .A2 .A3 ...An )
i=1 i<j i<j<k
• P (AB) = P (A/B).P (B) = P (B/A).P (A)
• P (A1 .A2 ...An ) = P (A1 ).P (A2 /A1 ).P (A3 /A1 .A2 )....P (An /A1 .A2 ...An−1 ).
(
1 − e−λ.x , x ≥ 0
The Exponential Distributions, X ∼ E(λ): F (x) =
0, x<0
1 1 ln(2)
E(X) = , V (X) = 2 , mod(X) = 0, med(X) =
λ λ λ
The Normal Distributions, X ∼ N (a, σ 2 ):E(X) = M od(X) = M ed(X) = µ, V (X) = σ 2
k2 − µ k1 − µ ε
P (k1 ≤ X ≤ k2 ) = Φ −Φ ; P (|X − µ| < ε) = 2.Φ( ) − 1, ε > 0
σ σ σ
X1 ∼ N (µ1 , σ12 ); X2 ∼ N (µ2 , σ22 ) are independent, then: Y = a.X1 + b.X2 ∼ N (µ = a.µ1 + b.µ2 , σ 2 = a2 .σ12 + b2 .σ22 )
Xi ∼ N (µi , σi2 ), i = 1, 2,..., n are independent, then: Y = X1 + ... + Xn ∼ N (µ = µ1 + ... + µn , σ 2 = σ12 + ... + σn2 )
The Centeral Limit Theorem: X1 , ..., Xn are independent, and have: E(Xi ) = µ and V (Xi ) = σ 2 , we have:
X1 + X2 + ... + Xn σ2
If X = X1 + X2 + ... + Xn then X ∼N (n.µ, n.σ 2 ) ; if X = then X ∼N (µ, ).
n n
Note: Xi are discrete random variables and X = X1 + X2 + ... + Xn , we adjust ±0.5 in the probability formula,
meaning:
k2 + 0.5 − nµ k1 − 0.5 − nµ
P (k1 ≤ X ≤ k2 ) = Φ √ −Φ √
nσ 2 nσ 2
2 STATISTIC PART
2.1 Confidence Intervals
Summary table of problems to find confidence intervals: (Note: The problem of finding confidence intervals for
M and N is related to the proportional estimator, the result of which is round integers according to the oversold rule.)
Sample size problem: (Note: Round the result to an integer according to the rule of rounding up)
Hypothesis Tests on the Difference in Means (Type 4) Any Distribution - Large Sample Size
µ1 6= µ2 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
zqs = rx12−x2 2
µ1 = µ2 µ1 < µ2 σ1 σ2 RR = (−∞; −z α ) zqs < −zα
n1 + n2
µ1 > µ2 RR = (zα ; +∞) zqs > zα
Note for Type 4: in case σ1 , σ2 is unknown, we change it to s1 , s2 in the test statistic
Hypothesis Tests on the Difference in Means (Type 5) Two samples in pairs + Known σD
µD 6= 0 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
√
µD = 0 µD < 0 zqs = σdD n RR = (−∞; −zα ) zqs < −zα
µD > 0 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 6) Two samples in pairs + Unknown σD
µD 6= 0 RR = (−∞; −tα/2;n−1 ) ∪ (tα/2;n−1 ; +∞) |tqs | > tα/2;n−1
d √
µD = 0 µD < 0 tqs = sD n RR = (−∞; −tα;n−1 ) tqs < −tα;n−1
µD > 0 RR = (tα;n−1 ; +∞) tqs > tα;n−1
Hypothesis Tests on the Difference in Means (Type 7) Two samples in pairs + Large Sample Size
µD 6= 0 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
√
µD = 0 µD < 0 zqs = σdD n RR = (−∞; −zα ) zqs < −zα
µD > 0 RR = (zα ; +∞) zqs > zα
Note for Type 7: in case σD is unknown, we change it to sD in the test statistic
+ For Hypothesis Tests on the Difference in Means with unknown variance, X1 and X2 follow a normal distribution
s1 s1
(Type 2 or Type 3), we calculate the ratio . if ∈ [0.5; 2], then σ12 = σ22 . On the opposite, then σ12 6= σ22 .
s2 s2
n n
( xi )2
X X i=1
1 Sum of squares Sxx Sxx = (xi − x)2 = x2i − sx )2
= n.(b
i=1 i=1
n
n
X
n n
( yi )2
X X i=1
2
Syy Syy = (yi − y) = yi2 − sy )2
= n.(b
i=1 i=1
n
Xn Xn
n n
( xi ).( yi )
X X i=1 i=1
X
Sxy Sxy = (xi − x).(yi − y) = xi .yi − = xy − n.x.y
i=1 i=1
n
2. The simple linear regression considers a single predictor x and a dependent variable Y: yb = a + b.x
Sxy
The slope b b=
Sxx
The intercept a a = y − b.x.
The simple linear regression considers a single predictor y and a dependent variable X: x b = c + d.y
Sxy
The slope d d=
Syy
The intercept c c = x − d.y
Sxy
3. Correlation coefficients rXY rXY = p
Sxx .Syy
4 Sum of squares SST SST = Syy
SSR SSR = b.Sxy
SSE SSE = SST − SSR = Syy − b.Sxy
SSR SSE
5 Multiple R square R2 R2 = =1− 2
= rXY
SST r SST
SSE
6 An unbiased estimator of σ s s=
n−2
SSE
7 An unbiased estimator of σ 2 s2 s2 =
n−2
8 Confidence interval
s s
The slope b − tα/2;n−2 . √ ; b + tα/2;n−2 . √
pSxx Sxx !
p
s. x2 s. x2
The intercept a − tα/2;n−2 . √ ; a + tα/2;n−2 . √ .
Sxx Sxx
Hypothesis testing:
Type Formulas
p
pb.(1 − pb)
Confidence interval for Population Proportion SE = √
n
σ
Confidence interval for Population Mean (Type 1) SE = √
n
s
Confidence interval for Population Mean (Type 2) SE = √
n
σ
Confidence interval for Population Mean (Type 3) SE = √
n
Note for Type 3: in case σ is unknown, we change it to s
p
p0 .(1 − p0 )
Hypothesis Tests on Proportion SE = √
n
σ
Hypothesis Tests on Mean (Type 1) SE = √
n
s
Hypothesis Tests on Mean (Type 2) SE = √
n
σ
Hypothesis Tests on Mean (Type 3) SE = √
n
Note for Type 3: in case σ is unknown, we change it to s
r
pb1 .(1 − pb1 ) pb2 .(1 − pb2 )
Confidence interval for the Difference in Proportion SE = +
n1q n2
σ12 σ22
Confidence interval for Difference in Means (Type 1) SE = n1 + n2
q
Sp2 Sp2
Confidence interval for Difference in Means (Type 2) SE = n1 + n2
q
s21 s22
Confidence interval for Difference in Means (Type 3) SE = n1 + n2
q
σ12 σ22
Confidence interval for Difference in Means (Type 4) SE = n1 + n2
σD
Confidence interval for Difference in Means (Type 5) SE = √
n
sD
Confidence interval for Difference in Means (Type 6) SE = √
n
σD
Confidence interval for Difference in Means (Type 7) SE = √
n
Note for Type 4: in case σ1 , σ2 is unknown, we change it to s1 , s2
Type Formulas
r
M SE
Confidence interval for µi in ANOVA SEi =
r J
2M SE
Confidence interval for µi − µj in ANOVA SE =
J
s
Confidence interval for slope SE = √
Sxx
p
s. x2
Confidence interval for intercept SE = √
Sxx