Professional Documents
Culture Documents
Cookbook
Copyright
c Matthias Vallentin, 2015
vallentin@icir.org
i=0
i! x!
1 We use the notation γ(s, x) and Γ(x) to refer to the Gamma functions (see §22.1), and use B(x, y) and Ix to refer to the Beta functions (see §22.2).
3
Uniform (discrete) Binomial Geometric Poisson
●
● n = 40, p = 0.3 0.8 ●
● p = 0.2 ● ●
● λ=1
● n = 30, p = 0.6 ● p = 0.5 ● λ=4
● n = 25, p = 0.9 ● p = 0.8 ● λ = 10
●
0.3
0.2 ● 0.6
●
0.2
PMF
PMF
PMF
PMF
1 ● ● ● ●
● ●
● ● ● ● ● ● ● 0.4 ●
n ●
● ● ●
●
● ●
0.1
● ●
● ● ● ● ●
● ●
● 0.1 ●
● 0.2 ● ●
● ● ● ● ●
● ●
● ● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●●● ● ● ● ●
0.0 ●●●● ●●
●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●● 0.0 ● ●
●
● ●
● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ●
CDF
CDF
CDF
0.50 0.6 ● 0.50
● ●
● ●
●
● ●
i ●
● ● ●
n ●
●
0.25 ● 0.4 0.25 ●
●
● ●
●
●
● ● ● ● n = 40, p = 0.3 ● p = 0.2 ● ● λ=1
● ● ●
● n = 30, p = 0.6 ● p = 0.5 ●
● λ=4
●
0 ● 0.00 ●●●● ●
●
●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●● ● ● n = 25, p = 0.9 0.2 ● ● p = 0.8 0.00
●
● ● ● ● ● λ = 10
4
1.2 Continuous Distributions
Notation FX (x) fX (x) E [X] V [X] MX (s)
0 x<a
(b − a)2 esb − esa
x−a I(a < x < b) a+b
Uniform Unif (a, b) a<x<b
b−a b−a 2 12 s(b − a)
1 x>b
(x − µ)2
Z x
σ 2 s2
1
N µ, σ 2 σ2
Normal Φ(x) = φ(t) dt φ(x) = √ exp − µ exp µs +
−∞ σ 2π 2σ 2 2
(ln x − µ)2
1 1 ln x − µ 1 2 2 2
ln N µ, σ 2 eµ+σ /2
(eσ − 1)e2µ+σ
Log-Normal + erf √ √ exp −
2 2 2σ 2 x 2πσ 2 2σ 2
1 T
Σ−1 (x−µ) 1
Multivariate Normal MVN (µ, Σ) (2π)−k/2 |Σ|−1/2 e− 2 (x−µ) µ Σ exp µT s + sT Σs
2
−(ν+1)/2 ( ν
Γ ν+1
ν ν
2 x2 ν−2
ν>2
Student’s t Student(ν) Ix , √ 1 + 0
νπΓ ν2
2 2 ν ∞ 1<ν≤2
1 k x 1
Chi-square χ2k γ , xk/2−1 e−x/2 k 2k (1 − 2s)−k/2 s < 1/2
Γ(k/2) 2 2 2k/2 Γ(k/2)
r
d
(d1 x)d1 d2 2
2d22 (d1 + d2 − 2)
d1 d1 (d1 x+d2 )d1 +d2 d2
F F(d1 , d2 ) I d1 x , d1 d1 d2 − 2 d1 (d2 − 2)2 (d2 − 4)
d1 x+d2 2 2 xB 2
, 2
1 −x/β 1
Exponential Exp (β) 1 − e−x/β e β β2 (s < 1/β)
β 1 − βs
α
γ(α, x/β) 1 1
Gamma Gamma (α, β) xα−1 e−x/β αβ αβ 2 (s < 1/β)
Γ(α) Γ (α) β α 1 − βs
Γ α, βx
β α −α−1 −β/x β β2 2(−βs)α/2 p
Inverse Gamma InvGamma (α, β) x e α>1 α>2 Kα −4βs
Γ (α) Γ (α) α−1 (α − 1)2 (α − 2) Γ(α)
P
k
Γ i=1 αi Y α −1
k
αi E [Xi ] (1 − E [Xi ])
Dirichlet Dir (α) Qk xi i Pk Pk
i=1 Γ (αi ) i=1 i=1 αi i=1 αi + 1
∞ k−1
!
Γ (α + β) α−1 α αβ X Y α+r sk
Beta Beta (α, β) Ix (α, β) x (1 − x)β−1 1+
Γ (α) Γ (β) α+β (α + β)2 (α + β + 1) r=0
α+β+r k!
k=1
∞ n n
k k x k−1 −(x/λ)k 1 2 X s λ n
Weibull Weibull(λ, k) 1 − e−(x/λ) e λΓ 1 + λ2 Γ 1 + − µ2 Γ 1+
λ λ k k n=0
n! k
x α
m xα αxm xα
Pareto Pareto(xm , α) 1− x ≥ xm m
α α+1 x ≥ xm α>1 m
α>2 α(−xm s)α Γ(−α, −xm s) s < 0
x x α−1 (α − 1)2 (α − 2)
5
Uniform (continuous) Normal Log−Normal Student's t
2.0 1.00 0.4
µ = 0, σ2 = 0.2 µ = 0, σ2 = 3 ν=1
µ = 0, σ2 = 1 µ = 2, σ2 = 2 ν=2
µ = 0, σ2 = 5 µ = 0, σ2 = 1 ν=5
ν=∞
µ = −2, σ2 = 0.5 µ = 0.5, σ2 = 1
µ = 0.25, σ2 = 1
1.5 0.75 µ = 0.125, σ2 = 1 0.3
PDF
PDF
1
● ● 1.0 0.50 0.2
b−a
a b −5.0 −2.5 0.0 2.5 5.0 0 1 2 3 −5.0 −2.5 0.0 2.5 5.0
x x x x
χ 2 F Exponential Gamma
d1 = 1, d2 = 1 2.0 β=2 2.0 α = 1, β = 2
1.00 k=1 3 d1 = 2, d2 = 1 β=1 α = 2, β = 2
k=2 d1 = 5, d2 = 2 β = 0.4 α = 3, β = 2
k=3 d1 = 100, d2 = 1 α = 5, β = 1
k=4 d1 = 100, d2 = 100 α = 9, β = 0.5
k=5
1.5 1.5
0.75
2
PDF
PDF
PDF
1
0.25 0.5 0.5
0 2 4 6 8 0 1 2 3 4 5 0 1 2 3 4 5 0 5 10 15 20
x x x x
Inverse Gamma Beta Weibull Pareto
α = 1, β = 1 5 α = 0.5, β = 0.5 2.0 λ = 1, k = 0.5 4 xm = 1, k = 1
α = 2, β = 1 α = 5, β = 1 λ = 1, k = 1 xm = 1, k = 2
α = 3, β = 1 α = 1, β = 3 λ = 1, k = 1.5 xm = 1, k = 4
4 α = 3, β = 0.5 α = 2, β = 2 λ = 1, k = 5
4 α = 2, β = 5
1.5 3
3
3
PDF
PDF
1.0 2
2
2
0.5 1
1 1
0 0 0.0 0
0 1 2 3 4 5 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5
x x x x
6
Uniform (continuous) Normal Log−Normal Student's t
1 1.00 1.00
µ = 0, σ2 = 3
µ = 2, σ2 = 2
0.75 µ = 0, σ2 = 1
µ = 0.5, σ2 = 1
µ = 0.25, σ2 = 1
0.75 µ = 0.125, σ2 = 1 0.75
0.50
CDF
CDF
CDF
CDF
0.50 0.50
0.25
0.25 0.25
µ = 0, σ = 0.2
2
ν=1
µ = 0, σ2 = 1 ν=2
µ = 0, σ2 = 5 ν=5
0 0.00 µ = −2, σ2 = 0.5 0.00 0.00 ν=∞
a b −5.0 −2.5 0.0 2.5 5.0 0 1 2 3 −5.0 −2.5 0.0 2.5 5.0
x x x x
χ 2 F Exponential Gamma
1.00 1.00 1.00
1.00
CDF
CDF
CDF
0 2 4 6 8 0 1 2 3 4 5 0 1 2 3 4 5 0 5 10 15 20
x x x x
Inverse Gamma Beta Weibull Pareto
1.00 1.00
1.00 1.00 α = 0.5, β = 0.5
α = 5, β = 1
α = 1, β = 3
α = 2, β = 2
α = 2, β = 5
0.75 0.75 0.75 0.75
CDF
CDF
CDF
CDF
0.50 0.50 0.50 0.50
α = 1, β = 1 λ = 1, k = 0.5
α = 2, β = 1 λ = 1, k = 1 xm = 1, k = 1
α = 3, β = 1 λ = 1, k = 1.5 xm = 1, k = 2
0.00 α = 3, β = 0.5 0.00 0.00 λ = 1, k = 5 0.00 xm = 1, k = 4
0 1 2 3 4 5 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5
x x x x
7
2 Probability Theory Law of Total Probability
n n
Definitions X G
P [B] = P [B|Ai ] P [Ai ] Ω= Ai
• Sample space Ω i=1 i=1
2. P [Ω] = 1
"∞ #
G ∞
X 3 Random Variables
3. P Ai = P [Ai ]
i=1 i=1 Random Variable (RV)
• Probability space (Ω, A, P) X:Ω→R
Probability Mass Function (PMF)
Properties
• P [∅] = 0 fX (x) = P [X = x] = P [{ω ∈ Ω : X(ω) = x}]
• B = Ω ∩ B = (A ∪ ¬A) ∩ B = (A ∩ B) ∪ (¬A ∩ B) Probability Density Function (PDF)
• P [¬A] = 1 − P [A]
Z b
• P [B] = P [A ∩ B] + P [¬A ∩ B]
P [a ≤ X ≤ b] = f (x) dx
• P [Ω] = 1 P [∅] = 0 a
S T T S
• ¬( n An ) = n ¬An ¬( n An ) = n ¬An DeMorgan
S T Cumulative Distribution Function (CDF)
• P [ n An ] = 1 − P [ n ¬An ]
• P [A ∪ B] = P [A] + P [B] − P [A ∩ B] FX : R → [0, 1] FX (x) = P [X ≤ x]
=⇒ P [A ∪ B] ≤ P [A] + P [B] 1. Nondecreasing: x1 < x2 =⇒ F (x1 ) ≤ F (x2 )
• P [A ∪ B] = P [A ∩ ¬B] + P [¬A ∩ B] + P [A ∩ B] 2. Normalized: limx→−∞ = 0 and limx→∞ = 1
• P [A ∩ ¬B] = P [A] − P [A ∩ B] 3. Right-Continuous: limy↓x F (y) = F (x)
Continuity of Probabilities
S∞ Z b
• A1 ⊂ A2 ⊂ . . . =⇒ limn→∞ P [An ] = P [A] whereA = i=1 Ai P [a ≤ Y ≤ b | X = x] = fY |X (y | x)dy a≤b
T∞
• A1 ⊃ A2 ⊃ . . . =⇒ limn→∞ P [An ] = P [A] whereA = i=1 Ai a
f (x, y)
Independence ⊥
⊥ fY |X (y | x) =
A⊥
⊥ B ⇐⇒ P [A ∩ B] = P [A] P [B] fX (x)
Independence
Conditional Probability
1. P [X ≤ x, Y ≤ y] = P [X ≤ x] P [Y ≤ y]
P [A ∩ B]
P [A | B] = P [B] > 0 2. fX,Y (x, y) = fX (x)fY (y)
P [B] 8
Z
3.1 Transformations • E [XY ] = xyfX,Y (x, y) dFX (x) dFY (y)
X,Y
Transformation function
• E [ϕ(Y )] 6= ϕ(E [X]) (cf. Jensen inequality)
Z = ϕ(X)
• P [X ≥ Y ] = 1 =⇒ E [X] ≥ E [Y ]
Discrete • P [X = Y ] = 1 ⇐⇒ E [X] = E [Y ]
X ∞
fZ (z) = P [ϕ(X) = z] = P [{x : ϕ(x) = z}] = P X ∈ ϕ−1 (z) =
f (x)
X
• E [X] = P [X ≥ x]
x∈ϕ−1 (z) x=1
Cauchy-Schwarz
2 Exponential
E [XY ] ≤ E X 2 E Y 2
n
X
Markov • Xi ∼ Exp (β) ∧ Xi ⊥
⊥ Xj =⇒ Xi ∼ Gamma (n, β)
E [ϕ(X)]
P [ϕ(X) ≥ t] ≤ i=1
t • Memoryless property: P [X > x + y | X > y] = P [X > x]
Chebyshev
V [X] Normal
P [|X − E [X]| ≥ t] ≤
t2
X−µ
Chernoff • X ∼ N µ, σ 2 =⇒ σ∼ N (0, 1)
δ
e •
X ∼ N µ, σ 2 ∧ Z = aX + b =⇒ Z ∼ N aµ + b, a2 σ 2
P [X ≥ (1 + δ)µ] ≤ δ > −1
(1 + δ)1+δ •
X ∼ N µ1 , σ12 ∧ Y ∼ N µ2 , σ22 =⇒ X + Y ∼ N µ1 + µ2 , σ12 + σ22
P 2
Xi ∼ N µi , σi2 =⇒
Hoeffding P P
• X ∼N i µi , i σi
i i
b−µ a−µ
X1 , . . . , Xn independent ∧ P [Xi ∈ [ai , bi ]] = 1 ∧ 1 ≤ i ≤ n • P [a < X ≤ b] = Φ σ − Φ σ
2 • Φ(−x) = 1 − Φ(x) φ0 (x) = −xφ(x) φ00 (x) = (x2 − 1)φ(x)
P X̄ − E X̄ ≥ t ≤ e−2nt t > 0
• Upper quantile of N (0, 1): zα = Φ−1 (1 − α)
2n2 t2
P |X̄ − E X̄ | ≥ t ≤ 2 exp − Pn 2
t>0 Gamma
i=1 (bi − ai )
Jensen • X ∼ Gamma (α, β) ⇐⇒ X/β ∼ Gamma (α, 1)
Pα
E [ϕ(X)] ≥ ϕ(E [X]) ϕ convex • Gamma (α, β) ∼ i=1 Exp (β)
10
9.2 Bivariate Normal
P P
• Xi ∼ Gamma (αi , β) ∧ Xi ⊥
⊥ Xj =⇒ i Xi ∼ Gamma ( i αi , β)
Z ∞
Γ(α)
• = xα−1 e−λx dx Let X ∼ N µx , σx2 and Y ∼ N µy , σy2 .
λα 0
Beta 1 z
f (x, y) = exp −
2(1 − ρ2 )
p
2πσx σy 1−ρ 2
1 Γ(α + β) α−1
• xα−1 (1 − x)β−1 = x (1 − x)β−1
B(α, β) Γ(α)Γ(β) " 2 2 #
B(α + k, β) α+k−1 x − µx y − µy x − µx y − µy
• E Xk = = E X k−1
z= + − 2ρ
B(α, β) α+β+k−1 σx σy σx σy
• Beta (1, 1) ∼ Unif (0, 1) Conditional mean and variance
σX
E [X | Y ] = E [X] + ρ (Y − E [Y ])
8 Probability and Moment Generating Functions σY
p
V [X | Y ] = σX 1 − ρ2
• GX (t) = E tX |t| < 1
"∞ # ∞
X (Xt)i X E Xi
· ti
t
Xt
• MX (t) = GX (e ) = E e =E = 9.3 Multivariate Normal
i=0
i! i=0
i!
• P [X = 0] = GX (0) Covariance matrix Σ (Precision matrix Σ−1 )
• P [X = 1] = G0X (0)
(i)
V [X1 ] · · · Cov [X1 , Xk ]
GX (0) .. .. ..
• P [X = i] = Σ=
. . .
i!
• E [X] = G0X (1− ) Cov [Xk , X1 ] · · · V [Xk ]
(k)
• E X k = MX (0) If X ∼ N (µ, Σ),
X! (k)
• E = GX (1− ) −1/2
1
(X − k)! fX (x) = (2π)−n/2 |Σ| exp − (x − µ)T Σ−1 (x − µ)
2 2
• V [X] = G00X (1− ) + G0X (1− ) − (G0X (1− ))
d Properties
• GX (t) = GY (t) =⇒ X = Y
• Z ∼ N (0, 1) ∧ X = µ + Σ1/2 Z =⇒ X ∼ N (µ, Σ)
9 Multivariate Distributions • X ∼ N (µ, Σ) =⇒ Σ−1/2 (X − µ) ∼ N (0, 1)
• X ∼ N (µ, Σ) =⇒ AX ∼ N Aµ, AΣAT
9.1 Standard Bivariate Normal
• X ∼ N (µ, Σ) ∧ kak = k =⇒ aT X ∼ N aT µ, aT Σa
p
Let X, Y ∼ N (0, 1) ∧ X ⊥
⊥ Z where Y = ρX + 1 − ρ2 Z
10 Convergence
Joint density
x2 + y 2 − 2ρxy
1 Let {X1 , X2 , . . .} be a sequence of rv’s and let X be another rv. Let Fn denote
f (x, y) = exp −
2(1 − ρ2 )
p
2π 1 − ρ2 the cdf of Xn and let F denote the cdf of X.
Conditionals
Types of convergence
(Y | X = x) ∼ N ρx, 1 − ρ2 (X | Y = y) ∼ N ρy, 1 − ρ2
and D
1. In distribution (weakly, in law): Xn → X
Independence
X⊥
⊥ Y ⇐⇒ ρ = 0 lim Fn (t) = F (t) ∀t where F continuous
n→∞ 11
P
2. In probability: Xn → X 10.2 Central Limit Theorem (CLT)
Let {X1 , . . . , Xn } be a sequence of iid rv’s, E [X1 ] = µ, and V [X1 ] = σ 2 .
(∀ε > 0) lim P [|Xn − X| > ε] = 0
n→∞
√
3. Almost surely (strongly): Xn → X
as
X̄n − µ n(X̄n − µ) D
Zn := q = →Z where Z ∼ N (0, 1)
V X̄n σ
h i h i
P lim Xn = X = P ω ∈ Ω : lim Xn (ω) = X(ω) = 1
n→∞ n→∞ lim P [Zn ≤ z] = Φ(z) z∈R
n→∞
qm
4. In quadratic mean (L2 ): Xn → X CLT notations
Score function τn − τ ) D
(b
∂ → N (0, 1)
s(X; θ) = log f (X; θ) se(b
b τ)
∂θ
Fisher information where τb = ϕ(θ)
b is the mle of τ and
I(θ) = Vθ [s(X; θ)]
In (θ) = nI(θ) b = ϕ0 (θ)
se se(
b θn )
b b
14
12.3 Multiparameter Models 13 Hypothesis Testing
Let θ = (θ1 , . . . , θk ) and θb = (θb1 , . . . , θbk ) be the mle.
H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1
∂ 2 `n ∂ 2 `n
Hjj = Hjk = Definitions
∂θ2 ∂θj ∂θk
Fisher information matrix • Null hypothesis H0
• Alternative hypothesis H1
Eθ [H11 ] · · · Eθ [H1k ]
In (θ) = −
.. .. .. • Simple hypothesis θ = θ0
. . .
• Composite hypothesis θ > θ0 or θ < θ0
Eθ [Hk1 ] · · · Eθ [Hkk ]
• Two-sided test: H0 : θ = θ0 versus H1 : θ 6= θ0
Under appropriate regularity conditions • One-sided test: H0 : θ ≤ θ0 versus H1 : θ > θ0
(θb − θ) ≈ N (0, Jn ) • Critical value c
• Test statistic T
with Jn (θ) = In−1 . Further, if θbj is the j th component of θ, then • Rejection region R = {x : T (x) > c}
• Power function β(θ) = P [X ∈ R]
(θbj − θj ) D
→ N (0, 1) • Power of a test: 1 − P [Type II error] = 1 − β = inf β(θ)
se
bj θ∈Θ1
h i • Test size: α = P [Type I error] = sup β(θ)
b 2j = Jn (j, j) and Cov θbj , θbk = Jn (j, k)
where se θ∈Θ0
Retain H0 Reject H0
12.3.1 Multiparameter delta method √
H0 true Type
√ I Error (α)
Let τ = ϕ(θ1 , . . . , θk ) and let the gradient of ϕ be H1 true Type II Error (β) (power)
p-value
∂ϕ
∂θ1 • p-value = supθ∈Θ0 Pθ [T (X) ≥ T (x)] = inf α : T (x) ∈ Rα
.
∇ϕ = .. Pθ [T (X ? ) ≥ T (X)]
• p-value = supθ∈Θ0 = inf α : T (X) ∈ Rα
∂ϕ | {z }
1−Fθ (T (X)) since T (X ? )∼Fθ
∂θk
p-value evidence
Suppose ∇ϕθ=θb 6= 0 and τb = ϕ(θ).
b Then,
< 0.01 very strong evidence against H0
τ − τ) D
(b 0.01 − 0.05 strong evidence against H0
→ N (0, 1) 0.05 − 0.1 weak evidence against H0
se(b
b τ)
> 0.1 little or no evidence against H0
where r Wald test
T
se(b
b τ) = ∇ϕ
b Jbn ∇ϕ
b
• Two-sided test
θb − θ0
b and ∇ϕ
and Jbn = Jn (θ) b = ∇ϕ b.
θ=θ • Reject H0 when |W | > zα/2 where W =
se
b
12.4 Parametric Bootstrap • P |W | > zα/2 → α
• p-value = Pθ0 [|W | > |w|] ≈ P [|Z| > |w|] = 2Φ(−|w|)
Sample from f (x; θbn ) instead of from Fbn , where θbn could be the mle or method
of moments estimator. Likelihood ratio test (LRT)
15
supθ∈Θ Ln (θ) Ln (θbn ) Vector parameter
• T (X) = =
supθ∈Θ0 Ln (θ) Ln (θbn,0 ) ( s
X
)
k fX (x | θ) = h(x) exp ηi (θ)Ti (x) − A(θ)
iid
D
X
• λ(X) = 2 log T (X) → χ2r−q where Zi2 ∼ χ2k and Z1 , . . . , Zk ∼ N (0, 1) i=1
Types
Under the assumption of Normality, the least squares parameter estimators are
Estimate regression function
also the MLEs, but the least squares variance estimator is not the MLE
n k
1X 2 X
b2 =
σ ˆ rb(x) = βbj xj
n i=1 i j=1
Training error
n
R
btr (S) =
X
(Ybi (S) − Yi )2 19 Non-parametric Function Estimation
i=1
h i n
X h i Frequentist risk
bias(Rtr (S)) = E Rtr (S) − R(S) = −2
b b Cov Ybi , Yi
i=1 h i Z Z
R(f, fbn ) = E L(f, fbn ) = b2 (x) dx + v(x) dx
Adjusted R2
n − 1 rss
R2 (S) = 1 −
n − k tss
h i
Mallow’s Cp statistic b(x) = E fbn (x) − f (x)
h i
R(S)
b =R σ 2 = lack of fit + complexity penalty
btr (S) + 2kb v(x) = V fbn (x)
22
19.1.1 Histograms KDE
n
Definitions
1X1 x − Xi
fbn (x) = K
n i=1 h h
• Number of bins m 1 4
Z
00 2 1
Z
• Binwidth h = m 1 R(f, fn ) ≈ (hσK )
b (f (x)) dx + K 2 (x) dx
4 nh
• Bin Bj has νj observations c
−2/5 −1/5 −1/5
c2 c3
Z Z
h∗ = 1 c = σ 2
, c = K 2
(x) dx, c = (f 00 (x))2 dx
R
• Define pbj = νj /n and pj = Bj f (u) du 1 K 2 3
n1/5
Z 4/5 Z 1/5
∗ c4 5 2 2/5 2 00 2
Histogram estimator R (f, fn ) = 4/5
b c4 = (σK ) K (x) dx (f ) dx
n 4
| {z }
m C(K)
X pbj
fbn (x) = I(x ∈ Bj )
h
j=1 Epanechnikov Kernel
h i pj
E fbn (x) = ( √
h √ 3
|x| < 5
h i p (1 − p ) K(x) = 4 5(1−x2 /5)
j j
V fbn (x) = 0 otherwise
nh2
h2
Z
2 1
R(fbn , f ) ≈ (f 0 (u)) du + Cross-validation estimate of E [J(h)]
12 nh
!1/3
∗ 1 6 n n n
1 X X ∗ Xi − Xj
Z
h = 1/3 R 2 du 2Xb 2
n (f 0 (u)) JbCV (h) = fbn2 (x) dx − f(−i) (Xi ) ≈ 2
K + K(0)
n i=1 hn i=1 j=1 h nh
2/3 Z 1/3
∗ b C 3 0 2
R (fn , f ) ≈ 2/3 C= (f (u)) du
n 4 Z
∗ (2) (2)
K (x) = K (x) − 2K(x) K (x) = K(x − y)K(y) dy
Cross-validation estimate of E [J(h)]
Z n m
JbCV (h) = fbn2 (x) dx −
2Xb
f(−i) (Xi ) =
2
−
n+1 X 2
pb
19.2 Non-parametric Regression
n i=1 (n − 1)h (n − 1)h j=1 j
Estimate f (x) where f (x) = E [Y | X = x]. Consider pairs of points
(x1 , Y1 ), . . . , (xn , Yn ) related by
φ0 (xn ) · · · φJ (xn )
Pm+n = Pm Pn
Least squares estimator
βb = (ΦT Φ)−1 ΦT Y Pn = P × · · · × P = Pn
1
≈ ΦT Y (for equally spaced observations only) Marginal probability
n
Cross-validation estimate of E [J(h)] µn = (µn (1), . . . , µn (N )) where µi (i) = P [Xn = i]
2
µ0 , initial distribution
Xn J
X
R
bCV (J) = Yi − φj (xi )βbj,(−i) µn = µ0 Pn
24
i=1 j=1
20.2 Poisson Processes Autocorrelation function (ACF)
Poisson process
Cov [xs , xt ] γ(s, t)
ρ(s, t) = p =p
• {Xt : t ∈ [0, ∞)} = number of events up to and including time t V [xs ] V [xt ] γ(s, s)γ(t, t)
• X0 = 0
• Independent increments: Cross-covariance function (CCV)
∀t0 < · · · < tn : Xt1 − Xt0 ⊥
⊥ · · · ⊥⊥ Xtn − Xtn−1
γxy (s, t) = E [(xs − µxs )(yt − µyt )]
• Intensity function λ(t)
– P [Xt+h − Xt = 1] = λ(t)h + o(h) Cross-correlation function (CCF)
– P [Xt+h − Xt = 2] = o(h)
Rt γxy (s, t)
• Xs+t − Xs ∼ Po (m(s + t) − m(s)) where m(t) = 0
λ(s) ds ρxy (s, t) = p
γx (s, s)γy (t, t)
Homogeneous Poisson process
Backshift operator
λ(t) ≡ λ =⇒ Xt ∼ Po (λt) λ>0
B k (xt ) = xt−k
Waiting times
Wt := time at which Xt occurs
Difference operator
1
Wt ∼ Gamma t, ∇d = (1 − B)d
λ
Interarrival times
White noise
St = Wt+1 − Wt
1 2
• wt ∼ wn(0, σw )
St ∼ Exp
λ iid 2
• Gaussian: wt ∼ N 0, σw
St
• E [wt ] = 0 t ∈ T
• V [wt ] = σ 2 t ∈ T
Wt−1 Wt t • γw (s, t) = 0 s 6= t ∧ s, t ∈ T
Random walk
21 Time Series
• Drift δ
Mean function ∞
Pt
• xt = δt + j=1 wj
Z
µxt = E [xt ] = xft (x) dx
−∞ • E [xt ] = δt
Autocovariance function
Symmetric moving average
γx (s, t) = E [(xs − µs )(xt − µt )] = E [xs xt ] − µs µt
k
X k
X
γx (t, t) = E (xt − µt )2 = V [xt ]
mt = aj xt−j where aj = a−j ≥ 0 and aj = 1
25
j=−k j=−k
21.1 Stationary Time Series 21.2 Estimation of Correlation
Strictly stationary Sample mean
n
1X
x̄ = xt
P [xt1 ≤ c1 , . . . , xtk ≤ ck ] = P [xt1 +h ≤ c1 , . . . , xtk +h ≤ ck ] n t=1
Sample variance
n
∀k ∈ N, tk , ck , h ∈ Z
1 X |h|
V [x̄] = 1− γx (h)
n n
h=−n
Weakly stationary
Sample autocovariance function
• E x2t < ∞ ∀t ∈ Z
n−h
1 X
2
• E xt = m ∀t ∈ Z γ
b(h) = (xt+h − x̄)(xt − x̄)
• γx (s, t) = γx (s + r, t + r) ∀r, s, t ∈ Z n t=1
γ
b(h)
• γ(h) = E [(xt+h − µ)(xt − µ)] ∀h ∈ Z ρb(h) =
γ
b(0)
• γ(0) = E (xt − µ)2
• γ(0) ≥ 0 Sample cross-variance function
• γ(0) ≥ |γ(h)|
n−h
• γ(h) = γ(−h) 1 X
γ
bxy (h) = (xt+h − x̄)(yt − y)
n t=1
Autocorrelation function (ACF)
Sample cross-correlation function
Cov [xt+h , xt ] γ(t + h, t) γ(h)
ρx (h) = p =p = γ
bxy (h)
V [xt+h ] V [xt ] γ(t + h, t + h)γ(t, t) γ(0) ρbxy (h) = p
γbx (0)b
γy (0)
Jointly stationary time series Properties
• ARMA (p, q) causal ⇐⇒ roots of φ(z) lie outside the unit circle Spectral distribution function
∞
X θ(z)j 0
ω < −ω0
ψ(z) = ψj z = |z| ≤ 1
φ(z) F (ω) = σ 2 /2 −ω ≤ ω < ω0
j=0
2
σ ω ≥ ω0
• ARMA (p, q) invertible ⇐⇒ roots of θ(z) lie outside the unit circle
• F (−∞) = F (−1/2) = 0
∞
X φ(z) • F (∞) = F (1/2) = γ(0)
π(z) = πj z j = |z| ≤ 1
θ(z)
j=0 Spectral density
Behavior of the ACF and PACF for causal and invertible ARMA models ∞
X 1 1
f (ω) = γ(h)e−2πiωh − ≤ω≤
AR (p) MA (q) ARMA (p, q) 2 2
h=−∞
ACF tails off cuts off after lag q tails off
P∞ R 1/2
PACF cuts off after lag p tails off q tails off • Needs h=−∞ |γ(h)| < ∞ =⇒ γ(h) = −1/2
e2πiωh f (ω) dω h = 0, ±1, . . .
• f (ω) ≥ 0
21.5 Spectral Analysis • f (ω) = f (−ω)
Periodic process • f (ω) = f (1 − ω)
R 1/2
• γ(0) = V [xt ] = −1/2 f (ω) dω
xt = A cos(2πωt + φ) 2
• White noise: fw (ω) = σw
= U1 cos(2πωt) + U2 sin(2πωt)
• ARMA (p, q) , φ(B)xt = θ(B)wt :
• Frequency index ω (cycles per unit time), period 1/ω |θ(e−2πiω )|2
2
• Amplitude A fx (ω) = σw
|φ(e−2πiω )|2
• Phase φ
Pp Pq
• U1 = A cos φ and U2 = A sin φ often normally distributed rv’s where φ(z) = 1 − k=1 φk z k and θ(z) = 1 + k=1 θk z k
28
Discrete Fourier Transform (DFT) • I0 (a, b) = 0 I1 (a, b) = 1
n • Ix (a, b) = 1 − I1−x (b, a)
X
d(ωj ) = n−1/2 xt e−2πiωj t
i=1 22.3 Series
Fourier/Fundamental frequencies Finite Binomial
n n
ωj = j/n X n(n + 1) X n
• k= • = 2n
2 k
Inverse DFT k=1 k=0
n−1 n n
X r+k r+n+1
xt = n−1/2 d(ωj )e2πiωj t
X X
• (2k − 1) = n2 • =
j=0
k n
k=1 k=0
n n
Periodogram X n(n + 1)(2n + 1) X k n+1
2 • k2 = • =
I(j/n) = |d(j/n)| 6 m m+1
k=1 k=0
Scaled Periodogram n 2 • Vandermonde’s Identity:
X n(n + 1)
• k3 = r
m n
m+n
2
X
4 k=1 =
P (j/n) = I(j/n) n k r−k r
n X cn+1 − 1 k=0
n
!2 n
!2 • ck = c 6= 1 • Binomial Theorem:
2X 2X c−1 n
n n−k k
= xt cos(2πtj/n + xt sin(2πtj/n k=0
X
n t=1 n t=1 a b = (a + b)n
k
k=0
22 Math Infinite
∞ ∞
22.1 Gamma Function
X 1 X p
• pk = , pk = |p| < 1
Z ∞ 1−p 1−p
k=0 k=1
• Ordinary: Γ(s) = ts−1 e−t dt ∞ ∞
!
0
X d X d 1 1
Z ∞ • kpk−1 = pk
= = |p| < 1
• Upper incomplete: Γ(s, x) = ts−1 e−t dt dp dp 1 − p (1 − p)2
k=0 k=0
x ∞
X r + k − 1
Z x
• Lower incomplete: γ(s, x) = ts−1 e−t dt • xk = (1 − x)−r r ∈ N+
k
0 k=0
∞
• Γ(α + 1) = αΓ(α) α>1 X α k
• p = (1 + p)α |p| < 1 , α ∈ C
• Γ(n) = (n − 1)! n∈N k
√ k=0
• Γ(1/2) = π
Partitions
n
X
Pn+k,k = Pn,i k > n : Pn,k = 0 n ≥ 1 : Pn,0 = 0, P0,0 = 1
i=1
References
[1] L. M. Leemis and J. T. McQueston. Univariate Distribution Relationships. The American
Statistician, 62(1):45–53, 2008.
[2] A. Steger. Diskrete Strukturen – Band 1: Kombinatorik, Graphentheorie, Algebra.
Springer, 2001.
[3] A. Steger. Diskrete Strukturen – Band 2: Wahrscheinlichkeitstheorie und Statistik.
Springer, 2002.
30
Univariate distribution relationships, courtesy Leemis and McQueston [1].
31