Professional Documents
Culture Documents
Stat Cookbook PDF
Stat Cookbook PDF
Cookbook
Version 0.2.1
14th July, 2016
http://statistics.zone/
c Matthias Vallentin, 2016
Copyright
Contents
3 15 Bayesian Inference
15.1 Credible Intervals . . . . . . . . . . . .
3
15.2 Function of parameters . . . . . . . . .
5
15.3 Priors . . . . . . . . . . . . . . . . . .
Probability Theory
8
15.3.1 Conjugate Priors . . . . . . . .
15.4 Bayesian Testing . . . . . . . . . . . .
Random Variables
8
3.1 Transformations . . . . . . . . . . . . . 9 16 Sampling Methods
16.1 Inverse Transform Sampling . . . . . .
Expectation
9
16.2 The Bootstrap . . . . . . . . . . . . .
16.2.1 Bootstrap Confidence Intervals
Variance
9
16.3 Rejection Sampling . . . . . . . . . . .
16.4 Importance Sampling . . . . . . . . . .
Inequalities
10
17 Decision Theory
Distribution Relationships
10
17.1 Risk . . . . . . . . . . . . . . . . . . .
17.2 Admissibility . . . . . . . . . . . . . .
Probability and Moment Generating
17.3 Bayes Rule . . . . . . . . . . . . . . .
Functions
11
17.4 Minimax Rules . . . . . . . . . . . . .
1 Distribution Overview
1.1 Discrete Distributions . . . . . . . . . .
1.2 Continuous Distributions . . . . . . . .
2
3
4
5
6
7
8
14 Exponential Family
16
.
.
.
.
.
16 22 Math
22.1 Gamma Function
16
22.2 Beta Function . .
17
22.3 Series . . . . . .
17
22.4 Combinatorics .
17
18
.
.
.
.
.
18
18
18
18
19
19
.
.
.
.
19
19
20
20
20
9 Multivariate Distributions
11 18 Linear Regression
9.1 Standard Bivariate Normal . . . . . . . 11
18.1 Simple Linear Regression . . . . . . . .
9.2 Bivariate Normal . . . . . . . . . . . . . 11
18.2 Prediction . . . . . . . . . . . . . . . . .
9.3 Multivariate Normal . . . . . . . . . . . 11
18.3 Multiple Regression . . . . . . . . . . .
18.4 Model Selection . . . . . . . . . . . . . .
10 Convergence
11
10.1 Law of Large Numbers (LLN) . . . . . . 12 19 Non-parametric Function Estimation
10.2 Central Limit Theorem (CLT) . . . . . 12
19.1 Density Estimation . . . . . . . . . . . .
19.1.1 Histograms . . . . . . . . . . . .
11 Statistical Inference
12
19.1.2 Kernel Density Estimator (KDE)
11.1 Point Estimation . . . . . . . . . . . . . 12
19.2 Non-parametric Regression . . . . . . .
11.2 Normal-Based Confidence Interval . . . 13
19.3 Smoothing Using Orthogonal Functions
11.3 Empirical distribution . . . . . . . . . . 13
11.4 Statistical Functionals . . . . . . . . . . 13 20 Stochastic Processes
20.1 Markov Chains . . . . . . . . . . . . . .
12 Parametric Inference
13
20.2 Poisson Processes . . . . . . . . . . . . .
12.1 Method of Moments . . . . . . . . . . . 13
12.2 Maximum Likelihood . . . . . . . . . . . 14 21 Time Series
21.1 Stationary Time Series . . . . . . . . . .
12.2.1 Delta Method . . . . . . . . . . . 14
21.2 Estimation of Correlation . . . . . . . .
12.3 Multiparameter Models . . . . . . . . . 14
21.3 Non-Stationary Time Series . . . . . . .
12.3.1 Multiparameter delta method . . 15
21.3.1 Detrending . . . . . . . . . . . .
12.4 Parametric Bootstrap . . . . . . . . . . 15
21.4 ARIMA models . . . . . . . . . . . . . .
13 Hypothesis Testing
15
21.4.1 Causality and Invertibility . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
29
30
20
20
21
21
22
22
22
23
23
23
24
24
24
25
25
26
26
26
27
27
28
Distribution Overview
1.1
Discrete Distributions
Notation1
Uniform
Unif {a, . . . , b}
FX (x)
Bernoulli
Binomial
Multinomial
Hypergeometric
Negative Binomial
1 We
Bern (p)
Bin (n, p)
x<a
axb
x>b
(1 p)1x
bxca+1
ba
I1p (n x, x + 1)
Mult (n, p)
Hyp (N, m, n)
NBin (r, p)
Geometric
Geo (p)
Poisson
Po ()
x np
p
np(1 p)
Ip (r, x + 1)
1 (1 p)x
e
x N+
x
X
i
i!
i=0
fX (x)
E [X]
V [X]
MX (s)
I(a x b)
ba+1
a+b
2
(b a + 1)2 1
12
eas e(b+1)s
s(b a)
px (1 p)1x
p(1 p)
1 p + pes
np
np(1 p)
(1 p + pes )n
!
n x
p (1 p)nx
x
n!
x
px1 pk k
x1 ! . . . xk ! 1
m N m
x
k
X
xi = n
i=1
nx
N
n
!
x+r1 r
p (1 p)x
r1
p(1 p)x1
x e
x!
x N+
np1
.
..
npk
nm
N
np1 (1 p1 )
np2 p1
np1 p2
..
.
k
X
!n
pi e
si
i=0
nm(N n)(N m)
N 2 (N 1)
1p
r
p
1p
r 2
p
1
p
1p
p2
p
1 (1 p)es
r
pes
1 (1 p)es
e(e
1)
use the notation (s, x) and (x) to refer to the Gamma functions (see 22.1), and use B(x, y) and Ix to refer to the Beta functions (see 22.2).
Uniform (discrete)
Binomial
Geometric
n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
0.8
Poisson
p = 0.2
p = 0.5
p = 0.8
=1
=4
= 10
0.3
0.2
0.6
0.1
10
1.00
30
0.0
2.5
5.0
7.5
0.0
10.0
10
Geometric
1.0
1.00
0.75
20
Poisson
0.8
15
0.6
10
0.4
0.25
0.00
0.50
CDF
0.50
CDF
CDF
CDF
0.25
40
i
n
0.0
0.75
i
n
Binomial
Uniform (discrete)
20
x
1
0.1
0.0
0.2
0.2
0.4
PMF
PMF
PMF
PMF
1
n
20
30
n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
40
0.2
0.0
2.5
5.0
7.5
p = 0.2
p = 0.5
p = 0.8
10.0
0.00
10
15
=1
=4
= 10
20
1.2
Continuous Distributions
Notation
FX (x)
Uniform
Unif (a, b)
Normal
N , 2
x<a
a<x<b
1
x>b
Z x
(x) =
(t) dt
xa
ba
Log-Normal
ln N , 2
Multivariate Normal
MVN (, )
Students t
Chi-square
Student()
2k
1
1
ln x
+ erf
2
2
2 2
fX (x)
E [X]
V [X]
MX (s)
exp
2 2
x 2 2
a+b
2
(b a)2
12
esb esa
s(b a)
2 s2
exp s +
2
,
(k/2)
2 2
e+
1 (x)
(e 1)e2+
/2
1
xk/21 ex/2
2k/2 (k/2)
r
>1
1
exp T s + sT s
2
(+1)/2
+1
x2
2
1
+
>2
1<2
2k
d2
d2 2
2d22 (d1 + d2 2)
d1 (d2 2)2 (d2 4)
F
F(d1 , d2 )
Exponential
Exp ()
Gamma
Inverse Gamma
Dirichlet
Beta
Weibull
Pareto
d1 x
d1 x+d2
Gamma (, )
InvGamma (, )
d1 d1
,
2 2
Pareto(xm , )
1
1
(, x)
()
, x
()
1 x
x
e
()
1
1
1 /x
x
e
()
P
k
k
i=1 i Y 1
xi i
Qk
i=1 (i ) i=1
>1
1
2
>2
( 1)2 ( 2)
( + ) 1
x
(1 x)1
() ()
+
1
1 +
k
( + )2 ( + + 1)
2
2 1 +
2
k
xm
>1
1
x2m
>2
( 1)2 ( 2)
1 e(x/)
x
xB
d1 d1
, 2
2
1 x/
e
Ix (, )
Weibull(, k)
1 ex/
Dir ()
Beta (, )
(d1 x)d1 d2 2
x
1
.
x xm
k x k1 (x/)k
e
x
m
+1
x
i
Pk
i=1 i
x xm
(s < )
!
s
(s < )
p
2(s)/2
K
4s
()
E [Xi ] (1 E [Xi ])
Pk
i=1 i + 1
1+
X
k=1
X
n=0
k1
Y
r=0
+r
++r
sk
k!
sn n
n
1+
n!
k
(xm s) (, xm s) s < 0
Uniform (continuous)
Normal
2.0
LogNormal
1.00
= 0, = 0.2
= 0, 2 = 1
= 0, 2 = 5
= 2, 2 = 0.5
2
1.0
0.5
0.0
0.00
2.5
0.0
2
k=1
k=2
k=3
k=4
k=5
2.5
5.0
0.0
0
5.0
0.0
Exponential
5.0
Gamma
= 0.5
=1
= 2.5
2.0
d1 = 1, d2 = 1
d1 = 2, d2 = 1
d1 = 5, d2 = 2
d1 = 100, d2 = 1
d1 = 100, d2 = 100
2.5
x
= 1, = 0.5
= 2, = 0.5
= 3, = 0.5
= 5, = 1
= 9, = 2
0.5
0.4
1.5
0.3
2.5
0.75
0.50
0.2
0.1
0.25
5.0
1.00
0.50
=1
=2
=5
=
0.3
= 0, = 3
= 2, 2 = 2
= 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
= 0.125, 2 = 1
0.75
1
ba
1.5
Student's t
0.4
1.0
0.2
1
0.5
0.25
0.1
0.00
0
0.0
0.0
0
Inverse Gamma
= 1, = 1
= 2, = 1
= 3, = 1
= 3, = 0.5
10
15
20
Weibull
= 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
Beta
5
2.0
Pareto
4
= 1, k = 0.5
= 1, k = 1
= 1, k = 1.5
= 1, k = 5
xm = 1, k = 1
xm = 1, k = 2
xm = 1, k = 4
1.5
3
3
1.0
0.5
0
0
0.0
0.00
0.25
0.50
0.75
1.00
0.0
0.5
1.0
1.5
2.0
2.5
1.0
1.5
2.0
2.5
Uniform (continuous)
Normal
LogNormal
1.00
Student's t
1.00
= 0, = 3
= 2, 2 = 2
= 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
= 0.125, 2 = 1
2
0.75
0.75
0.75
CDF
CDF
CDF
CDF
0.50
0.50
0.50
0.25
0.25
0.25
= 0, 2 = 0.2
= 0, 2 = 1
= 0, 2 = 5
= 2, 2 = 0.5
0.00
a
5.0
2.5
0.0
x
2
k=1
k=2
k=3
k=4
k=5
0.00
4
Exponential
0.75
0.75
0.50
0.00
2
CDF
0.75
CDF
0.75
0.50
0.00
3
0.00
0.25
0.50
10
0.75
1.00
15
20
Pareto
1.00
1.00
0.75
0.75
0.50
0.50
0.25
= 1, k = 0.5
= 1, k = 1
= 1, k = 1.5
= 1, k = 5
0.00
0.00
0.00
0.25
0.25
= 1, = 1
= 2, = 1
= 3, = 1
= 3, = 0.5
= 1, = 0.5
= 2, = 0.5
= 3, = 0.5
= 5, = 1
= 9, = 2
Weibull
= 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
5.0
0.50
Beta
1.00
CDF
Inverse Gamma
= 0.5
=1
= 2.5
0.00
2.5
0.25
1.00
0.50
0.25
d1 = 1, d2 = 1
d1 = 2, d2 = 1
d1 = 5, d2 = 2
d1 = 100, d2 = 1
d1 = 100, d2 = 100
1
0.0
Gamma
0.75
2.5
x
1.00
0.25
5.0
1.00
0.50
1.00
0.25
0.25
CDF
0.50
CDF
CDF
0.75
0.00
0
1.00
CDF
0.00
5.0
CDF
2.5
=1
=2
=5
=
0.0
0.5
1.0
1.5
2.0
2.5
xm = 1, k = 1
xm = 1, k = 2
xm = 1, k = 4
0.00
1.0
1.5
2.0
2.5
Probability Theory
Definitions
P [B] =
Sample space
Outcome (point or element)
Event A
-algebra A
P [B|Ai ] P [Ai ]
n
G
Ai
i=1
Bayes Theorem
P [B | Ai ] P [Ai ]
P [Ai | B] = Pn
j=1 P [B | Aj ] P [Aj ]
Inclusion-Exclusion Principle
[
X
n
n
Ai =
(1)r1
Probability Distribution P
i=1
1. P [A] 0 A
2. P [] = 1
" #
G
X
3. P
Ai =
P [Ai ]
r=1
n
G
Ai
i=1
\
r
A
ij
Random Variables
i=1
X:R
Probability space (, A, P)
Probability Mass Function (PMF)
Properties
i=1
1. A
S
2. A1 , A2 , . . . , A = i=1 Ai A
3. A A = A A
i=1
n
X
P [] = 0
B = B = (A A) B = (A B) (A B)
P [A] = 1 P [A]
P [B] = P [A B] + P [A B]
P [] = 1
P [] = 0
S
T
T
S
( n An ) = n An ( n An ) = n An
DeMorgan
S
T
P [ n An ] = 1 P [ n An ]
P [A B] = P [A] + P [B] P [A B]
= P [A B] P [A] + P [B]
P [A B] = P [A B] + P [A B] + P [A B]
P [A B] = P [A] P [A B]
f (x) dx
a
FX (x) = P [X x]
Continuity of Probabilities
A1 A2 . . . = limn P [An ] = P [A]
A1 A2 . . . = limn P [An ] = P [A]
S
where A = i=1 Ai
T
where A = i=1 Ai
Z
P [a Y b | X = x] =
fY |X (y | x)dy
Independence
fY |X (y | x) =
A
B P [A B] = P [A] P [B]
f (x, y)
fX (x)
Independence
Conditional Probability
P [A | B] =
ab
P [A B]
P [B]
P [B] > 0
1. P [X x, Y y] = P [X x] P [Y y]
2. fX,Y (x, y) = fX (x)fY (y)
3.1
Transformations
E [XY ] =
Transformation function
E [(Y )] 6= (E [X])
(cf. Jensen inequality)
P [X Y ] = 1 = E [X] E [Y ]
P [X = Y ] = 1 E [X] = E [Y ]
X
E [X] =
P [X x]
Z = (X)
Discrete
X
fZ (z) = P [(X) = z] = P [{x : (x) = z}] = P X 1 (z) =
fX (x)
x1 (z)
x=1
Sample mean
Continuous
X
n = 1
Xi
X
n i=1
Z
FZ (z) = P [(X) z] =
with Az = {x : (x) z}
f (x) dx
Az
Conditional expectation
Z
E [Y | X = x] = yf (y | x) dy
E [X] = E [E [X | Y ]]
Z
E(X,Y ) | X=x [=]
(x, y)fY |X (y | x) dx
Z
E [(Y, Z) | X = x] =
(y, z)f(Y,Z)|X (y, z | x) dy dz
Z
E [IA (x)] =
E [Y + Z | X] = E [Y | X] + E [Z | X]
E [(X)Y | X] = (X)E [Y | X]
EY | X [=] c = Cov [X, Y ] = 0
Z
dFX (x) = P [X A]
Convolution
Z
Z := X + Y
fX,Y (x, z x) dx
fZ (z) =
X,Y 0
fX,Y (x, z x) dx
Z := |X Y |
Z :=
X
Y
fZ (z) = 2
fX,Y (x, z + x) dx
0
Z
Z
fZ (z) =
|x|fX,Y (x, xz) dx =
xfx (x)fX (x)fY (xz) dx
Variance
Expectation
V
Z
E [X] = X =
x dFX (x) =
i=1
#
Xi =
i=1
xfX (x)
x
Z
xfX (x) dx
P [X = c] = 1 = E [X] = c
E [cX] = c E [X]
E [X + Y ] = E [X] + E [Y ]
" n
X
X discrete
n
X
i6=j
V [Xi ]
if Xi
Xj
i=1
Standard deviation
sd[X] =
V [X] = X
Covariance
X continuous
n
m
n X
m
X
X
X
Cov
Xi ,
Yj =
Cov [Xi , Yj ]
i=1
j=1
Distribution Relationships
Binomial
Xi Bern (p) =
i=1 j=1
n
X
Xi Bin (n, p)
i=1
Correlation
Cov [X, Y ]
[X, Y ] = p
V [X] V [Y ]
Independence
Negative Binomial
X
Y = [X, Y ] = 0 Cov [X, Y ] = 0 E [XY ] = E [X] E [Y ]
Sample variance
n
S2 =
1 X
n )2
(Xi X
n 1 i=1
Conditional variance
2
V [Y | X] = E (Y E [Y | X])2 | X = E Y 2 | X E [Y | X]
V [Y ] = E [V [Y | X]] + V [E [Y | X]]
Poisson
Xi Po (i ) Xi Xj =
n
X
Xi Po
i=1
n
X
!
i
i=1
X
n
X
n
Xj Bin
Xj , Pn
Xi Po (i ) Xi Xj = Xi
j
j=1
j=1
j=1
Inequalities
Cauchy-Schwarz
Exponential
2
E [XY ] E X 2 E Y 2
Markov
P [(X) t]
E [(X)]
t
Chebyshev
P [|X E [X]| t]
Chernoff
P [X (1 + )]
Xi Exp () Xi
Xj =
Xi Gamma (n, )
i=1
V [X]
t2
e
(1 + )1+
n
X
Normal
X N , 2
> 1
Hoeffding
X1 , . . . , Xn independent P [Xi [ai , bi ]] = 1 1 i n
E X
t e2nt2 t > 0
P X
2 2
E X
| t 2 exp Pn 2n t
P |X
t>0
2
i=1 (bi ai )
Jensen
E [(X)] (E [X]) convex
N (0, 1)
2
X N , Z = aX + b = Z N a + b, a2 2
P
P
P
Xi N i , i2 Xi
Xj =
Xi N
i , i i2
i
i
P [a < X b] = b
a
(x) = 1 (x)
0 (x) = x(x)
00 (x) = (x2 1)(x)
1
Upper quantile of N (0, 1): z = (1 )
Gamma
X Gamma (, ) X/ Gamma (, 1)
P
Gamma (, ) i=1 Exp ()
P
P
Xi Gamma (i , ) Xi
Xj =
i Xi Gamma (
i i , )
10
()
=
9.2
x1 ex dx
Bivariate Normal
Let X N x , x2 and Y N y , y2 .
Beta
1
( + ) 1
x1 (1 x)1 =
x
(1 x)1
B(, )
()()
B( + k, )
+k1
=
E X k1
E Xk =
B(, )
++k1
Beta (1, 1) Unif (0, 1)
f (x, y) =
"
z=
x x
x
2x y
2
+
1
p
1 2
y y
y
exp
2
2
z
2(1 2 )
x x
x
y y
y
#
X
(Y E [Y ])
Y
p
V [X | Y ] = X 1 2
E [X | Y ] = E [X] +
|t| < 1
"
#
X (Xt)i
X
E Xi
MX (t) = GX (et ) = E eXt = E
=
ti
i!
i!
i=0
i=0
9.3
P [X = 0] = GX (0)
P [X = 1] = G0X (0)
P [X = i] =
Multivariate Normal
(Precision matrix 1 )
V [X1 ]
Cov [X1 , Xk ]
..
..
..
=
.
.
.
Covariance matrix
(i)
GX (0)
i!
E [X] = G0X (1 )
(k)
E X k = MX (0)
X!
(k)
E
= GX (1 )
(X k)!
Cov [Xk , X1 ]
V [Xk ]
If X N (, ),
2
1/2
fX (x) = (2)n/2 ||
GX (t) = GY (t) = X = Y
1
exp (x )T 1 (x )
2
Properties
9
9.1
Multivariate Distributions
Let X, Y N (0, 1) X
Z where Y = X +
1 2 Z
Joint density
x2 + y 2 2xy
p
f (x, y) =
exp
2(1 2 )
2 1 2
10
Conditionals
(Y | X = x) N x, 1 2
and
(X | Y = y) N y, 1 2
Z N (0, 1) X = + 1/2 Z = X N (, )
X N (, ) = 1/2 (X ) N (0, 1)
X N (, ) = AX N A, AAT
X N (, ) kak = k = aT X N aT , aT a
Convergence
Let {X1 , X2 , . . .} be a sequence of rvs and let X be another rv. Let Fn denote
the cdf of Xn and let F denote the cdf of X.
Types of Convergence
D
t where F continuous
11
2. In probability: Xn X
n
X
n(Xn ) D
Zn := q =
Z
n
V X
CLT notations
Zn N (0, 1)
2
Xn N ,
n
2
Xn N 0,
n
n(Xn ) N 0,
n(Xn )
N (0, 1)
qm
Relationships
qm
Xn X = Xn X = Xn X
as
P
Xn X = Xn X
D
P
Xn X (c R) P [X = c] = 1 = Xn X
Xn
Xn
Xn
Xn
X
qm
X
P
X
P
X
Yn
Yn
Yn
=
Y = Xn + Yn X + Y
qm
qm
Y = Xn + Yn X + Y
P
P
Y = Xn Yn XY
P
(Xn ) (X)
Continuity correction
n x
P X
Xn X = (Xn ) (X)
qm
Xn b limn E [Xn ] = b limn V [Xn ] = 0
qm
n
X1 , . . . , Xn iid E [X] = V [X] < X
Slutzkys Theorem
D
x + 12
/ n
x 12
/ n
Delta method
P
n
Strong (SLLN)
as
n
X
10.2
n x 1
P X
Xn X and Yn c = Xn + Yn X + c
D
P
D
Xn X and Yn c = Xn Yn cX
D
D
D
In general: Xn X and Yn Y =
6
Xn + Yn X + Y
10.1
zR
where Z N (0, 1)
Yn N
11
2
,
n
= (Yn ) N
2
(), ( ())
n
0
Statistical Inference
iid
11.1
Point Estimation
12
h
i
h i
Mean squared error: mse = E (bn )2 = bias(bn )2 + V bn
11.4
11.2
Statistical functional: T (F )
Plug-in estimator of = (F ): bn = T (Fbn )
R
Linear functional: T (F ) = (x) dFX (x)
Plug-in estimator for linear functional:
Z
T (Fbn ) =
b 2 . Let z/2 = 1 (1 (/2)), i.e., P Z > z/2 = /2
Suppose bn N , se
and P z/2 < Z < z/2 = 1 where Z N (0, 1). Then
Empirical distribution
1X
(Xi )
(x) dFbn (x) =
n i=1
b 2 = T (Fbn ) z/2 se
b
Often: T (Fbn ) N T (F ), se
b
Cn = bn z/2 se
11.3
Statistical Functionals
12
Parametric Inference
Let F = f (x; ) : be a parametric model with parameter space Rk
and parameter = (1 , . . . , k ).
12.1
j
th
Method of Moments
moment
j () = E X j =
U (x) = min{Fbn + n , 1}
s
1
2
log
=
2n
xj dFX (x)
j th sample moment
bj =
1X j
X
n i=1 i
13
n(b ) N (0, )
where = gE Y Y T g T , Y = (X, X 2 , . . . , X k )T ,
1
g = (g1 , . . . , gk ) and gj =
j ()
12.2
Maximum Likelihood
Likelihood: Ln : [0, )
Ln () =
n
Y
h i
V bn
are(en , bn ) = h i 1
V en
f (Xi ; )
i=1
n
X
log f (Xi ; )
i=1
12.2.1
Delta Method
(b
n ) D
N (0, 1)
b )
se(b
Ln (bn ) = sup Ln ()
Score function
s(X; ) =
log f (X; )
b b
b = 0 ()
b n )
se
se(
Fisher information
I() = V [s(X; )]
In () = nI()
Fisher information (exponential family)
I() = E s(X; )
12.3
Multiparameter Models
2 `n
2
Hjk =
n
2 X
log f (Xi ; )
2 i=1
2 `n
j k
..
.
E [Hk1 ]
E [H11 ]
..
In () =
.
E [H1k ]
..
.
E [Hkk ]
14
Critical value c
Test statistic T
Rejection region R = {x : T (x) > c}
Power function () = P [X R]
Power of a test: 1 P [Type II error] = 1 = inf ()
1
12.3.1
1
.
=
..
p-value = sup0 P [T (X) T (x)] = inf : T (x) R
P [T (X ? ) T (X)]
p-value = sup0
= inf : T (X) R
|
{z
}
b Then,
Suppose =b 6= 0 and b = ().
1F (T (X))
(b
) D
N (0, 1)
b )
se(b
b ) =
se(b
r
T
evidence
very strong evidence against H0
strong evidence against H0
weak evidence against H0
little or no evidence against H0
Parametric Bootstrap
Hypothesis Testing
H0 : 0
versus
Two-sided test
b 0
Reject H0 when |W | > z/2 where W =
b
se
P |W | > z/2
p-value = P0 [|W | > |w|] P [|Z| > |w|] = 2(|w|)
H1 : 1
Definitions
since T (X ? )F
Wald test
Sample from f (x; bn ) instead of from Fbn , where bn could be the mle or method
of moments estimator.
13
p-value
< 0.01
0.01 0.05
0.05 0.1
> 0.1
b
Jbn
b and
b = b.
and Jbn = Jn ()
=
12.4
Type II Error ()
Reject H0
Type
I Error ()
(power)
p-value
where
Retain H0
Null hypothesis H0
Alternative hypothesis H1
Simple hypothesis = 0
Composite hypothesis > 0 or < 0
Two-sided test: H0 : = 0 versus H1 : 6= 0
One-sided test: H0 : 0 versus H1 : > 0
i=1
p-value = P0 [(X) > (x)] P 2rq > (x)
15
Multinomial LRT
X1
Xk
mle: pbn =
,...,
n
n
k
Y pbj Xj
Ln (b
pn )
T (X) =
=
Ln (p0 )
p0j
j=1
k
X
pbj
D
Xj log
2k1
(X) = 2
p
0j
j=1
Natural form
fX (x | ) = h(x) exp { T(x) A()}
= h(x)g() exp { T(x)}
= h(x)g() exp T T(x)
15
Bayesian Inference
Bayes Theorem
k
X
j=1
f ( | x) =
f (x | )f ()
f (x | )f ()
=R
Ln ()f ()
f (xn )
f (x | )f () d
Definitions
T 2k1
p-value = P 2k1 > T (x)
2
than LRT, hence preferable for small n
Faster Xk1
Independence testing
I rows, J columns, X multinomial sample of size n = I J
X
mles unconstrained: pbij = nij
X
X n = (X1 , . . . , Xn )
xn = (x1 , . . . , xn )
Prior density f ()
Likelihood f (xn | ): joint density of the data
n
Y
In particular, X n iid = f (xn | ) =
f (xi | ) = Ln ()
i=1
Posterior density f ( | xn )
R
Normalizing constant cn = f (xn ) = f (x | )f () d
Kernel: part of a density that dependsRon
R
L ()f ()d
Posterior mean n = f ( | xn ) d = R n
Ln ()f () d
15.1
Credible Intervals
Posterior interval
14
Exponential Family
P [ (a, b) | xn ] =
Scalar parameter
f ( | xn ) d = 1
Vector parameter
(
fX (x | ) = h(x) exp
s
X
)
i ()Ti (x) A()
i=1
f ( | xn ) d = /2
16
15.2
Function of parameters
15.3.1
Let = () and A = { : () }.
Posterior CDF for
n
Conjugate Priors
H(r | x ) = P [() | x ] =
f ( | x ) d
Likelihood
Conjugate prior
Unif (0, )
Pareto(xm , k)
Exp ()
Gamma (, )
Posterior hyperparameters
max x(n) , xm , k + n
n
X
xi
+ n, +
i=1
2
N , c
Posterior density
2
N 0 , 0
h( | xn ) = H 0 ( | xn )
N c , 2
15.3
Priors
N , 2
MVN(, c )
Choice
Subjective Bayesianism: prior should incorporate as much detail as possible
the researchs a priori knowledgevia prior elicitation
Objective Bayesianism: prior should incorporate as little detail as possible
(non-informative prior)
Robust Bayesianism: consider various priors and determine sensitivity of
our inferences to changes in the prior
MVN(c , )
InverseWishart(, )
Pareto(xmc , k)
Gamma (, )
Pareto(xm , kc )
Pareto(x0 , k0 )
Gamma (c , )
Gamma (0 , 0 )
Pn
0
1
n
i=1 xi
+
/
+ 2 ,
2
2
02
c
0
c1
1
n
+ 2
02
c
Pn
02 + i=1 (xi )2
+ n,
+n
+ n
x
n
,
+ n,
+ ,
+n
2
n
2
1X
(
x
)
+
(xi x
)2 +
2 i=1
2(n + )
1
1
0 + nc
1
1
1
x
,
0 0 + n
1 1
1
0 + nc
n
X
(xi c )(xi c )T
n + , +
i=1
n
X
xi
x
mc
i=1
x0 , k0 kn where k0 > kn
n
X
0 + nc , 0 +
xi
+ n, +
log
i=1
Types
Flat: f () constant
R
Proper: f () d = 1
R
Improper: f () d =
Jeffreys Prior (transformation-invariant):
f ()
I()
f ()
det(I())
17
Discrete likelihood
Likelihood
Conjugate prior
Bern (p)
Posterior hyperparameters
Beta (, )
Bin (p)
Bayes factor
Beta (, )
n
X
i=1
n
X
xi , + n
xi , +
i=1
NBin (p)
Beta (, )
Po ()
+ rn, +
Gamma (, )
n
X
n
X
n
X
n
X
xi
i=1
Ni
i=1
n
X
xi
i=1
p =
xi
log10 BF10
BF10
evidence
0 0.5
0.5 1
12
>2
1 1.5
1.5 10
10 100
> 100
Weak
Moderate
Strong
Decisive
p
1p BF10
p
+ 1p
BF10
i=1
xi , + n
16
Sampling Methods
i=1
Multinomial(p)
Dir ()
n
X
i=1
Geo (p)
Beta (, )
16.1
x(i)
+ n, +
n
X
Setup
xi
i=1
15.4
U Unif (0, 1)
XF
F 1 (u) = inf{x | F (x) u}
Algorithm
Bayesian Testing
If H0 : 0 :
Z
Prior probability P [H0 ] =
f () d
0
f ( | xn ) d
16.2
The Bootstrap
b
vboot = VFbn =
Tn,b
T
B
B r=1 n,r
b=1
f (xn | Hi ) =
f (xn | , Hi )f ( | Hi ) d
16.2.1
Normal-based interval
b boot
Tn z/2 se
f (xn | Hi )
f (xn | Hj )
| {z }
Pivotal interval
P [Hi ]
P [Hj ]
| {z }
prior odds
1. Location parameter = T (F )
18
2. Pivot Rn = bn
3. Let H(r) = P [Rn r] be the cdf of Rn
4. Let Rn,b
= bn,b
bn . Approximate H using bootstrap:
B
1 X
b
H(r)
=
I(Rn,b
r)
B
16.4
b=1
iid
), i.e., r = bn
6. r = beta sample quantile of (Rn,1
, . . . , Rn,B
7. Approximate 1 confidence interval Cn = a
, b where
a
=
b =
b 1 1 =
bn H
2
1
b
=
bn H
2
Percentile interval
16.3
bn r1/2
=
2bn 1/2
bn r/2
=
2bn /2
Rejection Sampling
Algorithm
1. Draw cand g()
2. Generate u Unif (0, 1)
k(cand )
3. Accept cand if u
M g(cand )
4. Repeat until B values of cand have been accepted
Example
We can easily sample from the prior g() = f ()
Target is the posterior h() k() = f (xn | )f ()
Envelope condition: f (xn | ) f (xn | bn ) = Ln (bn ) M
Algorithm
1. Draw cand f ()
17
Decision Theory
Definitions
Cn = /2
, 1/2
Setup
Importance Sampling
17.1
Risk
Posterior risk
Z
h
i
b
b
L(, (x))f
( | x) d = E|X L(, (x))
h
i
b
b
L(, (x))f
(x | ) dx = EX| L(, (X))
r(b | x) =
(Frequentist) risk
b =
R(, )
19
18
Bayes risk
ZZ
b =
r(f, )
h
i
b
b
L(, (x))f
(x, ) dx d = E,X L(, (X))
h
h
ii
h
i
b = E EX| L(, (X)
b
b
r(f, )
= E R(, )
h
h
ii
h
i
b = EX E|X L(, (X)
b
r(f, )
= EX r(b | X)
Linear Regression
Definitions
Response variable Y
Covariate X (aka predictor variable or feature)
18.1
Model
17.2
Yi = 0 + 1 Xi + i
Admissibility
E [i | Xi ] = 0, V [i | Xi ] = 2
Fitted line
b0 dominates b if
b0
b
: R(, ) R(, )
b
: R(, b0 ) < R(, )
b is inadmissible if there is at least one other estimator b0 that dominates
it. Otherwise it is called admissible.
rb(x) = b0 + b1 x
Predicted (fitted) values
Ybi = rb(Xi )
Residuals
i = Yi Ybi = Yi b0 + b1 Xi
17.3
Bayes Rule
rss(b0 , b1 ) =
n
X
2i
i=1
b = inf e r(f, )
e
r(f, )
R
b
b = r(b | x)f (x) dx
(x)
= inf r(b | x) x = r(f, )
Theorems
Squared error loss: posterior mean
Absolute error loss: posterior median
Zero-one loss: posterior mode
17.4
Minimax Rules
Maximum risk
b = sup R(, )
b
)
R(
R(a)
= sup R(, a)
Minimax rule
b = inf R(
e = inf sup R(, )
e
)
sup R(, )
b =c
b = Bayes rule c : R(, )
Least favorable prior
bf = Bayes rule R(, bf ) r(f, bf )
n
b0 = Yn b1 X
Pn
Pn
i=1 Xi Yi nXY
i=1 (Xi Xn )(Yi Yn )
b
Pn
= P
1 =
n
2
2
2
i=1 (Xi Xn )
i=1 Xi nX
h
i
0
E b | X n =
1
P
h
i
2 n1 ni=1 Xi2 X n
V b | X n = 2
1
X n
nsX
r Pn
2
b
i=1 Xi
b b0 ) =
se(
n
sX n
b b1 ) =
se(
sX n
Pn
Pn 2
1
where s2X = n1 i=1 (Xi X n )2 and
b2 = n2
i (unbiased estimate).
i=1
Further properties:
P
P
Consistency: b0 0 and b1 1
20
18.3
Asymptotic normality:
b0 0 D
N (0, 1)
b b0 )
se(
and
b1 1 D
N (0, 1)
b b1 )
se(
Multiple Regression
Y = X +
where
b b1 )
and b1 z/2 se(
b b0 )
b0 z/2 se(
Xn1
Pn b
Pn 2
2
rss
i=1 (Yi Y )
R = Pn
= 1 Pn i=1 i 2 = 1
2
tss
i=1 (Yi Y )
i=1 (Yi Y )
Likelihood
L2 =
1
= ...
Xnk
n
Y
i=1
n
Y
i=1
n
Y
f (Xi , Yi ) =
n
Y
fX (Xi )
n
Y
fY |X (Yi | Xi )
2
1 X
exp 2
Yi (0 1 Xi )
2 i
b = (X T X)1 X T Y
h
i
V b | X n = 2 (X T X)1
Under the assumption of Normality, the least squares estimator is also the mle
but the least squares variance estimator is not the mle.
b N , 2 (X T X)1
1X 2
n i=1 i
rb(x) =
k
X
bj xj
j=1
Prediction
b2 =
N
X
(Yi xTi )2
fX (Xi )
n
n
i=1
fY |X (Yi | Xi ) = L1 L2
i=1
i=1
1
..
=.
i=1
18.2
1
L(, ) = (2 2 )n/2 exp 2 rss
2
L1 =
X1k
..
.
Likelihood
R2
L=
..
.
X11
..
X= .
Pn
2
2
2
i=1 (Xi X )
b
P
n =
b
2j + 1
n i (Xi X)
Yb z/2 bn
b2 =
n
1 X 2
n k i=1 i
= X b Y
mle
b=X
b2 =
nk 2
1 Confidence interval
b bj )
bj z/2 se(
21
18.4
Model Selection
AIC(S) = `n (bS ,
bS2 ) k
Bayesian Information Criterion (BIC)
k
BIC(S) = `n (bS ,
bS2 ) log n
2
Procedure
1. Assign a score to each model
2. Search through all models to find the one with the highest score
Hypothesis testing
m
X
(Ybi (S) Yi )2
i=1
n
n
or
4
2
H0 : j = 0 vs. H1 : j 6= 0 j J
Leave-one-out cross-validation
Mean squared prediction error (mspe)
h
i
mspe = E (Yb (S) Y )2
bCV (S) =
R
n
X
(Yi Yb(i) )2 =
i=1
Prediction risk
R(S) =
n
X
mspei =
i=1
n
X
h
i
E (Ybi (S) Yi )2
n
X
i=1
Yi Ybi (S)
1 Uii (S)
!2
i=1
Training error
btr (S) =
R
n
X
(Ybi (S) Yi )2
19
i=1
19.1
btr (S)
rss(S)
R
R2 (S) = 1
=1
=1
tss
tss
Pn b
2
i=1 (Yi (S) Y )
P
n
2
i=1 (Yi Y )
Adjusted R2
R2 (S) = 1
Density Estimation
Z
R
A
f (x) dx.
Z
2
f (x) fbn (x) dx = J(h) + f 2 (x) dx
Frequentist risk
Z
h
i Z
R(f, fbn ) = E L(f, fbn ) = b2 (x) dx + v(x) dx
n 1 rss
n k tss
Mallows Cp statistic
b
btr (S) + 2kb
R(S)
=R
2 = lack of fit + complexity penalty
h
i
b(x) = E fbn (x) f (x)
h
i
v(x) = V fbn (x)
22
19.1.1
Histograms
KDE
n
x Xi
1X1
K
fbn (x) =
n i=1 h
h
Z
Z
1
1
4
00
2
b
R(f, fn ) (hK )
K 2 (x) dx
(f (x)) dx +
4
nh
Z
Z
2/5 1/5 1/5
c
c2 c3
2
2
h = 1
c
=
,
c
=
K
(x)
dx,
c
=
(f 00 (x))2 dx
1
2
3
K
n1/5
Z
4/5 Z
1/5
c4
5 2 2/5
2
00 2
b
R (f, fn ) = 4/5
K (x) dx
(f ) dx
c4 = (K )
4
n
|
{z
}
Definitions
Number of bins m
1
Binwidth h = m
Bin Bj has j observations
R
Define pbj = j /n and pj = Bj f (u) du
Histogram estimator
fbn (x) =
m
X
pbj
j=1
C(K)
I(x Bj )
Epanechnikov Kernel
pj
E fbn (x) =
h
h
i p (1 p )
j
j
V fbn (x) =
nh2
Z
h2
1
2
R(fbn , f )
(f 0 (u)) du +
12
nh
!1/3
1
6
h = 1/3 R
2 du
n
(f 0 (u))
2/3 Z
1/3
C
3
2
b
0
R (fn , f ) 2/3
(f (u)) du
C=
4
n
h
(
K(x) =
4 5(1x2 /5)
|x| <
otherwise
n
n
n
2Xb
2
1 X X Xi Xj
fbn2 (x) dx
K
+
K(0)
f(i) (Xi )
2
n i=1
hn i=1 j=1
h
nh
K (x) = K
(2)
(x) 2K(x)
(2)
Z
(x) =
K(x y)K(y) dy
19.1.2
2Xb
2
n+1 X 2
fbn2 (x) dx
f(i) (Xi ) =
pb
n i=1
(n 1)h (n 1)h j=1 j
19.2
Non-parametric Regression
Yi = r(xi ) + i
E [i ] = 0
Kernel K
K(x) 0
R
K(x) dx = 1
R
xK(x) dx = 0
R 2
2
x K(x) dx K
>0
V [i ] = 2
k-nearest Neighbor Estimator
rb(x) =
1
k
X
i:xi Nk (x)
Yi
23
20
n
X
Stochastic Processes
Stochastic Process
wi (x)Yi
i=1
wi (x) =
xxi
h
Pn
xxj
K
j=1
h
(
{0, 1, . . . } = Z discrete
T =
[0, )
continuous
{Xt : t T }
[0, 1]
Z
4 Z
2
h4
f 0 (x)
2 2
00
0
R(b
rn , r)
x K (x) dx
r (x) + 2r (x)
dx
4
f (x)
R
Z 2
K 2 (x) dx
+
dx
nhf (x)
c1
h 1/5
n
c2
R (b
rn , r) 4/5
n
Notations Xt , X(t)
State space X
Index set T
20.1
Markov Chains
Markov chain
P [Xn = x | X0 , . . . , Xn1 ] = P [Xn = x | Xn1 ]
n T, x X
n
X
i=1
19.3
n
X
i=1
K(0)
xx
Pn
j
j=1 K
h
Approximation
r(x) =
j j (x)
j=1
J
X
j j (x)
Multivariate regression
where
i = i
..
.
0 (xn )
!2
pij P [Xn+1 = j | Xn = i]
pij (n) P [Xm+n = j | Xm = i]
Transition matrix P (n-step: Pn )
Chapman-Kolmogorov
J (x1 )
..
.
pij (m + n) =
J (xn )
b = (T )1 T Y
1
T Y (for equally spaced observations only)
n
Cross-validation estimate of E [J(h)]
2
n
J
X
X
bCV (J) =
Yi
R
j (xi )bj,(i)
j=1
Pm+n = Pm Pn
i=1
n-step
i pij = 1
j=1
Y = +
0 (x1 )
..
and = .
Transition probabilities
Pn = P P = Pn
Marginal probability
n = (n (1), . . . , n (N ))
where
i (i) = P [Xn = i]
0 , initial distribution
n = 0 Pn
24
20.2
Poisson Processes
Poisson process
{Xt : t [0, )} = number of events up to and including time t
X0 = 0
Independent increments:
(s, t)
Cov [xs , xt ]
=p
(s, t) = p
V [xs ] V [xt ]
(s, s)(t, t)
Cross-covariance function (CCV)
Rt
0
xy (s, t) = p
(s) ds
xy (s, t)
x (s, s)y (t, t)
Backshift operator
>0
B k (xt ) = xtk
Waiting times
Wt := time at which Xt occurs
1
Wt Gamma t,
Difference operator
d = (1 B)d
Interarrival times
White noise
St = Wt+1 Wt
1
St Exp
2
wt wn(0, w
)
St
Wt1
21
Wt
iid
2
Gaussian: wt N 0, w
E [wt ] = 0 t T
V [wt ] = 2 t T
w (s, t) = 0 s 6= t s, t T
Random walk
Time Series
Mean function
xt = E [xt ] =
xft (x) dx
Drift
Pt
xt = t + j=1 wj
E [xt ] = t
Autocovariance function
Symmetric moving average
x (s, t) = E [(xs s )(xt t )] = E [xs xt ] s t
x (t, t) = E (xt t )2 = V [xt ]
mt =
k
X
j=k
aj xtj
where aj = aj 0 and
k
X
j=k
aj = 1
25
21.1
21.2
Estimation of Correlation
Sample mean
Strictly stationary
x
=
1X
xt
n t=1
Sample variance
n
|h|
1 X
1
x (h)
V [
x] =
n
n
k N, tk , ck , h Z
h=n
Weakly stationary
Sample autocovariance function
t Z
E x2t <
2
E xt = m
t Z
x (s, t) = x (s + r, t + r)
r, s, t Z
Autocovariance function
nh
1 X
(xt+h x
)(xt x
)
n t=1
b(h) =
h Z
b(h) =
b(h)
b(0)
bxy (h) =
nh
1 X
(xt+h x
)(yt y)
n t=1
bxy (h)
bxy (h) = p
bx (0)b
y (0)
Properties
xy (h) = p
xy (h)
x (0)y (h)
1
bx (h) = if xt is white noise
n
1
bxy (h) = if xt or yt is white noise
n
21.3
Linear process
j wtj
where
j=
2
(h) = w
|j | <
xt = t + st + wt
j=
X
j=
j+h j
t = trend
st = seasonal component
wt = random noise term
26
21.3.1
Detrending
Least squares
(z) = 1 + 1 z + + q zq
z C q 6= 0
Moving average
1
2k+1 :
k
X
1
xt1
2k + 1
E [xt ] =
i=k
j E [wtj ] = 0
j=0
Pk
1
If 2k+1
i=k wtj 0, a linear trend function t = 0 + 1 t passes
without distortion
Differencing
2
w
0
Pqh
j=0
j j+h
0hq
h>q
MA (1)
xt = wt + wt1
2 2
(1 + )w h = 0
2
(h) = w
h=1
0
h>1
(
h=1
2
(h) = (1+ )
0
h>1
t = 0 + 1 t = xt = 1
21.4
q
X
ARIMA models
Autoregressive polynomial
(z) = 1 1 z p zp
z C p 6= 0
Autoregressive operator
ARMA (p, q)
(B) = 1 1 B p B p
k1
X
j (wtj )
k,||<1
j=0
j (wtj )
{z
linear process
j
j=0 (E [wtj ]) = 0
(h)
(0)
j=0
|
E [xt ] =
2 h
w
12
= h
(h) = (h 1) h = 1, 2, . . .
ARIMA (p, d, q)
d xt = (1 B)d xt is ARMA (p, q)
(B)(1 B)d xt = (B)wt
Exponentially Weighted Moving Average (EWMA)
xt = xt1 + wt wt1
27
xt =
(1 )j1 xtj + wt
when || < 1
j=1
x
n+1 = (1 )xn +
xn
Seasonal ARIMA
Periodic mixture
xt =
q
X
k=1
21.4.1
j=0
wtj = (B)wt
j=0
j=0
(h) = 2 cos(20 h)
2 2i0 h 2 2i0 h
e
+
e
2
2
Z 1/2
=
e2ih dF ()
Xtj = wt
j=0
1/2
Properties
ARMA (p, q) causal roots of (z) lie outside the unit circle
(z)
(z) =
j z =
(z)
j=0
j
0
F () = 2 /2
|z| 1
< 0
< 0
0
ARMA (p, q) invertible roots of (z) lie outside the unit circle
(z)
(z) =
j z j =
(z)
j=0
F () = F (1/2) = 0
F () = F (1/2) = (0)
|z| 1
Spectral density
Behavior of the ACF and PACF for causal and invertible ARMA models
ACF
PACF
21.5
AR (p)
tails off
cuts off after lag p
MA (q)
cuts off after lag q
tails off q
Spectral Analysis
Periodic process
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t)
ARMA (p, q)
tails off
tails off
(h)e2ih
R 1/2
f () =
h=
Needs
h=
1
1
2
2
1/2
e2ih f () d
h = 0, 1, . . .
f () 0
f () = f ()
f () = f (1 )
R 1/2
(0) = V [xt ] = 1/2 f () d
2
White noise: fw () = w
28
22.2
Z 1
(x)(y)
Ordinary: B(x, y) = B(y, x) =
tx1 (1 t)y1 dt =
(x + y)
0
Z x
a1
b1
Incomplete: B(x; a, b) =
t
(1 t)
dt
|(e2i )|2
fx () =
|(e2i )|2
Pp
Pq
where (z) = 1 k=1 k z k and (z) = 1 + k=1 k z k
2
w
Regularized incomplete:
a+b1
(a + b 1)!
B(x; a, b) a,bN X
=
xj (1 x)a+b1j
Ix (a, b) =
B(a, b)
j!(a
+
b
j)!
j=a
n
X
Beta Function
xt e2ij t
I0 (a, b) = 0
I1 (a, b) = 1
Ix (a, b) = 1 I1x (b, a)
i=1
Fourier/Fundamental frequencies
22.3
j = j/n
Series
Finite
Inverse DFT
xt = n
1/2
n1
X
d(j )e
2ij t
j=0
Periodogram
I(j/n) = |d(j/n)|2
Scaled Periodogram
4
I(j/n)
n
!2
n
2X
=
xt cos(2tj/n +
n t=1
2X
xt sin(2tj/n
n t=1
!2
k=
k=1
n
X
n(n + 1)
2
(2k 1) = n2
k=1
n
X
k=1
n
X
ck =
22.1
k=0
n
X
= 2n
k=0
Vandermondes Identity:
r
X
m
n
m+n
=
k
rk
r
k=0
c 6= 1
Math
Binomial
Theorem:
n
X
n nk k
a
b = (a + b)n
k
k=0
Gamma Function
Z
Infinite
ts1 et dt
0
Z
Upper incomplete: (s, x) =
ts1 et dt
Z xx
Lower incomplete: (s, x) =
ts1 et dt
Ordinary: (s) =
cn+1 1
c1
n
X
n
r+k
r+n+1
=
k
n
k=0
n
X k
n+1
=
m
m+1
k2 =
k=0
22
n(n + 1)(2n + 1)
6
k=1
2
n
X
n(n + 1)
k3 =
2
P (j/n) =
Binomial
n
X
( + 1) = ()
>1
(n) = (n 1)!
nN
(0) = (1) =
(1/2) =
(1/2) = 2(1/2)
X
k=0
pk =
1
,
1p
p
|p| < 1
1p
k=1
!
X
d
1
1
k
p
=
=
dp 1 p
(1 p)2
pk =
d
dp
k=0
k=0
X
r+k1 k
x = (1 x)r r N+
k
k=0
X
k
p = (1 + p) |p| < 1 , C
k
kpk1 =
|p| < 1
k=0
29
22.4
Combinatorics
[3] R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications With R
Examples. Springer, 2006.
Sampling
k out of n
w/o replacement
nk =
ordered
k1
Y
(n i) =
i=0
w/ replacement
n!
(n k)!
n
nk
n!
=
=
k
k!
k!(n k)!
unordered
nk
n1+r
n1+r
=
r
n1
(
1
n
=
0
0
1kn
n=0
else
Partitions
Pn+k,k =
n
X
Pn,i
k > n : Pn,k = 0
n 1 : Pn,0 = 0, P0,0 = 1
i=1
|B| = n, |U | = m
f :BU
f arbitrary
mn
B : D, U : D
B : D, U : D
B : D, U : D
f injective
(
mn m n
0
else
m+n1
n
m
X
n
k=1
B : D, U : D
D = distinguishable, D = indistinguishable.
m
X
k=1
m
n
m!
n
m
n1
m1
1
0
mn
else
n
m
1
0
mn
else
Pn,m
k
Pn,k
f surjective
f bijective
(
n! m = n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
References
[1] P. G. Hoel, S. C. Port, and C. J. Stone. Introduction to Probability Theory. Brooks Cole,
1972.
[2] L. M. Leemis and J. T. McQueston. Univariate Distribution Relationships. The American
Statistician, 62(1):4553, 2008.
30
31