ACTL2002/ACTL5101 Probability and Statistics
c Katja Ignatieva
School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 5 Video Lecture Notes
Week 2
Week 3
Week 4
Probability: Week 1
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Special Sampling Distributions: chisquared distribution
Chisquared distribution: one degree of freedom
Special sampling distributions & sample mean and variance
Special Sampling Distributions: chisquared distribution
Chisquared distribution: one degree of freedom
Chisquared distribution: n degrees of freedom
Special Sampling Distributions: studentt distribution
Jacobian technique and William Gosset (tdistribution)
Special Sampling Distributions: Snecdor’s F distribution
Jacobian technique and Snecdor’s F distribution
Distribution of sample mean/variance
Background
Fundamental sampling distributions
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Special Sampling Distributions: chisquared distribution
Chisquared distribution: one degree of freedom
Chisquared distribution: one degree of freedom
Sampling from a normal distribution; independent and
identically distributed (i.i.d.) random values.
Suppose Z ∼ N (0, 1), then
Y = Z 2 ∼ χ2 (1)
has a chisquared distribution with one degree of freedom.
Distribution characteristics:
1
fY (y ) = √
· exp(−y /2);
2πy
√
√
√
FY (y ) =FZ ( y ) − FZ (− y ) = 2 · FZ ( y ) − 1;
E[Y ] =E Z 2 = 1;
2
Var (Y ) =E Y 2 − (E [Y ])2 = E Z 4 − E Z 2
= 3 − 1 = 2.
802/827
Prove: see next slides.
. = 2· 2π 2 0 √ * using change of variable z = w . with Z a standard normal r.d. Proof: using the CDF technique (seen last week). Consider: FY (y ) = Pr Z 2 ≤ y √ √ = Pr (− y ≤ Z ≤ y ) Z √y 1 2 1 √ · e − 2 z dz = √ 2π − y Z √y 1 2 1 √ · e − 2 z dz = 2· 2π Z0 y 1 1 1 ∗ √ · · w −1/2 · e − 2 w dw .). so that dz = 12 · w −1/2 dw .v.f. . 803/827 Proof continues on next slide.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Prove that Z 2 has a chisquared distributed with one degree of freedom (using p.
).f. .d. 12 1/2 1/2 ⇒ MY (t) = 1/2−t = (1 − 2 · t)−1/2 . gives: 1 ∂FY (y ) ∗∗ 1 = fY (y ) = √ · y −1/2 · e − 2 y ∂y 2π 1 · y (1−2)/2 · e −y /2 . 2π Differentiating to get the p.). dist Note: Yi ∼ χ2 (1) = Gamma 21 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Proof (cont. which is the density of a χ2 (1) distributed random variables (see F&T pages 164169 for tabulated values of c.f. = 1/2 2 · Γ 21 ** using differentiation of integral: 804/827 ∂ Rb a f (x)dx ∂b = f (b). Z FY (y ) = 0 y 1 1 √ · w −1/2 · e − 2 w dw .d.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: chisquared distribution Chisquared distribution: n degrees of freedom Special sampling distributions & sample mean and variance Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Chisquared distribution: n degrees of freedom Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Distribution of sample mean/variance Background Fundamental sampling distributions .
n i. . i = 1. . . then X = ni=1 Zi2 . . . has a Chisquared distribution with n d. n be i. Prove: * use i = 1. . t < 1/2. 2. Distribution properties: 1 · x (n−2)/2 · e −x/2 . Parameter constraints: n = 1.f.i. . . Yi ∼ χ2 (1). fX (x) = n/2 2 · Γ (n/2) and zero otherwise. N(0.d. . .d. . if x > 0.i. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: chisquared distribution Chisquared distribution: n degrees of freedom Chisquared distribution: n degrees of freedom P Let Zi .1).: X ∼ χ2 (n). " n # X ∗ E[X ] = E Yi = n · E [Yi ] = n i=1 n X Var (X ) = Var ! Yi ∗ = n · Var (Yi ) = 2 · n i=1 MX (t) = 805/827 MPn i=1 Yi (t) ∗ = MYn i (t) = (1 − 2 · t)−n/2 .
+ Yn ∼ Gamma . of Y : 1 · y −1/2 · e y /2 fY (y ) = √ 2 · Γ(1/2) Recall X ∼ Gamma (n. . Yn ∼ χ2 (1) .f. 806/827 See F&T pages 164169 for tabulated values of c. 2 2 since the sum of i.nλ) is also a Gamma random variable but with Gamma( i=1 αi . = χ (n) . . .f.d. with p.d.f. n 1 dist 2 Y1 + Y2 + . . Γ (n) if x ≥ 0 and zero otherwise.d.i. λ) (see lecture week 2). For independent Y1 .d. λ). .: fX (x) = λn · x n−1 · e −λ·x . . Gamma random variables Gamma(α Pi .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: chisquared distribution Chisquared distribution: n degrees of freedom Alternative proof: Recall the p. Y2 . .
4 0.7 FX(x) fX(x) 0.2 0.5 0.f.3 0.5 0.4 0.1 0.8 0.d.d.9 0. 1 n=1 n=2 n=3 n=5 n=10 n=25 0.1 0 0 807/827 10 20 x 30 0 0 10 20 x 30 .f.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: chisquared distribution Chisquared distribution: n degrees of freedom Chisquared probability/cumulative density function 2 2 χ p.3 0.6 0. χ c.2 0.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Special sampling distributions & sample mean and variance Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Chisquared distribution: n degrees of freedom Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Distribution of sample mean/variance Background Fundamental sampling distributions .
r i. 4. Gosset was a statistician employed by the Guinness brewing company.d. tDistributions discovered by William Gosset in 1908. Then. 1) and V ∼ χ2 (r ) = r P Zi2 . . V are independent. .i. the random variable: Z T =p V /r has a tdistribution with r degrees of freedom. . and Z . where k=1 Zi .30 in W+(7ed)). Suppose Z ∼ N (0.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Jacobian technique and William Gosset As an illustration of the Jacobian transformation technique. consider deriving the tdistribution (see exercises 4.112 and 7. 808/827 .111. i = 1. .
x2 ) and u2 = g2 (x1 . u2 ). x2 ). x2 ): 1. 809/827 . Determine h (u1 . Find the absolute value of the Jacobian of the transformation. Multiply that with the joint density of X1 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Jacobian transformation technique procedure Recall the procedure to find joint density of u1 = g1 (x1 . 2. Find u1 = g1 (x1 . u2 ). x2 ) and u2 = g2 (x1 . h2 (u1 . X2 evaluated in h1 (u1 . 3. u2 ). 4. u2 ) = g −1 (u1 .
t = g2 (z. 1.’s: fV (v ) = fZ (z) = v r /2−1 · e −v /2 . Define the variables: s = g1 (z. 2π if 0 ≤ v < ∞. 810/827 . 2r /2 ·Γ(r /2) 1 2 √1 · e − 2 ·z .d. v ) = v and z . t) = s and z = h2 (s.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Proof: Note p. So that this forms a onetoone transformation with inverse: p v = h1 (s. if −∞ < z < ∞.f. v ) = p v /r 2. t) = t · s/r .
. t) 2 2 ∂s ∂t 1 0 p = s /r = det p 1 −1/2 /√r s /r 2 ·t ·s Note that the support is: 0 <v < ∞ 0 <s< ∞ 811/827 and and −∞ < z < ∞. −∞ < t < ∞. t) = det ∂h (s. t) ∂h (s. t) ∂h1 (s. The Jacobian is: ∂h1 (s. t) ∂s ∂t J (s.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) 3.
0 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Since Z and V are independent. t) ds 812/827 (continues on next slide). their joint density can be written as: fZ .V (z. t) = s /r · √ e · s r /2−1 e −s/2 Γ (r /2) 2r /2 2π 1 1 s t2 (r +1)/2−1 √ √ = ·s · · exp − 1+ 2 r r 2πΓ (r /2) 2r /2 5. the joint density of (S.T (s. Using the Jacobian transformation formula above. the marginal density of T is given by: Z ∞ fT (t) = fS. Therefore.T (s. Γ (r /2) · 2r /2 2π 4. v ) =fZ (z) · fV (v ) 1 2 1 1 = √ · e− 2 z · · v r /2−1 · e −v /2 . T ) is given by: 2 √ p 1 − 21 t s/r 1 fS.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Making the transformation: s t2 w= 1+ 2 r ⇔ s= 2w . So that we have: Z ∞ √ fT (t) = 0 Z = 0 813/827 ∞ 1 1 s t2 · s (r +1)/2−1 · √ · exp − · 1 + ds 2 r r 2πΓ (r /2) 2r /2 1 √ · 2πΓ (r /2) 2r /2 2w 2 1 + tr ! (r +1) 2 −1 1 · √ · exp(−w ) · r 2 2 1 + tr ! dw . . 1 + t 2 /r so that: 1 dw = 2 t2 1+ ds r ⇔ ds = 2 1 + t 2 /r dw .
d.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Simplifying: ∞ Z √ fT (t) = 0 1 2πr · Γ (r /2) · 2r /2 2 1 + t 2 /r (r +1)/2−1 2 1 + t 2 /r × w (r +1)/2−1 e −w dw =√ 1 πr · Γ (r /2) · 2(r +1)/2 Γ ((r + 1) /2) 1 =√ · Γ (r /2) πr ∗ * using Gamma function: 1 1 + t 2 /r R∞ 0 2 1 + t 2 /r (r +1)/2 Z ∞ w (r +1)/2−1 e −w dw 0 (r +1)/2 . 814/827 . x α−1 · exp(−x)dx = Γ(α). This is the standard form of t−distribution (see F&T page 163 for tabulated values of c.).f. for − ∞ < t < ∞.
15 0.d.4 1 r=1 r=2 r=3 r=5 r=10 r=25 0.8 0.5 0.6 F (x) f (x) 0.1 0 x 5 0 −5 0 x 5 .35 0.f.d.25 X X 0.05 0 −5 815/827 0. 0.3 0. Student−t c.f.3 0.2 0.1 0.4 0.2 0.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Studentt probability/cumulative density function Student−t p.9 0.7 0.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Special sampling distributions & sample mean and variance Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Chisquared distribution: n degrees of freedom Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Distribution of sample mean/variance Background Fundamental sampling distributions .
the random variable: F = U /n1 V /n2 has a F distribution with n1 and n2 degrees of freedom.f. Inverse transformation: v = g and u = f · g · 816/827 n1 n2 . 1. g = v. Prove: Use Jacobian technique. See F&T pages 170174 for tabulated values of c. 2. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Snecdor’s F distribution Suppose U ∼ χ2 (n1 ) and V ∼ χ2 (n2 ) are two independent chisquared distributed random variables. Define variables: f = u/n1 v /n2 . Then.d.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Special Sampling Distributions: Snecdor’s F distribution
Jacobian technique and Snecdor’s F distribution
Snecdor’s F distribution
3. Jacobian of the transformation:
0
∂v /∂f ∂v /∂g
J(f , g ) = det
= det
g · nn21
∂u/∂f ∂u/∂g
Absolute value of the Jacobian: J(f , g ) = g ·
1
f · nn12
= −g ·
n1
n2 .
4. Multiply the absolute value of the Jacobian by the joint
density (joint density, using independence:
fU,V (u, v ) = fU (u) · fV (v )):
fU,V (u, v ) =fU (u) · fV (v )
(n2 −2)
u
v
v 2
= n /2
· exp −
· n /2
· exp −
2
2
2 1 · Γ(n1 /2)
2 2 · Γ(n2 /2)
u
817/827
(n1 −2)
2
Continues on the next slide.
n1
.
n2
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Special Sampling Distributions: Snecdor’s F distribution
Jacobian technique and Snecdor’s F distribution
Snecdor’s F distribution
(Cont.) Joint density F and G (using u = f · g · nn12 and
v = g ):
(n1 −2)
!
2
f ·n1 ·g
(n2 −2)
f ·n1 ·g
g
n2
g 2
n1 · g
n2
· n /2
·
fF ,G (f , g ) =
·
exp
−
·
exp
−
n2
2
2
2 1 · Γ n21
2n2 /2 · Γ n22
5. The marginal of F is obtained by integrating over all possible
values of G :
Z ∞
fF (f ) =
fF ,G (f , g )dg
0
Z ∞
1
fn1
(n1 +n2 −2)/2
=func(f ) ·
g
· exp −g
+
dg
2 2n2
0
where func(f ) =
818/827
n1
(f · n1 )(n1 −2)/2
·
2n2 /2 · Γ(n2 /2) nn1 /2 · 2n1 /2 · Γ(n1 /2)
2
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Special Sampling Distributions: Snecdor’s F distribution
Jacobian technique and Snecdor’s F distribution
Continues:
∗
∗∗
fF (f ) =func(f ) ·
=func(f ) ·
2 · n2
n2 + f · n1
(n1 +n2 −2)/2+1 Z
·
2 · n2
n2 + f · n1
(n1 +n2 )/2
∞
x (n1 +n2 −2)/2 · exp (−x) dx
0
· Γ ((n1 + n2 )/2)
Γ ((n1 + n2 )/2)
f n1 /2−1
·
Γ (n1 /2) · Γ (n2 /2) (n2 + f · n1 )(n1 +n2 )/2
·n1
2
* using transformation x = g · 12 + f2·n
, thus g = n22·n
+f ·n1 · x and
2
−1
+f ·n1
+f ·n1
dx = n22·n
dg , thus dg = n22·n
dx.
2
R 2∞ α−1
** using Gamma function: Γ(α) = 0 x
· exp(−x)dx.
∗∗∗ n1 /2
= n1
n /2
· n2 2
·
*** using func(f ) =
819/827
n1 ·(f ·n1 )(n1 −2)/2
n /2
(n
+n
)/2
2 1 2 ·n21 ·Γ(n2 /2)·Γ(n1 /2)
1 0.f.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Snecdor’s F probability density function Snecdor‘s F p.3 0.8 n1=2.6 FX(x) 0.f.3 0. n2=10 0.2 0.5 0. 1 Snecdor‘s F c.9 0. 1 n =2. n2=10 0. n =4 2 n1=10.8 0. n2=2 n1=10.d.1 0 0 820/827 5 x 10 0 0 5 x 10 .4 0.6 fX(x) 2 n =2.9 0. n =2 1 0.d.4 0.7 1 0.5 0. n2=6 0.2 0.7 n1=2.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Background Special sampling distributions & sample mean and variance Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Chisquared distribution: n degrees of freedom Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Distribution of sample mean/variance Background Fundamental sampling distributions .
Xn are n independent r. . . Suppose X1 . . . all with the same distribution and independent. with identical distribution. xn ) are random variables.v. X2 . These outcomes (x1 . .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Background Properties of the sample mean and sample variance Suppose you select randomly from a sample. from a large population size. alternatively. Assume selected with replacement or. . . Define the sample mean by: n 1 X X = · Xk . n k=1 821/827 and recall the sample variance by: n X 2 1 S2 = · Xk − X . . n−1 k=1 .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Fundamental sampling distributions Special sampling distributions & sample mean and variance Special Sampling Distributions: chisquared distribution Chisquared distribution: one degree of freedom Chisquared distribution: n degrees of freedom Special Sampling Distributions: studentt distribution Jacobian technique and William Gosset (tdistribution) Special Sampling Distributions: Snecdor’s F distribution Jacobian technique and Snecdor’s F distribution Distribution of sample mean/variance Background Fundamental sampling distributions .
i. (n − 1) · S 2 ∼ χ2 (n − 1): sample variance using population σ2 variance. normal samples. n σ : sample mean using known population variance.e.d. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Fundamental sampling distributions Fundamental sampling distributions Sampling distributions for i. .X and S 2 are independent (proof given in Exercise 13.. σ 2 ). Xi ∼ N(µ.93 of W+(7ed)).X ∼ N µ. In the next slides we will prove the following important properties: 1 2 .i. 822/827 .T =  X −µ S √ n ∼ tn−1 : sample mean using sample variance.
i=1 Use MGFtechnique to find the distribution of X : t 1 2 t 2 n n n MX (t) =M P (t) = MXi (t/n) = exp µ · + · σ · Xi /n n 2 n i=1 1 σ2 2 ·t = exp µ · t + · 2 n 823/827 which is the m.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Fundamental sampling distributions Distribution of sample mean (known σ 2 ) Prove that the distribution of the sample mean given known variance is N(µ. .g.f. n P Xi We defined the sample mean by: X = n . We have X1 .i. . Xn are i. . normally distributed variables. . . σ 2 /n). of a normal distribution with mean µ and variance σ 2 /n.d.
where Z ∼ N(0. * Using (n − 1) · S 2 /σ 2 ∼ χ2n−1 (prove: see next slides).v.. 824/827 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Fundamental sampling distributions Distribution of sample mean (unknown σ 2 ) The distribution of the sample mean given unknown (population) variance is given by: X −µ S √ n ∼ tn−1 Proof: X −µ X −µ S √ n σ √ ∗ n =q 2 ∼q S σ2 Z χ2n−1 n−1 ∼ tn−1 . 1) is a standard normal r.
825/827 .v.. . standard normal r. i = 1. . . 1).d. σ2 First note that: (n − 1) · S 2 = σ2 Pn i=1 Xi − X σ2 2 and second note that: Pn n n 2 X Xi − µ 2 X 2 i=1 (Xi − µ) = = Zi ∼ χ2n . n i.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes Distribution of sample mean/variance Fundamental sampling distributions Distribution of sample variance Prove that the distribution of the sample variance is given by: (n − 1) · S 2 ∼ χ2n−1 .i. σ2 σ i=1 i=1 where Zi ∼ N(0. .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Distribution of sample mean/variance
Fundamental sampling distributions
We have:
Pn

i=1 (Xi
σ2
− µ)2
{z
}
Pn
i=1
Zi2 ∼χ2n
=
∗
=
Pn
(Xi − X ) + (X − µ)
σ2
Pn
Xi − X
σ2
2
Xi − X
σ2
2
i=1
i=1
Pn
=
i=1
2
Pn
X −µ
σ2
!2
X −µ
.
σ
i=1
+
+
√

2
n
{z
Z 2 ∼χ21
}
Hence, the first term on right is χ2n−1 (using gamma sum
property/MGFtechnique).
Xn
* Using 2 · (X − µ) ·
(Xi − X ) = 0.
}
 i=1 {z
826/827
=0
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes
Distribution of sample mean/variance
Fundamental sampling distributions
Fundamental sampling distributions
We have now proven the following important properties:
 X ∼ N µ, n1 σ 2
 T =

X −µ
S
√
n
∼ tn−1
(n − 1) · S 2
∼ χ2 (n − 1)
σ2
We will use this for:
 confidence intervals for population mean and variance;
 testing population mean and variance;
 parameter uncertainty of a linear regression model.
Notice, when applying CLT, we do not need that Xi are
normally distributed anymore.
827/827
ACTL2002/ACTL5101 Probability and Statistics: Week 5
ACTL2002/ACTL5101 Probability and Statistics
c Katja Ignatieva
School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 5
Week 2
Week 3
Week 4
Probability:
Review
Estimation: Week 6
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
ACTL2002/ACTL5101 Probability and Statistics: Week 5 1001/1074 .
Dependence of multivariate distributions Functions of random variables 1002/1074 . variance (standard deviation). Moments: (non)central moments.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Last four weeks Introduction to probability. Joint distributions. skewness & kurtosis. mean. Special univariate distribution (discrete & continue).
.Maximum Likelihood method. Application (important theorems): .Central limit theorem. . Convergence (almost surely. & distribution). 1003/1074 . probability.ACTL2002/ACTL5101 Probability and Statistics: Week 5 This week Parameter estimation: .Bayesian estimator.Law of large numbers.Method of Moments. .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Parameter estimation Definition of an estimator Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
We observe values of the random sample X1 . .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Parameter estimation Definition of an estimator Definition of an Estimator Problem of statistical estimation: a population has some characteristics that can be described by a r. . X2 . X with density fX (· θ ). . We then estimate the parameter (or some function of the parameter) based on this random sample. . Denote this observed sample values by x1 . Xn from the population fX (· θ ). xn . . 1004/1074 . . . x2 . .v. Density has unknown parameter (or set of parameters) θ.
. . . T (X1 . . . . . Note θ can be a vector. For example: 1 Pn Xj . is called an estimator of τ (θ). xn . 1005/1074 . . point estimate. . then the estimator is a set of equations. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Parameter estimation Definition of an estimator Definition of an Estimator Any statistic. Xn ) = X n = n j=1 b θ = 0. . that is a function of observable random variables and whose values are used to estimate τ (θ). A value θb of the statistic evaluated at the observed sample values by x1 .23. i. will be called an (point) estimate. X2 . where τ (·) is some function of the parameter θ. .e. Xn ).. estimator. x2 . a function T (X1 . X2 .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments The method of moments Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments The method of moments The Method of Moments Example of estimator: Method of Moments (MME). 2. . . 3. Equate (the first) k sample moments to the corresponding k population moments. θk ]> . . The method of moments estimator τ (θ) procedure is: 1. Solve the resulting system of simultaneous equations. . . X2 . . say θ = [θ1 . . . Xn be a random sample from the population with density fX (·θ) which we will assume has k number of parameters. The method of moment point estimates (b θ) are the estimate values of the estimator corresponding to the data set. Equate the k population moments to the parameters of the distribution. 1006/1074 . Let X1 . θ2 .
. θk ) is given by: mj = µj (θ1 . . 2. . θk ) = E X k . . . . . . µk (θ1 . . . θk ) . . mk = · xj . . θ2 . . m2 = · xj . . . . . . . for j = 1. k. . h i . . 1007/1074 b Solving this provides us the point estimate θ. n n n j=1 j=1 j=1 and the population moments by: µ1 (θ1 . . . . θk ) = E [X ] .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments The method of moments The Method of Moments Denote the sample moments by: m1 = n n n 1 X 2 1 X k 1 X · xj . . . θ2 . θk ) = E X 2 . . θ2 . . θ2 . . µ2 (θ1 . θ2 . . . . The system of equations to solve for (θ1 .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments Example & exercise Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
Solution: Equate population moment to sample moment: n 1 X E[X ] = · xj = x. . . 3.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments Example & exercise Example: MME & Binomial distribution Suppose X1 . X2 . 1. Then the method of moments estimator is (i. p . with known parameter n. solving it): x =n·p 1008/1074 ⇒ b = x/n. p) distribution. Question: Use the method of moments to find point estimators of θ = p. Xn is a random sample from Bin (n.e. n j=1 2. . Equate population moment to the parameter (use week 2): E[X ] = n · p.. .
Xn is a random sample from N µ. Solution: Equate population moment to sample moment: n 1 X = · xj = x E [X ]  {z } n j=1 population moment  {z } E X2  {z } population moment = sample moment n 1 X 2 · xj . Question: Use the method of moments to find point estimators of µ and σ 2 . σ 2 distribution. . n j=1  {z } sample moment 1009/1074 . . . .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments Example & exercise Exercise: MME & Normal distribution Suppose X1 . 1. X2 .
The method of moments estimators are: µ b =E [X ] = x σ b2 =E X 2 − (E [X ])2 n n 1X 1X 2 ∗ n−1 2 xj − x 2 = (xj − x)2 = s .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator I: the method of moments Example & exercise Exercise: MME & Normal distribution 2. P 1010/1074 j=1 . = n n n j=1 2 n (xj −x ) * using s 2 = j=1n−1 is the sample variance. more on this next week. 3. 2 2 Note: E σ b 6= σ (biased estimator). Equate population moment to the parameters (use week 2): E[X ] = µ and E[X 2 ] = Var (X ) + E[X ]2 = σ 2 + µ2 .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
x2 . . x2 . . xn are drawn from a population with a parameter θ (where θ could be a vector of parameters). . x2 .. . . . .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation Maximum Likelihood function Another example (mostly used) of an estimator is the maximum likelihood estimator. x1 . Xn .X2 ... we need to define the likelihood function.X2 . .. xn ) = fX1 . where fX1 .. .. . . If x1 . . .Xn (x1 .. xn ) . xn ) is the joint probability density of the random variables X1 . . First.. . X2 . . then the likelihood function is given by: L (θ. x2 . .Xn (x1 . . . . 1011/1074 .
j=1 1012/1074 which is just the product of the densities evaluated at each of the observations in the random sample. .d.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation Maximum Likelihood Estimation Let L (θ) = L (θ. X2 . . . . The set of parameters θb = θb (x1 . Xn is a random sample from fX (xθ). xn ) be the likelihood function for X1 . X2 . When X1 . x1 . . . . . Xn . . . . The random variable θb (X1 . . xn ) = n Y fX (xj θ) . .i. . property): L (θ. . . . x1 . . . . . x2 . . Xn ) is called the maximum likelihood estimator. x2 . . X2 . . then the likelihood function is (using i. xn ) (note: function of observed values) that maximizes L (θ) is the maximum likelihood estimate of θ. x2 . .
1013/1074 . you should usually check for the second derivative (or Hessian) conditions and boundary conditions for a global maximum. the point where the likelihood is a maximum is a solution of the k equations: ∂L (θ1 . . θ2 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation Maximum Likelihood Estimation If the likelihood function contains k parameters so that: L (θ1 . θ) · . ∂θk Normally. . x) = 0. but to ensure. x) = 0. . ∂θ1 ∂L (θ. then (under certain regularity conditions). . θ) .. . . . x) = 0. ∂θ2 . θk . . . ∂L (θ. θ2 . the solutions to this system of equations give the global maximum.. · fX (xn . .. x) = fX (x1 θ) · fX (x2 . θk .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation Maximum Likelihood Estimation Consider the case of estimating two variables. 2 ∂ L ∂θ22 . say θ1 and θ2 . Define the gradient vector: ∂L ∂θ1 D (L) = ∂L ∂θ2 and define the Hessian matrix: ∂2L ∂θ2 1 H (L) = ∂2L ∂θ1 ∂θ2 1014/1074 ∂2L ∂θ1 ∂θ2 .
1015/1074 . but also H should be negative definite which means: ∂2L ∂2L 2 ∂θ1 ∂θ2 h1 ∂θ1 h1 h2 h2 < 0. h2 ] 6= 0. ∂2L ∂2L ∂θ1 ∂θ2 ∂θ22 for all [h1 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation Maximum Likelihood Estimation From calculus we know that the maximum choice θ1 and θ2 should satisfy not only: D (L) = 0.
maximizing the loglikelihood function is easier. x)) n Y = log fX (xj θ) j=1 ∗ n X = log (fX (xj θ)) . . . we define the loglikelihood function as: ` (θ1 . . j=1 * using log(a · b) = log(a) + log(b). . θ2 . . because taking the log is a monotonic increasing function.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation LogLikelihood function Generally. x) = log (L (θ1 . θ2 . θk . . θk . . . 1016/1074 Maximizing the loglikelihood function gives the same parameter estimates as maximizing the likelihood function. . Not surprisingly.
. Determine the loglikelihood function ` (θ1 . 1017/1074 . θ2 . . 2. . θ2 . x)). . Equate the derivatives of ` (θ1 . 3. . . . θ2 . . θk . θk . . . . θk to zero (⇒ global/local minimum/maximum). Check wether second derivative is negative (maximum) and boundary conditions.r. . θ2 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Maximum likelihood estimation MLE procedure The general procedure to find the ML estimator is: 1. . θ2 . . Determine the likelihood function L (θ1 . x) = log (L (θ1 .t. 4. . . θ1 . . . . x) w. θk . x). . θk .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
. x) = x1 ! x2 ! xn ! j=1 x1 λ λx2 λxn =e −λ·n · · . 1018/1074 j=1 j=1 .. So that taking the log of both sides.i.. ` (λ. we get: n n X X xk − log (xk !) .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MLE and Poisson 1. X2 . · . .d. using directly the loglikelihood function: n n X X ` (λ. x1 ! x2 ! xn ! 2. . equivalently.. Xn are i.. · fX (xj θ) = L (λ. x) = −λ · n + log (λ) · k=1 k=1 Or. Suppose X1 . x) = log (fX (xj θ)) = −λ + xk · log (λ) − log (xk !) . and Poisson(λ). . The likelihood function is given by: −λ x1 −λ x2 −λ xn n Y e λ e λ e λ · · .
Taking the first order condition (FOC) with respect to λ we have: n ∂ 1X ` (λ) = 0 ⇒ −n + xk = 0. ∂λ λ k=1 This gives the maximum likelihood estimate (MLE): n X b= 1 xk = x. . 3. Check for second derivative condition to ensure global maximum. λ n k=1 which equals the sample mean. 1019/1074 4.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MLE and Poisson Now we need to maximize this loglikelihood function with respect to the parameter λ.
X2 . . σ. . Thus the likelihood function is given by: n Y 1 1 √ L (µ.d. is given by: 1 1 · exp − · fX (x) = √ 2 2π · σ x −µ σ 2 ! . σ 2 where both parameters are unknown. . 1020/1074 xk − µ σ 2 ! .f. . . The p.d.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Exercise: MLE and Normal Suppose X1 . 1. x) = exp − 2 2πσ k=1 Question: Find the MLE of µ and σ 2 . and Normal µ.i. Xn are i.
5 log(b). with a = σ √ and log(1/ b) = log(b −0. σ. x) = i=1 ∗ 1 1 √ · exp − · 2 2π · σ =−n · log(σ) − xk − µ σ 2 !! n n 1 X (xk − µ)2 . with b = 2π. Solution: Its loglikelihood function is: n X log ` (µ.r. µ and σ and set that equal to zero.5 ) = −0.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Exercise: MLE and Normal 2. Take the derivative w.t. · log(2π) − 2 · 2 2σ k=1 * using log(1/a) = log(a−1 ) = − log(a). 1021/1074 .
we obtain: n ∂ 1 X ` (µ. x) = 2 (xk − µ) = 0 ∂µ σ ⇒ k=1 n X xk − nµ = 0 k=1 ⇒µ b=x Pn (xk − µ) ∂ −n ` (µ.8 of W+(7ed) for further details. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise 3. x) = + k=1 3 =0 ∂σ σ σ n P (xk − µ) k=1 ⇒n= σ2 n X 1 ⇒σ b2 = (xk − x)2 . n k=1 1022/1074 See §9. σ./4. σ. Then.7 and §9.
using MLE yields one parameter estimate in closedform solution. To find the MLE one should do the following: numerically estimate the estimates (!) by solving a nonlinear equation.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MME & MLE and Gamma You may not always obtain closedform solutions for the parameter estimates with the maximum likelihood method. As we will see in the next slides.g. This can be done by employing an iterative numerical approximation (e. not so for the second parameter. Application: Surrender mortgages. An example of such problem when estimating the parameters using MLE is the Gamma distribution. 1023/1074 . see Excel. NewtonRalphson).
Then use it as the starting value. . Xn i.i.d. X2 . Question: Consider X1 . tX λ α MX (t) = E e = λ−t .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MME & MLE and Gamma In such cases an initial value may be needed so that other means of estimating first may be used. and Gamma(λ. λ2 1. λα Γ(α) · x α−1 · e −λ·x . fX (x) = E [X r ] = Var (X ) = Γ(α+r ) λr Γ(α) α . . α) find the MME of the Gamma distribution. . . such as using the method of moments. Solution: Equate sample moments to population moments: .
.
(1) µ1 = MX (t).
t=0 1024/1074 = E [X ] = x and .
.
(2) µ2 = MX (t).
n i=1 . t=0 n X xi2 = E X2 = .
= · λ λ2 λ λ λ 3.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MME & MLE and Gamma 2. Therefore. λ 2 σ b σ b using (step 1. µ2 −µ21 So that estimators are: x2 b=x and α b = 2.) µ1 = x and n 2 n 2 P P xi xi 2 = 2 µ2 = ⇒ µ − µ b2 2 1 n n −x =σ 1025/1074 i=1 i=1 . Equate population moments to the parameters: α α · (α + 1) α+1 1 α µ1 = and µ2 = = µ1 · µ1 + . the method of moments estimates are given by: µ2 µ1 = µ1 + α = µ1 · λ 1 λ ⇒λ = ⇒α = µ1 µ2 −µ21 µ21 .
i=1 1026/1074 i=1 . X1 . . x) = n Y i=1 1 · λα · xiα−1 · e −λ·xi . . Γ (α) 2. α.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MME & MLE and Gamma Question: Find the MLestimates. and Gamma(λ. 1.d. α. . Solution: Now. .i. Xn are i. α) so likelihood function is: L (λ. X2 . x) = − n · log (Γ (α)) + n · α · log(λ) n n X X + (α − 1) · log(xi ) − λ · xi . The loglikelihood function is then: ` (λ.
α. x) = − ∂λ λ n X xi = 0. λ n P xi i=1 but need numerical (iterative) techniques for solving the first equation. 1027/1074 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MME & MLE and Gamma 3. x) = − n · ∂α + n · log(λ) + log(xi ) = 0 ∂α Γ (α) i=1 ∂ n·α ` (λ. α. Maximizing this: n ∂Γ(α) X ∂ ` (λ. i=1 Easy to solve for second equation: b b = n·α .
. . X2 . The likelihood function can be expressed as: n Y n 1 L (θ. Question: How to find the maximum of this Likelihood function? 1028/1074 . θ]. U [0. Xn are i.d. θ k=1 where I{0≤xk ≤θ} is an indicator function taking 1 if x ∈ [0.i. . . for θ 0 ≤ x ≤ θ. x) = · I{0≤xk ≤θ} . fX (x) = . Here the range of x depends on the parameter θ. i. θ] and zero otherwise.. and zero otherwise.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MLE and Uniform 1 Suppose X1 .e.
. xn } = x(n) . . . x2 . i.(1/θ)n is a decreasing function in θ.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Example & exercise Example: MLE and Uniform 14 12 10 Solution: Nonlinearity in the indicator function ⇒ cannot use calculus to maximize this function. x) 8 6 Qn k=1 I{0≤xk ≤θ} can only take value 0 and 1. .: 2 0 x(1) x(4) x(3) 1029/1074 x(2) θ θb = max {x1 . 4 .e. You can maximize it by looking at its properties: L(θ. function is maximized for the lowest value of θ for which Qn I k=1 {0≤xk ≤θ} = 1 i. . setting FOC equal to zero. .e.Hence. Note: it will take the value 0 if θ < x(n) and 1 else! .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Sampling distribution and the bootstrap Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
b for each of these k samples using MLE. 1030/1074 . How to obtain their sampling distribution? Use bootstrapping. However. we have no closed solution for MLE estimates. but in the whole distribution of the MLE estimate (parameter uncertainty!). Step 3: The empirical joint cumulative distribution function of these k parameter estimates is an approximation to sample distribution of the MLE estimates. b α Step 1: Generate k samples from Gamma(λ. b α Step 2: Estimate λ. Quantification of risk: produce histograms of estimates.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Sampling distribution and the bootstrap Sampling distribution and the bootstrap We might not only be interested in the point estimate. b).
4 0.6 0.6 0.1 0.8 0.8 0.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator II: maximum likelihood estimator Sampling distribution and the bootstrap Sampling distribution and bootstrap.2 0.4 .2 0 1 1031/1074 2nd Fλ(λ) Fα(α) 1st time 1 Approximation sample distr of λ 2 α 3 0 0.2 λ 0. see Excel Approximation sample distr of α 3rd 4th 5th 1 0.3 0. k = 250.4 0.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Introduction Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
prior belief about a distribution.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Introduction Introduction We have seen: I I Method of moment estimator: Idea: first k moments of the estimated special distribution and sample are the same. a priori. . 1032/1074 Then you observe data ⇒ more information about the distribution. Warning: Bayesian estimation is hard to understand. Maximum likelihood estimator: Idea: Probability of sample given a class of distribution is the highest with this set of parameters. Partly due to nonstandard notation in Bayesian estimates. Pure Bayesian interpretation: Suppose you have.
The probability of insured having a car accident depends on adverse selection. . b) the distribution of the risk among individuals (i. in case of time varying parameters: 1033/1074 .Posterior: revised estimated claim distribution. let θ ∈ Θ.Use this for estimating the parameter ⇒ what is our prior for θ? This is called empirical Bayes.A new insurer does not know the amount of adverse selection in his pool. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Introduction Example frequentist interpretation: Let Xi ∼ Ber(θ) be whether individual i lodge a claim at the insurer: PT i=1 Xi = Y ∼ Bin(T .e. representing adverse selection). θ) be the number of car accidents..Prior: Last year’s estimated claim distribution. . . Similar idea: Bayesian updating. .Data: This years claims. . with Θ ∼ Beta(a.Now. .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Bayesian estimation Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Bayesian estimation
Notation for Bayesian estimation
Under this approach, we assume that Θ is a random quantity
with density π (θ) called the prior density.
(This is usual notation, rather than fΘ (θ).)
A sample X = x(= [x1 , x2 , . . . , xT ]> ) is taken from its
population and the prior density is updated using the
information drawn from this sample and applying Bayes’ rule.
This updated prior is called the posterior density, which is the
conditional density of Θ given the sample X = x is π(θx)
(=fΘX (θx)).
So we’re using a conditional r.v., ΘX , associated with the
multivariate distribution of Θ and the X (look back at lecture
notes for week 3).
1034/1074
Use for example E [π(θx)] as the Bayesian estimator.
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Bayesian estimation
Bayesian estimation, theory
b θ) on T which is an
First, let us define a loss function L(θ;
estimator of τ (θ) with:
b θ) ≥ 0,
L(θ;
b θ) = 0,
L(θ;
b
for every θ;
when θb = θ.
Interpretation loss function: for reasonable functions we
have:
a loss function has a lower value ⇒ better estimator.
Examples of the loss function:
b θ) = (θb − θ)2
 Mean squared error: L(θ,
b θ) = θb − θ.
 Absolute error: L(θ,
1035/1074
(mostly used);
ACTL2002/ACTL5101 Probability and Statistics: Week 5
Estimator III: Bayesian estimator
Bayesian estimation
Bayesian estimation, theory
Next, we define a risk function, the expected loss:
h
i Z
b
b
Rθb(θ) =Eθb L(θ; θ) = L(θ(x);
θ) · fxΘ (xθ)dx.
Note: estimator is a random variable (e.g. T = θb = X ,
τ (θ) = θ = µ) depending on observations.
Interpretation risk function: loss function is a random
variable ⇒ taking expectation returns a number given θ.
Note: Rθb(θ) is a function of θ (we only know prior density).
Define Bayes risk under prior π as:
Z
b = Eθ R b(θ) =
Rθb(θ) · π(θ)dθ.
Bπ (θ)
θ
Θ
1036/1074
Goal: minimize Bayes risk.
1037/1074 Interpretation: θb is the “best estimator” with respect to loss b θ). θbB . b for any θ. Rewriting. theory Now. we can introduce the Bayesian estimator. ** using the law of iterative expectations (week 3) we have: n h h iio b θbB =argminθb Eθ Eθb L(θθ) n h h iio ∗ b =argminθb Eθb Eθ L(θθ) n h io ∗∗ b =argminθb Eθb L(θ) . .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Bayesian estimation Bayesian estimation. for a given loss function. function L(θ. * using reversing order of integrals. for which the following hold: Eθ RθbB (θ) ≤ Eθ Rθb(θ) .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Bayesian estimation Bayesian estimation. Θ  {z } b ≡r (θx) Z = b · fxθ dx. Law of Total Probability and ** 1038/1074changing order of integration. i. θ) · fxθ · π(θx)dxdθ Θ Z Z ∗∗ b θ) · π(θx)dθ · fxθ dx = L(θ(x). r (θx) b is equivalent to minimizing r (θx) b for Implying: minimizing Bπ (θ) all x. R * using π(θx)dx = π(θ). estimators Rewriting the Bayes risk we have: Z Z Z b b θ) · fxθ dx · π(θ)dθ Bπ (θ) = Rθb(θ) · π(θ)dθ = L(θ(x). .e.. Θ Θ Z Z ∗ b = L(θ(x).
θbB = E[π(θx)]! 1039/1074 One can show that for absolute error loss function: θbB (x) = median(π(θx)). i.e. .. estimators For the squared error loss function (used in *) we have: n o b b ⇔minimizing r (θx) b for all x ⇒ ∂r (θx) = 0 min Bπ (θ) b ∂ θx θb Z ∗ b ⇒2 (θ − θ(x)) · π(θx)dθ = 0 Θ Z B b ⇒θ (x) = θ · π(θx)dθ Θ ⇒θbB (x) = Eθx [θ] . Interpretation: Bayesian estimator under squared error loss function is the expectation of the posterior density.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Bayesian estimation Bayesian estimation.
. . . . . . P ** Using LTP: Pr(A) = ni=1 Pr(ABi ) · Pr(Bi ) (where B1 . . xT θ ) · π (θ) fX (x1 . xT θ ) · π (θ) fX Θ (x1 . . . derivation The posterior density (i. with * Using Bayes formula: Pr(Ai B) = PnPr(BA j=1 Pr(BAj )·Pr(Aj ) A1 . . . . An a complete partition of Ω. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Bayesian estimation Bayesian estimation. . xT ]> (=constant given the observations!). .e. 1040/1074 Note: π(θ) is a complete partition of the sample space.. . Bn a complete partition of Ω. denominator is is the marginal density of the X = [x1 . . Hence. . . x2 . . . . . x2 . week 1). x2 . xT θ ) · π (θ) dθ (1) fX Θ (x1 . . xT ) i )·Pr(Ai ) . . fΘX (θx)) is derived as: ∗ π (θx) = ∗∗ = R fX Θ (x1 . x2 . . . x2 . . .
Find posterior density using (1) (difficult/tidious integral!) or (2). x2 . Find c such that c · fX Θ (x1 . Find a (special) distribution that is proportional to fX Θ (x1 . . derivation Notation: ∝ is “proportional to”.e. . (fastest way. Equation (2) can be used R to find the posterior density by: I. II. 2. . . . (2) Either use equation (1) (difficult/tidious integral!) or (2).ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Bayesian estimation Bayesian estimation. . i. x2 . . x2 . . xT θ ) · π (θ) dθ = 1. Compute the Bayesian estimator (using the posterior) under a given loss function (under mean squared loss function: take expectation of the posterior distribution). f (x) ∝ g (x) ⇒ f (x) = c · g (x). . We have that the posterior is given by: π (θx) ∝ fX Θ (x1 . xT θ ) · π (θ). . . . if possible!) Estimation procedure: 1041/1074 1.. xT θ ) · π (θ) . .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
.i. Γ (a) · Γ (b) We know that the conditional density (density conditional on the true value of θ) of our data is given by: fX Θ (x θ ) =θx1 (1 − θ)1−x1 · θx2 (1 − θ)1−x2 · .d. P * Simplifying notation. Bernoulli(Θ).ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise Example Bayesian estimation: BernoulliBeta Let X1 . . i. . . let s = T j=1 xj .. XT be i. . This is just the likelihood function. Assume the prior density of Θ is Beta(a. b) so that: π (θ) = Γ (a + b) · θa−1 · (1 − θ)b−1 . X2 . . (Xi Θ = θ) ∼ Bernoulli(θ). · θxT (1 − θ)1−xT T P =θ 1042/1074 j=1 xj T− · (1 − θ) T P j=1 xj ∗ = θs · (1 − θ)T −s .e. .
d.f. of Ξ ∼ Beta (a + s. Γ (a) · Γ (b) Posterior density is c · fX Θ (x1 . . 1043/1074 . the density of Θ given X = x. . . . . b + T − s). we observe (3) is proportional to the p. Posterior density is also solvable by finding c such that: Z Γ (a + b) c· · θ(a+s)−1 · (1 − θ)(b+T −s)−1 dθ = 1. x2 . . 1. Tedious method: To find the posterior density using (2) we first need to find the marginal density of the X (next slide). However. x2 . xT θ ) · π (θ) = Γ (a + b) · θ(a+s)−1 · (1 − θ)(b+T −s)−1 Γ (a) Γ (b) (3) I. . Easy method: The posterior density.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise 1. using (1) is proportional to: π (θx) ∝fX Θ (x1 . xT θ ) · π (θ). . II.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise The marginal density of the X (* using LTP) is given by: Z 1 ∗ fX (x) = fX Θ (x θ ) · π (θ)dθ 0 Z 1 Γ (a + b) · θ(a+s)−1 · (1 − θ)(b+T −s)−1 dθ = Γ (a) · Γ (b) 0 ∗∗ Γ (a + b) Γ (a + s) · Γ (b + T − s) . Γ (a + s) · Γ (b + T − s) . = Γ (a) · Γ (b) Γ (a + b + T ) ** : R1 0 x α−1 · (1 − x)β−1 dx = B(α. β) = using (2): π (θx) = = 1044/1074 = Γ(α)·Γ(β) Γ(α+β) . Posterior density fX Θ (x θ ) · π (θ) fX (x) θs · (1 − θ)T −s · Γ(a+b) a−1 · (1 Γ(a)·Γ(b) · θ Γ(a+b) Γ(a+s)·Γ(b+T −s) Γ(a)·Γ(b) Γ(a+b+T ) − θ)b−1 Γ (a + b + T ) · θ(a+s)−1 · (1 − θ)(b+T −s)−1 .
θ = E[ΘX = x] = a+b+T T} a+b+T a+b  {z  {z }  {z }  {z } weight sample 1045/1074 sample mean weight prior prior mean . with the above posterior density is then: θbB = E[ΘX = x] = E [Ξ ∼ Beta (a + s. The mean of this r.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise Example Bayesian estimation: BernoulliBeta 2.v. We note that we can write the Bayesian estimator as a weighted average of the prior mean (which is a/(a + b)) and the sample mean (which is s/T ) as follows: s T a+b a B b · + · . b + T − s)] = a+s a+b+T gives the Bayesian estimator of Θ.
(Xi Θ = θ) ∼ Normal(θ. . .d. i. .. Normal Θ. π (θ) = √ · exp − 2 · σ12 2πσ1 Question: Find the Bayesian estimator for θ. X2 . 1046/1074 Solution: We know that the conditional density of our data is given by the likelihood function: T Y (xj − θ)2 1 √ fX Θ (x θ ) = · exp − 2 · σ22 2πσ2 j=1 ! PT 2 (x − θ) 1 j j=1 = √ · exp − 2 · σ22 ( 2πσ2 )T . Assume the prior density of Θ is Normal m. σ22 .i.e.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise Exercise NormalNormal Let X1 . σ22 ). . σ12 so that: 1 (θ − m)2 . XT be i.
Posterior density: π(θx) ∝ fX Θ (xθ) · π(θ) ∝ exp − PT j=1 (xj − θ)2 ! (θ − m)2 · exp − 2 · σ12 2 · σ22 ! PT 2 (θ − m)2 j=1 (xj − θ) = exp − − 2 · σ22 2 · σ12 ! PT 2 2 (θ2 + m2 − 2 · θ · m) j=1 (xj + θ − 2 · θ · xj ) = exp − − 2 · σ22 2 · σ12 ! P 2 2 σ22 · (θ2 + m2 − 2 · θ · m) + σ12 · T j=1 (xj + θ − 2 · θ · xj ) = exp − 2 · σ22 · σ12 2 θ · (σ22 + T · σ12 ) − 2 · θ · (m · σ22 + T · x · σ12 ) ∗ ∝ exp − 2 · σ22 · σ12 (m·σ 2 +T ·x·σ 2 ) m·σ22 +T ·x·σ12 2 θ2 − 2 · θ · (σ22+T ·σ2 ) 1 θ − ∗∗ σ22 +T ·σ12 2 1 ∝ = exp − exp − 2 2 2 2 2 2 2 · σ2 · σ1 /(σ2 + T · σ1 ) 2 · σ2 · σ1 /(σ22 + T · σ12 ) 1047/1074 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise 1.
The Bayesian estimator under both the mean squared loss function and absolute error loss function is: bB θ = 1048/1074 1 σ12 1 σ12 + T σ22 ·m+ T σ12 1 σ12 + T σ22 · x. −1 . Thus θX is Normally distributed with mean variance mean: σ22 ·σ12 σ22 +T ·σ12 1 σ12 1 σ12 + T σ22 m·σ22 +T ·x·σ12 σ22 +T ·σ12 and . 1.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Estimator III: Bayesian estimator Example & exercise P m2 + T j=1 xj − 2·σ2 ·σ2 2 1 *: exp and **: m·σ22 +T ·x·σ12 2 2 2 2 2 2 · σ2 · σ1 /(σ2 + T · σ1 ) exp are σ 2 +T ·σ 2 2 1 constants given x. Note that we can rewrite it to: ·m+ T σ12 1 σ12 + T σ22 · x. and variance: 1 T + 2 2 σ1 σ2 2.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Chebyshev’s Inequality Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
states that for any random variable X with mean µ and variance σ 2 . k This provides us with an upper bound of the probability that X deviates more than k standard deviations of its mean. 2 Note that this applies to all distributions.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Chebyshev’s Inequality Chebyshev’s Inequality The Chebyshev’s inequality. 2 Interesting example: set = k · σ then: 1 Pr (X − µ > k · σ) ≤ 2 . the following probability inequality holds for all > 0: σ2 . hence also nonsymmetric ones! This implies that: Pr (X − µ > ) ≤ σ2 ≥ Pr (X − µ < −) . Pr (X − µ > ) ≤ 1049/1074 .
Pr (X > 500) ≤ 1/9. . We do know that the mean claim size in the portfolio is $50 million with a standard deviation of $150 million.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Chebyshev’s Inequality Application: Chebyshev’s Inequality The distribution of fire insurance claims does not have a special distribution. k 9 1050/1074 Thus. Question: What is an upper bound for the probability that the claim size is larger than $500 million? Solution: We have: Pr (X − µ > k · σ) ≤ Pr (X − µ > k · σ) = Pr (X − 50 > k · 150) 1 1 ≤ 2 = .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Convergence concepts Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Convergence concepts Convergence concepts Suppose X1 .o. It means that beyond some point in the sequence (ω). but that point is random.s.) = Pr(lim supn An ). . . as n → ∞) =1. a. X2 . stands for infinitely often: Pr(An i. . Example: Xi is the sample variance using the first i observations. Sometimes called strong convergence. where i. OPTIONAL: Also expressed as: Pr (Xn (ω) − X (ω) > .) to the random variable X as n → ∞ if and only if: Pr (ω : Xn (ω) → X (ω) .) = 0.o.v. the difference will always be less than some positive .’s.s. i. 1051/1074 Applications: Law of large numbers. Xn is said to converge almost surely (a. and we write Xn → X . as n → ∞. Monte Carlo integration. form a sequence of r.o.
as n goes to infinity (hence → is weaker than →).s. Difference converges in probability and converges almost surely: Pr (Xn − X  > ) goes to zero instead of equals zero p a. 1052/1074 . as n → ∞. p and we write Xn → X .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Convergence concepts Xn converges in probability to the random variable X as n → ∞ if and only if. as n → ∞. Pr (Xn − X  > ) →0. for every > 0.
n · p · (1 − p)).Xn ∼ Poi(λn ). Convergence of MGF’s implies weak convergence.Xn ∼ Bin(n. with λn → ∞ and X ∼ N(λn . for every x. p) and X ∼ N(n · p. d and we write Xn → X . Sometimes called weak convergence.Cental Limit Theorem.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Convergence concepts Xn converges in distribution to the random variable X as n → ∞ if and only if. as n → ∞. 1053/1074 . λn ). FXn (x) → FX (x) . . as n → ∞. Applications (see later in lecture): . .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of strong convergency: Law of Large Numbers Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
we have: . according to the law of large numbers. X2 .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of strong convergency: Law of Large Numbers The Law of Large Numbers Suppose X1 . . 2. . . Define the sequence of sample means as: n 1X Xk . for any > 0. for k = 1. . n. . Xn are independent random variables with common mean E[Xk ] = µ and common variance Var (Xk ) = σ 2 . . . . Xn = n k=1 1054/1074 Then.
.
σ2 σ2 lim Pr .
X n − µ.
use Chebychev’s inequality with σ → 0. σ 2 ): X − µ ∼ N(0. thus when n → ∞ we have lim σ 2 /n = 0. . n→∞ n→∞ n · 2 n→∞ Proof: special case: ∼ N(µ. σ 2 /n). n→∞ General case: When second moment exists. > = lim 2n = lim = 0.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of strong convergency: Law of Large Numbers The law of large numbers (LLN) is sometimes written as: .
.
as n → ∞. Pr .
X n − µ.
How accurate the estimate is will depend on: 1055/1074 I) how large the sample size is. II) the variance σ 2 . . because this is the same concept as convergence in probability to a constant. there is also what we call the (strong) law of large numbers which simply states that the sample mean converges almost surely to µ: a. X n → µ. the law of large number states that the sample mean X n converges to the true value µ. However. > → 0.s. The result above is sometimes called the (weak) law of large p numbers and sometimes we write X n → µ. Important result in Probability and Statistics! Intuitively. as n → ∞.
. n k=1 where bIn (g ) denotes the approximation of I (g ). Xn and compute: n X bIn (g ) = 1 · g (Xk ) . → I (g ). Using the Monte Carlo method.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of strong convergency: Law of Large Numbers Application of LLN: Monte Carlo Integration Suppose we wish to calculate Z I (g ) = 1 g (x) dx. . as n → ∞. we have: bIn (g ) a. .s. 1] variables say X1 . X2 . . 1056/1074 Prove: next slide. 0 where elementary techniques of integration will not work. we generate U [0. .
0 Try this in Excel using the integral of the standard normal density.s. → E [g (X )] which is: k=1 n Z 1 Z g (x) · 1dx = E [g (X )] = 0 1 g (x) dx = I (g ) . 1057/1074 .000 and 1. we have bIn (g ) = 1 Pn g (Xk ) a.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of strong convergency: Law of Large Numbers Proof: Using the law of large numbers.000) random numbers? This method is called Monte Carlo integration. How good is your approximation for 100 (1.000.000 100.000 10.
n k=1 Then. this amount will become closer to µ. but homogeneous enough to have a common distribution. . the average loss each individual expects. and if these individuals pool together and each agrees to pay: n 1 X Xn = · Xk . This is based on the LLN. Insurance may help reduce the financial consequences of such losses by pooling individual risks. . . In effect. the LLN tells us that the amount each person will end up paying becomes more predictable as the size of the group increases. 1058/1074 . Xn are the amount of losses faced by n different individuals.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of strong convergency: Law of Large Numbers Application of LLN: Pooling of Risks in Insurance Individuals may be faced with large and unpredictable losses. . X2 . If X1 .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
As before.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Central Limit Theorem Suppose X1 . .v. identically distributed random variables with finite mean µ and finite variance σ 2 . σ n as n → ∞.v. not only normal r. 1059/1074 . Then. X2 . with finite mean and variance. 1) . Xn are independent. . the central limit theorem states: Xn − µ d √ → N (0. denote the sample mean by X n . This holds for all r. . .! Prove & rewriting CLT: see next slides.
. the random variable: Zn = Xn − µ √ σ n is approximately standard normally distributed. Then the CLT applies to the random variable: Sn − n · µ d Zn = √ → N (0. 1060/1074 The Central Limit Theorem is Pusually expressed in terms of the standardized sums Sn = nk=1 Xk .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Rewriting Central Limit Theorem We can write this result as: lim Pr n→∞ Xn − µ √ ≤ x σ n ! = Φ (x) .v. for all x where Φ (·) denotes the cdf of a standard normal r. Intuitively for large n. n·σ . as n → ∞. 1) .
of X : MX (t). Take the limit n → ∞ of m. of Xn : MXn (t). then we have: MZ (t) = exp t 2 /2 . be a sequence of independent r.g.v. This should be equal to MX (t).f. Zn = d General procedure to prove Xn → X : 1. Prove that Sn − n · µ √ σ· n converges to the standard normal distribution. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Proof of the Central Limit Theorem Let X1 . Find the m. X2 .g. 3. . of Xn : lim MXn (t) and n→∞ rewrite it. 2. . Note: useful are expansions for log and exp (see F&T page 2)! 1061/1074 1. Find the m.g.’s with mean P µ and variance σ 2 and denote Sn = ni=1 Xi . .f. Proof: Consider the case with µ = 0 and assuming the MGF exists for X .f.
d. 1062/1074 hence.g. Recall Sn = n P n P Xi . σ 2 . the m.i.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem 2.f. Var (Xi ) =σ 2 < ∞. random variables Xi . thus MPni=1 Xi (t) = MXn i (t). E [Xi ] =µ. of Zn = i=1 S√ n σ· n = Xi √ σ· n i=1 is obtained by: ∗ t √ MZn (t) =Msn σ· n n t ∗∗ √ = MXi σ· n * using Ma·X (t) = MX (a · t) ** using Sn is the sum of n i. Note that we only assumed that: MXi (t) =f t. for any distribution Xi with mean µ and finite variance! .
n→∞ Recall from week 1: 1) An m. uniquely defines a distribution.f. Consider Taylor series around zero for any M (t): ∞ i . is a function of all moments. for b ∈ R and c > 0.f.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Note: lim b · n−c = 0.g. 2) The m.g.
X t .
· M (i) (t).
M (t) = i!  {z t=0} i=0 i th moment .
1 2 .
·t · M (2) (t).
2 t=0 t=0 where O(t 3 ) covers all terms ck · t k . . with ck ∈ R for k ≥ 3. + O(t 3 ).
.
=M (0) + t · M (1) (t).
+ We have M (0) = E[e 0·X ] = 1 and because we assumed that E[Xi ] = 0: .
.
(1) MXi (t).
1063/1074 t=0 =E [Xi ] = 0. and .
.
(2) MXi (t).
3. Proof continues on next slide. t=0 2 = E Xi2 = Var (Xi ) + (E [Xi ]) = σ 2 . .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Now we can align the results from the previous two slides: √ n 1062 lim MZn (t) = lim MXi t/(σ n) n→∞ n→∞ !n √ i ∞ .
X t/(σ · n) 1063 .
(i) = lim · MXi (t).
i n n 1064/1074 . if n → ∞ P∞ (−1)i+1 ai t2 1 3/2 2 * using log(1 + a)= i=1 = a + O(a ). n→∞ i! t=0 i=0 2 3 !!n t t 1 1063 2 √ √ σ +O = lim 1 + 0 + n→∞ 2 σ n σ/ n 2 3/2 !! 1 1 t 2 √ ⇒ lim log (MZn (t)) = lim n · log 1 + σ +O n→∞ n→∞ 2 σ n n 3/2 ! t 2 1 1 t2 ∗ √ = lim n · +O = . with a= + O . n→∞ 2 n 2 n  {z } 3/2 2 1/2 =n· O ( n1 ) +O ( n1 ) =O ( n1 ) →0.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of weak convergency: Central Limit Theorem Application CLT: An insurer offers builder’s risk insurance. The sample mean of a claim is $10 million and the sample standard deviation is $25 million.d.0228. σ/ n ⇒ n · X n ∼N n · µ.9772 = Pr 400 · X 400 ≤ 400 · 10 million + 2 · 20 · 25 million . Pr 400 · X 400 > $5 billion = 1 − 0. It has yearly 400 contracts and offers the product already 9 years.?) Xn − µ d √ →N (0. Question: What is the probability that in a year the claim size is larger than $5 billion? Solution: Using CLT (why is σ ≈ sample s. Thus. as n → ∞ σ n √ 2 ⇒ X n ∼N µ. n · σ 2 ⇒ 0. 1065/1074 .9772 = 0. 1) .
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Binomial Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Binomial Normal Approximation to the Binomial From week 2 we know: a Binomial random variable is the sum of Bernoulli random variables. S must be approximately normal with mean E[S] = n · p and variance Var (S) = n · p · q. Applying the Central Limit Theorem. . 0. so that approximately for large n we have: S −n·p ∼ N (0. Then: S = X1 + X2 + . Let Xk ∼ Bernoulli (p).06)? Not in Binomial tables! 1066/1074 . √ n·p·q Question: What is the probability that X = 60 if X ∼ Bin(1000. p) distribution. + Xn has a Binomial(n. 1) .
. 1. for large n and for p around 0. 2. 1067/1074 .5 (but in particular np > 5 and np (1 − p) > 5 or n > 30) then can approximate the binomial probabilities with the Normal distribution. .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Binomial In practice. . we require the Normal approximation: ! k+ 12 − µ k− 12 − µ <Z < Pr σ σ and similarly for probability that Pr (X ≤ k). Continuity correction for binomial: note that Binomial random variable X takes integer values k = 0. but Normal probability is continuous so that for value: Pr (X = k) . Use µ = n · p and σ 2 = n · p · (1 − p).
f.4 n = 10. N(1.15 ← p.1 0.0.45) 0. N(3.f.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Binomial n = 5.m.0.1 0.f. probability mass function probability mass function probability mass function Normal approximation to Binomial Binomial(10. p = 0.2 0.1) p.f. p = 0. N(20.m.0.m. p = 0.9) 0. p = 0.d.f. 0.f.1 0.1) p.m. n = 200.1) p.d.d.0.3 ← p.18) 0.5.d.f.06 5 10 x Binomial(200.02 0 0 100 x 200 .7) 0.2 0.08 0.05 1068/1074 0 0 10 20 x 30 probability mass function Binomial(5.0.2.0.1 ← p.f.1 0.2 0 0 2 4 x Binomial(30.4 ← p.1) p.04 0. n = 30.1 0 0 0. N(0.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Poisson Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
e. 1). We have: E[Xn ] =λn Var (Xn ) =λn Standardize the random variable (i. λn Var (Xn ) 1069/1074 Proof: See next slides. subtract mean and divide by standard deviation): Xn − E[Xn ] Xn − λn d Zn = p = √ → Z ∼ N(0. λ2 . such that λn → ∞.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Poisson Normal approximation to the Poisson Approximation of Poisson by Normal for large values of λ.. Let Xn be a sequence of Poisson random variables with increasing parameters λ1 . . . . .
of Z : MZ (t) = exp t 2 /2 . 1070/1074 .g.g. We know (week 2): MXn (t) = exp λn · e t − 1 . 2.f.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Poisson 1.f. Thus.f. using the calculation rules for m. we need to find the m. We have the m. we have: √ MZn (t) =M X√ n −λn (t) = M √Xn − λ (t) n λn λn p p ∗ =exp − λn · t · MXn t/ λn p √ = exp − λn · t · exp λn · e t/ λn − 1 p √ = exp − λn · t + λn · e t/ λn − 1 * using Ma·X +b (t) = exp (b · t) · MX (a · t).. of Zn . Next.g.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Poisson 3. . n→∞ 1071/1074 * using√exponential expansion: e a = a = t/ λn . −1 3! λn 1 2 1 = lim t + O √ = t 2 /2 n→∞ 2! λn ⇒ lim MZn (t) = exp t 2 /2 = MZ (t). ai i=1 i! . . Find the limit of the MZn (t) and proof it equals MZ (t): √ p lim MZn (t) = lim exp − λn · t + λn · e t/ λn − 1 n→∞ n→∞ √t p ⇒ lim log (MZn (t)) = lim − t · λn + λn · e λn − 1 n→∞ n→∞ 2 p t 1 t ∗ √ = lim −t λn + λn · 1 + √ + n→∞ λn 2! λn ! 3 1 t √ + + . P∞ with .
1.1 1 ← p.f.1 probability mass function probability mass function probability mass function Normal approximation to Poisson ← p. λ=1 0.f.10) 0.1) 0. λ = 10 0.1) p.f.5 0 0 1 2 x Poisson(10) p.05 0 0 1072/1074 10 20 x 30 Poisson(1) p. N(100.d.m.f. N(10.1 0 0 2 4 6 x Poisson(100) p.f.d. N(1. λ = 100 0.m.1) 0.f.2 0.0.02 ← p. λ = 0. N(0.03 0.f.d.m.3 ← p.01 0 0 100 x 200 .100) 0.m.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Convergence of series Application of convergence in distribution: Normal Approximation to the Poisson probability mass function Poisson(0.f.d.
ACTL2002/ACTL5101 Probability and Statistics: Week 5 Summary Summary Limit theorems & parameter estimators Parameter estimation Definition of an estimator Estimator I: the method of moments The method of moments Example & exercise Estimator II: maximum likelihood estimator Maximum likelihood estimation Example & exercise Sampling distribution and the bootstrap Estimator III: Bayesian estimator Introduction Bayesian estimation Example & exercise Convergence of series Chebyshev’s Inequality Convergence concepts Application of strong convergency: Law of Large Numbers Application of weak convergency: Central Limit Theorem Application of convergence in distribution: Normal Approximation to the Binomial Application of convergence in distribution: Normal Approximation to the Poisson Summary Summary .
x)). . . Determine the likelihood function L (θ1 . 4. . Solve the resulting system of simultaneous equations.t. Bayesian: 1073/1074 1. . 2. 3. . Posterior density using (1) (difficult/tidious integral!) or (2). θk . 2. . θk to zero (⇒ global/local minimum/maximum). θ1 . x) = log (L (θ1 . . . . θk . 2.ACTL2002/ACTL5101 Probability and Statistics: Week 5 Summary Summary Parameter estimators Method of moments: 1. .r. . θ2 . . θk . Check wether second derivative is negative (maximum) and boundary conditions. Equate (the first) k sample moments to the corresponding k population moments. θ2 . x) w. Determine the loglikelihood function ` (θ1 . . . Equate the k population moments to the parameters of the distribution. θ2 . Compute the Bayesian estimator under a given loss function. . Equate the derivatives of ` (θ1 . . . . Maximum likelihood: 1. . 3. θ2 . . . θk . x). θ2 .
. Xn be independent random variables with equal mean E[Xk ] = µ and variance Var (Xk ) = σ 2 for k = 1. . . . . . . then for all > 0 we have: .ACTL2002/ACTL5101 Probability and Statistics: Week 5 Summary Summary LLN & CLT Law of large numbers: Let Xi . . n.
.
Pr .
X n − µ.
. . . n. Central limit theorem: Let Xi . 1). . . . as n → ∞. . then: Xn − µ d √ → N(0. Xn be independent and identically distributed random variables with mean E[Xk ] = µ and variance Var (Xk ) = σ 2 for k = 1. σ/ n 1074/1074 as n → ∞. > → 0. . .