You are on page 1of 145

Advanced Econometrics I

Ingo Steinke, Anne Leucht, Enno Mammen

University of Mannheim

Fall 2014

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 1


Organisation

Important dates
Start: 2013-10-07
End: 2013-12-05
Lectures:
Tuesday 10:15 - 11:45 in L 7, 3-5 - 001
Thursday 10:15 - 11:45 in L 7, 3-5 - 001
Exercises:
Thursday 13:45 - 15:15 in L 9, 1-2 003
Thursday 15:30 - 17:00 in L 9, 1-2 003
teaching assistants: Maria Marchenko
Slides will be provided via Ilias, usually on Friday for the next week.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 2


Exercise sheets
will be provided via Ilias and usually published on Tuesday (or
Wednesday).
hand in written solutions in lecture on Tuesday (you may work in
pairs).
discussion of the solutions on Thursday.
There is 1 point per exercise (0.25,0.5,0.75,1).
You need 75% of the points of the Exercise sheet to get (at most)
20% of the exam points.
There will be stared exercises which can be used to make up for
missing points of one exercise sheet.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 3


Exam
written exam, 180 min
Date: 2014-12-17

Contact
Office: L7, 3 - 5, room 142
Phone: 1940
E-Mail: isteinke@rumms.uni-mannheim.de
Office hour: on appointment

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 4


Contents

Overview:
1 Probability theory
2 Asymptotic theory
3 Conditional expectations
4 Linear regression

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 5


Literature
Ash, R. B. and Doléans-Dade, C. (1999). Probability & Measure
Theory. Academic Press.
Billingsley, P. (1994). Probability and Measure. Wiley.
Hayashi, F. (2009). Econometrics. Princeton University Press.
Jacod, J. and Protter, P. (2000). Probability Essentials. Springer.
Van der Vaart, A. W. and Wellner, J. A. (2000). Weak Convergence
and Empirical Processes. With Applications to Statistics. New York:
Springer.
Wooldridge, J. M. (2004). Introductory Econometrics: A Modern
Approach. Thomson/Southwestern.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 6


Introduction
Motivation

Application in Statistics ...


Study relationship between variables, e.g.
consumption and income
−→ How does raising income effect consumption behaviour?
evaluation of effectiveness of job market training (treatment effects)
...
Econometrics (Wooldridge (2004)):
development of statistical methods for estimating economic
relationships
testing economic theories
evaluation of government and business policies

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 7


Classical model in econometrics: linear regression

Figure: http://en.wikipedia.org/wiki/File:Linear regression.svg

Y = β0 + β1 X + u,
e.g. Y consumption, X wage, u error term typically data not generated by
experiments, error term “collects all other effects on consumption besides
wage”
→ variables somehow “random”
→ How do we formalize randomness?
Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 8
Aims of this course:
(1) Provide basic probabilistic framework and statistical tools for
econometric theory.
(2) Application of these tools to the classical multiple linear regression
model.
→ Application of these results to economic problems in Advanced
Econometrics II/III and follow-up elective courses.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 9


Chapter 1: Probability theory

Chapter 1: Elementary probability theory


Overview

1 Probability measures
2 Probability measures on R
3 Random variables
4 Expectation

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 10


Chapter 1: Probability theory 1.1 Probability measures

1.1 Probability measures

Aim: Formal description of “probability measures”


Setup:
The set Ω 6= ∅ of the possible outcomes of an random experiment is
called sample space, e.g. Ω = N = {1, 2, · · · }.
A ⊆ Ω event, e.g. A = {2, 4, 6, 8, · · · }
outcome: ω ∈ Ω
−→ Want to assign a “probability ” P(A) to event A
Consider first the case that Ω is a countable set, i.e.

Ω = {ω1 , ω2 , ω3 , · · · }

(e.g. Ω = N, Ω = Z).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 11


Chapter 1: Probability theory 1.1 Probability measures

∅ = { } denotes the empty set,


P(Ω) = {A : A ⊆ Ω} the power set.

Defintion 1.1
A probability measure P on a countable set Ω is a set function that
maps subsets of Ω to [0, 1], i.e. P : P(Ω) → [0, 1], and has the following
properties:
(i) P(Ω) = 1.
(ii) It holds

[ ∞
X
P( Ai ) = P(Ai )
i=1 i=1

for any Ai ⊆ Ω, i ∈ N, that are pairwise disjoint, i.e Ai ∩ Aj = ∅ for


i 6= j.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 12


Chapter 1: Probability theory 1.1 Probability measures

Recap: Index notation


Let I 6= ∅ some set and Ai ⊂ Ω for all i ∈ I . Then
[
x∈ Ai ⇐⇒ ∃j ∈ I : x ∈ Aj .
i∈I

If I = {1, . . . , n} and J = N, then


[ n
[
Ai = A1 ∪ · · · ∪ An = Ai ,
i∈I i=1
[ ∞
[
Aj = A1 ∪ · · · ∪ An ∪ · · · = Aj .
j∈J j=1

Especially, for I = A and Ax = {x},


[
A= {x}.
x∈A

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 13


Chapter 1: Probability theory 1.1 Probability measures

Recap: Series

A ⊆ Ω set is countable iff there is a set N ⊆ N and a bijection


(one-to-one-map) m : N → A. Then A can be written A = {a1 , a2 , . . . , an }
or A = {a1 , a2 , . . . , an , . . . }.
A series is a infinite sum and defined by

X n
X
s= ak = lim ai
n→∞
k=1 k=1

if the limit exists. The series s is absolutely convergent if



X
|ak | < ∞.
k=1

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 14


Chapter 1: Probability theory 1.1 Probability measures

The series s is unconditionally well-defined


P∞ if for any
{k1 , k2 , k3 , . . . } = N we have s = i=1 aki , i.e. a rearrangement of its
members does not change the (infinite) sum which might be ∞ oder −∞.
Note:
If a series is absolutely convergent, then it is unconditionally
convergent.
A series is unconditionally well-defined iff the (infinite) sums of all its
positive members or the sum of all negative members is finite.
Let I be countable and ai ∈ R for any i ∈ I . If I = {i1 , i2 , . . . }, then

X ∞
X
ai := aij ,
i∈I j=1

if the right-hand series is unconditionally convergent

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 15


Chapter 1: Probability theory 1.1 Probability measures

Lemma 1.2
For a countable sample space Ω = {ωi }i∈I (with countable I ) and a
probability measure P on Ω. Then for every A ⊆ Ω, it holds
X
P(A) = P({ω}).
ω∈A

Proof: Exercise.
Let ω ∈ Ω. An event {ω} that only contains one element is also called an
elementary event.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 16


Chapter 1: Probability theory 1.1 Probability measures

Arbitrary sample spaces

It is often impossible to define P “appropriately” for all subsets A ⊆ Ω


such that Definition 1.1 holds true; see e.g. Billingsley (1994).
Definition 1.3 A family A of subsets of Ω with
(i) ∅ ∈ A,
(ii) if A ∈ A, then AC = Ω\A ∈ A,
(iii) if A1 , A2 , · · · ∈ A, then ∞
S
i=1 Ai ∈ A,
is called a σ-field or σ-algebra.
For a σ-field A of Ω holds: A ⊆ P(Ω).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 17


Chapter 1: Probability theory 1.1 Probability measures

A σ-field A is called smallest-σ-field, containing B ⊆ P(Ω), if for any


σ-field C of Ω holds: If B ⊆ C, then A ⊆ C.
Notation: A = σ(B).

Example 1.4 Let Ω 6= ∅ be a set.


{Ω, ∅} is the smallest σ-field on Ω and is called trivial σ-field.
The power set P(Ω) is the largest σ-field on Ω.
6 B ⊂ Ω, the family {Ω, ∅, B, B C } is the smallest σ-field on Ω
If ∅ =
that contains B.
Suppose that A is a σ-field on a set Ω. Then the tuple (Ω, A) is called a
measurable space.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 18


Chapter 1: Probability theory 1.1 Probability measures

Definition 1.5
A set function P : A → [0, ∞) is a measure on (Ω, A) if for A1 , A2 , . . .
pairwise disjoint ∈ A it holds that
∞ ∞
!
[ X
P Ai = P(Ai ) (σ − additivity)
i=1 i=1

If, in addition, P(Ω) = 1, then it is called probability measure.


The triple (Ω, A, P) is then called a probability space.
Example 1.6 Let (Ω, A) be a measurable space and ω0 ∈ Ω.
Then ν(A) = |A|, A ∈ A, is the so-called counting measure.
The Dirac measure δω0 is then defined by

δω0 (A) := 1A (ω0 ) A ∈ A.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 19


Chapter 1: Probability theory 1.1 Probability measures

Theorem 1.7 (Properties of probability measures)


Suppose that (Ω, A, P) is probability space. Let A, B, A1 , A2 , · · · ∈ A.
Then holds:
(i) P(∅) = 0
(ii) Finite additivity: A1 , . . . , An pairwise disjoint imply
n
[ n
X
P( Ai ) = P(Ai ).
i=1 i=1

(iii) P(AC ) = 1 − P(A).


(iv) P(A) ≤ 1 for all A ∈ A.
(v) Subtractivity: A ⊆ B implies P(B\A) = P(B) − P(A).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 20


Chapter 1: Probability theory 1.1 Probability measures

(vi) Monotonicity: A ⊆ B implies P(A) ≤ P(B).


(vii) P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
(viii) Continuity from below: An ⊆ An+1 for all n ∈ N implies

S
P(An ) −→ P( Ak ).
n→∞ k=1
(ix) Continuity from above: An+1 ⊆ An for all n ∈ N implies

T
P(An ) −→ P( Ak ).
n→∞ k=1

S ∞
P
(x) Sub-σ-additivity: P( An ) ≤ P(An ).
n=1 n=1

Proof: Exercise.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 21


Chapter 1: Probability theory 1.2 Probability measures on R

1.2 Probability measures on R


Definition 1.8 The smallest σ-field B, that contains all open intervals
(a, b) (−∞ ≤ a ≤ b ≤ ∞), is called the Borel σ-field.
A set A ∈ B is called a Borel set.

Theorem 1.9 Put

A1 = {(a, b] : −∞ ≤ a < b < +∞},


A2 = {[a, b) : −∞ < a < b ≤ +∞},
A3 = {[a, b] : −∞ < a ≤ b < +∞},
A4 = {(−∞, b] : −∞ < b < +∞}.

Then it follows for j = 1, . . . , 4: B = σ(Aj ).

Proof: Exercise.
Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 22
Chapter 1: Probability theory 1.2 Probability measures on R

Definition 1.10
A class A∗ of subsets of Ω is a field if
(i) ∅ ∈ A∗ ,
(ii) A ∈ A∗ , then AC ∈ A∗ ,
(iii) A1 , A2 ∈ A∗ , then A1 ∪ A2 ∈ A∗ .

Suppose that A∗ is a field and define A as the smallest σ-field with


A∗ ⊆ A (notion: A = σ(A∗ )). Then a set S function P ∗ : A∗ → [0, ∞) s.t.
for A1 , A2 , . . . pairwise disjoint ∈ A∗ with ∞ ∗
i=1 Ai ∈ A it holds that

∞ ∞
!
[ X
P∗ Ai = P ∗ (Ai )
i=1 i=1

is called pre-measure.
If, in addition, P ∗ (Ω) = 1 it is called probability pre-measure.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 23


Chapter 1: Probability theory 1.2 Probability measures on R

Theorem 1.11 (Carathéodory) Let A∗ be a field, A = σ(A∗ ) and P ∗


a probability pre-measure on A∗ . Then there exists a unique
probability measure P on A with

P(A) = P ∗ (A) for A ∈ A∗ .

For a proof see Ash and Doléans-Dade (1999), Theorem 1.3.10.


Definition 1.12 For a probability measure P on (R, B) the function
F : R → [0, 1] given by

F (b) = P((−∞, b]) ∀b ∈ R

is called a (cumulative) distribution function (CDF).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 24


Chapter 1: Probability theory 1.2 Probability measures on R

Proposition 1.13 (Properties of the CDF) Suppose that F is the


distribution function of a probability measure P on (R, B). Then
(i) P((a, b]) = F (b) − F (a) for a < b,
(ii) F is non-decreasing (i.e. F (a) ≤ F (b) for a ≤ b),
(iii) F is continuous from the right (i.e. F (bn ) → F (b) for bn → b,
bn ≥ b (or for bn ↓ b)),
(iv) limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1.
(v) F (b−) := limn→∞ F (bn ) = P((−∞, b)) for any bn ↑ b.
(vi) P({b}) = F (b) − F (b−) for all b ∈ R.

Define P by F on A1 : P((a, b]) = F (b) − F (a), a < b.


Can this function be uniquely extended to a set function on B?

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 25


Chapter 1: Probability theory 1.2 Probability measures on R

Theorem 1.14 Consider a function F : R → R satisfying (ii) to (iv) of


Proposition 1.13. Then F is a distribution function (i.e. then there
exists a unique probability measure P on (R, B) with
F (b) = P((−∞, b]) for all b ∈ R).

Some ideas of the proof: First, define a set function P ∗ : A1 → [0, 1] as

P ∗ ((a, b]) = F (b) − F (a).

Extend this function as follows: P ∗ : A∗ → [0, 1], where A∗ consists of the


empty set and all finite unions of sets of A1 and their complements, and
for disjoint intervals
n n
!
[ X

P (ai , bi ] = P ∗ ((ai , bi ]) with notation (c, ∞] = (c, ∞).
i=1 i=1

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 26


Chapter 1: Probability theory 1.2 Probability measures on R

Discrete probability measures

The function f : R → R is called probability mass function (pmf) if


1 f (x) ≥ 0 for all x ∈ R and
P
2 it holds x∈Sf f (x) = 1 with Sf = {x ∈ R : f (x) > 0}.
Note that Sf must be countable if 2. holds. Sf is called support of f .
Define
X
P(A) = f (x). (1)
x∈Sf ∩A

Lemma 1.15 P, defined by (1), is a probability measure.

P
Then, the cdf is defined by F (x) = a∈Sf ∩(−∞,x] f (a).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 27


Chapter 1: Probability theory 1.2 Probability measures on R

Definition 1.16 A probability measure P on the measurable space (R, B)


is discrete if there is an at most countable set A ⊂ R such that P(A) = 1.
By

SP = {a ∈ R : P({a}) > 0}

we denote the support of a discrete probability measure R.

Lemma 1.17 P is discrete iff f : R → R, f (x) = P({x}), is a pmf.

Remark 1.18
1 SP ⊆ A is countable and P(SP ) = 1.
2 If P is a discrete probability measure with support SP then F has
jumps at a ∈ SP with jump heights P({a}).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 28


Chapter 1: Probability theory 1.2 Probability measures on R

Example 1.19
1 Binomial distribution
 
n i
P({i}) = π (1 − π)(n−i) for i = 0, 1, . . . , n,
i

P({i}) = 0 elsewhere. Parameter: 0 ≤ π ≤ 1, n ≥ 1.


2 Geometric distribution

P({i}) = (1 − π)i−1 π for i = 1, 2, 3 . . .

P({i}) = 0 elsewhere. Parameter: 0 ≤ π ≤ 1.


3 Poisson distribution
λi −λ
P({i}) = e , i = 0, 1, 2, . . .
i!
P({i}) = 0 elsewhere. Parameter: λ > 0.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 29


Chapter 1: Probability theory 1.2 Probability measures on R

Absolultely continuous probability measures


A (Riemann) integrable function f : R → R is called probability density
function (pdf) if
1 f (x) ≥ 0 for all x and
R∞
−∞ f (x)dx = 1.
2

In the following we assume that f is piecewise continuous, i.e. the is an


S countable index set I and pairwise disjoint open intervals Ai ⊆ R
at most
with i∈I Ai = R such that
f (x) is continuous on Ai for all i ∈ I .

Lemma 1.20 Let f : R → R be a piecewise continuous pdf. Then


there exists a unique probability measure on (R, B) such that
Z b
P((a, b]) = f (x)dx, for all a < b.
a

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 30


Chapter 1: Probability theory 1.2 Probability measures on R

The corresponding distribution P isR called absolutely continuous.


x
Then the CDF is given by F (x) = −∞ f (t)dt.
Note that F is continuous and

F 0 (x) = f (x),

if f is continuous at x.
A density is not unique but almost unique.

Lemma 1.21 Let f , g be piecewise continuous pdf’s such that


Z b Z b
f (x)dx = g (x)dx for all a < b.
a a

Then {x : f (x) 6= g (x)} is countable.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 31


Chapter 1: Probability theory 1.2 Probability measures on R

Example 1.22
1 Normal distribution
1 (x − µ)2
 
1
f (x) = √ exp − .
2πσ 2 σ2

Parameter: µ ∈ R, σ > 0
2 Uniform distribution
1
f (x) = 1 (x)
b − a [a,b]
Parameters: −∞ < a < b < ∞
3 Exponential distribution

f (x) = λ e −λx 1[0,∞) (x)

Parameter: λ > 0

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 32


Chapter 1: Probability theory 1.2 Probability measures on R

Extension to Rk

The Borel σ-field B k is the σ-field generated by the open intervals


(a1 , b1 ) × · · · × (ak , bk ). As in the real-valued case, probability measures on
(Rk , B k ) are uniquely defined via the multivariate distribution function:

F (b1 , . . . , bk ) = P({(x1 , . . . , xk ) : x1 ≤ b1 , . . . , xk ≤ bk }).

F is called absolutely continuous if:


Z b1 Z bk
F (b1 , . . . , bk ) = ··· f (x1 , . . . , xk )dxk · · · dx1
−∞ −∞

for all b1 , ..., bk ∈ R. Here, if f continuous at (x1 , . . . , xk ), then

∂k F
(x1 , . . . , xk ) = f (x1 , . . . , xk ).
∂x1 · · · ∂xk

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 33


Chapter 1: Probability theory 1.2 Probability measures on R

The Borel σ-field B k is the σ-field generated by the open intervals


(a1 , b1 ) × · · · × (ak , bk ).
The function f : Rk → R is called (multivariate) probability mass
function if
1 f (x) ≥ 0 for all x ∈ Rk and
it holds x∈Sf f (x) = 1 with Sf = {x ∈ Rk : f (x) > 0}.
P
2

Define the discrete probability measure on (Rk , B k ) by


X
P(A) = f (x),
x∈Sf ∩A

cf.(1), p.27.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 34


Chapter 1: Probability theory 1.2 Probability measures on R

A (Riemann) integrable function f : Rk → R is called (multivariate)


probability density function if
1 f (x) ≥ 0 for all x and
2 it holds
Z Z ∞ Z ∞
f (x)dx := ··· f (x1 , . . . , xk )dxk · · · dx1 = 1.
Rk −∞ −∞

Then by
Z b1 Z bk
P((a1 , b1 ] × · · · × (ak , bk ]) = ··· f (x1 , . . . , xk )dxk · · · dx1
a1 ak

a probability measure can be introduced on (Rk , B k ) which is called


absolutely continuous.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 35


Chapter 1: Probability theory 1.2 Probability measures on R

Notation: For x = (x1 , . . . , xk )0 , y = (y1 , . . . , yk )0 ∈ Rk we write


x = y iff xi = yi for all i = 1, . . . , k,
x ≤ y iff xi ≤ yi for all i = 1, . . . , k,
x < y iff xi < yi for all i = 1, . . . , k,
(x, y ] = {z ∈ Rk : x < z and z ≤ y } ⊂ Rk ,
(x, y ) = {z ∈ Rk : x < z and z < y } ⊂ Rk etc.
Let
(k)
A0 = {(a, b) : a, b ∈ Rk and a < b}.

Then
(k)
B k := σ(A0 )

is the Borel-σ-field on Rk and its members are called Borel sets.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 36


Chapter 1: Probability theory 1.3 Random variables

1.3 Random variables

Definition 1.23 Let (Ω1 , A1 ) and (Ω2 , A2 ) measurable spaces.


A function g : Ω1 → Ω2 is called measurable (or A1 − A2 -measurable) if

g −1 (B) = {ω ∈ Ω1 : g (ω) ∈ B} ∈ A1 ∀ B ∈ A2 . (2)

Notation: g : (Ω1 , A1 ) → (Ω2 , A2 ).


Remark 1.24
1 IA : Ω → R is A − B-measurable, if A ∈ A.
2 If g : Rm → Rk is continuous, then g it is B m − B k -measurable.
3 If f : (Ω1 , A1 ) → (Ω2 , A2 ) and g : (Ω2 , A2 ) → (Ω3 , A3 ), then
h : Ω1 → Ω3 , defined by h(ω1 ) = g (f (ω1 )), is A1 − A3 -measurable.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 37


Chapter 1: Probability theory 1.3 Random variables

Definition 1.25 An Rk -valued random variable (r.v.) is a function


X : Ω → Rk , where (Ω, A) is a measurable space and X fulfills:

X −1 (B) = {ω ∈ Ω : X (ω) ∈ B} ∈ A ∀ B ∈ Bk , (3)

i.e. X is A − B k measurable.

Notation: B ∈ B k ,

P(X ∈ B) := P(X −1 (B)) = P({ω ∈ Ω : X (ω) ∈ B}),


P(X = x) = P(X −1 ({x})) = P({ω ∈ Ω : X (ω) = x}).

(3) guarantees that X −1 (B) ∈ A, i.e. P(X ∈ B) is well-defined.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 38


Chapter 1: Probability theory 1.3 Random variables

Definition 1.26 Suppose that X is an Rk -valued random variable on a


probability space (Ω, A, P). Then

P X (B) := P(X ∈ B) := P(X −1 (B)), B ∈ Bk , (4)

is called the distribution of X .

Lemma 1.27 P X is a probability measure on (Rk , B k ).

Notation: Let X1 , . . . , Xl be r.v. on a probability space (Ω, A, P).

P(X1 ∈ A1 , . . . , Xl ∈ Al ) := P(X1−1 (A1 ) ∩ · · · ∩ Xl−1 (Al )).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 39


Chapter 1: Probability theory 1.3 Random variables

Definition 1.28
(i) Random variables X1 , . . . , Xl on a probability space (Ω, A, P) are
independent if

P(X1 ∈ A1 , . . . , Xl ∈ Al ) := P(X1 ∈ A1 ) · · · · · P(Xl ∈ Al )

for all Borel sets A1 , . . . , Al .


(ii) Suppose that (Xt )t∈T with some nonempty index set T is a family of
Rk -valued random variables on (Ω, A, P). These random variables are
independent if for any finite, nonempty I0 ⊆ T and any
At ∈ B k , t ∈ I0 ,
 
\ Y
P Xt−1 (At ) = P(Xt−1 (At )).
t∈I0 t∈I0

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 40


Chapter 1: Probability theory 1.3 Random variables

Let X : Ω → Rk and Y : Ω → Rm be r.v.

Lemma 1.29 If X , Y are independent and g : Rk → Rl and


h : Rm → Rn B k − B l and B m − B n -measurable, then g (X ) and h(Y )
are independent.

The cumulative distribution function (CDF) of X , FX : Rk → [0, 1], is


defined by

FX (x) = P(X ≤ x) for all x ∈ Rk .

Note that for k = 2, a = (a1 , a2 )0 , b = (b1 , b2 )0 holds

P(X ∈ (a, b]) = FX (b1 , b2 ) − FX (a1 , b2 ) − FX (b1 , a2 ) + FX (a1 , a2 ). (5)

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 41


Chapter 1: Probability theory 1.3 Random variables

Discrete random vectors


Let Z be a r.v. with values in Rk .
Z is discrete, if P Z is discrete, i.e. P(Z ∈ SZ ) = 1, where

SZ := {z ∈ Rk : P(Z = z) > 0}

is countable. SZ is called the support of Z . If Z = (X , Y )0 , then


X
P(X = x) = P(X = x, Y = y ),
y ∈SY
X
P(Y = y ) = P(X = x, Y = y ).
x∈SX

X , Y are independent iff

P(X = x, Y = y ) = P(X = x)P(Y = y ) ∀x ∈ SX , y ∈ SY .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 42


Chapter 1: Probability theory 1.3 Random variables

Let X be real-valued. See Example 1.19, see p.29.


X is called binomial distributed with parameter π ∈ [0, 1] and
n ∈ N, in signs X ∼ B(n, π), if P X is a Binomial distribution, i.e.
 
n x
P(X = x) = π (1 − π)(n−x) for x = 0, 1, . . . , n,
x

X is called geometric distributed with parameter π ∈ [0, 1], i.s.


X ∼ Geo(π), if P X is a Geometric distribution, i.e.

P(X = x) = (1 − π)x−1 π for x = 1, 2, 3 . . .

X is called Poisson distributed with parameter λ > 0, i.s.


X ∼ Po(λ), if P X is a Poisson distribution, i.e.

λx −λ
P(X = x) = e , x = 0, 1, 2, . . .
x!

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 43


Chapter 1: Probability theory 1.3 Random variables

Continuous random variables


Let X be a r.v. with values in R.
X is continuous, if P X is absolutely continuous, i.e. it exists a pdf fX s.t.
Z b
P(a ≤ X ≤ b) = P X ([a, b]) = fX (x)dx
a

for all a < b. fX is then called probability density function (pdf) of X .


The CDF of X is given by
Z x
FX (x) = P(X ≤ x) = fX (t)dt
−∞

Let SX> = {x ∈ R : f (x) > 0}. Then SX = SX> is called the support of X.
It holds

P(X ∈ SX ) = 1.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 44


Chapter 1: Probability theory 1.3 Random variables

Let X be continuous. See Example 1.22, p.32.


X is normally distributed with parameters µ and σ 2 > 0, i.s.
X ∼ N(µ, σ 2 ) if its density can be written as

1 (x − µ)2
 
1
fX (x) = √ exp − .
2πσ 2 2 σ2

X is uniformly distributed with parameters a and b, a < b, i.s.


X ∼ U(a, b) if its density can be written as

1
fX (x) = 1 (x).
b − a [a,b]
X is exponentially distributed with parameter λ > 0, i.s.
X ∼ Exp(λ) if its density can be written as

fX (x) = λ e −λx 1[0,∞) (x).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 45


Chapter 1: Probability theory 1.3 Random variables

Continuous random vectors

Let X = (X1 , . . . , Xk )0 be a r.v. with values in Rk .


X is continuous, if P X is absolutely continuous, i.e. the is a multivariate
pdf fX s.th.
Z b1 Z bk
F (b1 , . . . , bk ) = ··· fX (x1 , . . . , xk )dxk · · · dx1
−∞ −∞

for all b1 , ..., bk ∈ R. fX is then called probability density function (pdf)


of X . SX> = {x ∈ Rk : f (x) > 0}. Then SX = SX> is called the support of
X. Especially, for k = 2, holds, for a1 < b1 , a2 < b2 ,
Z b1 Z b2
P(a1 ≤ X1 ≤ b1 , a2 ≤ X2 ≤ b2 ) = fX1 ,X2 (x1 , x2 )dx2 dx1 .
a1 a2

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 46


Chapter 1: Probability theory 1.3 Random variables

Lemma 1.30 Let X = (X1 , X2 )0 be continuous with density fX1 ,X2 .


Then X1 is continuous with density
Z ∞
fX1 (x1 ) = fX1 ,X2 (x1 , x2 )dx2
−∞

and X2 is continuous with density


Z ∞
fX2 (x2 ) = fX1 ,X2 (x1 , x2 )dx1 .
−∞

fX1 and fX2 are called marginal densities and the distribution of X1 and
X2 , resp., are called marginal distributions.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 47


Chapter 1: Probability theory 1.4 Expectation

1.4 Expectation

In the following we consider real-valued r.v., i.e. with values in R.


Definition 1.31 Suppose that X is a discrete random variable with
support SX ⊆ R on a probability space (Ω, A, P).
X
E ∗ X = E ∗ [X ] = x · P(X = x) (6)
x∈SX

is well-defined if the sum (6) is unconditionally well-defined. Then E ∗ [X ]


is called expectation (or mean) of (the discrete r.v.) X.
E ∗ [X ] is finite iff
X
|x| · P(X = x) < ∞. (7)
x∈SX

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 48


Chapter 1: Probability theory 1.4 Expectation

Especially,
PN if SX = {a1 , . . . , aN }, pi = P(X = ai ) for i = 1, 2, . . . and
i=1 pi = 1, then
N
X N
X N
X

E X = ai pi = ai P(X = ai ) = ai P X ({ai }).
i=1 i=1 i=1

Example 1.32 If P(X = a) = 1, then SX = {a} and

E ∗ X = a · P(X = a) = a.

Example 1.33 Let (Ω, A, P) be a probability space and A ∈ A. Then 1A


is a r.v. with support in {0, 1}.

E ∗ 1A = 0 · P(1A = 0) + 1 · P(1A = 1) = P(A).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 49


Chapter 1: Probability theory 1.4 Expectation

Proposition 1.34 Let Z be discrete r.v. with values in Rk and


support SZ and g : Rk → R.
(a) The support of Z∗ = g (Z ) is SZ∗ = g (SZ ) = {g (z) : z ∈ SZ }.
(b) If E ∗ [g (Z )] is well-defined, then
X
E ∗ [g (Z )] = g (z)P(Z = z)
z∈SZ

Especially, if X , Y are random vectors and Z = (X , Y )0 , then


X X
E ∗ [g (Z )] = g (x, y )P(X = x, Y = y ).
x∈SX y ∈SY

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 50


Chapter 1: Probability theory 1.4 Expectation

Special case:
X
E ∗ |X | = |x|P(X = x),
x∈SX

Remark 1.35
(a) E ∗ |X | is always well-defined.
(b) E ∗ X is finite iff E ∗ |X | < ∞. See (7).

Recap: Some laws for real numbers. Let an , bn , cn be real numbers.


P P
Triangle inequality: | i∈I ai | ≤ i∈I |ai |.
If an ≤ bn , limn→∞ an = a, and limn→∞ bn = b, then a ≤ b.
If an ≤ cn ≤ bn , limn→∞ an = a, and limn→∞ bn = a, then
limn→∞ cn = a.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 51


Chapter 1: Probability theory 1.4 Expectation

Laws for the expectation of discrete r.v.

Lemma 1.36 Let X , Y be discrete r.v. and E ∗ [X ], E ∗ [Y ] well-defined.


Then holds: if X ≤ Y , then E ∗ [X ] ≤ E ∗ [Y ].

Lemma 1.37
Let X , Y be discrete r.v., E ∗ [X ], E ∗ [Y ] finite, and a, b, c ∈ R.
(a) |E ∗ [X ]| ≤ E ∗ [|X |].
(b) E ∗ [a + bX + cY ] is finite and
E ∗ [a + bX + cY ] = a + bE ∗ [X ] + cE ∗ [Y ].
(c) If X , Y are independent, then E ∗ [X · Y ] is finite and
E ∗ [X · Y ] = E ∗ [X ] · E ∗ [Y ].

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 52


Chapter 1: Probability theory 1.4 Expectation

General definition of expectation


Definition 1.38 For a real-valued random variable X on (Ω, A) define
k k k +1
Xn∗ (ω) = if ≤ X (ω) < for k ∈ Z.
n n n
If
(i) E ∗ [Xn∗ ] is well-defined for every n ∈ N and
(ii) lim E ∗ [Xn∗ ] is well-defined,
n→∞
then
E [X ] := EX := lim E ∗ [Xn∗ ]. (8)
n→∞

is called the expectation (or mean) of X .


Note that by definition for all n ∈ N holds
1 1
Xn∗ ≤ X ≤ Xn∗ + and |X − Xn∗ | ≤ . (9)
n n
Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 53
Chapter 1: Probability theory 1.4 Expectation

Denote

X + = max(0, X ), X − = max(0, −X ).

Then X + ≥ 0, X − ≥ 0,

X = X + − X −, and |X | = X + + X − .

Then for a discrete r.v. X holds, by definition of E ∗ [X ], that

E ∗ [X ] is well-defined iff E ∗ [X + ] < ∞ or E ∗ [X − ] < ∞.


E ∗ [X ] is finite iff E ∗ [X + ] < ∞ and E ∗ [X − ] < ∞.

Lemma 1.39 Let X , Y be discrete r.v., E ∗ Y well-defined, and


|X − Y | ≤ c for some constant c. Then E ∗ X is well-defined.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 54


Chapter 1: Probability theory 1.4 Expectation

For a discrete r.v. X we have two definitions of an expectation, E ∗ [X ] and


E [X ], but they coincide.

Proposition 1.40 Let X be a discrete r.v.


(i) E ∗ [X ] is well-defined iff E [X ] is well-defined.
(ii) If E ∗ [X ] is well-defined, then E ∗ [X ] = E [X ].

Some technical Lemma:

Lemma 1.41 Let X be any r.v. and Yn , Zn discrete r.v. Assume that

|X − Yn | ≤ Zn , lim E [Zn ] → 0, and lim E [Yn ] = a ∈ R.


n→∞ n→∞

Then E [X ] is finite and E [X ] = a.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 55


Chapter 1: Probability theory 1.4 Expectation

Laws for the expectation


We are ready to generalize Lemma 1.36 and Lemma 1.37.

Proposition 1.42 (Monotonicity) Let X , Y be r.v. and E [X ], E [Y ]


well-defined. Then holds: if X ≤ Y , then E [X ] ≤ E [Y ].

Theorem 1.43 Let X , Y be r.v. and E [X ], E [Y ] finite. Then:


(a) |EX | ≤ E |X |.
(b) (Linearity) E [a + bX + cY ] is finite and
E [a + bX + cY ] = a + b · E [X ] + c · E [Y ].
(c) (Product law) If X , Y are independent, then E [X · Y ] is finite
and it holds E [X · Y ] = E [X ] · E [Y ].

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 56


Chapter 1: Probability theory 1.4 Expectation

Continuous random variables


A continuous version of Proposition 1.34.

Theorem 1.44 Let X be real-valued and continuous with density fX .


(a) If E [X ] is finite, then
Z ∞
E [X ] = x · fX (x)dx. (10)
−∞

(b) (Expectation rule) Let g : R → R (measurable). If E [g (X )] is


finite, then Z ∞
E [g (X )] = g (x) · fX (x)dx. (11)
−∞

Cf. Jacod and Protter (2000), Corollary 9.1.


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 57
Chapter 1: Probability theory 1.5 Variance and covariance

1.5 Variance and covariance

Suppose that X , Y are a real-valued random variables and denote


µX = E [X ], µY = E [Y ] if the expectations are defined.
Definition 1.45
(i) The s-th moment of X is defined as E [X s ] (if well-defined),
(ii) the s-th absolute moment as E [|X |s ],
(iii) and the s-th central moment as E [(X − µX )s ] with µ = E [X ]
(if well-defined).
(iv) The 2nd central
 moment is also called variance:
Var [X ] = E (X − µX )2 .
For X , Y define by

Cov [X , Y ] = E [(X − µX )(Y − µY )]

the covariance between X and Y .


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 58
Chapter 1: Probability theory 1.5 Variance and covariance

Existence of higher moments

Lemma 1.46 Let X be any real-valued r.v.


(a) If E [|X |s ] < ∞ and 0 < p < s, then E [|X |p ] < ∞.
(b) Var [X ] < ∞ iff E [X 2 ] < ∞.
(c) If E [X 2 ] < ∞ and E [Y 2 ] < ∞, then Cov [X , Y ] is finite.

Higher order moments guarantee the existence of lower order moments.

Lemma 1.47 If X , Y are independent and µX , µY are finite, then


Cov [X , Y ] is finite and Cov [X , Y ] = 0.

If Cov [X , Y ] = 0, then X and Y are called uncorrelated.


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 59
Chapter 1: Probability theory 1.5 Variance and covariance

Laws for variances and covariances

A selection of laws for variances and covariances some of them could be


easily generalized.

Proposition 1.48 Let X be any real-valued r.v. with finite variances


and µX = E [X ].
(a) Var [X ] = E [X 2 ] − µ2X .
(b) Var [a + bX ] = b 2 Var [X ].
(c) Var [X + Y ] = Var [X ] + Var [Y ] + 2Cov [X , Y ].
(d) Cov [X , Y ] = Cov [Y , X ].
(e) Cov [X , X ] = Var [X ].
(f) Cov [a + bX , c + dY ] = b · d · Cov [X , Y ].

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 60


Chapter 1: Probability theory 1.5 Variance and covariance

Examples for expectations and variances

Expectations and variances can be computed for specific discrete and


absolutely continuous distribution using Proposition 1.34 and Theorem
1.44.

Examples 1.49
1 If X ∼ B(n, π), then E [X ] = nπ and Var [X ] = nπ(1 − π).
2 If X ∼ Po(λ), then E [X ] = λ and Var [X ] = λ.
3 If X ∼ U(a, b), then E [X ] = (a + b)/2, Var [X ] = (b − a)2 /12.
4 If X ∼ Exp(λ), then E [X ] = 1/λ, Var [X ] = 1/λ2 .
5 If X ∼ N(µ, σ 2 ), then E [X ] = µ, Var [X ] = σ 2 .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 61


Chapter 1: Probability theory 1.5 Variance and covariance

Property (c) of Proposition 1.48 can be generalized.

Proposition 1.50 If X1 , X2 , . . . , Xn are independent with finite


variances, then

Var [X1 + · · · + Xn ] = Var [X1 ] + · · · + Var [Xn ].

More generally,

Xn n
X n−1 X
X n
2
Var [ ci Xi ] = ci Var [Xi ] + 2 ci cj Cov [Xi , Xj ] (12)
i=1 i=1 i=1 j=i+1

for any r.v. X1 , . . . , Xn , i.e. the Xi0 s need not be independent.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 62


Chapter 1: Probability theory 1.5 Variance and covariance

Expectations and covariances of random vectors

For an Rk -valued random variable X = (X1 , . . . , Xk )0 such that EXj exists


for all j, we define the expectation (vector) as
 
E [X1 ]
µX = E [X ] =  ...  .
 

E [Xk ]

Lemma 1.51 Let X be r.v. with values in Rk and E [X ] finite. Let


a ∈ Rm and B ∈ Rm×k a matrix. Then holds

E [a + BX ] = a + B · E [X ].

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 63


Chapter 1: Probability theory 1.5 Variance and covariance

If E |Xj |2 < ∞ for all j, the covariance matrix of X is defined by

Var [X ] := Cov [X , X ] := ΣX = E [(X − µX )(X − µX )0 ]


(X1 − µ1 )2
 
. . . (X1 − µ1 )(Xk − µk )
=E 
 .. .. .. 
. . . 
(Xk − µk )(X1 − µ1 ) . . . (Xk − µk )2
E [(X1 − µ1 )2 ]
 
. . . E [(X1 − µ1 )(Xk − µk )]
= .. .. ..
.
 
. . .
E [(Xk − µk )(X1 − µ1 )] . . . E [(Xk − µk )2 ]

More generally, for two random vectors X , Y , we define by

Cov [X , Y ] := E [(X − µX )(Y − µY )0 ].

covariance matrix of X and Y .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 64


Chapter 2: Asymptotic theory

Chapter 2: Asymptotic theory

1 Convergence of expectations
2 Modes of convergence
3 Convergence in distribution
4 Limit Theorems

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 65


Chapter 2: Asymptotic theory 2.1 Convergence of expectations

2.1 Convergence of expectations


Let X , X1 , . . . , Xn , . . . be real-valued r.v. on R
(Xn )n converges pointwise to X , i.s. limn→∞ Xn = X or Xn → X , iff

lim Xn (ω) = X (ω) for all ω ∈ Ω. (13)


n→∞

In general, (13) is not sufficient for limn→∞ E [Xn ] = E [X ].

Theorem 2.1 (Monotone convergence theorem)


Assume (13) and

0 ≤ Xn ≤ Xn+1 for all n ≥ 1. (14)

Then
E [Xn ] −→ E [X ].
n→∞

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 66


Chapter 2: Asymptotic theory 2.1 Convergence of expectations

For a proof see Jacod and Protter (2000).


Recall that for a sequence of real numbers (an )n

lim inf an = lim inf ak .
n→∞ n→∞ k:k≥n

Lemma 2.2 (Fatou’s Lemma) Assume that Xn ≥ 0 and define

X (ω) = lim inf Xn (ω).


n→∞

Then
E [X ] ≤ lim inf E [Xn ].
n→∞

Idea of the proof: Put Yn = inf k≥n Xk and apply the Monotone
convergence theorem.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 67


Chapter 2: Asymptotic theory 2.1 Convergence of expectations

Theorem 2.3 (Dominated Convergence Theorem) Assume (13) and

|Xn | ≤ Y for n ≥ 1 (15)

for a random variable Y with E [Y ] < ∞. Then

E [Xn ] −→ E [X ]. (16)
n→∞

For a proof see Jacod and Protter (2000).

Especially, (16) holds if Xn → X and the Xn are uniformly bounded, i.e.


|Xn | ≤ C for all n and for some C ∈ R.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 68


Chapter 2: Asymptotic theory 2.1 Convergence of expectations

Theorem 2.1, Lemma 2.2, and Theorem 2.3 are still valid if (13), (14),
and (15) are replaced by a.s (almost surely) statements. I.e. put

A = {ω : Xn (ω) −−−→ X (ω)}


n→∞
Bn = {ω : Xn (ω) ≤ Xn+1 (ω)}
Cn = {ω : |Xn (ω)| ≤ |Y (ω)|}.

If

P(A) = P(Bn ) = P(Cn ) = 1 for all n ∈ N,

then (13), (14), and (15) are said to hold almost surely, Theorem 2.1,
Lemma 2.2, and Theorem 2.3 stay valid, and (16) holds true.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 69


Chapter 2: Asymptotic theory 2.2 Modes of convergence

2.2 Modes of convergence


Pk
Let k · k denote a norm on Rk , e.g. kxk = kxk1 = j=1 |xj | or
qP
k 2
kxk = kxk2 = j=1 |xj | .

Definition 2.4 Suppose that (Xn )n and X are random variables on a


probability space (Ω, A, P) and with values in (Rk , B k ).
(i) (Convergence in probability)
The sequence (Xn )n converges in probability to X if

P({ω : kXn (ω) − X (ω)k > }) = P(kXn − X k > ) −→ 0 ∀  > 0.


n→∞

Notation:
P
Xn −→ X , p-limn→∞ Xn = X .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 70


Chapter 2: Asymptotic theory 2.2 Modes of convergence

(ii) (Almost sure convergence)


The sequence (Xn )n converges almost surely to X if

P({ω : lim Xn (ω) = X (ω)}) = P(Xn −→ X ) = 1.


n→∞ n→∞

Notation:
a.s.
Xn −→ X P − a.s., Xn −→ X a.s., Xn −→ X .
n→∞ n→∞

(iii) (Convergence in the p-th mean (Lp -convergence))


Let p ≥ 1. The sequence (Xn )n converges in p-th mean to X if

E kXn − X kp −→ 0.

Notation:
Lp
Xn −→ X , Lp − lim Xn = X .
n→∞

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 71


Chapter 2: Asymptotic theory 2.2 Modes of convergence

Definition 2.5 Suppose that (Xn )n and X are random variables with
values in (Rk , B k ). (Convergence in distribution)
The sequence (Xn )n converges to X in distribution if

E [f (Xn )] −→ E [f (X )] for all f ∈ Cb (Rk ),


n→∞

i.e. for all functions f : Rk → R that are continuous and bounded.


Notation:
L D d
Xn −→ X , Xn −→ X , Xn −→ X ,

(d, D, and L for in distribution, in law).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 72


Chapter 2: Asymptotic theory 2.2 Modes of convergence

Relation between different modes of convergence

In the following suppose that (Xn )n and X are random variables on a


probability space (Ω, A, P).
Then the following scheme holds true:
Lp
Xn −→ X &
P d
Xn −→ X → Xn −→ X
a.s.
Xn −→ X %
Lp -convergence (p ≥ r ≥ 1)

Lp r L 1 L
Xn −→ X → Xn −→ X → Xn −→ X

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 73


Chapter 2: Asymptotic theory 2.2 Modes of convergence

Lp -convergence and convergence in probability

Theorem 2.6 (Markov’s inequality) For a random variable X and an


monotone increasing function g : [0, ∞) → [0, ∞) with g (x) > 0 for
all x > 0 it holds for every ε > 0 that

E [g (kX k)]
P(kX k ≥ ε) ≤ .
g (ε)

For a real-valued random variable X with finite second moment and ε > 0
it holds that
Var [X ]
P(|X − EX | ≥ ε) ≤ (Chebychev’s inequality).
ε2

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 74


Chapter 2: Asymptotic theory 2.2 Modes of convergence

Suppose that (Xn )n and X are random variables on some probability space
(Ω, A, P).

Corollary 2.7 (a) Let p ≥ 1.


Lp P
Xn −→ X =⇒ Xn −→ X .

(b) For p ≥ r ≥ 1 holds


Lp r L
Xn −→ X =⇒ Xn −→ X.

Lemma 2.8 Let Xn be real-valued and a ∈ R.


L2
If E [Xn ] → a and Var [Xn ] → 0 as n → ∞, then Xn −→ a.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 75


Chapter 2: Asymptotic theory 2.2 Modes of convergence

Weak laws of large numbers (WLLN)

Lemma 2.9 (Weak law of large numbers 1)


Suppose that X1 , X2 , . . . are real-valued and uncorrelated r.v.
(i.e. cov [Xi , Xj ] = 0, i 6= j) with E [X1 ] = E [X2 ] = · · · = µ ∈ R and
Var [Xi ] ≤ c for all i and some c ∈ R. Then
n
1X P
X̄n = Xi −→ µ.
n
i=1

Theorem 2.10 (Weak law of large numbers 2)


For a sequence of independently and identically distributed (i.i.d.)
P
r.v. X1 , X2 , ... with finite mean µ = E [Xi ] it holds that X̄n −→ µ.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 76


Chapter 2: Asymptotic theory 2.2 Modes of convergence

Convergence in probability and almost surely

Theorem 2.11 It holds


a.s. P
Xn −→ X =⇒ Xn −→ X .

In general, convergence in probability does not imply a.s. convergence.


Example 2.12 Suppose that (Ω, A, P) = ([0, 1], B, U(0, 1)) and define

X2k +j (ω) = 1[j2−k ,(j+1)2−k ] (ω) , k ∈ N0 , j = 0, . . . , 2k − 1

P a.s.
Put X = 0. Then Xn −→ X , but not Xn −→ X .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 77


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

2.3 Convergence in distribution

Theorem 2.13 Suppose that (Xn )n , X are Rk -valued random


variables and FXn , FX their CDFs. Choose m ∈ N.
The following statements are equivalent
d
(i) Xn −→ X .
(ii) FXn (x) −→ FX (x) at all continuity points of FX .
n→∞
(iii) E [f (Xn )] −→ E [f (X )] for all bounded, Lipschitz functions
n→∞
f : Rk → R.

Cf. Van der Vaart (1998): Asymptotic Statistics. Cambridge University


Press, Lemma 2.2.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 78


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

Let f : Rk → R be a function.
C (f ) = {x ∈ Rk : f is continuous at x} is called the set of
continuity points of f .
f is bounded if if there is a c ∈ R s.th. |f (x)| ≤ c for x ∈ Rk .
f is a Lipschitz-function if there is a L ∈ R such that

for all x, y : |f (x) − f (y )| ≤ Lkx − y k.

Remark: A continuous function f : Rk → R is uniformly continuous on a


compact subset C ⊂ Rk , i.e. for every ε > 0 there exists a δ = δ(ε) > 0
s.th.

for all x, y ∈ C : kx − y k ≤ δ =⇒ |f (x) − f (y )| ≤ ε.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 79


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

Convergence in probability and convergence in distribution

Theorem 2.14 For Rk -valued r.v. (Xn )n and X it holds:


P d
Xn −→ X =⇒ Xn −→ X .

In general, convergence in probability implies convergence in distribution


but not vice versa.

Theorem 2.15 For Rk -valued r.v. (Xn )n on a probability space


(Ω, A, P) and deterministic a ∈ Rk it holds:
P d
Xn −→ a ⇐⇒ Xn −→ a.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 80


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

Characteristic functions
Let X be a random vector in Rk .
The function, ϕX : Rk → C, defined by
0
ϕX (t) = E [e it X ] = E [cos(t 0 X )] + iE [sin(t 0 X )],
is called characteristic function of X.
i is here the imaginary unit, i.e. i 2 = −1.

Proposition 2.16
Let Y be a random vector in Rk , a ∈ Rm and B ∈ Rm×k
1 ϕX is uniformly continuous.
0
2 ϕa+BX (t) = e ia t · ϕX (B 0 t) for t ∈ Rm .
3 If X und Y are independent, then ϕX +Y (t) = ϕX (t)ϕY (t).
2 /2
4 If Z ∼ N(0, 1), then ϕZ (t) = e −t .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 81


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

Let X , Y , Xn be random vectors in Rk .

d
Lemma 2.17 X = Y iff ϕX = ϕY .

Cf. v.d.Vaart (1998), Lemma 2.15.

Theorem 2.18 (Levy’s continuity theorem)


d
Xn → X iff for all t ∈ Rk : ϕXn (t) → ϕX (t).

Cf. v.d.Vaart (1998), Lemma 2.13.

d d
Theorem 2.19 (Cramèr-Wold) Xn → X iff ∀t ∈ Rk : t 0 Xn → t 0 X .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 82


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

Theorem 2.20 (Continuous mapping theorem) Let g : Rk → Rm be


continuous on C with P (X ∈ C ) = 1. Then
P P
(i) Xn → X ⇒ g (Xn ) → g (X ) ,
d d
(ii) Xn → X ⇒ g (Xn ) → g (X ) .

Cf. v.d.Vaart (1998), Theorem 2.3.

P P
→ X iff Xn,j −
Lemma 2.21 Xn − → Xj for j = 1, . . . , k.

The vector (Xn ) converges in probability if and only if all components


converge in probability.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 83


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

P
Note that for P(An = an ) = 1 holds: an → a iff An −
→ a.
Application: Let Xn , Yn be r.v. and an , bn , cn be real numbers.
P P
Let Xn → X , Yn → Y , an → a, bn → b, cn → c. Then
P
an + bn Xn + cn Yn → a + bX + cY ,
P
Xn Yn → XY etc.

Lemma 2.22 (Slutsky’s Lemma) Let Xn , Zn , Z r.v. with values in Rk


P d
and let Xn −
→ c ∈ R and Zn −
→ Z . Then
d
Xn + Zn −→ c + Z .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 84


Chapter 2: Asymptotic theory 2.3 Convergence in distribution

Theorem 2.23 (Slutsky’s Lemma)


P
(i) Let Xn , Zn be r.v. with values in Rk and Rm , resp., with Xn → c
d
and Zn → Z , where c is a constant. Then
d
(Xn , Zn ) → (c, Z ).

P
(ii) Let c ∈ Rm and B ∈ Rm×k . Let Xn → c with values in Rm ,
P d
Bn → B m × k-matrices, and Zn → Z with values in Rk . Then
d
Xn + Bn Zn → c + BZ .

Cf. v.d.Vaart Theorem 2.7 and Lemma 2.8.


P P
Bn → B means Bn,i,j → Bi,j for all 1 ≤ i ≤ m,1 ≤ j ≤ k.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 85


Chapter 2: Asymptotic theory 2.4 Limit Theorems

2.4 Limit Theorems

Theorem 2.24 (Strong law of large numbers (SLLN)) For a sequence


of i.i.d. random variables X1 , X2 , ... on some probability space
(Ω, A, P) with finite mean µ = E [Xj ] it holds that X̄n −→ µ almost
surely.

Theorem 2.25 (Central limit theorem for i.i.d. sequences) Let


X1 , X2 , X3 , . . . be i.i.d. real-valued random variables with EXi = µ,
Var (Xi ) = σ 2 ∈ (0, ∞). Then

X1 + · · · + Xn − nµ d
√ −→ Z ∼ N(0, 1).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 86


Chapter 2: Asymptotic theory 2.4 Limit Theorems

Taylor formula: Let f : R → R be (m+1)-times differentiable in (a,b)


and x, x + h ∈ (a, b). Then
m
X f (i) (x) f (m+1) (ξ) m+1
f (x + h) = hi + h ,
i! (m + 1)!
i=0

where ξ lies between x and x + h.


Special case, m = 1:
1
f (x + h) = f (x) + f 0 (x)h + f 00 (ξ)h2
2
1 1
= f (x) + f (x)h + f 00 (x)h2 + [f 00 (ξ) − f 00 (x)]h2 .
0
2 2

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 87


Chapter 2: Asymptotic theory 2.4 Limit Theorems

Lemma 2.26 Let p=1. If E [|X |m ] < ∞, then ϕX is m-times


differentiable and 1 ≤ r ≤ m,
(r ) (r )
ϕX (t) = E [(iX )r e itX ], E (X r ) = ϕX (0)/i r , and
m
X (it)r (it)m
ϕX (t) = E [X r ] + Rm (t),
r! m!
r =0

where |Rm (t)| ≤ 3E [|X |m ] and Rm (t) tends to 0 as t → 0.

Especially, if E (X 2 ) < ∞, then

t2
ϕX (t) = 1 + itE [X ] − E [X 2 ] + t 2 R ∗ (t)
2
with R ∗ (t) → 0 if t → 0.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 88


Chapter 2: Asymptotic theory 2.4 Limit Theorems

Theorem 2.27 (Lyapounov CLT) Let X1 , X2 , . . . be independent


real-valued random variables with µt = EXt , σt2 = Var (Xt ) and
m3,t = E |Xt − µt |3 < ∞. Assume

[ nt=1 m3,t ]1/3


P
−→ 0.
2 1/2 n→∞
Pn 
t=1 σ t

Then
X1 + · · · + Xn − µ1 − · · · − µn d
q −→ Z ∼ N(0, 1).
σ12 + · · · + σn2

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 89


Chapter 2: Asymptotic theory 2.4 Limit Theorems

Excursus to multivariate normal distribution: A vector X with density


1
exp −0.5(x − µ)0 Σ−1 (x − µ) , x ∈ Rk ,

φ(x) = p
(2π)k detΣ

has a multivariate normal distribution with mean µ ∈ Rk and covariance


matrix Σ ∈ Rk × Rk , which it assumed to be positive definite. One can
show that a0 X ∼ N(a0 µ, a0 Σa) for any a ∈ Rk \{0k }

Theorem 2.28 (Multivariate CLT) Suppose that X1 , X2 , . . . are i.i.d.


Rk -valued random variables with mean vector µ and finite, positive
definite covariance matrix Σ. Then
1 d
√ (X1 + · · · + Xn − nµ) −→ Ze ∼ N(0k , Σ).
n

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 90


Chapter 2: Asymptotic theory 2.5 Application in Statistics

2.5 Application in Statistics

Example: 2.29 We assume that

X1 , . . . , Xn ∼ N(µ, σ 2 ) i.i.d.

are defined on the same sample space. µ and σ 2 are unknown and could
be ”determined” by observations. Θ = R × (0, ∞).
Note that the distribution P Zn of Zn = (X1 , . . . , Xn )0 changes with the
choice of the parameter θ = (µ, σ 2 ).

Let Zn be r.v. with values of Rmn defined on some sample space. Let
Θ 6= ∅ a set and PθZn for any θ ∈ Θ probability measures on Rmn . Then
(Rmn , B mn , {PθZn : θ ∈ Θ}) is called statistical experiment, Θ its
parameter space. θ ∈ Θ is called parameter.
n is usually some sample size.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 91


Chapter 2: Asymptotic theory 2.5 Application in Statistics

Bias, Consistency
Let there be given a statistical experiment with parameter space Θ. A r.v.
g (Zn ), g : Rmn → S, Θ ⊂ S, can be called estimator of θ.
Let T̂ , T̂n be estimators with values in Rk .
T̂ is unbiased for τ = τ (θ) ∈ Θ if

Eθ [T̂ ] = τ ∀θ ∈ Θ.

The sequence of estimators (T̂n )n is asymptotically unbiased if

lim Eθ [T̂n ] = τ ∀θ ∈ Θ.
n→∞

The sequence of estimators (T̂n )n is (weakly) consistent if


P
T̂n −→ τ ∀θ ∈ Θ.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 92


Chapter 2: Asymptotic theory 2.5 Application in Statistics

Example 2.30 We assume that X1 , . . . , Xn are real-valued and i.i.d. with


E [Xi2 ] < ∞. Then
n
1 X
M̂n = X̄n and Sn2 = (Xi − X̄n )2
n−1
i=1

are unbiased and consistent estimators for µ = E [Xi ] and σ 2 = Var [Xi ],
respectively.
R
Notation: If F is the CDF of Y , we write E [g (Y )] = g (y )F (dy ). Let
DF (R) denote the set of all CDF on R.
Note that the parameter space of the Example 2.30 can be written as
Z
Θ = {F ∈ DF (R) : x 2 F (dx) < ∞};

xF (dx) and σ 2 = σ 2 (θ) = (x − µ)2 F (dx).


R R
θ = F ∈ Θ, µ = µ(θ) =

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 93


Chapter 2: Asymptotic theory 2.5 Application in Statistics

Example 2.31 (Plug-in principle) Let X1 , . . . , Xn be Rk -valued r.v. and


θbn = T (X1 , . . . , Xn ) an estimator of some parameter θ, T : Rkn → Θ.
Let g be continuous. Then
P P
θbn −→ θ ∀θ ∈ Θ =⇒ g (θbn ) −→ g (θ) ∀θ ∈ Θ.

Example 2.32 Let X1 , . . . , Xn ∼ Exp(λ) i.i.d.


By the WLLN, Theorem 2.10,

P 1
X̄n −→ E [Xi ] = µ = forall λ > 0.
λ
Consequently, g (x) = 1/x, by Theorem 2.20,

1 P 1
Λ̂n = = g (X̄n ) −→ g (µ) = = λ forall λ > 0,
X̄n µ

i.e. Λ̂n = 1/X̄n is a consistent estimator for λ.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 94


Chapter 2: Asymptotic theory 2.5 Application in Statistics

Some Applications
Let X , X1 , . . . , Xn i.i.d. with values in Rk with E [X ] = µ, Var [X ] = Σ.
Then
n
1X P
h i
(i) Xi XiT → E XX T . (LLN)
n
i=1
n n
1X T 1X
Xi XiT − X̄n (X̄n )T

(ii) Σ̂n = Xi − X̄n Xi − X̄n =
n n
i=1 i=1
h i
P T T
→ E XX − E [X ] E [X ]
h i
= E (X − E [X ]) (X − E (X ))T = Σ.
√ − 21  d
(iii) n Σ̂n X̄n − µ −→ N(0, Ip ).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 95


Chapter 2: Asymptotic theory 2.6 Stochastic Boundedness

2.6 Stochastic Boundedness

Defintion 2.33 (Stochastic boundedness, tightness)


The sequence (Xn )n is stochastically bounded if for every ε > 0 there
exists a C = C (ε) > 0 and n0 = n0 (ε) ∈ N, s.th.

P(kXn k ≤ C ) ≥ 1 − ε for all n ≥ n0 .

Notation: Xn = OP (1).
Note that for a r.v. X , in general, there is no C ∈ R s.th.

P(kX k ≤ C ) = 1.

Consider e.g. X ∼ N(0, 1):

P(|X | ≤ C ) = Φ(C ) − Φ(−C ) < 1 for all C > 0.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 96


Chapter 2: Asymptotic theory 2.6 Stochastic Boundedness

P
Notation: Zn = oP (1) iff Zn −
→ 0.

Theorem 2.34
d
(i) Xn −→ X =⇒ Xn = Op (1).
(ii) Xn = X + op (1) =⇒ Xn = Op (1)
(Xn , X scalar or vector or matrix).
(iii) For Xn = op (1), Yn = op (1), Un = Op (1), Wn = Op (1) it holds
(a) Xn + Yn = op (1),
(b) Un + Wn = Op (1),
(c) Un · Wn = Op (1),
(d) Xn · Un = op (1).
(iv) g : Rk → Rl continuous at x0 . Then

Xn = x0 + op (1) =⇒ g (Xn ) = g (x0 ) + op (1).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 97


Chapter 2: Asymptotic theory 2.6 Stochastic Boundedness

Lemma 2.35 Let cn Zn = OP (1) for cn → ∞, then Zn = oP (1).

Theorem 2.36 (Delta method)


Let U ⊂ Rk be a neighborhood of c ∈ Rk , φ : U → Rm differentiable
at c and Xn is a Rk -valued random variable with
√ d
n (Xn − c) −→ N (0, Σ) .

Then:
√ d
N 0, φ0 (c) Σ φ0 (c)0 .

n (φ (Xn ) − φ (c)) −→

Cf. v.d.Vaart Theorem 3.1.


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 98
Chapter 2: Asymptotic theory 2.6 Stochastic Boundedness

Summary

Important concepts and statements:


Modes of convergence and their relationship:
P L a.s. d
Xn → X , Xn →k X , Xn → X , Xn → X
continuous mapping theorem, dominated convergence theorem
Slutsky’s lemma, Delta method,
characteristic functions: Definition, identification of distributions,
expansion, Levy’s continuity theorem
Cramèr-Wold
LLN, CLT, Lyapunov CLT
Landau symbols: oP (1), OP (1)
algebra with Landau symbols

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 99


Chapter 3: Conditional expectations, probabilities and variances

Chapter 3: Conditional expectations, probabilities and


variances

1 Conditional expectation and conditional probabilities:


Definition and special cases
2 Important properties of conditional expectations
3 Conditional variances

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 100


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

3.1 Conditional expectations and conditional probabilities

Regression problem: How much of the random fluctuations of Y can be


explained by X ?
Find a (measurable) function g : Rk −→ R that minimizes

E [{Y − g (X )}2 ]. (∗)

Definition 3.1
Each (measurable) function g that minimizes (∗) is called conditional
expectation of Y given X .
Notation:

E [Y |X ] = g (X ), E [Y |X = x] = g (x).

Remark: For c ∈ R we have E [c|X ] = c.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 101


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

Theorem 3.2 For an Rk -valued random variable X and a real-valued


random variable Y assume that EY 2 < ∞. Then the following are
equivalent (TFAE):
(i) g (X ) = E [Y |X ] a.s.
(ii) E [{Y − g (X )}h(X )] = 0
for all measurable functions h with E [h2 (X )] < ∞.
(iii) E [Y · h(X )] = E [g (X ) · h(X )]
for all measurable, bounded functions h.
(iv) E [{Y − g (X )}h(X )] = 0
for all measurable functions h : Rk → {0, 1}.

These characterizations can be used to prove properties of conditional


expectations or to compute specific ones.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 102


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

(iv) can be rewritten as

E [Y 1(X ∈ B)] = E [g (X )1(X ∈ B)] (**)

for all (Borel-) sets B ⊂ Rk . (**) is often used as definition of a


conditional expectation; it does not require E [Y 2 ] < ∞ but only
E |Y | < ∞ or Y ≥ 0.

Theorem 3.3 (Uniqueness of conditional expectation)


For two minimizers g1 , g2 of (∗) it holds

g1 (X ) = g2 (X ) a.s.

Consequently, E [Y |X ] is almost sure unique.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 103


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

Recall the relation between expectation and probability E 1A = P(A).


Now, we define conditional distributions via conditional expectations.
Suppose that X and Y are random variables with values in (Rk , B k ) and
(Rl , B l ), respectively, on a probability space (Ω, A, P).
Definition 3.4 For any A ∈ A,

P(A | X ) = E (1A | X )

is called conditional probability of A given X . Since 1A is bounded,


P(A | X ) always exists. All conditional probabilities

P Y |X (B) := P(Y ∈ B | X ) = E (1B (Y ) | X ), ∀B ∈ B l ,

together are called the conditional distribution of Y given X .


Moreover, P Y |X =x (B) = E (1B (Y ) | X = x) is called conditional
distribution of Y given X = x.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 104


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

Special case: X , Y discrete


Let X , Y be discrete with support SX , SY and pmf
fX ,Y (x, y ) = P(X = x, Y = y ). Define

fX ,Y (x, y ) P(X = x, Y = y )
fY |X (y |x) := = = P(Y = y |X = x)
fX (x) P(X = x)

for x ∈ SX . Then
X
E (Y | X = x) = g (x) = y · fY |X (y |x)
y ∈SY

and
X
P Y |X =x (B) = g (x) = fY |X (y |x).
y ∈SY ∩B

fY |X is called conditional probability mass function of Y given X .


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 105
Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

Example 3.5 Suppose that X and Z are the thrown numbers of two
independent dice throws and define Y = X + Z . Then, for x = 1, . . . , 6

E [Y | X = x] = x + 3, 5

and hence E [Y | X ] = X + 3.5.


Example 3.6
Suppose that Y1 , . . . , Yn are i.i.d. random variables with Yi ∼ B(1, θ),
where θ ∈ (0, 1) is an unknown parameter.Pn Let Y = (Y1 , . . . , Yn )0 . We
consider the statistic X = T (Y ) = i=1 Yi , i.e. T ∼ B(n, θ). Then
(
1/ kn ,

if x = k,
P(Y = y | X = k) =
0, else,

y = (y1 , . . . , yn )0 , k = 0, . . . , n, is independent of θ.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 106


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

Special case: Continuous distributions

Let (X , Y )0 be a continuous random vector with joint pdf fX ,Y . Define


(f
X ,Y (x,y )
fX (x) , if fX (x) > 0,
fY |X (y |x) =
any density, elsewhere.

Then
Z ∞
E [Y |X = x] = y · fY |X (y |x)dy
−∞

and
Z b
Y |X =x
P ([a, b]) = fY |X (y |x)dy .
a

fY |X is called conditional probability density function of Y given X .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 107


Chapter 3: Conditional expectations, probabilities and variances 3.1 Conditional expectations and conditional probabilities

Theorem 3.7 Let Y be a square-integrable real-valued r.v. and X is


an Rk -valued random variable on a probability space (Ω, A, P). Then
(i) X , Y are independent iff for all B ∈ B: P Y |X (B) = P(B) a.s.
(ii) If X , Y are independent then E [Y |X ] = E [Y ] a.s.

Note: If (X , Y )0 is continuous with pdf fX ,Y , then independence can be


expressed by

fX ,Y (x, y ) = fX (x)fY (y ) a.e.

Then holds
fX ,Y (x, y ) fX (x)fY (y )
fY |X (y |x) = = = fY (y ) a.e.
fX (x) fX (x)

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 108


Chapter 3: Conditional expectations, probabilities and variances 3.2 Important properties of conditional expectations

3.2 Important properties of conditional expectations

Theorem 3.8 (Iterated expectations)


Let Y be a real-valued r.v., X a Rk -valued r.v., and Z a Rm -valued
r.v. on a probability space (Ω, A, P). Then
(i) E [E [Y |X ]] = E [Y ],
(ii) E [E [Y |X , Z ]|Z ] = E [Y |Z ] a.s.,
(iii) E [E [Y |X ]|X , Z ] = E [Y |X ] a.s.,
(iv) E [E [Y |X ]|f (X )] = E [Y |f (X )] a.s.,
(v) E [Yf (X )|X ] = f (X )E [Y |X ] a.s., where f is an R-valued function
such that Ef 2 (X ) + E [Yf (X )]2 < ∞,
(vi) E [Y |X , f (X )] = E [Y |X ] a.s.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 109


Chapter 3: Conditional expectations, probabilities and variances 3.2 Important properties of conditional expectations

Remarks:
By (i)-(iv) the most restricted conditions prevail.
(v): If conditioning on X, f(X) can be handled like a constant and
pulled out of the conditional expectation.
Redundant information can be dropped (vi).

Example 3.9 (Application of (vi)). Some model equation for the wage in
dependence of the education and experience.

E [wage | educ, exper , educ 2 , educ · exper ]


= β0 + β1 educ + β2 exper + β3 educ · exper + β4 educ 2
= E [wage | educ, exper ] a.s.

Thus, it is redundant to also condition on educ 2 and educ · exper .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 110


Chapter 3: Conditional expectations, probabilities and variances 3.2 Important properties of conditional expectations

Theorem 3.10 (Properties of conditional expectation) Suppose that


Y1 , Y2 are square-integrable real-valued random variables, X is an
Rk -valued random variable on a probability space (Ω, A, P) and a1 , a2
are scalars. Then
(i) E [a1 Y1 + a2 Y2 |X ] = a1 E [Y1 |X ] + a2 E [Y2 |X ] a.s.
(ii) If Y1 ≤ Y2 , then E [Y1 |X ] ≤ E [Y2 |X ] a.s.
(iii) (E [Y1 Y2 |X ])2 ≤ E [Y12 |X ]E [Y22 |X ] a.s.
(Cauchy-Schwarz inequality), E [Yi4 ] < ∞.
(iv) For any ε > 0 and E [Y 4 ] < ∞ holds

E [Y 2 |X ]
P(|Y | ≥ ε|X ) ≤ a.s.
ε2

Note that the moment conditions for Y and Yi could be relaxed.


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 111
Chapter 3: Conditional expectations, probabilities and variances 3.2 Important properties of conditional expectations

A function % : (a, b) → R is convex iff for all x1 , x2 ∈ (a, b) and all


α ∈ (0, 1):

%(αx1 + (1 − α)x2 ) ≤ α%(x1 ) + (1 − α)%(x2 ).

If it is two times differentiable, then %00 (x) ≥ 0.

Theorem 3.11 (Properties of conditional expectation)


Let E [Y 2 ] < ∞ and X be Rk -valued. Then
(i) If % : R −→ R convex and E [%(Y )]2 < ∞, then

%(E [Y |X ]) ≤ E [%(Y )|X ] a.s.

(Jensen inequality).
(ii) 0 ≤ Yn ↑ Y =⇒ E [Yn |X ] ↑ E [Y |X ] a.s.
(monotone convergence).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 112


Chapter 3: Conditional expectations, probabilities and variances 3.3 Conditional Variances

3.3 Conditional Variances

Definition 3.12 For a real-valued random variable Y (with EY 4 < ∞)


and an Rk -valued random variable X on a probability space (Ω, A, P) a
conditional variance of Y given X is defined as

Var [Y |X ] = E [(Y − E [Y |X ])2 |X ].

Lemma 3.13 Under the conditions of the Definition


(i) Var [a(X )Y + b(X )|X ] = a2 (X )Var [Y |X ] a.s., where a and b
are measurable functions such that a(X )Y + b(X ) satisfies the
assumptions of the Definition,
(ii) Var [c|X ] = 0 and Var [a(X )|X ] = 0.
(iii) If X,Y are independent, then Var [Y |X ] = Var [Y ] a.s.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 113


Chapter 3: Conditional expectations, probabilities and variances 3.3 Conditional Variances

Theorem 3.14 Under the conditions of the Definition


(i) Var (Y ) = E [Var (Y |X )] + Var (E [Y |X ]).
(ii) E [Var (Y |X )] ≥ E [Var (Y |X , Z )] with an additional Rl -valued
random variable Z on the same space.

If Y is a vector with values in Rk , then

Var [Y |X ] = E [(Y − E [Y |X ])(Y − E [Y |X ])0 |X ]

is the conditional covariance matrix of Y given X.


It holds for A ∈ Rm×k

Var [AY |X ] = AVar [Y |X ]A0 . (17)

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 114


Chapter 3: Conditional expectations, probabilities and variances Summary

Summary

Important concepts and statements:


definition and equivalent characterization of a conditional expectation
of Y given X, E [Y |X ],
conditional distribution of Y given X,
formulas for the computation of E [Y |X = x] for jointly discreet and
jointly continuous X and Y,
laws for E [Y |X ],
definition and laws for Var [Y |X ], the conditional variance of Y given
X.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 115


Chapter 4: Linear regression Overview

Chapter 4: Linear regression

1 The classic model


2 Parameter estimation: finite sample properties
3 Parameter estimation: asymptotic properties
4 Hypothesis tests in the classical linear regression model

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 116


Chapter 4: Linear regression 4.1 The classic model

4.1 The classic model


Definition 4.1 A (multiple) linear regression model based on n
observations (Yi , Xi0 ), Xi0 = (Xi,1 , . . . , Xi,K ), i = 1, . . . , n, with (unknown)
regression coefficients β1 , . . . , βK is given by

Yi = β1 Xi,1 + · · · + βK Xi,K + εi , i = 1, . . . , n.

Matrix notation:

Y = X β + ε; (*)

X is called design matrix. εi are unobserved. Here


       
Y1 X1,1 . . . X1,K β1 ε1
 ..   .. .
.. .
..  , β =  ...   .. 
Y =  . , X =  . , ε =  . .
 

Yn Xn,1 . . . Xn,K βK εn

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 117


Chapter 4: Linear regression 4.1 The classic model

Classical linear regression model: Model assumptions I

Let Y and X satisfy the model (*) for some β ∈ RK .


Model assumptions I:
1 n > K.
2 P(rank(X ) = K ) = 1 (no multicollinearity).
3 E [ε|X ] = 0 (strict exogeneity).
4 Var [ε|X ] = σ 2 In (homoscedasticity).
Consequently,

E [Y |X ] = X β, Var [Y |X ] = σ 2 In .

Remember: Conditional expectations are only a.s. unique.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 118


Chapter 4: Linear regression 4.1 The classic model

In most cases an intercept, i.e. a constant, is included in the model, i.e.


with Xi,1 = 1 for i = 1, . . . , n, and we get

Yi = β1 + β2 Xi,2 · · · + βK Xi,K + εi , i = 1, . . . , n.

Let A be a n × K -matrix with n > K and z ∈ RK . Then

rank(A) = K ⇐⇒ Az = 0 =⇒ z = 0.
⇐⇒ det(A0 A) 6= 0.
rank(A) < K ⇐⇒ ∃z 6= 0 : Az = 0.

If rank(A) < K , then one column of A can be written as linear


combination of the other columns.
If rank(X ) < K in the linear model, then the parameter β is not uniquely
specified by the model equation (*).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 119


Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

4.2 Parameter estimation: finite sample properties

Note that

E [Yi |X ] = Xi0 β a.s.

Hence for a choice of β the prediction for Yi given Xi would be X 0 β and

ei = Yi − β 0 Xi

the prediction error or residual.


Therefore,
n
X n
X
Q(β) = ei2 = (Yi − β 0 Xi )2 = (Y − X β)0 (Y − X β)
i=1 i=1

becomes small if the prediction errors are small.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 120


Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

Notation: Let f : Rd → Rm , i.e. f (x) = (f1 (x), . . . , fm (x))0 and


x = (x1 , . . . , xd )0 . Then
 

f1 (x) x=x . . . ∂x∂d f1 (x) x=x

∂ ∂x 1 0 0
f (x0 ) =  ... ... ...
 
∂x

∂ ∂
∂x1 mf (x)
x=x
. . . f
∂xd m (x)
x=x
0 0

is the derivative of f w.r.t. x.


If f has a local minimum or maximum at x0 and is differentiable at x0 , then
Let A ∈ Rm×d and B ∈ Rd×d matrices. It can be shown:
∂ ∂ 0
(Ax) = A, (x Bx) = x 0 (B + B 0 ).
∂x ∂x

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 121


Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

OLS estimator

Definition 4.2 In the classical linear regression model the OLS estimator
(ordinary least square) is defined by

βbOLS = arg min (Y − X β)0 (Y − X β).


β∈RK

Then the OLS-fitted y -values are

Ŷ = X β̂OLS , Ŷi = Xi0 β̂OLS

and the OLS-residuals are

ê = Y − Ŷ , êi = Yi − Ŷi .

Notation: 1n = (1, . . . , 1)0 ∈ Rn .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 122


Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

Theorem 4.3 If det(X 0 X ) 6= 0, then

βbOLS = (X 0 X )−1 X 0 Y .

To emphasize the dependency of βbOLS from the sample size n we might


write βbn,OLS .

Lemma 4.4 Under the conditions of Theorem 4.3 holds

X 0 ê = 0 (normal equations).

If the model contains a constant, e.g. Xi,1 = 1 for i = 1, . . . , n, then

10n ê = 0.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 123


Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

A symmetric, quadratic m × m-matrix A is called positive semi-definite,


in signs A ≥ 0, if x 0 Ax ≥ 0 for all x ∈ Rm .
Let A, B two symmetric, quadratic matrices. Then

A≥B :⇔ A − B ≥ 0.

An estimator β̂ is called
linear iff β̂ = AY for some K × n-Matrix A = A(X ).
(A may depend on X, but not Y.)
conditionally unbiased iff E [β̂|X ] = β.
BLUE (best linear unbiased) iff β̂ is linear and biased and

Var [βe | X ] ≥ Var [βb | X ]

for any other linear and unbiased estimator β.


e

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 124


Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

Theorem 4.5 (Gauss-Markov-Theorem) In the classical linear


regression model the OLS estimator is BLUE if Var [β̂OLS |X ] is finite.

If the linear model contains a constant, then


n
X n
X n
X
2
(Yi − Ȳn ) = (Ybi − Ȳn )2 + êi2 .
i=1 i=1 i=1
| {z } | {z } | {z }
total variability of Y variability of regression variability of residuals

The coefficient of determination is then defined by


Pn 2
Pn 2
2 i=1 (Ŷi − Ȳn ) i=1 êi
R = Pn 2
= 1 − Pn 2
∈ [0, 1].
i=1 (Yi − Ȳn ) i=1 (Yi − Ȳn )

R 2 = 1 iff êi = 0 for all i.


Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 125
Chapter 4: Linear regression 4.2 Parameter estimation: finite sample properties

Estimation of σ 2
Definition 4.6 If n > K , the OLS estimate of the variance σ 2 > 0 is
given by
2 2 ê 0 ê
σ
bOLS =σ bn,OLS =
n−K
q
and σ 2
bOLS is called standard error of regression (SER).
Note: For a quadratic matrix C = (ci,j )1≤i,j≤m is tr (C ) = ni=1 ci,i the
P
trace of C. It holds for a matrix Z = (Zi,j )1≤i,j≤m of r.v. and matrices
A ∈ Rm×k and B ∈ Rk×m :

E [tr (Z )] = tr (E [Z ]), tr (AB) = tr (BA).

Theorem 4.7 In the classical linear regression model σ 2


bOLS is a
2
(conditionally) unbiased estimator for σ .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 126


Chapter 4: Linear regression 4.3 Parameter estimation: asymptotic properties

4.3 Parameter estimation: asymptotic properties


Classical linear model: Model assumptions II

For the model

Yi = Xi0 β + εi , i = 1, . . . , n.

Model assumptions II:


1 n > K.
2 P(rank(X ) = K ) = 1 (no multicollinearity).
3 E [εi |Xi ] = 0 a.s. (strict exogeneity)
4 E [ε2i |Xi ] = σ 2 a.s. (homoscedasticity).
5 (X10 , ε1 ), . . . , (Xn0 , εn ) are i.i.d.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 127


Chapter 4: Linear regression 4.3 Parameter estimation: asymptotic properties

Theorem 4.8 Let E [Y 2 ] < ∞ and X = (X1 , . . . , Xk )0 . Then


g (X ) = E [Y |X ] iff

E [Y 1(X1 ∈ B1 ) · · · 1(Xk ∈ Bk )] = E [g (X )1(X1 ∈ B1 ) · · · 1(Xk ∈ Bk )]

for all Borel sets B1 , . . . , Bk .

That is a weaker version of Theorem 3.2(iv).

Proposition 4.9 The model assumptions II imply

E [ε|X ] = 0 and Var [ε|X ] = σ 2 In .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 128


Chapter 4: Linear regression 4.3 Parameter estimation: asymptotic properties

Theorem 4.10 (Consistency and asymptotic normality of the OLS


estimator) In the classical linear regression model with model
assumptions II we assume that E [X1 X10 ] is finite and invertible. Then
P
βbn,OLS −→ β.

and √ d
n (βbn,OLS − β) −→ Z ∼ N(0K , Σ).
with Σ = σ 2 (E [X1 X10 ])−1 .

Consequently,
√ d
n (βbn,OLS,k − βk ) −→ Zk ∼ N(0, Σk,k ).

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 129


Chapter 4: Linear regression 4.3 Parameter estimation: asymptotic properties

Confidence intervals
Let there be given an statistical experiment with parameter space Θ and
τ = τ (θ) a parameter of interest.
Definition 4.11 Let Ln , Un be r.v. (not depending of θ) and α ∈ (0, 1).
[Ln , Un ] is called asymptotic (1 − α)-confidence interval for τ iff

lim inf Pθ (Ln ≤ τ ≤ Un ) ≥ 1 − α for all θ ∈ Θ.


n→∞

√ d P
Lemma 4.12 Let → Z ∼ N(0, σ 2 ) and Ŝn −
n(T̂n − τ ) − → σ, then
h Ŝn Ŝn i
T̂n − z1−α/2 √ , T̂n + z1−α/2 √
n n

is an asymptotic (1 − α)-confidence interval for τ .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 130


Chapter 4: Linear regression 4.3 Parameter estimation: asymptotic properties

Here zβ denote the quantiles of the standard normal distribution, i.e.


Φ(zβ ) = β for β ∈ (0, 1).

Theorem 4.13 Under the conditions of Theorem 4.10 holds


2 P
σ
bn,OLS −→ σ 2 as n → ∞.

A consequence of the continuous mapping theorem, Theorem 2.20:

Lemma 4.14 Let Zn = (Zn,i,j )1≤i,j≤m be matrices with random


P
entries and Zn −
→ A for a matrix A with det(A) 6= 0. Let
(
− Zn−1 , if det(Zn ) 6= 0, P
Zn = Then Zn− − → A−1 .
Im , else.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 131


Chapter 4: Linear regression 4.3 Parameter estimation: asymptotic properties

By the law of large numbers


n
1 0 1X P
XX = Xi Xi0 −
→ E [X1 X10 ].
n n
i=1

Consequently,
1 0 − P 2
2
Σ̂n = σ̂n,OLS · XX → σ · (E [X1 X10 ])−1 = Σ.

n
By Lemma 4.12
q q
h Σ̂n,k,k Σ̂n,k,k i
βbn,OLS,k − z1−α/2 √ , βbn,OLS,k + z1−α/2 √
n n

is an asymptotic (1-α)-confidence interval for βk .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 132


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

4.4 Hypothesis tests in the classical linear regression model

Example 4.15 A company delivers packages of pasta to the canteen of


the University of Mannheim and claims that the weight of a randomly
chosen package is N(5, 0.5)-distributed. Based on a sample of size n we
intend to decide whether
(a) the expected weight is at least 5.
(b) the assumption of normality is justified.
Question: How can we decide these problems properly ?

Remark:
Tests to decide (a) are called parameter tests.
Tests to decide (b) are called goodness-of-fit tests.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 133


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Statistical tests

Let En = (Rmn , B mn , {PθZn : θ ∈ Θ}) be a statistical experiment for some


r.v. Zn with some parameter space Θ, cf.p.91.

Based on our data Zn we aim to decide a testing problem of the


following form

H0 : θ ∈ Θ0 ⊆ Θ vs. H1 : θ ∈ Θ1 = Θ\Θ0 . (*)

Here, H0 is called null hypothesis and H1 is referred to as alternative or


alternative hypothesis.

E.g. if Θ0 = {θ0 }, (*) can be rewritten as

H0 : θ = θ0 vs. H1 : θ 6= θ0 .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 134


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Definition 4.16 Let there be given a statistical experiment En and a


testing problem (*). A (measurable) function ϕ : Rmn → {0, 1} is called
(non-randomized) (statistical) test if
(
0 if Zn = z implies acceptance H0 ,
ϕ(z) =
1 if Zn = z implies rejection of H0 .

Often tests are given in the form


(
0 if Tn ≤ c
ϕ(Zn ) =
1 if Tn > c.

Then Tn = g (Zn ) is called test statistic and c critical value for the test
ϕ.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 135


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Definition 4.17 Let ϕ be a test to decide the problem (*).


(i) A type I error (error of first kind) occurs when H0 is true but
rejected.
(ii) A type II error (error of second kind) occurs when H1 is true but
rejected.

Decision scheme:
Decision for
H0 H1
H0 is true correct type I error
H1 is true type II error correct

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 136


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Definition 4.18 In the set-up of Definition 4.17


(i) a test ϕ is called an α-test if

Eθ [ϕ(Zn )] = Pθ (ϕ(Zn ) = 1) ≤ α for all θ ∈ Θ0 .

(ii) a sequence of tests (ϕn )n is called consistent if

Pθ (ϕn (Zn ) = 1) −→ 1 for all θ ∈ Θ1 .


n→∞

(iii) a sequence of tests (ϕn )n is called asymptotic α-test if

lim sup Pθ (ϕn (Zn ) = 1) ≤ α for all θ ∈ Θ0 .


n→∞

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 137


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Test in the classic linear regression model

In the linear regression model, cf.p.117,

Yi = β1 Xi,1 + · · · + βK Xi,K + εi , i = 1, . . . , n,

we might want to test:

H0 : βj = 0 or H0 : β2 = · · · = βK = 0.

More generally, we want to test the following null hypothesis

H0 : Rβ = θ0 vs. H1 : Rβ 6= θ0

for some prescribed (r × K )-matrix R of rank r and a prescribed


r -dimensional vector θ0 .

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 138


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Example 4.19
This general hypothesis covers several interesting special cases.
1 H0 : β k = 0
with R = (0, . . . , 0, |{z}
1 , 0, . . . , 0) and θ0 = 0.
k−th
2 H0 : β 1 = β 2
with R = (1, −1, 0, . . . , 0) and θ0 = 0.
3 H0 : β 1 + β 2 + β 3 = 1
with R = (1, 1, 1, 0, . . . , 0) and θ0 = 1.
4 H0 : β 1 = β 2 = β 3 = 0
with    
1 0 0 0 ... 0 0
R = 0 1 0 0 . . . 0 and θ0 = 0 .
0 0 1 0 ... 0 0

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 139


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Ideas to proceed:
Put τ = Rβ − θ0 , i.e. H0 is true iff τ = 0r and kτ k = 0, resp.
Estimate τ by τ̂n = R β̂n,OLS − θ0 .
Find some distance function d : Rr → [0, ∞) s.th. d(0r ) = 0 and
d(x) is ”large” if kxk is ”large”.
Decision rule: Reject H0 if Tn = d(τ̂n ) > c (is ”large”).
Determine c s.th. the decision rule becomes an α-test.

For the last step we need to specify the distribution of d(τ̂n ), at least
approximately.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 140


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

χ2 -distribution
Definition 4.20 X∗ is χ2 -distributed with k degrees of freedom,
k ∈ N, k ≥ 1, if X∗ is continuous with density
1
fχ2 (x) = x k/2−1 exp(−x/2)1[0,∞) (x),
k 2k/2 Γ(k/2)
where Γ denotes the so-called Gamma function, defined by
Z ∞
Γ(a) = x a−1 e −x dx, a > 0.
0

Let Fχ2 denote the CDF of X∗ and χ2k,α its α-quantile, α ∈ (0, 1), i.e.
k
Fχ2 (χ2k,α ) = α.
k

Note (see next page): If X1 , . . . , Xk ∼ N(0, 1) are i.i.d., then


k
X
Xi2 ∼ χ2 (k).
i=1

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 141


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Multivariate normal distribution

Let Z ∼ N(µ, Σ) a multivariate normally distributed k-dimensional random


vector, cf.p.90, det(Σ) 6= 0, i.e. Z is continuous with density

1
exp −0.5(z − µ)0 Σ−1 (z − µ) , z ∈ Rk .

φ(z) = p
(2π)k detΣ

Proposition 4.21
Let Z ∼ N(µ, Σ), a ∈ Rm and B ∈ Rm×k . Then holds:
1 a + BZ ∼ N(a + Bµ, BΣB 0 ).
2 Z = (Z1 , . . . , Zk )0 ∼ N(0k , Ik ) iff Z1 , . . . , Zk ∼ N(0, 1) are i.i.d.
3 If µ = 0, then Z 0 Σ−1 Z ∼ χ2 (k).

See Jacod, Protter (2000), chapter 16.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 142


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Convergence to infinity
Let (Zn )n be r.v. with values in Rk . (Zn ) converges in probability to ∞,
if for all C > 0

lim inf P(kZn k > C ) = 1.


n→∞

Note: If (zn ) is a sequence of non-random vectors with kzn k → ∞, then


P
→ ∞ holds as well. For the notation A− see 131.
zn −

Lemma 4.22 Let Xn , Zn r.v. with values in Rk and Σ̂n , Σ k × k


matrices, Σ̂n with random entries.
P P
(a) If Zn −
→ ∞ and Xn = OP (1), then Xn + Zn −
→ ∞.
P P P
(b) If Zn − → Σ > 0, then Zn0 Σ̂−
→ ∞ and Σ̂n − n Zn −
→ ∞.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 143


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Wald-test for H0 : Rβ = θ0
Put

Tn = (R βbn,OLS − θ0 )0 (R(X 0 X )− R 0 )−1 (R βbn,OLS − θ0 )/b 2


σn,OLS .

Definition 4.23 A Wald test of level α ∈ (0, 1) for H0 : Rβ = θ based on


n observations Zi = (Yi , Xi,1 , . . . , Xi,K )0 , i = 1, 2, . . . , n, is given by
(
1, if Tn > χ2r ,1−α ,
ϕn ((Z1 , . . . , Zn )) = (**)
0, else,

where χ2r ,1−α denotes the (1 − α)-quantile of a χ2r distribution.

Theorem 4.24 Suppose the conditions of Theorem 4.10 hold. Then


(ϕn )n from (**) is an asymptotic α-test and consistent.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 144


Chapter 4: Linear regression 4.4 Hypothesis tests in the classical linear regression model

Summary

Important concepts and statements:


In the classical linear model we assume a linear relationship between
Yi and Xi and a strict exogeneity and homoscedasticity of the error
terms, εi .
β is estimated by the method of ordinary least squares. β̂OLS is linear,
conditionally unbiased and optimal in the sense of the
Gauß-Markov-Theorem.
Under the model assumptions II β̂OLS is a consistent and asymptotic
normally distributed.
Important concepts: test, test statistic, α-test, type I and II error,
consistency of tests.
The Wald-test for H0 : Rβ = θ0 is an asymptotical α-test and
consistent.

Ingo Steinke (Uni Mannheim) Advanced Econometrics I Fall 2014 145

You might also like