You are on page 1of 295

Statistics and Econometrics II

Jean-Paul Renne

HEC Lausanne
Organisational issues Statistics and
Econometrics II

J.-P. Renne

Prelim.

I Different types of sessions: CLT

Tests
I Lectures (#19)
I Exercise sessions (#7) Linear regression
I Mid-term exam + correction (#2) Assumptions
OLS

I Teaching Assistants: Giacomo Mangiante and Elena Sudan. Inference


Common pitfalls

I Moodle page. Large sample


IV

I Evaluation: GRM

Panels
I Two exams: mid-term and final exam.
Max.Lik.Estim.
I Mid-term exam: 40% of the total grade if it is higher than the
grade obtained at the final exam (0% otherwise). Binary
I Pen-and-paper exams. Time Series
I A4 sheet (home-made) with pertinent information is allowed; AR processes

calculator not allowed. MLE of AR processes


Forecasting

Appendix

Stat. Tables

2 / 295
Outline of course Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

1. Key Concepts (revisions) Tests

Linear regression
2. Central Limit Theorem Assumptions
OLS

3. Tests Inference
Common pitfalls

4. Linear Regressions Large sample


IV

5. Panel Regressions GRM

Panels
6. Maximum Likelihood Estimation Max.Lik.Estim.

7. Binary-Choice Models Binary

8. Introduction to Time Series Time Series


AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

3 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Key Concepts (revisions) Inference


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

4 / 295
Definition 1.1 (Cumulative distribution function (c.d.f.)) Statistics and
Econometrics II

The random variable (r.v.) X admits the cumulative distribution function F if, J.-P. Renne
for all a:
Prelim.
F (a) = P(X Æ a), ’a.
CLT

Definition 1.2 (Probability density function (p.d.f.)) Tests

Linear regression
A continuous random variable X admits the probability density function f if: Assumptions
OLS
⁄ b Inference
Common pitfalls
P(a < X Æ b) = f (x )dx , ’a Æ b, Large sample

a IV
GRM

where f (x ) Ø 0 for all x . Panels

Max.Lik.Estim.
I In particular (if this limit exists), we have:
Binary

P(x < X Æ x + Á) F (x + Á) ≠ F (x ) Time Series


f (x ) = lim = lim . (1) AR processes
Áæ0 Á Áæ0 Á MLE of AR processes
Forecasting
and ⁄ a
Appendix
F (a) = f (x )dx .
Stat. Tables
≠Œ

Web-interface illustration Extra material on p.d.f. estimation (kernel approaches)

5 / 295
Statistics and
Definition 1.3 (Joint cumulative distribution function (c.d.f.)) Econometrics II

J.-P. Renne
The random variables X and Y admit the joint cumulative distribution
function FXY if, for all a and b: Prelim.

CLT
FXY (a, b) = P(X Æ a, Y Æ b).
Tests

Linear regression
Definition 1.4 (Joint probability density function (p.d.f.)) Assumptions
OLS
The continuous random variables X and Y admit the joint p.d.f. fXY , where Inference

fXY (x , y ) Ø 0 for all x and y , if: Common pitfalls


Large sample

⁄ b⁄ d
IV
GRM

P(a < X Æ b, c < Y Æ d) = fXY (x , y )dxdy , ’a Æ b, c Æ d. Panels


a c
Max.Lik.Estim.

I In particular (if this limit exists), we have: Binary

Time Series
fXY (x , y ) AR processes
MLE of AR processes
P(x < X Æ x + Á, y < Y Æ y + Á) Forecasting
= lim
Áæ0 Á2 Appendix
FXY (x + Á, y + Á) ≠ FXY (x , y + Á) ≠ FXY (x + Á, y ) + FXY (x , y ) Stat. Tables
= lim .
Áæ0 Á2

6 / 295
Statistics and
Econometrics II

Figure 1: Joint normal distribution: c.d.f. illustration J.-P. Renne

Prelim.

CLT

Tests

0.15 Linear regression


Assumptions
joint de

OLS
0.10
Inference
ns

Common pitfalls
ity

0.05 Large sample


−3
IV
−2
GRM
−1

Panels
−3
0
−2

X
−1
0
1
Max.Lik.Estim.
Y
1 2
Binary
2

3
3 Time Series
AR processes
MLE of AR processes
Forecasting

Appendix
I The volume between the horizontal plane (z = 0) and the surface is Stat. Tables
equal to FXY (0.5, 1).

7 / 295
Statistics and
Figure 2: Joint normal distribution Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

0.15 Linear regression


Assumptions
joint

0.10 OLS
densi

Inference
3
ty

0.05 Common pitfalls


2
Large sample
1
IV
−3
−2
0 GRM

Y
Panels
−1 −1
0
X
1 −2 Max.Lik.Estim.
2
−3 Binary
3

Time Series
AR processes
MLE of AR processes

I Assume that the basis of the black column is defined by those points Forecasting

whose x -coordinates are between x and x + Á and y -coordinates are Appendix


between y and y + Á. Stat. Tables
∆ Then the volume of the black column is equal to
P(x < X Æ x + Á, y < Y Æ y + Á), which is approximately equal to
fXY (x , y )Á2 if Á is small.
8 / 295
Statistics and
Definition 1.5 (Conditional probability distribution function) Econometrics II

If X and Y are continuous r.v., then the distribution of X conditional on J.-P. Renne

Y = y , is denoted by fX |Y (x , y ) and satisfies:


Prelim.

CLT
P(x < X Æ x + Á|Y = y )
fX |Y (x , y ) = lim . Tests
Áæ0 Á
Linear regression

Proposition 1.1 (Conditional probability distribution function)


Assumptions
OLS
Inference

We have Common pitfalls

fXY (x , y ) Large sample


fX |Y (x , y ) = . IV
fY (y ) GRM

Panels
Proof We have:
Max.Lik.Estim.
P(x < X Æ x + Á|Y = y ) Binary
fX |Y (x , y ) = lim
Áæ0 Á
Time Series
1 AR processes
= lim P(x < X Æ x + Á|y < Y Æ y + Á)
Áæ0 Á MLE of AR processes
Forecasting
1 P(x < X Æ x + Á, y < Y Æ y + Á)
= lim
Áæ0 Á P(y < Y Æ y + Á) Appendix

1 Á2 fXY (x , y ) Stat. Tables


= lim .⌅
Áæ0 Á ÁfY (y )

9 / 295
Statistics and
Econometrics II
Definition 1.6 (Independent random variables) J.-P. Renne

Consider two r.v., X and Y , with c.d.f. and p.d.f. FX , FY , fX and fY . Prelim.

These random variables are independent if and only if (iff) the joint c.d.f. of CLT
X and Y (see Def. 1.3) is given by: Tests

Linear regression
FXY (x , y ) = FX (x ) ◊ FY (y ) Assumptions
OLS
Inference
or, equivalently, iff the joint p.d.f. of (X , Y ) (see Def. 1.4) is given by: Common pitfalls
Large sample

fXY (x , y ) = fX (x ) ◊ fY (y ). IV
GRM

Panels
Remark 1.1 Max.Lik.Estim.

I If X and Y are independent, fX |Y (x , y ) = fX (x ), implying, in particular, Binary


E(g(X )|Y ) = E(g(X )) where g is any function. Time Series
I If X and Y are independent, then E(g(X )h(Y )) = E(g(X ))E(h(Y )) AR processes

and Cov (g(X ), h(Y )) = 0 where g and h are any functions.


MLE of AR processes
Forecasting

I If X and Y are uncorrelated, they are not necessarily independent Appendix


[Example: X = Y 2 ≠ 1 and Y ≥ N (0, 1)].
Stat. Tables

10 / 295
Proposition 1.2 (Law of iterated expectations) Statistics and
Econometrics II

J.-P. Renne
If X and Y are two random variables and if E(|X |) < Œ, we have:
Prelim.
E(X ) = E(E(X |Y )).
CLT

Tests
Proof (in the case where the p.d.f. of (X , Y ) exists) Let us denote by fX , fY and fXY the probability
distribution functions (p.d.f.) of X , Y and (X , Y ), respectively. We have: Linear regression
⁄ Assumptions
OLS

fX (x ) = fXY (x , y )dy . Inference


Common pitfalls
Large sample
Besides, we have (Bayes equality, Prop. 1.1): IV
GRM
fXY (x , y ) = fX |Y (x , y )fY (y ).
Panels
Therefore:
Max.Lik.Estim.
⁄ ⁄ ⁄ ⁄ ⁄
Binary
E(X ) = xfX (x )dx = x fXY (x , y )dy dx = xfX |Y (x , y )fY (y )dydx
Time Series
¸ ˚˙ ˝ AR processes
=fX (x ) MLE of AR processes
⁄ 3⁄ 4 Forecasting

= xfX |Y (x , y )dx fY (y )dy = E (E[X |Y ]) .⌅ Appendix

¸ ˚˙ ˝ Stat. Tables
E[X |Y =y ]

11 / 295
Example 1.1 (Mixture of Gaussian distributions (1/2)) Statistics and
Econometrics II

X is drawn from a mixture of Gaussian distributions if: J.-P. Renne

Prelim.
X = B ◊ Y1 + (1 ≠ B) ◊ Y2 ,
CLT
where B, Y1 and Y2 are three independent variables: Tests
B ≥ Bernoulli(p), Y1 ≥ N (µ1 , ‡12 ) and Y2 ≥ N (µ2 , ‡22 ). Web-interface
Linear regression
Assumptions
OLS

p=0.5, µ1=−1, σ1=1, µ2=2, σ2=1, p=0.1, µ1=−2, σ1=0.5, µ2=2, σ2=1, p=0.3, µ1=−2, σ1=2, µ2=1, σ2=1, Inference
Common pitfalls

0.30
0.20

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Large sample
IV

0.25
0.15

GRM

0.20
Panels

0.15
0.10

Max.Lik.Estim.

0.10
0.05

Binary

0.05
Time Series
0.00

0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 AR processes
MLE of AR processes
Forecasting

The law of iterated expectations gives: Appendix

Stat. Tables
E(X ) = E(E(X |B)) = E(1{B=1} µ1 + 1{B=0} µ2 ) = pµ1 + (1 ≠ p)µ2 .
¸ ˚˙ ˝ ¸ ˚˙ ˝
=B =1≠B
12 / 295
Statistics and
Econometrics II
Example 1.2 (Buffon (1733)’s needles and fi)
J.-P. Renne
Suppose we have a floor made of
parallel strips of wood, each the same Prelim.

width [w = 1], and we drop a needle CLT


onto the floor. What is the probability Tests
that the needle lands on a line?
Linear regression
Assumption: needles’ length = 1/2. Assumptions
OLS

Let’s define the random variable X by Inference


Common pitfalls
Large sample
Ó
1 if the needle crosses a line
angle θ
IV
X =
d

0 otherwise
GRM

Panels

Conditionally on ◊:
w=1
Max.Lik.Estim.

E(X |◊) = cos(◊)/2. Binary

◊ is uniformly distributed on Time Series


[≠fi/2, fi/2], therefore: AR processes
⁄ fi/2 1 2 MLE of AR processes
1 d◊ 1 Forecasting
E(X ) = E(E(X |◊)) = E(cos(◊)/2) = cos(◊) = .
≠fi/2
2 fi fi Appendix

Stat. Tables
Web-interface illustration

13 / 295
Statistics and
Example 1.3 (Davis and Lo’s (2001) model) Econometrics II

Davis and Lo (2001) introduce a contagion model and propose applications in J.-P. Renne
terms of credit-risk management.
Prelim.
They consider n entities. The state of entity i is denoted by di : di = 0 if CLT
entity i stays non-infected and 1 if it gets infected. There are two ways of
getting infected: Tests

I direct, or autonomous, infection; Linear regression


Assumptions
I indirect infection, or contamination. OLS
Inference

Direct infection and contamination are modelled through two types of random Common pitfalls
Large sample
variables: Xi s (n of them) and Yji s (n2 of them). These n(n + 1) variables are IV
independently Bernoulli distributed. The Bernoulli parameter is p for the Xi s GRM

and q for the Yji s. We have: Panels

A B Max.Lik.Estim.
Ÿ
Binary
di = Xi +(1 ≠ Xi ) 1≠ (1 ≠ Xj Yji ) .
¸˚˙˝ Time Series
j”=i
autonomous infection ¸ ˚˙ ˝ AR processes

contamination
MLE of AR processes
Forecasting

Appendix
In order to determine the distribution of the number of infected entities, it is
convenient to first reason conditionally
q on the number of autonomous Stat. Tables
infections r defined by r = X.
i i
Web-interface illustration

14 / 295
Statistics and
Proposition 1.3 (Law of total variance) Econometrics II

J.-P. Renne
If X and Y are two random variables and if the variance of X is finite,
we have: Prelim.

Var (X ) = E(Var (X |Y )) + Var (E(X |Y )). CLT

Tests

Proof Linear regression


Assumptions
2 2
Var (X ) = E(X ) ≠ E(X ) OLS
Inference
2 2
= E(E(X |Y )) ≠ E(X ) Common pitfalls
2 2 2 2 Large sample
= E(E(X |Y )≠E(X |Y ) ) + E(E(X |Y ) ) ≠ E(X )
IV
2 2 2 2
= E(E(X |Y ) ≠ E(X |Y ) ) + E(E(X |Y ) ) ≠ E(E(X |Y )) .⌅ GRM
¸ ˚˙ ˝ ¸ ˚˙ ˝
Panels
Var (X |Y ) Var (E(X |Y ))
Max.Lik.Estim.

Binary
Example 1.4 (Mixture of Gaussian distributions (2/2)) Time Series

Consider the case of a mixture of Gaussian distributions (Example 1.1).


AR processes
MLE of AR processes

We have: Forecasting

Appendix
Var (X ) = E(Var (X |B)) + Var (E(X |B))
Stat. Tables
= p‡12 + (1 ≠ p)‡22 + p(1 ≠ p)(µ1 ≠ µ2 )2 .

15 / 295
About consistent estimators Statistics and
Econometrics II

J.-P. Renne
I The objective of econometrics is to estimate parameters out of
data observations (samples). Prelim.

I Except in degenerate cases, the estimates are different from the CLT

“true” value. Tests

I A good estimator is expected to “converge” to the true value Linear regression


when the sample size increases: consistency. Assumptions
OLS

I Denote by ◊ˆn the estimate of ◊ based on a sample of length n. Inference

◊ˆ is consistent if, for any Á > 0 (even if very small), the


Common pitfalls
I
Large sample
probability that ◊ˆn is in [◊ ≠ Á, ◊ + Á] goes to 1 when n goes to Œ: IV
GRM
! "
lim P ◊ˆn œ [◊ ≠ Á, ◊ + Á] = 1 Panels
næ+Œ
Max.Lik.Estim.
I That is, ◊ˆ is a convergent estimator if ◊ˆn converges in probability Binary
(Def. 9.9) to ◊.
Time Series
AR processes
MLE of AR processes
Example 1.5 Forecasting

Examples of parameters to be estimated: causal effect of a variable on Appendix

another, elasticities, parameters defining some distribution of interest, Stat. Tables


preference parameters (risk aversion)...

16 / 295
Example 1.6 (Example of non-convergent estimator) Statistics and
Econometrics II
I Assume that Xi ≥ i.i.d.Cauchy with a location parameter of 1 and J.-P. Renne
a scale parameter of 1 (Def. 9.7).
Prelim.
I The sample mean
n CLT
1ÿ
X̄n = Xi Tests
n
i=1 Linear regression

does not converge in probability.


Assumptions
OLS

(This is because a Cauchy distribution has no mean, hence the Inference


Common pitfalls
law of large numbers cannot apply.) Large sample
IV
GRM

Xn Xn Panels

Max.Lik.Estim.
1

Binary
−1000 1000

Time Series
−1

AR processes
MLE of AR processes
−2

Forecasting
−3
−4000

Appendix

Stat. Tables
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000

sample size (n) sample size (n)

17 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Central Limit Theorem Inference


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

18 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Definition 2.1 (Sample and Population) CLT

In statistics, a sample refers to a set of observations drawn from a Tests

population. Linear regression


Assumptions
OLS

Remarks Inference
Common pitfalls

I Most of the time, it is necessary to use samples for research, Large sample
IV
because it is impractical to study the whole population. GRM

I Suppose one wants to know the average height of 20-year-old Panels

Swiss students: Not possible to get the heights of all of the Max.Lik.Estim.
20-year-old students in Switzerland, but possible to get them for a Binary
sample of them. Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

19 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 3: Data (red bars) and estimated density of men heights
Prelim.

CLT
0.06

Tests
0.05

Linear regression
Assumptions
0.04

OLS
Inference
frequency

Common pitfalls
0.03

Large sample
IV
0.02

GRM

Panels
0.01

Max.Lik.Estim.
0.00

Binary

160 170 180 190 200 Time Series


AR processes
height MLE of AR processes
Forecasting
Source of the heights data: Davis (1990).
The distribution is estimated by employing the kernel approach Appendix

(see extra material on p.d.f. estimation by kernel approach). Stat. Tables

20 / 295
Statistics and
Econometrics II

J.-P. Renne
I The central limit theorem (CLT) and the law of large numbers
(LLN) are (the) two fundamental theorems of probability. Prelim.
I LLN: The sample mean converges to the population mean (when CLT
it exists). Tests

I CLT: The distribution of the sum (or average) of a large number Linear regression
of independent, identically distributed (i.i.d.) variables with finite Assumptions
OLS
variance is approximately normal, regardless of the underlying Inference

distribution. Common pitfalls


Large sample

I These theorems are at the basis of most statistical procedures. IV


GRM

Panels

Max.Lik.Estim.

Example 2.1 Binary

I Consider N classes of 20-year-old students. For each of the N Time Series


AR processes
classes, we compute the average size of the students. These N MLE of AR processes

values can be seen as draws from an (approximate) Gaussian Forecasting

distribution. Appendix

Stat. Tables

21 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Theorem 2.1 (Law of large numbers)
Tests

The sample mean is a consistent estimator of the population mean. Linear regression
Assumptions
OLS
Inference
Common pitfalls

Remarks Large sample


IV

I Consistent estimator: convergence in proba. (Slide 16, Def. 9.9).


GRM

Panels
I The assumption of finite variance is not necessary.
Max.Lik.Estim.

Binary
Proof. See extra material.
Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

22 / 295
Statistics and
Econometrics II

J.-P. Renne

I How many students should we measure to “correctly” estimate Prelim.


the average height µ? CLT
I It depends on the required precision. Tests
1
qn
I The LLN states that the sample average x̄n = xi converges Linear regression
n i=1
to µ.
Assumptions
OLS
Inference
I The CLT tells us about the accuracy of the estimator x̄n . Common pitfalls
Large sample
IV
GRM

Consider Buffon’s experiment (Example 1.2) Panels

I What is the minimal number n of needles that one has to throw to Max.Lik.Estim.

get an estimate fî (= 1/X̄n ) of fi that is such that there is a 95% Binary

chance that fi œ [fî ≠ 0.01, fî + 0.01]? Time Series


AR processes
I The central limit theorem will help find an approximate solution. MLE of AR processes
Forecasting

Appendix

Stat. Tables

23 / 295
Statistics and
Econometrics II

J.-P. Renne

I How far can we go without the CLT? Prelim.

CLT
I Assume xi ≥ i.i.d. L(µ, ‡ 2 ), where L(µ, ‡) denotes a distribution
2
of mean µ and variance ‡ . Tests

Linear regression
I We have: Assumptions

1
OLS

E(x̄n ) = µ and Var (µ ≠ x̄n ) = ‡ 2 æ 0 Inference

n næŒ Common pitfalls


Large sample
IV
I The Chebychev inequality (Thm. 9.1) implies that, for any Á > 0 GRM

(even very small): Panels

Max.Lik.Estim.
Var (µ ≠ x̄n ) ‡2
P(|x̄n ≠ µ| > Á) Æ = 2 æ 0. Binary
Á 2 nÁ næŒ
Time Series
I How does the CLT complement this information? AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

24 / 295
Statistics and
Econometrics II

J.-P. Renne

Theorem 2.2 (Lindberg-Levy Central limit theorem, CLT) Prelim.

CLT
If xn is an i.i.d. sequence of random variables with mean µ and variance Tests
‡ 2 (œ]0, +Œ[), then:
Linear regression
Assumptions
n
Ô d 2 1ÿ OLS

n(x̄n ≠ µ) æ N (0, ‡ ), where x̄n = xi . Inference

n Common pitfalls
i=1 Large sample
IV
GRM
d
[“æ” denotes the convergence in distribution (see Def. 9.12)] Panels

Max.Lik.Estim.

Binary
Proof. See online extra material.
Time Series
AR processes

Web-interface illustration N Statistical Table MLE of AR processes


Forecasting

Appendix

Stat. Tables

25 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
History of CLT
Tests
I First version of this theorem: postulated by the mathematician
Linear regression
Abraham de Moivre. In an article published in 1733, he used the Assumptions

normal distribution to approximate the distribution of the number OLS


Inference
of heads resulting from many tosses of a fair coin. Common pitfalls
Large sample
I This finding was nearly forgotten until Pierre-Simon Laplace IV

rescued it from obscurity in his work Théorie analytique des GRM

probabilités, published in 1812. Panels

I Laplace expanded De Moivre’s finding by approximating the Max.Lik.Estim.

binomial distribution with the normal distribution. Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

26 / 295
Statistics and
Econometrics II

Figure 4: Distribution of sample means, heights example (see Fig. 3) J.-P. Renne

Prelim.

CLT
0.10

Height distribution (centered)


Tests
Distribution of simulated sample means
0.08

Linear regression
Assumptions
OLS
0.06
frequency

Inference
Common pitfalls
Large sample
0.04

IV
GRM
0.02

Panels

Max.Lik.Estim.
0.00

Binary
−20 −10 0 10 20 Time Series
AR processes
(centered) height MLE of AR processes
Forecasting

Appendix
Notes: To obtain the black distribution, we simulate a large number (4000) of samples of n = 100
points using the red distributions. For each sample, we compute the sample mean.ÔThe black curve is Stat. Tables
the distribution of the (simulated) sample means, after having multiplied them by n = 10.
Web-interface illustration

27 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Theorem 2.3 (Multivariate Central limit theorem, CLT) Tests

Linear regression
If xn is an i.i.d. sequence of m-dimensional random variables with mean Assumptions

µ and variance , where is positive definite and finite, then: OLS


Inference
Common pitfalls
n
1ÿ
Large sample
Ô d
n(x̄n ≠ µ) æ N (0, ), where x̄n = xi . IV

n GRM

i=1
Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

28 / 295
Statistics and
Econometrics II

J.-P. Renne

I Whatever the distribution L of the xi ’s (it just has to feature an Prelim.


average µ and a variance ‡ 2 ), the error between x̄n and µ can be CLT
seen as a random variable that is normally
Ô distributed with a mean Tests
of zero and a standard deviation of ‡/ n.
Ô Linear regression
I We say that the convergence rate is “in n”. Assumptions
OLS
I The good news is: we know a lot about the normal distribution Inference

(contrary to L in the general case).


Common pitfalls
N Statistical Table
Large sample

I This allows us to design statistical tests.


IV
GRM

Panels

Max.Lik.Estim.

Computer-free simulation of N Binary

Time Series
I Assume you have a lot of time and a coin. How would you
AR processes
approximate the 100 percentiles of the N (0, 1) distribution? MLE of AR processes
Forecasting

Appendix

Stat. Tables

29 / 295
Statistics and
Example 2.2 (Comparison of sample means) Econometrics II

J.-P. Renne
I The question is:
Is the average of hourly wage rates the same for men (µm ) and Prelim.

women (µw )? CLT

I Wage sample for men: {m1 , . . . , mnm }. Tests

Linear regression
I Wage sample for women: {w1 , . . . , wnw }.
Assumptions

I CLT: OLS
Inference
Common pitfalls
Ô 2 Ô
nm (m̄ ≠ µm ) ≥ N (0, ‡m ) and nw (w̄ ≠ µw ) ≥ N (0, ‡w2 ), Large sample
IV
GRM
or
2 Panels
m̄ ≥ N (µm , ‡m /nm ) and w̄ ≥ N (µw , ‡w2 /nw ).
Max.Lik.Estim.
I Because m̄ and w̄ are based on randomly selected samples, they
Binary
are independent (Def. 1.6) and:
Time Series
AR processes
! 2
" ‡2 ‡2
m̄ ≠ w̄ ≥ N µm ≠ µw , ‡ with ‡2 = m + w .
MLE of AR processes

nm nw Forecasting

Appendix
I In particular, under the assumption that µm ≠ µw = 0, the
Stat. Tables
probability that m̄ ≠ w̄ is in [≠2‡, 2‡] is approximately 0.95.

30 / 295
Statistics and
Econometrics II

J.-P. Renne

Example 2.2 (Comparison of sample means (continued))


Prelim.

CLT
Figure 5: Distributions of hourly wage rates
Tests
(Case A) education>12 years (Case B) education>18 years
Linear regression
Men Men Assumptions
Women Women OLS
0.06

0.06
Inference
Common pitfalls
Large sample
0.04

0.04
frequency

frequency
IV
GRM

Panels
0.02

0.02
Max.Lik.Estim.
0.00

0.00
Binary
0 10 20 30 40 50 0 10 20 30 40 50 60
Time Series
hourly wage rate hourly wage rate AR processes
MLE of AR processes
Source of the data: Fox (2008), Canadian Survey of Labour and Income Dynamics, Ontario province. Forecasting

Appendix

Stat. Tables

31 / 295
Statistics and
Econometrics II

J.-P. Renne
Example 2.2 (Comparison of sample means (continued))
Prelim.
I Case A (Education > 12 years):
Ô CLT
We have m̄ ≠ w̄ = 3.44, ‡ 2 = 0.33.
Tests
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠0.65, 0.65]) ¥ 5%.
Linear regression
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠0.85, 0.85]) ¥ 1%.
Assumptions
I If µm ≠ µw = 0, P(m̄ ≠ w̄ > 3.44) ¥ 0.0000001%.
¸ ˚˙ ˝ OLS
Inference
p-value Common pitfalls
Large sample
N Statistical Table
IV
GRM

Panels
I Case B (Education > 18 years):
Ô Max.Lik.Estim.
We have m̄ ≠ w̄ = 2.34, ‡ 2 = 1.22. Binary
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠2.39, 2.39]) ¥ 5%. Time Series
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠3.14, 3.14]) ¥ 1%. AR processes
I If µm ≠ µw = 0, P(m̄ ≠ w̄ > 2.34) ¥ 2.7% . MLE of AR processes
¸˚˙˝ Forecasting

p-value Appendix

Stat. Tables

32 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Tests Inference
Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

33 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Context and objective
CLT
I Context: We consider a vector of parameters ◊.
Tests
I Objective: We want to test a specific hypothesis about ◊. Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
Definition 3.1 (Hypothesis) GRM

I Hypothesis = conjecture about a given property of a population. Panels

Max.Lik.Estim.
I We consider:
Binary
a null hypothesis (H0 : {◊ œ }) and
Time Series
c
an alternative hypothesis (H1 : {◊ œ
/ }, i.e. H1 : {◊ œ }). AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

34 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

I Researchers may want to find support for the null hypothesis H0 Tests

as well as for the alternative hypothesis H1 . Linear regression

I General idea:
Assumptions
OLS

The null is assumed to be true;


Inference
I
Common pitfalls
I Under this assumption, one look for the distribution of a given Large sample

statistic; IV

I The statistic is computed using observed data; GRM

I If the obtained statistic is located in some “implausible” parts of Panels


the distribution, H0 is rejected; otherwise it is accepted. Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

35 / 295
Statistics and
I When testing a hypothesis, one can make two types of errors: Econometrics II

J.-P. Renne
Type I Reject H0 while it is true = False Positive (FP).
Type II Accept H0 while it is false = False Negative (FN).
Prelim.

CLT
Example 3.1 (A practical illustration of type-I and type-II errors) Tests

I Early warning systems (EWS): approaches designed to warn policy Linear regression

makers of potential future economic and financial crises. See e.g.


Assumptions
OLS

ECB (2014). Inference


Common pitfalls
I Researchers look for signals/approaches forecasting crises. Large sample
IV

H1 : A crisis is coming.
GRM

Panels
I Four cases: Max.Lik.Estim.

I True Positive: The approach indicates that a crisis is coming; Binary


which turns out to be true. Time Series
I False Positive: The approach indicates that a crisis is coming; AR processes

which turns out to be false. MLE of AR processes

True Negative: The approach indicates that no crisis is coming;


Forecasting
I
which turns out to be true. Appendix
I False Negative: The approach indicates that no crisis is coming; Stat. Tables
which turns out to be false.

36 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Definition 3.2 (Size and Power of a test)
CLT

For a given test, Tests

I the probability of type-I errors, denoted by –, is called the size (or Linear regression

the significance level) of the test,


Assumptions
OLS

I the power of a test is equal to 1 ≠ —, where — is the probability of


Inference
Common pitfalls

type-II errors. Large sample


IV
GRM

I For a given size, we prefer tests with high power. Power = Panels
probability that the test will lead to the rejection of the null Max.Lik.Estim.
hypothesis if the latter is false (almost ruling out false negatives). Binary

I √ size ∆ √ Type-I error but, often, ¬ Type-II error . Time Series


AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

37 / 295
Statistics and
Econometrics II

J.-P. Renne
I A test involves:
I a vector of (unknown) parameters (◊), Prelim.
I a test statistic (say S) and CLT
I a critical region (say ).
Tests
I H0 expressed some constraints on ◊, formulated as h(◊) = 0.
Linear regression
I The computation of S is based on a set of observed data Y (i.e. Assumptions
OLS
S © S(Y)). Inference
Common pitfalls
I Decision: Large sample
IV
I H0 is accepted if S ”œ ; GRM
I H0 is rejected if S œ .
Panels
I Notations: Max.Lik.Estim.

Binary
– = P(S œ |H0 ) (Proba. of false positive) (2)
Time Series
— = P(S ”œ |H1 ) (Proba. of false negative). (3) AR processes
MLE of AR processes
Forecasting

Charts on tests (critical region, level, p-value) Web-interface illustration Appendix

Stat. Tables

38 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Definition 3.3 (Different types of tests) CLT

I Two-tailed test: The region of rejection is on both sides of the Tests

sampling distribution (Fig. 66). Linear regression

Typically used if the sampling distribution is symmetrical (e.g.


Assumptions
OLS

N (0, 1)). Then, the critical region is Inference


Common pitfalls
Large sample
] ≠ Œ, ≠1
(–/2)] fi [ ≠1
(1 ≠ –/2), +Œ[. IV
GRM

I One-tailed test: The region of rejection is on one side of the Panels


sampling distribution (Fig. 69). Max.Lik.Estim.
Typically used if the sampling distribution has a support on R+ Binary
(e.g. ‰2 ). Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

39 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Example 3.2 (Factory example (1/2))
CLT

I Consider a factory that produces metal cylinders whose diameter Tests

has to be equal to 1cm. The tolerance is a = 0.01cm and more Linear regression

than 90% of the pieces have to satisfy the tolerance for the whole
Assumptions
OLS

production (say 1.000.000 pieces) to be bought by the client. Inference


Common pitfalls
I The production technology is such that a proportion ◊ of the Large sample

pieces does not satisfy the tolerance.


IV
GRM

I ◊ could be computed by measuring all the pieces but this would Panels
be costly. Max.Lik.Estim.

I Instead, it is decided that n << 1.000.000 pieces will be measured. Binary

I The null hypothesis H0 is ◊ < 10%. Time Series


AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

40 / 295
Statistics and
Econometrics II

Example 3.2 (Factory example (2/2)) J.-P. Renne

I Let us denote by di the binary indicator defined as: Prelim.

; CLT
0 if the size of the i th cylinder is in [1 ≠ a, 1 + a];
di = Tests
1 otherwise.
Linear regression
qn Assumptions
I We set xn = di , i.e. xn is the number of measured pieces OLS
i=1
that do not satisfy the tolerance (out of n).
Inference
Common pitfalls

x Large sample
I A decision rule is: accept H0 if n Æ b, reject otherwise. IV
n GRM

I In that case: Panels


I the test statistic is Sn = xnn and Max.Lik.Estim.
I the critical region is = [b, 1]. Binary
I The probability to reject H0 is: Time Series
AR processes
n
ÿ MLE of AR processes

P◊ (Sn œ )= Cni ◊i (1 n≠i


≠ ◊) . Forecasting

i=b◊n+1 Appendix

Stat. Tables

41 / 295
Statistics and
Econometrics II
Figure 6: Factory example
J.-P. Renne

Prelim.
Probability to reject H0, N=100
CLT

Tests
1.0

Linear regression
Assumptions
0.8

We would like We would like OLS


Inference
to accept Ho to reject Ho Common pitfalls
0.6

Large sample
Probability

Power of the test IV


GRM
0.4

Panels
Size of the test
Max.Lik.Estim.
0.2

b = 0.05 Binary
b = 0.10 Time Series
b = 0.20
0.0

AR processes
MLE of AR processes
Forecasting
0.0 0.1 0.2 0.3 0.4
Appendix
"true" θ
Stat. Tables

Web-interface illustration
42 / 295
Statistics and
Econometrics II
Figure 6: Factory example
J.-P. Renne

Prelim.
Probability to reject H0, N=400
CLT

Tests
1.0

Linear regression
Assumptions
0.8

We would like We would like OLS


Inference
to accept Ho to reject Ho Common pitfalls
0.6

Large sample
Probability

Power of the test IV


GRM
0.4

Panels
Size of the test
Max.Lik.Estim.
0.2

b = 0.05 Binary
b = 0.10 Time Series
b = 0.20
0.0

AR processes
MLE of AR processes
Forecasting
0.0 0.1 0.2 0.3 0.4
Appendix
"true" θ
Stat. Tables

Web-interface illustration
43 / 295
Statistics and
Econometrics II
Figure 6: Factory example
J.-P. Renne

Prelim.
Probability to reject H0, N=1000
CLT

Tests
1.0

Linear regression
Assumptions
0.8

We would like We would like OLS


Inference
to accept Ho to reject Ho Common pitfalls
0.6

Large sample
Probability

Power of the test IV


GRM
0.4

Panels
Size of the test
Max.Lik.Estim.
0.2

b = 0.05 Binary
b = 0.10 Time Series
b = 0.20
0.0

AR processes
MLE of AR processes
Forecasting
0.0 0.1 0.2 0.3 0.4
Appendix
"true" θ
Stat. Tables

Web-interface illustration
44 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Definition 3.4 (Asymptotic level)
Tests
An asymptotic test with critical region n has an asymptotic level equal Linear regression
to – if: Assumptions

sup lim P◊ (Sn œ n ) = –.


OLS
Inference
◊œ næŒ
Common pitfalls
Large sample
IV
Definition 3.5 (Asymptotically consistent test) GRM

Panels
An asymptotic test with critical region n is consistent if: Max.Lik.Estim.
c
’◊ œ , P◊ (Sn œ n ) æ 1. Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

45 / 295
Statistics and
Econometrics II
Back to the Factory example (Exmpl. 3.2): Asymptotic level J.-P. Renne

I Because Sn = d̄n , and since E(di ) = ◊ and Var (di ) = ◊(1 ≠ ◊), the Prelim.
CLT (Thm 9.6) leads to: CLT
1 2 Ô
1 n(Sn ≠ ◊) Tests
Sn ≥ N ◊, ◊(1 ≠ ◊) or  ≥ N (0, 1)
n ◊(1 ≠ ◊) Linear regression
Assumptions

1Ô 2 OLS

n(b≠◊) Inference
I Hence, P◊ (Sn œ n) = P◊ (Sn > b) ¥ 1 ≠ Ô . Common pitfalls
◊(1≠◊)
Large sample
1Ô 2 IV
n(b≠◊)
I ◊ æ1≠ Ô increases w.r.t. ◊, therefore: GRM
◊(1≠◊)
Panels

sup P◊ (Sn > bn ) = P◊=0.1 (Sn œ n) ¥ Max.Lik.Estim.


◊œ =[0,0.1]
Binary
3Ô 4
n(bn ≠ 0.1) Time Series
1≠ . AR processes
0.3 MLE of AR processes
Forecasting
Ô
∆ If we set bn = 0.1 + 0.3 ≠1 (1 ≠ –)/ n, then we have Appendix
sup◊œ =[0,0.1] P◊ (Sn > bn ) ¥ – for large values of n. Stat. Tables

46 / 295
Statistics and
Econometrics II

J.-P. Renne
Back to the Factory example (Exmpl. 3.2): Asymptotic consistency
I We proceed under the assumption that ◊ > 0.1 and we consider Prelim.

bn = b = 0.1. CLT

Tests
I We still have:
AÔ B Linear regression

n(b ≠ ◊) Assumptions

P◊ (Sn œ n) = P◊ (Sn > b) ¥ 1 ≠  . OLS


Inference
◊(1 ≠ ◊) Common pitfalls
Large sample
Ô
I Because Ôn(b≠◊)
IV
æ ≠Œ, we have GRM
◊(1≠◊) næŒ
Panels
AÔ B
n(b ≠ ◊) Max.Lik.Estim.
P◊ (Sn > b) ¥ 1 ≠  æ 1. Binary
◊(1 ≠ ◊) næŒ
¸ ˚˙ ˝ Time Series
æ 0 AR processes
næŒ MLE of AR processes
Forecasting

I Therefore, with bn = b = 0.1, the test is consistent. Appendix

Stat. Tables

47 / 295
Normality tests Statistics and
Econometrics II

J.-P. Renne

I Let f be the p.d.f. of Y . The k th standardized moment of Y is Prelim.

defined as: CLT


µk
Âk = 1 2k , Tests
Var (Y ) Linear regression
Assumptions

where E(Y ) = µ and OLS


Inference
⁄ Œ Common pitfalls

µk = E[(Y ≠ µ)k ] = (y ≠ µ)k f (y )dy Large sample


IV
≠Œ GRM

Panels
is the k th central moment of Y .
Max.Lik.Estim.
I In particular, µ2 = Var (Y ) = ‡ 2 , say. Therefore:
Binary
µk
Âk = 1 2k , Time Series
1/2 AR processes
µ2 MLE of AR processes
Forecasting

I The skewness corresponds to Â3 and the kurtosis to Â4 (Def. 9.1). Appendix

Stat. Tables

48 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Proposition 3.1 (Normal distribution) Tests

Linear regression
For a Gaussian var., the skewness (Â3 ) is 0 and the kurtosis (Â4 ) is 3. Assumptions
OLS
Inference
3 3
Proof For a centered Gaussian distribution, (≠y ) f (≠y ) = ≠y f (y ). This implies that
sΠs0 sΠsΠsΠCommon pitfalls

y 3 f (y )dy = y 3 f (y )dy + y 3 f (y )dy = ≠ y 3 f (y )dy + y 3 f (y )dy = 0, Large sample


≠Œ ≠Œ 0 0 0 IV
which leads to the skewness result. GRM
Moreover, for a Gaussian distribution, df (y )/dy = ≠yf (y ) and therefore
d (y 3 f (y )) = 3y 2 f (y ) ≠ y 4 f (y ). Partial integration leads to the kurtosis result. ⌅ Panels
dy
Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

49 / 295
Statistics and
Econometrics II

J.-P. Renne

I k th central sample moment of Y : Prelim.

CLT
n
1ÿ Tests
mk = (yi ≠ ȳ )k .
n Linear regression
i=1
Assumptions
OLS
I k th standardized sample moment of Y : Inference
Common pitfalls

mk Large sample
gk = k/2
. IV

m2 GRM

Panels

Max.Lik.Estim.
Proposition 3.2 (Consistency of central sample moments)
Binary
If the k th central moment of Y , exists, then the sample central moment Time Series
mk is a consistent estimate of the central moment µk . AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

50 / 295
Statistics and
Econometrics II

J.-P. Renne

Proposition 3.3 (Asymptotic distribution of g3 when Y ≥ N ) Prelim.


Ô d
If Y ≥ N (µ, ‡ 2 ), then ng3 æ N (0, 6). CLT

Tests

Proof See e.g. Lehmann (1999). Linear regression


Assumptions
OLS
Inference

Proposition 3.4 (Asymptotic distribution of g4 when Y ≥ N ) Common pitfalls


Large sample
Ô d IV
If Y ≥ N (µ, ‡ 2 ), then n(g4 ≠ 3) æ N (0, 24). GRM

Panels

Max.Lik.Estim.

Proposition 3.5 (Joint asymptotic distribution of g3 and g4 ) Binary

Ô Ô Time Series
Asymptotically, the vector ( ng3 , n(g4 ≠ 3)) is bivariate Gaussian. AR processes

Further its elements are uncorrelated and therefore independent.


MLE of AR processes
Forecasting

Appendix

Stat. Tables

51 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

The Jarque-Bera statistic is defined by: CLT

3 2
4 3 2
4 Tests
g32 (g4 ≠ 3) n (g4 ≠ 3)
JB = n + = g32 + . Linear regression
6 24 6 4 Assumptions
OLS
Inference
Common pitfalls
Proposition 3.6 (Jarque-Bera asympt. distri.) Large sample
IV
d 2
If Y is Gaussian, JB æ ‰ (2). GRM

Panels

Proof This directly derives from Proposition 3.5. ⌅ Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

52 / 295
Statistics and
Econometrics II
Figure 7: The distr. of the JB stat. in small-sample (Y ≥ N ).
J.-P. Renne
n=5 n=10
Prelim.

1.0
2.0

CLT

0.8
1.5

0.6
Density

Density
Tests
1.0

0.4
0.5

0.2
Linear regression
0.0

0.0
0 2 4 6 8 10 0 2 4 6 8 10 Assumptions
all.res[, i] all.res[, i] OLS
n=20 n=50
Inference

0.5
Common pitfalls
0.6

0.4
Large sample
0.4

0.3
Density

Density
IV

0.2
GRM
0.2

0.1
Panels
0.0

0.0
0 2 4 6 8 10 0 2 4 6 8 10

all.res[, i] all.res[, i]
Max.Lik.Estim.
n=100 n=500

Binary
0.0 0.1 0.2 0.3 0.4 0.5

0.4

Time Series
0.3
Density

Density

AR processes
0.2

MLE of AR processes
0.1

Forecasting
0.0

0 2 4 6 8 10 0 2 4 6 8 10
Appendix
10.000 samples of size n are simulated (from a normal distribution). For each sample, the JB statistic is
Stat. Tables
computed. The red line shows the ‰2 (2) density. This figure illustrates the convergence of the JB
statistic to the ‰2 (2) distribution under H0 (the null hypothesis of normality).

53 / 295
Statistics and
Econometrics II
Figure 8: The distr. of the JB stat. in small-sample (Y ≥ U ).
J.-P. Renne
n=5 n=10
Prelim.

1.2
2.0

CLT
1.5

0.8
Density

Density
Tests
1.0

0.4
0.5

Linear regression
0.0

0.0
0 5 10 15 0 5 10 15 Assumptions
all.res[, i] all.res[, i] OLS
n=20 n=50
Inference
Common pitfalls
0.8

Large sample
0.6

0.4
Density

Density
IV
0.4

GRM

0.2
0.2

Panels
0.0

0.0
0 5 10 15 0 5 10 15

all.res[, i] all.res[, i]
Max.Lik.Estim.
n=100 n=500

Binary
0.15
0.3

Time Series
0.10
Density

Density
0.2

AR processes
MLE of AR processes
0.05
0.1

Forecasting
0.00
0.0

0 5 10 15 0 5 10 15
Appendix
10.000 samples of size n are simulated (from a uniform distribution). For each sample, the JB statistic is
Stat. Tables
computed. The red line shows the ‰2 (2) density. This figure illustrates the absence of convergence of
the JB statistic to the ‰2 (2) distribution under H1 .

54 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions

Linear Regressions OLS


Inference

and Ordinary Least Squares (OLS)


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

55 / 295
Statistics and
Econometrics II

J.-P. Renne

Example 4.1 (Infant mortality rate and GDP/capita) Prelim.

CLT
Figure 9: Infant mortality rate (deaths per 1000 births) and GDP/capita Tests

Linear regression
● ● ●
● ●
● ●
Assumptions

5
● ●
●●●
● ●● ●●● ● ●
150

150
●● ● ●

● ●● ●
OLS
● ● ● ●

● ●
● ●●● ●● ●
● ● ●
● ● ● ●● ● ●
● ●● ●

Inference
● ● ● ●● ● ●
● ● ●● ● ● ●
● ●

4
● ● ●● ● ●●

log(Infant mortality rate)



● ● ● ●
● ● ●●●
● ●
Common pitfalls

Infant mortality rate

Infant mortality rate


● ●● ● ● ●●● ●


● ●● ●● ●●●
● ● ● ●

●● ● ● ● ● ●● ● ● ●
100

100

● ●● ● ● ●●
Large sample
● ●


● ● ● ● ●● ● ●●●●●




●● ● ● ● ●● ●● ●
●● ● ● ●●●

3
● ●●
IV
●● ● ●●● ● ● ● ● ● ● ● ● ●




●● ● ● ●● ● ● ●●
●●● ● ● ● ● ●
● ●

● ●●



● ●●●● ● ● ● ●
GRM
● ● ● ●●● ●
●● ●● ●
●●● ● ● ●
●●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ● ●● ●

● ● ● ● ●
●● ● ●
●●
50

50

2
Panels
● ●
● ● ●● ●●
●●●●● ● ●
●●● ● ● ● ●●●
●●
● ●● ●
● ● ● ●●
● ●●●
● ●
●●
● ● ●●●
●●●
●● ● ●● ●● ●●● ● ●
●● ●●
● ●
●● ● ● ●●●● ● ●
●● ●

● ●●●
●●●●● ●●● ●
● ●● ●● ● ● ● ●● ●●●●

Max.Lik.Estim.
●●
●● ●●

●●● ● ● ● ● ● ●
● ●● ●●●
● ●
●●●

●●● ● ● ● ● ● ●
●● ● ●●● ● ● ● ● ●
●● ●● ● ● ● ● ● ●● ● ● ●●
●●

1
●● ●●●
● ●● ● ● ●● ●●●●●●
● ●● ●● ●●●●
●● ●●● ● ●
●● ●

●● ● ●●
●● ●● ●● ● ● ● ●● ●


●●●

●●●●


● ●●●
●● ●● ●●
●●●

● ●●
● ●
● ●
0

0 10000 20000 30000 40000 4 5 6 7 8 9 10 4 5 6 7 8 9 10


Binary
GDP per capita log(GDP per capita) log(GDP per capita) Time Series
AR processes
Source of the data United Nations (1998) Social indicators. MLE of AR processes
Forecasting

Appendix

Stat. Tables

56 / 295
Statistics and
Example 4.2 (Income, Seniority and Education) Econometrics II

J.-P. Renne

Figure 10: Income, Seniority and Education


Prelim.

CLT
100

● ●


● ●

Tests
● ●
80

● ● ●
● ●
● ●

Linear regression

● ●
Income

60


● Assumptions
OLS
● ●

40

● ● ●
Inference

● ● ● ●
● ●


● ● Common pitfalls
20

● ● ● ● ●
●●
Large sample
● ●

Incom
10 12 14 16 18 20 22 ●
IV

Education
GRM

e
● ●
● ● ●

Panels

100

rity
● ● ●

● ●

nio

● ● ●

Max.Lik.Estim.

Se
Ed
uca
80

● ● ● tion
● ●
● ●

Binary
● ●
Income

60




Time Series
40



AR processes

● ● MLE of AR processes
20



Forecasting
50 100 150

Seniority Appendix

Source of the data An Introduction to Statistical Learning, with applications in R, (Springer, 2013) Stat. Tables
with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani.

57 / 295
Statistics and
Econometrics II

J.-P. Renne
Example 4.3 (Advertising and Sales)
Prelim.

Figure 11: Advertising and Sales CLT

Tests

Linear regression
● ● ●
● ● ●
● ● ●● ●
●● ● ● ● ● ●
25

25

25
● ● ●
● ●
Assumptions
● ● ●

●● ● ● ●
● ● ● ●
● ● ●
●● ● ● ●● ● ● ● ●
OLS
●● ● ●● ● ●●
● ● ● ● ●

● ● ● ●●

● ● ●
● ● ● ● ●● ● ● ●● ●
Inference
20

20

20
● ● ● ●● ● ●● ●
● ●● ●

● ●●
● ● ●● ● ● ●
● ● ●● ●● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ●
Common pitfalls
● ● ● ●● ●
● ● ● ● ●

● ● ●
● ●● ●● ●● ● ●● ●
● ● ● ● ● ● ● ● ●
●●
● ● ●● ● ● ●●
● ●● ●● ● ● ● ●
● ●
● ● ●● ● ●● ● ●● ●
● ●● ●
●● ● ●
● ● ● ● ● ●●
●●

●●
● ● Large sample
Sales

Sales

Sales
15

15

15
● ● ● ● ● ● ● ●
●● ● ●●●● ● ● ●● ●
● ●●● ● ●● ●
● ●
● ●● ● ●
● ●●●
● ● ●●
● ●
● ●

● ●●

●●● ●●
● ● ●

●●
● ●
● ● ●
●●● ●

● ● ●
IV
● ●
● ● ● ● ●
●● ●● ● ● ● ●● ● ●● ● ● ● ● ●
●● ● ● ● ● ● ●●
●● ● ●● ●
GRM
● ● ●●
●● ●●

● ●● ● ●
● ●
●● ●
● ● ● ● ●
● ● ●● ● ● ●●
●●● ●
●● ● ● ● ●
● ●● ●●●
●●
● ●● ●
● ● ●●●●●●● ●
●● ● ●


● ● ● ● ● ●● ●● ●
● ● ● ● ● ●● ● ●●
●●
●●● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ●● ●
10

10

10

● ●●
● ●● ● ●
●●●
● ● ●● ● ● ● ●
●●● ●●
● ● ● ●
Panels
● ● ● ●
● ●● ●● ●● ● ● ●
●● ●● ● ● ● ●●
●●● ●● ● ● ●
● ● ● ● ● ●● ●
●● ● ● ● ●●● ● ●
● ●
●● ●
●● ●● ● ● ●●
● ● ● ● ● ●● ●● ●
● ●● ● ● ● ● ●●
● ●

Max.Lik.Estim.
●●●

● ● ●● ● ●
● ●
●● ● ● ● ●
5

5
● ● ●

● ● ●

● ● ● Binary
0 50 100 150 200 250 300 0 10 20 30 40 50 0 20 40 60 80 100 Time Series
TV Radio Newspaper AR processes
MLE of AR processes
Source of the data An Introduction to Statistical Learning, with applications in R, (Springer, 2013) Forecasting

with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani. Appendix

Stat. Tables

58 / 295
Statistics and
Econometrics II
yi = xi,1 ◊—1 + · · · + xi,K ◊—K + Ái (4)
¸˚˙˝ ¸˚˙˝ ¸˚˙˝ ¸˚˙˝ J.-P. Renne
regressand th disturbance
first k
Prelim.
regressor regressor
CLT

Tests
I Using the notation xi = [xi,1 , . . . , xi,K ]Õ , we get:
Linear regression
Assumptions
yi = xiÕ — + Ái . OLS
scalar (K ◊1) scalar Inference
Common pitfalls

I — is the vector of unknown parameters. Large sample


IV
I Regressors are also called independent variables, explicative variables, or GRM

covariates. Panels
I We have a sample of n observations for yi and for xi . Max.Lik.Estim.

Binary
Remark 4.1
Time Series
I Important to distinguish between population quantities (— and Ái ) and AR processes

their sample estimates (denoted by b and ei ). MLE of AR processes


Forecasting
I In general we will have:
Appendix

Stat. Tables
yi = xiÕ — + Ái = xiÕ b + ei but — ”= b and Ái ”= ei .

59 / 295
Statistics and
Econometrics II

J.-P. Renne

Context and Objective Prelim.

I Context: Eq. (4). CLT

I Objective: Estimate — and test hypotheses based on this vector. Tests

Linear regression
Assumptions
OLS
Inference
Common pitfalls

Notations
Large sample
IV

We will use the notations y, Á and X, where:


GRM

Panels
S y T S Á T S xÕ T Max.Lik.Estim.
1 1 1
W y2 X W Á2 X W x2Õ X Binary
y = U . V, Á = U . V and X = U . V.
(n◊1)
. . (n◊1) . . (n◊K ) . . Time Series
yn Án xnÕ AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

60 / 295
Assumptions of the classical linear regression model Statistics and
Econometrics II

Assumption A.1 (Full rank) J.-P. Renne

There is no exact linear relationship among the independent variables (the xi s). Prelim.

CLT
Example of exact linear relationship among the xi s Tests

xi,1 = 3xi,2 ≠ xi,5 . Linear regression


Assumptions
OLS
Inference

Assumption A.2 (The conditional mean-zero assumption) Common pitfalls


Large sample
IV
GRM
E(Á|X) = 0.
Panels

Max.Lik.Estim.
Implications
Binary
Under Assumption A.2:
Time Series
i) E(Ái ) = 0; AR processes

ii) the xij s and the Ái s are uncorrelated, i.e. ’i, j Corr (xij , Ái ) = 0. MLE of AR processes
Forecasting

Proof i) By the law of iterated expectations (Prop. 1.2): Appendix

E(Á) = E(E(Á|X)) = E(0) = 0. Stat. Tables


ii) E(xij Ái ) = E(E(xij Ái |X)) = E(xij E(Ái |X)) = 0. ⌅
¸ ˚˙ ˝
=0
61 / 295
Statistics and
Econometrics II
Assumption A.3 (Homoskedasticity) J.-P. Renne

Prelim.
’i, Var (Ái |X) = ‡ 2 .
CLT

Tests

Linear regression
Figure 12: Heteroskedasticity (left) and homoskedasticity (right) Assumptions
OLS
Inference
6

● ●
●● ●

●●●
● ●
● Common pitfalls

6
● ●
● ●

● ● ●
●●● ●
4

●● ●
●●● ● ●
● ● Large sample
●● ● ●●
●●●
●● ●
●●● ●●● ● ● ● ●●
●● ●●● ●● IV

4
● ● ● ●●● ● ●
●●
●●
2

● ●●
● ● ●● ● ●
GRM
● ●●
●●
●● ● ● ●●● ● ●●
● ●
●● ● ● ●●
●● ●
Y

Y
● ●

2
●● ●● ●
● ● ●● ●●● ●●●●
0

Panels
● ● ● ●
● ●●
● ● ● ●
● ● ● ●● ● ●
●● ● ● ●

● ●● ● ●●
● ● ●
−2

0
●● ● ●
Max.Lik.Estim.
● ● ●
● ● ●
● ●● ●●
● ●
● ●
● ● ● ●
● ●
−4

−2
Binary


●●
●● ●

−2 −1 0 1 −2 −1 0 1 Time Series
AR processes
X X
MLE of AR processes
Xi ≥ N (0, 1) and Áú
i ≥ N (0, 1). ! " Forecasting
Left-hand side (heteroskedasticity): Yi = 2 + 2Xi + 21{X <0} + 0.21{X Ø0} i .
Áú
i i Appendix
Right-hand side (homoskedasticity): Yi = 2 + 2Xi + Áú
i .
Stat. Tables

62 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 13: Salaries versus years since end of PhD
Prelim.

● CLT

Tests
200000

● ●

● ● ● Linear regression
● ●
● ● Assumptions
● ●

● ● ● ●
● ● OLS
● ● ●
● ●● ● ● ●
● ●● ● Inference
150000


●● ●
●● ● ●●● ●
Salary

● ● ● ● ● ● ● ●● ● Common pitfalls
● ● ● ● ● ● ● ● ●
●● ●● ●
● ● ●● ● ● ● ●
●●●● ● Large sample
● ●
●●● ● ●●
● ● ●●● ●
●●●
● ● ● ● ●●● ●● ●● IV
● ●●● ● ● ●
● ● ●
● ● ● ●● ●● ● ●
●●●●

●●●● ● ●
● GRM
●● ● ● ●
● ● ●●●
● ● ●● ● ● ●● ●● ● ●●● ● ●
100000

● ● ●●●●●● ● ● ●● ● ●
Panels
●● ●●●●●● ●● ● ● ● ●
● ● ●
●●●● ● ● ●● ●● ● ● ● ● ●●● ●

●●●
● ●● ●●●● ●● ● ●
● ●●● ● ● ● ● ●● ● ● ●
● ●
● ● ● ● ● ● ●
●●● ● ● ● ●●●●● ●
Max.Lik.Estim.
●●
●● ●●● ● ● ● ● ● ●●●
●●●●● ● ● ●

●● ● ●●
● ●● ●● ● ● ● ● ●
●● ● ●
●● ● ● ●● ● ● ●
● ●● ●●●● ● ●
Binary
●●●● ● ● ● ●
●● ● ● ●
● ● ●

Time Series
AR processes
0 10 20 30 40 50
MLE of AR processes
Forecasting
years since PhD
Appendix
Data from Fox J. and Weisberg, S. (2011).
Stat. Tables

63 / 295
Statistics and
Econometrics II
Assumption A.4 (Non-correlated residuals)
J.-P. Renne
’i ”= j, Cov (Ái , Áj |X) = 0.
Prelim.

CLT
Remark 4.2 (Implication of A.3 and A.4)
Tests
If A.3 and A.4 hold, then: Linear regression
Assumptions
2
Var (Á|X) = ‡ Id, OLS
Inference
Common pitfalls

where Id is the n ◊ n identity matrix. Large sample


IV
GRM

Panels

Assumption A.5 (Normal distribution) Max.Lik.Estim.

Binary
’i, Ái ≥ N (0, ‡ 2 ).
Time Series
AR processes

About normality MLE of AR processes


Forecasting

As we will see later (Slide 108), even if this assumption is not satisfied, the Appendix
normal distribution will show up in asymptotic results (by virtue of the Central
Stat. Tables
Limit Theorem 9.6).

64 / 295
The least square coefficient vector Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests
I For a given vector of coefficients b, the sum of square residuals is: Linear regression
Assumptions

n
A K
B2 n OLS
ÿ ÿ ÿ Inference
f (b) = yi ≠ xi,j bj = (yi ≠ xiÕ b)2 Common pitfalls
Large sample
i=1 j=1 i=1 IV
GRM

I Minimizing the sum of squared residuals amounts to minimizing: Panels

Max.Lik.Estim.
f (b) = (y ≠ Xb)Õ (y ≠ Xb).
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

65 / 295
Statistics and
Econometrics II

J.-P. Renne

I We have (see Def. 9.14 for the computation of ˆf


(b)): Prelim.
ˆb
CLT
ˆf
(b) = ≠2XÕ y + 2XÕ Xb. Tests
ˆb
Linear regression
I Necessary first-order condition (FOC): Assumptions
OLS
Inference

XÕ Xb = XÕ y. (5)
Common pitfalls
Large sample
IV
GRM
I Under Assumption A.1, XÕ X is invertible. Hence:
Panels

Max.Lik.Estim.
b = (XÕ X)≠1 XÕ y.
Binary

I b minimises the sum of squared residuals. (f is a non-negative quadratic Time Series


AR processes
function, it admits a minimum.) MLE of AR processes
Forecasting

Appendix

Stat. Tables

66 / 295
Statistics and
Example 4.4 (Income, Seniority and Education (Example 4.2)) Econometrics II

J.-P. Renne
Figure 14: Income, Seniority and Education
Prelim.

CLT

Tests


● ● ●

● ●

● ●

● Linear regression
● ● ● ●●
Incom ● Assumptions

OLS

Inference
e

● ● ●
● ●
● ● Common pitfalls

ity
● ●
Large sample

r

nio
Ed IV

Se
uc atio
n GRM

Panels

Max.Lik.Estim.

Binary
Estimate Std. Error t value Pr(>|t|) Time Series
(Intercept) -50.086 5.999 -8.349 0.000 AR processes

Education 5.896 0.357 16.513 0.000 MLE of AR processes

Seniority 0.173 0.024 7.079 0.000


Forecasting

Appendix
Table 1: Income, Education and Seniority Stat. Tables

67 / 295
Statistics and
Econometrics II
I The estimated residuals are: J.-P. Renne

e = y ≠ X(XÕ X)≠1 XÕ y = My Prelim.

CLT
where M := I ≠ X(XÕ X)≠1 XÕ is called the residual maker matrix. Tests
I We have MX = 0: if one regresses one of the explanatory variables on
Linear regression
X, the residuals are null. Assumptions

I We have My = MÁ (because y = X— + Á and MX = 0). OLS


Inference
I The fitted values are: Common pitfalls
Large sample
IV
ŷ = X(XÕ X)≠1 XÕ y = Py, GRM

Panels
where P = X(XÕ X)≠1 XÕ is a projection matrix. (ŷ is the projection of
Max.Lik.Estim.
the vector y onto the vectorial space spanned by the columns of X.)
Binary

Time Series
Remark 4.3 AR processes

I It can be shown that each column x̃k of X is orthogonal to e. MLE of AR processes


Forecasting

∆ If intercepts are included in the regression (xi,1 © 1), the average of the Appendix
residuals is null.
Stat. Tables

68 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests
Properties of M and P Linear regression
I M is symmetric (M = MÕ ) and idempotent (M = M2 = Mk for k > 0). Assumptions
OLS

I P is symmetric and idempotent. Inference


Common pitfalls
I PX = X. Large sample
IV
I PM = MP = 0. GRM

I y = Py + My (two orthogonal parts). Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

69 / 295
Statistics and
Econometrics II

Proposition 4.1 (Properties of the OLS estimator) J.-P. Renne

(i) Under Assumptions A.1 and A.2, the OLS estimator is linear and Prelim.

unbiased. CLT

(ii) Under Assumptions A.1 to A.4, the conditional covariance matrix of b Tests
is: Var (b|X) = ‡ 2 (XÕ X)≠1 . Linear regression
Assumptions
OLS
Proof Under A.1, XÕ X can be inverted. We have: Inference
Common pitfalls
Õ ≠1 Õ Õ ≠1 Õ
b = (X X) X y = — + (X X) X Á. Large sample
IV
GRM

Panels
(i) Let us consider the expectation of the last term, i.e. E((XÕ X)≠1 XÕ Á). Using the law of iterated
expectations (Prop. 1.2), we obtain: Max.Lik.Estim.

Õ ≠1 Õ Õ ≠1 Õ Õ ≠1 Õ Binary
E((X X) X Á) = E(E[(X X) X Á|X]) = E((X X) X E[Á|X]).
Time Series
AR processes
By A.2, we have E[Á|X] = 0. Hence E((XÕ X)≠1 XÕ Á) = 0 and result (i) follows.
MLE of AR processes
(ii) Var (b|X) = (X X) X E(ÁÁ |X)X(X X) . By Remark 4.2, if A.3 and A.4 hold, then we
Õ ≠1 Õ Õ Õ ≠1
Forecasting

have E(ÁÁÕ |X) = Var (Á|X) = ‡ 2 Id. The result follows. ⌅


Appendix

Stat. Tables

70 / 295
Statistics and
Example 4.5 (Bivariate case (K = 2)) Econometrics II
I If K = 2 and if the regression includes a constant (xi,1 = 1), X is a n J.-P. Renne
matrix whose i th row is [1, xi,2 ]. Let us use the notation wi = xi,2 :
Prelim.
5 q 6 CLT
Õ
X X = qn q i w2i ,
wi wi Tests
i i
5 q q 6 Linear regression
2
Õ
(X X)
≠1
= q 1
q qi wi ≠
i
wi
, Assumptions

n w2 ≠( wi )2 ≠ wi n OLS
i i i i
Inference
5 q q q q 6
wi2
Common pitfalls
1 yi ≠ wi wi yi
Õ
(X X)
≠1 Õ
X y = q q iq iq i q i Large sample

n w2 ≠( wi )2 ≠ wi yi + n wi yi IV
i i i i i i
5 ȳ
q 2 w̄ q 6 GRM

= q 1 n
1
qi wi ≠ n i wi yi . Panels
1 (wi ≠ w̄ )2 n
(wi ≠ w̄ )(yi ≠ ȳ )
n i i Max.Lik.Estim.

Binary
I It can be seen that the second element of b = (XÕ X)≠1 XÕ y is:
Time Series
AR processes

Cov (W , Y ) MLE of AR processes


b2 = , Forecasting
Var (W )
Appendix

Stat. Tables
where Cov (W , Y ) and Var (W ) are sample estimates.
I Since there is a constant in the regression, we have b1 = ȳ ≠ b2 w̄ .

71 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Theorem 4.1 (Gauss-Markov Theorem)
CLT

Under Assumptions A.1 to A.4, for any vector w , the minimum-variance linear Tests
unbiased estimator of w Õ — is w Õ b, where b is the least squares estimator. Linear regression
(BLUE: Best Linear Unbiased Estimator.) Assumptions
OLS
Proof.: Consider bú = C y, another linear unbiased estimator of —. Since it is unbiased, we must have Inference
E(C y|X) = E(C X— + C Á|X) = —. We have E(C Á|X) = C E(Á|X) = 0 (by A.2). Therefore bú is Common pitfalls
unbiased if E(C X)— = —. This has to be the case for any —, which implies that we must have C X = I. Large sample
Let us compute Var (bú |X). For this, we introduce D = C ≠ (XÕ X)≠1 XÕ , which is such that IV
Dy = bú ≠ b. The fact that C X = I implies that DX = 0. GRM
We have Var (bú |X) = Var (C y|X) = Var (C Á|X) = ‡ 2 CC Õ (by Assumptions A.3 and A.4, see
Remark 4.2). Using C = D + (XÕ X)≠1 XÕ and exploiting the fact that DX = 0 leads to: Panels

# $ Max.Lik.Estim.
ú 2 Õ ≠1 Õ Õ ≠1 Õ Õ 2 Õ
Var (b |X) = ‡ (D + (X X) X )(D + (X X) X ) = Var (b|X) + ‡ DD . Binary

Time Series
Therefore, we have Var (w Õ bú |X) = w Õ Var (b|X)w + ‡ 2 w Õ DDÕ w Ø w Õ Var (b|X)w = Var (w Õ b|X). ⌅ AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

72 / 295
Statistics and
Econometrics II

J.-P. Renne

Remark 4.4 Prelim.

Prop. 4.1, Theorems 4.1 and 4.2 (next slide) hold for any sample size. CLT

Tests

Notations 4.1 Linear regression


Assumptions

Consider the linear least square regression of y on X. We introduce the


OLS
Inference
notations: Common pitfalls

I by/X : OLS estimates of —, Large sample


IV

I MX : residual-maker matrix of any regression on X (see Slide 68), GRM

I PX : projection matrix of any regression on X (see Slide 68). Panels

Max.Lik.Estim.
I Consider the case where we have two sets of explanatory variables: Binary
X = [X1 , X2 ]. Time Series
I With obvious notations: by/X = [bÕ1 , bÕ2 ]Õ . AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

73 / 295
Statistics and
Econometrics II
Theorem 4.2 (Frisch-Waugh Theorem)
J.-P. Renne

We have: X1
y/MX1 X2 Prelim.
b2 = bM .
CLT

Proof The minimization of the least squares leads to (these are first-order conditions, see Eq. 5): Tests

Ë ÈË È Ë È Linear regression
XÕ1 X1 XÕ1 X2 b1 XÕ1 y Assumptions
= .
XÕ2 X1 XÕ2 X2 b2 XÕ2 y OLS
Inference
Common pitfalls
Use the first-row block of equations to solve for b1 first; it comes as a function of b2 . Then use the Large sample
second set of equations to solve for b2 , which leads to: IV
GRM

Õ X ≠1 Õ X
[X2 M 1 X2 ] X2 M 1 y.
Õ Õ Õ Õ ≠1 Õ Õ Õ
b2 = [X2 X2 ≠ X2 X1 (X1 X1 )X1 X2 ] X2 (Id ≠ X1 (X1 X1 )X1 )y = Panels

Max.Lik.Estim.
Using the fact that MX1 is idempotent and symmetric leads to the result. ⌅
Binary

Time Series
Remark 4.5 AR processes
MLE of AR processes

This suggests a second way of estimating b2 : Forecasting

1. Regress Y on X1 , regress X2 on X1 . Appendix

2. Regress the former residuals on the latter. Stat. Tables

74 / 295
Statistics and
Econometrics II

J.-P. Renne
200

Prelim.
150

CLT

Tests
100

Linear regression
Assumptions
50

OLS
Inference
Common pitfalls
2005 2006 2007 2008 2009 2010 2011 Large sample
IV
GRM

Figure 15: Google searches for “parapluie” (red) versus precipitations (black) Panels

Max.Lik.Estim.

Binary

Estimate Std. Error t value Pr(>|t|) Time Series


(Intercept) 5.539 2.055 2.695 0.009 AR processes

precip 0.160 0.031 5.150 0.000


MLE of AR processes
Forecasting

Table 2: Regression of Google searches on precipitations Appendix

Stat. Tables

75 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.762 1.954 11.138 0.000 CLT

monthly.dummy1 -9.011 2.763 -3.261 0.002 Tests


monthly.dummy2 -13.337 2.763 -4.827 0.000 Linear regression
monthly.dummy3 -8.188 2.763 -2.963 0.004 Assumptions
monthly.dummy4 -5.590 2.763 -2.023 0.048 OLS

monthly.dummy5 -3.457 2.763 -1.251 0.216 Inference

monthly.dummy6 -5.425 2.763 -1.964 0.054


Common pitfalls
Large sample
monthly.dummy7 -9.893 2.763 -3.581 0.001 IV

monthly.dummy8 -9.378 2.763 -3.394 0.001 GRM

monthly.dummy9 -6.154 2.763 -2.227 0.030 Panels


monthly.dummy10 -5.906 2.763 -2.137 0.037
Max.Lik.Estim.
monthly.dummy11 2.793 2.763 1.011 0.316
Binary
Table 3: Deseasonalization regression of Google searches Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

76 / 295
Statistics and
Estimate Std. Error t value Pr(>|t|) Econometrics II
(Intercept) 0.000 0.465 0.000 1.000 J.-P. Renne
deseas.precip 0.108 0.025 4.271 0.000
Prelim.
Table 4: Regression of deseas. Google searches on deseas. precipitations
CLT

Tests

Linear regression
Estimate Std. Error t value Pr(>|t|) Assumptions

(Intercept) 14.233 2.601 5.472 0.000 OLS


Inference
precip 0.108 0.027 3.921 0.000 Common pitfalls

monthly.dummy1 -7.282 2.521 -2.889 0.005 Large sample

monthly.dummy2 -11.507 2.525 -4.557 0.000 IV


GRM
monthly.dummy3 -7.429 2.489 -2.984 0.004
monthly.dummy4 -4.346 2.502 -1.737 0.088 Panels

monthly.dummy5 -4.118 2.487 -1.656 0.103 Max.Lik.Estim.


monthly.dummy6 -4.247 2.500 -1.699 0.095 Binary
monthly.dummy7 -8.455 2.509 -3.370 0.001
monthly.dummy8 -8.553 2.491 -3.434 0.001 Time Series

monthly.dummy9 -5.374 2.490 -2.159 0.035


AR processes
MLE of AR processes
monthly.dummy10 -5.580 2.483 -2.247 0.028 Forecasting

monthly.dummy11 2.036 2.489 0.818 0.417 Appendix

Table 5: Regression of Google searches on [precipitations + Dummies] Stat. Tables

77 / 295
Partial regression coefficients Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression

Special case: x2 is scalar Assumptions


OLS

I When b2 is scalar (and then X2 is of dimension n ◊ 1), Theorem 4.2 Inference


Common pitfalls
leads to: Large sample
IV

XÕ2 M X1 y GRM

b2 = (partial regression coefficient). Panels


XÕ2 M X1 X2
Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

78 / 295
Goodness of fit Statistics and
Econometrics II

J.-P. Renne
I Total variation in y = sum of squared deviations:
Prelim.
n
ÿ CLT
TSS = (yi ≠ ȳ )2 .
Tests
i=1
Linear regression
I We have: Assumptions

y = Xb + e = ŷ + e
OLS
Inference
Common pitfalls
I In the following, we assume that the regression includes a constant (i.e. Large sample
for all i, xi,1 = 1). IV
GRM
I Let us denote by M0 the matrix that transforms observations into
Panels
deviations from sample means. Using that M0 e = e and that XÕ e = 0,
we have: Max.Lik.Estim.

Binary
yÕ M0 y = (Xb + e)Õ M0 (Xb + e)
¸ ˚˙ ˝ Time Series
Total sum of sq. AR processes
MLE of AR processes
0
= b X M Xb
Õ Õ
+ ee
Õ Forecasting
¸ ˚˙ ˝ ¸˚˙˝
Appendix
"Explained" sum of sq. Sum of sq. residuals
Stat. Tables
TSS = Expl.SS + SSR.

79 / 295
Coefficient of determination R 2 Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests
Expl.SS SSR eÕ e
Coefficient of determination = =1≠ =1≠ Õ 0 . (6) Linear regression
TSS TSS yM y Assumptions
OLS
Inference
Common pitfalls
Remark 4.6 Large sample
IV
I It can be shown [Greene, 2012, Section 3.5] that: GRM

qn Panels
[ (yi ≠ ȳ )(yˆi ≠ ȳ )]2
Coefficient of determination = qn i=1 qn . Max.Lik.Estim.

i=1
(yi ≠ ȳ )
2
i=1
(yˆi ≠ ȳ )2 Binary

Time Series
∆ R 2 is the sample squared correlation between y and the AR processes

(regression-implied) y ’s predictions. MLE of AR processes


Forecasting

Appendix

Stat. Tables

80 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 16: Bivariate regressions with low/high R 2
Prelim.

CLT
Low R2 High R2
Tests

Linear regression
● ●
●● ●

3
4

● ●
● ● ● Assumptions
● ●

● ● ● ●
● OLS
● ●●●
●● ● ●●
3

Inference
● ●
● ●
● ● ●

2
● ● ●● ●●

● ●
Common pitfalls
● ●●●
● ● ●● ● ● ●●
● ●
●● ● ●●


● ●●
2

Large sample
●●
● ● ● ● ●●●●
● ● ●● ● ●●
●●●●
● ● ●
IV
● ● ●

1
●● ●● ● ●●
● ●
●●
Y

Y
● ● ● ● ●
GRM
1

● ● ● ● ●●●●●●
● ●● ● ●●●●

●● ●

● ● ● ● ●
Panels
●● ●
●●
● ●
●● ● ●● ● ●●

0


0

● ●
● ●●
● ●● ●

Max.Lik.Estim.
● ●
● ● ● ● ● ●
● ●
−1

● ● ●

−1
● ● ●



● ●
● Binary

−2

● ● Time Series
−2

AR processes
−3 −2 −1 0 1 2 −3 −2 −1 0 1 2
MLE of AR processes

X X Forecasting

Appendix

Stat. Tables

81 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Definition 4.1 (Partial correlation coefficient) Tests

The partial correlation between y and z, controlling for some variables X is Linear regression

the sample correlation between y ú and z ú , where the latter two variables are
Assumptions
OLS
the residuals in regressions of y on X and of z on X, respectively. Inference
Common pitfalls
Large sample
The correlation is denoted by X.
ryz By definition, we have: IV
GRM

zúÕ yú Panels
X
ryz =  . (7)
Max.Lik.Estim.
(zúÕ zú )(yúÕ yú )
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

82 / 295
Statistics and
Econometrics II
Proposition 4.2 (Change in SSR (see Slide 79) when a variable is added) J.-P. Renne

We have: Prelim.
uÕ u = eÕ e ≠ c 2 (zúÕ zú ) (Æ eÕ e) (8) CLT

where (i) u and e are the residuals in the regressions of y on [X, z] and of y on Tests
X, respectively, (ii) c is the regression coefficient on z in the former regression Linear regression
and where zú are the residuals in the regression of z on X. Assumptions
OLS
Proof.: The OLS estimates [dÕ , c]Õ in the regression of y on [X, z] satisfies (first-order cond., Eq. 5) Inference
Common pitfalls
Ë ÈË È Ë È Large sample
XÕ X XÕ z d XÕ y
= . IV
zÕ X zÕ z c zÕ y
GRM

Panels
Hence, in particular d = b ≠ (XÕ X)≠1 XÕ zc, where b is the OLS of y on X. Substituting in
u = y ≠ Xd ≠ zc, we get u = e ≠ zú c. We therefore have: Max.Lik.Estim.

Binary
Õ ú ú Õ 2 úÕ ú úÕ
u u = (e ≠ z c)(e ≠ z c) = e e + c (z z ) ≠ 2cz e. (9)
Time Series
AR processes
MLE of AR processes
Now zú Õ e = zú Õ (y ≠ Xb) = zú Õ y because zú are the residuals in an OLS regression on X. Since
Forecasting

c = (zú Õ zú )≠1 zú Õ yú (by an application of Theorem 4.2), we have (zú Õ zú )c = zú Õ yú and, therefore, Appendix
zú Õ e = (zú Õ zú )c. Inserting this in Eq. (9) leads to the results. ⌅ Stat. Tables

83 / 295
Statistics and
Proposition 4.3 (Change in R 2 when a variable is added) Econometrics II

J.-P. Renne
Denoting by RW2 the coefficient of determination in the regression of y on

some variable W, we have: Prelim.

CLT
2
RX,z = RX2 + (1 ≠ RX2 )(ryz
X 2
) , Tests

Linear regression
X is the coefficient of partial correlation (Def. 4.1)
where ryz Assumptions
OLS
Inference
Proof.: Let’s use the same notations as in Prop. 4.2. Theorem 4.2 implies that c = (z z ) z
ú Õ ú ≠1 ú Õ ú
y .
Common pitfalls
Using this in Eq. (8) gives uÕ u = eÕ e ≠ (zú Õ yú )2 /(zú Õ zú ). Using the definition of the partial
! " Large sample
X 2
correlation (Eq. 7), we get u u = e e 1 ≠
Õ Õ
(ryz ) . The results is obtained by dividing both sides of IV
GRM
the previous equation by yÕ M0 y. ⌅
Panels

Max.Lik.Estim.
I The previous theorem shows that we necessarily increase the R 2 if we
add variables, even if they are irrelevant. Binary

I The adjusted R 2 , denoted by R̄ 2 , is a fit measure that penalizes large Time Series
numbers of regressors: AR processes
MLE of AR processes
Forecasting

2 eÕ e/(n ≠ K ) n≠1 Appendix


R̄ = 1 ≠ Õ 0 =1≠ (1 ≠ R 2 ).
y M y/(n ≠ 1) n≠K Stat. Tables

84 / 295
Inference and Prediction Statistics and
Econometrics II

J.-P. Renne
I Under the normality assumption (Assumption A.5), we know the
Prelim.
distribution of b (conditional on X).
CLT
I Indeed, (b|X) © (XÕ X)≠1 XÕ y is multivariate Gaussian:
Tests
2
b|X ≥ N (—, ‡ (X X) Õ ≠1
). (10) Linear regression
Assumptions

I Problem: In practice, we do not know ‡ 2 (population parameter). OLS


Inference
Common pitfalls
Large sample
2
Proposition 4.4 (Conditional expectation of s ) IV
GRM

Under A.1 to A.4, an unbiased estimate of ‡2 is given by: Panels

Max.Lik.Estim.
eÕ e
s2 = . (11) Binary
n≠K
Time Series
AR processes
(It is sometimes denoted by 2
‡OLS .) MLE of AR processes
Forecasting
2
Proof.: E(e e|X) = E(Á MÁ|X) = E(Tr(Á MÁ)|X)) = Tr(ME(ÁÁ |X)) = ‡ Tr(M). (Note that we
Õ Õ Õ Õ
Appendix
have E(ÁÁÕ |X) = ‡ 2 Id by Assumptions A.3 and A.4, see Remark 4.2.) Finally:
Stat. Tables
Tr(M) = n ≠ Tr(X(XÕ X)≠1 XÕ ) = n ≠ Tr((XÕ X)≠1 XÕ X) = n ≠ Tr(IdK ◊K ) (see also Prop. 9.2). ⌅

85 / 295
Statistics and
Econometrics II

I Two results will prove important to perform hypothesis testing: J.-P. Renne

(i) We know the distribution of s 2 (Prop. 4.5). Prelim.


(ii) s 2 and b are independent random variables (Prop. 4.6).
CLT

Tests
Proposition 4.5 (Distribution of s 2 )
Linear regression
s2 Assumptions

Under A.1 to A.5, we have: 2 |X ≥ ‰2 (n ≠ K )/(n ≠ K ). OLS


‡ Inference
Common pitfalls
Large sample
Proof I We have e e = Á MÁ. M is an idempotent symmetric matrix. Therefore it can be
Õ Õ
IV
decomposed as PDP Õ where D is a diagonal matrix and P is an orthogonal matrix. As a result GRM
eÕ e = (P Õ Á)Õ D(P Õ Á), i.e. eÕ e is a weighted sum of independent squared Gaussian variables (the
entries of P Õ Á are independent because they are Gaussian – under A.5 – and uncorrelated). The Panels
variance of each of these i.i.d. Gaussian variable is ‡ 2 .
Max.Lik.Estim.
I Because M is an idempotent symmetric matrix, its eigenvalues are either 0 or 1 (Prop. 9.1) and
its rank equals its trace (Prop. 9.2). Further, its trace is equal to n ≠ K (see proof of Eq. (11)). Binary
Therefore D has n ≠ K entries equal to 1 and K equal to 0.
Time Series
I Hence, eÕ e = (P Õ Á)Õ D(P Õ Á) is a sum of n ≠ K squared independent Gaussian variables of AR processes
variance ‡ 2 . MLE of AR processes
2
I Therefore eÕ e = (n ≠ K) s 2 is a sum of n ≠ k squared i.i.d. standard normal variables. ⌅ Forecasting
‡2 ‡
Appendix

Stat. Tables

86 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests
Proposition 4.6 (Independence of s 2 and b)
Linear regression
Under A.1 to A.5, b and s 2 are independent. Assumptions
OLS
Inference
Common pitfalls
Proof We have b = — + [XÕ X]≠1 XÁ and s 2 = ÁÕ MÁ/(n ≠ K ). Hence b is an affine combination of Á Large sample
and s 2 is a quadratic combination of the same Gaussian shocks. One can write s 2 as IV
s 2 = (MÁ)Õ MÁ/(n ≠ K ) and b as — + TÁ. Since TM = 0, TÁ and MÁ are independent (because two GRM
uncorrelated Gaussian variables are independent), therefore b and s 2 , which are functions of respective
independent variables, are independent. ⌅ Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

87 / 295
Statistics and
I Under A.1 to A.5, let us consider bk , the k th entry of b: Econometrics II

J.-P. Renne
bk |X ≥ N (—k , ‡ 2 vk ),
Prelim.
where vk is the kth component of the diagonal of (XÕ X)≠1 . CLT
I Besides, we have (Prop. 4.5): Tests

Linear regression
(n ≠ K )s 2
|X ≥ ‰2 (n ≠ K ). Assumptions

‡2 OLS
Inference
Common pitfalls
I As a result (using Props. 4.5 and 4.6), we have: Large sample
IV
GRM
bk ≠—k
Ô
‡ 2 vk bk ≠ —k Panels
tk = Ò =  ≥ t(n ≠ K ), (12)
Max.Lik.Estim.
(n≠K )s 2 s 2 vk
‡ 2 (n≠K ) Binary

Time Series
where t(n ≠ K ) denotes a t distribution with n ≠ K degrees of freedom AR processes

(Def. 9.5). MLE of AR processes


Forecasting
bk ≠—k (n≠K )s 2
I Remark: Ô |X ≥ N (0, 1) and ‡2
|X ≥ ‰2 (n ≠ K ). These two Appendix
‡ 2 vk
distributions do not depend on X ∆ the marginal distribution of tk is Stat. Tables
also t.

88 / 295
Statistics and
Remark 4.7 Econometrics II

I s 2 vk is not exactly the conditional variance of bk : J.-P. Renne


The variance of bk conditional on X is ‡ 2 vk .
Prelim.
I However s 2 vk is an unbiased estimate of ‡ 2 vk (by Prop. 4.4).
CLT

I The previous result (Eq. 12) can be extended to any linear combinations Tests

of elements of b (Eq. 12 is for its k th component only). Linear regression


Assumptions
I Let us consider –Õ b, the OLS estimate of –Õ —. OLS

I From Eq. (10), we have: Inference


Common pitfalls
Large sample
2
– b|X ≥ N (– —, ‡ – (X X)
Õ Õ Õ Õ ≠1
–). IV
GRM

I Therefore: Panels
–Õ b ≠ –Õ — Max.Lik.Estim.
 |X ≥ N (0, 1).
‡ 2 –Õ (XÕ X)≠1 – Binary

I Using the same approach as the one used to derive Eq. (12), one can Time Series

show that Props. 4.5 and 4.6 imply that:


AR processes
MLE of AR processes
Forecasting

–Õ b ≠ –Õ — Appendix
 ≥ t(n ≠ K ). (13)
s 2 –Õ (XÕ X)≠1 – Stat. Tables

89 / 295
Statistics and
Econometrics II
Figure 17: Normal and Student-t distributions J.-P. Renne

Prelim.
0.4 N(0,1)
CLT
t(3)
t(7) Tests
t(20)
Linear regression
0.3

Assumptions
OLS
Inference
Common pitfalls
0.2

Large sample
IV
GRM

Panels
0.1

Max.Lik.Estim.

Binary
0.0

Time Series
AR processes
−3 −2 −1 0 1 2 3
MLE of AR processes
Forecasting
X
Appendix
Note: See Def. 9.5 of the t distribution. The chart shows that the higher the degree of freedom ‹, the Stat. Tables
closer the distribution of t(‹) gets to the normal distribution. See also Table 23.

90 / 295
Confidence interval of —k Statistics and
Econometrics II
I Assume we want to compute a (symmetrical) confidence interval
J.-P. Renne
[Id,1≠– , Iu,1≠– ] that is such that P(—k œ [Id,1≠– , Iu,1≠– ]) = 1 ≠ –.
I In particular, we want to have: P(—k < Id,1≠– ) = – . Prelim.
2
I For this purpose, we make use of tk = Ô bk ≠—k CLT
≥ t(n ≠ K ) (Eq. 12).
s 2 vk
Tests
I We have:
– Linear regression
P(—k < Id,1≠– ) = …
2 Assumptions
OLS
Inference
A B A B Common pitfalls

bk ≠ —k bk ≠ Id,1≠– – bk ≠ Id,1≠– – Large sample

P  >  = …P tk >  = … IV

s 2 vk s 2 vk 2 s 2 vk 2 GRM

A B Panels
bk ≠ Id,1≠– bk ≠ Id,1≠–
1 2
– – Max.Lik.Estim.
1≠P tk Æ  = …  = ≠1
t(n≠K )
1≠ ,
s 2 vk 2 s2v 2 Binary
k

Time Series
where t(n≠K ) (–) is the c.d.f. of the t(n ≠ K ) distribution (Table 23). AR processes
MLE of AR processes
I Doing the same for Iu,1≠– , we obtain: Forecasting

Appendix
[Id,1≠– , Iu,1≠– ] =
Ë 1 2 1 2 È Stat. Tables
– –
bk ≠ ≠1
t(n≠K )
1≠ s 2 vk , bk + ≠1
t(n≠K )
1≠ s 2 vk .
2 2
91 / 295
Statistics and
Econometrics II
Example 4.6 (Income, Seniority and Education (Example 4.2))
J.-P. Renne
Estimate Std. Error t value Pr(>|t|)
(Intercept) -50.086 5.999 -8.349 0.000 Prelim.
Education 5.896 0.357 16.513 0.000 CLT
Seniority 0.173 0.024 7.079 0.000
Tests

Table 6: Income, Education and Seniority Linear regression


Assumptions
OLS
Inference
The data are the same as in Example 4.2. Common pitfalls
Large sample
IV

Example 4.7 (Advertising and sales (Example 4.3)) GRM

Panels
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.939 0.312 9.422 0.000 Max.Lik.Estim.

TV 0.046 0.001 32.809 0.000 Binary


radio 0.189 0.009 21.893 0.000 Time Series
newspaper -0.001 0.006 -0.177 0.860 AR processes
MLE of AR processes
Table 7: Advertising and sales Forecasting

Appendix

Stat. Tables
The data are the same as in Example 4.3.

92 / 295
Statistics and
Econometrics II

J.-P. Renne

I In Examples 4.6 and 4.7, the last two columns give the test statistic and Prelim.
p-values associated to the test (Section 3) whose null hypothesis is:
CLT

Tests
H0 : —k = 0.
Linear regression
 Assumptions
I The t-statistics, that is bk / s 2 vk , is the test statistic of the test. OLS

I Under H0 , the t-statistic is t(n ≠ K ) (see Eq. 12). Hence, the critical Inference
Common pitfalls
region (see Slide 38) for the test of size – is: Large sample
IV
È 1 2È Ë 1 2 Ë GRM
– –
≠Œ, ≠ ≠1
t(n≠K )
1≠ fi ≠1
t(n≠K )
1≠ , +Œ .
2 2 Panels

Max.Lik.Estim.
I The p-value is defined as the probability that |Z | > |t|, where t is the
Binary
(computed) t statistics and where Z ≥ t(n ≠ K ). That is, the p-value is
given by 2(1 ≠ t(n≠K ) (|tk |)). Time Series
AR processes
MLE of AR processes
Charts on tests (critical region, level, p-value)
Forecasting

Appendix

Stat. Tables

93 / 295
Set of linear restrictions Statistics and
Econometrics II

I We consider the following model: J.-P. Renne

Prelim.
y = X— + Á, Á ≥ i.i.d.N (0, ‡ 2 ).
CLT

and we want to test for the joint validity of a set of restrictions Tests
involving the components of — in a linear way. Linear regression
I Set of linear restrictions: Assumptions
OLS
Inference
r1,1 —1 + · · · + r1,K —K = q1 Common pitfalls

.. .. Large sample

. . (14) IV
GRM
rJ,1 —1 + · · · + rJ,K —K = qJ ,
Panels

that can be written in matrix form: Max.Lik.Estim.

Binary
R— = q. (15)
Time Series
AR processes
MLE of AR processes

Example 4.8 (Advertising and sales (Examples 4.3 and 4.7)) Forecasting

Appendix
Is the effect of advertising on TV (one unit) equal to that of advertising on
Stat. Tables
radio (one unit)?
∆ R = [0, 1, ≠1, 0], q = 0.

94 / 295
Statistics and
Econometrics II
I Discrepancy vector: m = Rb ≠ q. J.-P. Renne

I Under the null hypothesis:


Prelim.

CLT
E(m|X) = R— ≠ q = 0 and
Tests
Var (m|X) = RVar (b|X)R .Õ

Linear regression
I Under A.1 to A.4, Var (m|X) = ‡ 2 R(XÕ X)≠1 RÕ (see Prop. 4.1). Assumptions
OLS
I Test: Inference

H0 : R— ≠ q = 0 against H1 : R— ≠ q ”= 0. (16) Common pitfalls


Large sample
IV
I We could perform a Wald test. The test statistic is denoted by W . GRM

Under A.1 to A.5 (we need the normality assumption) and under H0 , we Panels
have:
Max.Lik.Estim.
W = mÕ Var (m|X)≠1 m ≥ ‰2 (J) (17)
Binary
(see Prop. 9.9).
Time Series
I However, ‡ 2 is unknown. Hence we cannot compute W . AR processes

We can however approximate it be replacing ‡ 2 by s 2 . The distribution MLE of AR processes

of this new statistic is not ‰2 (J) any more; it is an F distribution, and Forecasting

the test is called F test (see next slide). Appendix

Stat. Tables

95 / 295
Statistics and
Econometrics II
Proposition 4.7 (Definition and distribution of the F stat.)
J.-P. Renne

Under Assumptions A.1 to A.5 and if Eq. (16) holds, we have:


Prelim.

CLT
W ‡2 mÕ (R(XÕ X)≠1 RÕ )≠1 m
F = = ≥ F (J, n ≠ K ), (18)
J s2 s2J Tests

Linear regression
where F is the distribution of the F-statistic (see Def. 9.4). Assumptions
OLS
Inference

Proof According to Eq. (17), W /J ≥ ‰2 (J)/J. Consider the denominator s 2 /‡ 2 . We have Common pitfalls

s 2 = eÕ e/(n ≠ K ) (Eq. (11)) and eÕ e = ÁÕ MÁ. By Prop. 9.2, the latter term is ≥ ‰2 (n ≠ K ). Large sample

Therefore, F is the ratio of a r.v. distributed as ‰2 (J)/J and another distributed as IV


GRM
‰2 (n ≠ K )/(n ≠ K ). It remains to verify that these r.v. are independent.
Panels
Under H0 , we have m = R(b ≠ —) = R(XÕ X)≠1 XÕ Á. Therefore mÕ (R(XÕ X)≠1 RÕ )≠1 m is of the form
Max.Lik.Estim.
ÁÕ TÁ with T = DÕ CD where D = R(XÕ X)≠1 XÕ and C = (R(XÕ X)≠1 RÕ )≠1 .
Binary
Under A.1 to A.4, the covariance between TÁ and MÁ is ‡ 2 TM = 0. Therefore, under A.5, these
variables are Gaussian variables with 0 covariance. Hence they are independent. ⌅ Time Series
AR processes
MLE of AR processes
Forecasting

Remark 4.8 Appendix

For large n ≠ K , the FJ,n≠K distribution converges to FJ,Œ = ‰2 (J)/J. Stat. Tables

96 / 295
Proposition 4.8 (Other expressions of the F -statistics) Statistics and
Econometrics II

The F-statistic defined by Eq. (18) is also equal to: J.-P. Renne

Prelim.
(R 2 ≠ Rú2 )/J (SSRrestr ≠ SSRunrestr )/J
F = = , (19) CLT
(1 ≠ R 2 )/(n ≠ K ) SSRunrestr /(n ≠ K )
Tests

where Rú2 is the coef. of determination (Eq. 6) of the “restricted regression” Linear regression
(SSR: sum of squared residuals, see Slide 79.) Assumptions
OLS
Inference
Proof Let’s denote by eú = y ≠ Xbú the vector of residuals associated to the "restricted regression" Common pitfalls
(i.e. Rbú = q). We have eú = e ≠ X(bú ≠ b). Using eÕ X = 0, we get Large sample
eÕú eú = eÕ e + (bú ≠ b)Õ XÕ X(bú ≠ b) Ø eÕ e. IV
By Prop. 9.3, we know that bú ≠ b = ≠(XÕ X)≠1 RÕ {R(XÕ X)≠1 RÕ }≠1 (Rb ≠ q). Therefore: GRM

Õ Õ Õ Õ ≠1 Õ ≠1
Panels
eú eú ≠ e e = (Rb ≠ q) [R(X X) R ] (Rb ≠ q).
Max.Lik.Estim.
This implies that the F statistic defined in Prop. 4.7 is also equal to:
Binary
(eÕú eú ≠ eÕ e)/J Time Series
.⌅
eÕ e/(n ≠ K ) AR processes
MLE of AR processes
Forecasting

Definition 4.2 (F-test of size –) Appendix

Stat. Tables
The null hypothesis H0 (Eq. 16) of the F-test is rejected if F –defined by Eq.
(18) or (19)– is higher than F1≠– (J, n ≠ K ). [one-sided test: see Slide 287]
97 / 295
Prediction Statistics and
Econometrics II

J.-P. Renne

Prelim.
I We consider the model:
CLT
2 Tests
yi = xiÕ — + Ái , Ái ≥ N (0, ‡ ).
Linear regression
I Prediction exercise: we want to predict y ú , which would be associated Assumptions

with a vector of explanatory variables xú (but this “entity ú” is not in


OLS
Inference
the estimation sample). Common pitfalls

I The Gauss-Markov Theorem (4.1) implies that, under under A.1 to A.4, Large sample
IV


‚ = xúÕ b is the minimum-variance linear unbiased estimator of GRM

E(y ú |xú ) = xúÕ —, where xú is a specific (deterministic) realization of x. Panels


I Forecast error: e ú = y ú ≠ y‚ú = y ú ≠ xúÕ b = xúÕ (— ≠ b) + Áú . Max.Lik.Estim.
I Prediction variance: Binary

Time Series
Var (e ú |X, x = xú ) = xúÕ {‡ 2 (XÕ X)≠1 }xú + ‡ 2 . (20) AR processes
MLE of AR processes
Forecasting
I The prediction variance is estimated by replacing ‡ 2 by s 2 .
Appendix

Stat. Tables

98 / 295
I Let’s focus on the specific case where the regression includes a constant
term and a single regressor, i.e. each row of X is of the form [1, wi ].
(This is Example 4.5.)
I In this case, we have xú = [1, w ú ]Õ and, using the expression of (XÕ X)≠1
computed in Example 4.5 (Eq. 20), we have:
5 6
1 (w ú ≠ w̄ )2
Var ( y ú ≠ xúÕ b |X, x = xú ) = ‡ 2 1 + + q . (21)
¸ ˚˙ ˝ n i
(wi ≠ w̄ )
2
forecast error

I The farther the considered w ú is from the center of observations (i.e.


w̄ ), the lower the confidence in the predicted y ú .
I Conditional on (X, x = xú ), the forecast error e ú = xúÕ (— ≠ b) + Áú is
Gaussian (under A.1 to A.5, using in particular Eq. 10).
It expectation is 0 and its variance is given by Eq. (21). Hence

y ú |X, x = xú ≥ N (xúÕ b, Var (e ú |X, x = xú )).


¸ ˚˙ ˝
given by Eq. (21)

I Hence, under A.1 to A.5, a 95% confidence interval for y ú is:


Ë   È
xúÕ b ≠ 1.96 Var (e ú |X, x = xú ), xúÕ b + 1.96 Var (e ú |X, x = xú ) .
Figure 18: Prediction – Three cases: low, medium and large dispersion of the xi s
4

4

● ● ●
● ● ●
● ● ●
● ● ● ●● ●●
2

2
● ● ● ● ● ●
● ● ● ●●

● ● ●●● ●●

● ● ● ●
● ●
● ●
● ● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
● ● ●

● ●●● ●
● ●●● ● ●●●
● ● ● ●●
● ●
●● ● ● ●
● ● ● ●
● ●
●● ● ● ●
● ●● ● ●
● ● ●● ●
● ● ●● ● ● ●

0

0
y

y

● ●● ●
Theoretical ●
Theoretical ●
Theoretical
regression line ●
regression line regression line

−2

−2

−2
Estimated Estimated Estimated
regression line regression line regression line
−4

−4

−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4

x x x

100 / 295
Statistics and
Econometrics II

J.-P. Renne

Figure 19: Seniority and income (data of Example 4.2) Prelim.

CLT

Tests

Linear regression
150

Assumptions
OLS
Inference
Income

100

● ● Common pitfalls
● ●
●● ● ● Large sample
● ● ●
●●● IV
●● ● ●
● GRM

50



● Panels
● ●

● ● Max.Lik.Estim.
0

Binary

0 50 100 150 200 250 300 Time Series


AR processes

Seniority MLE of AR processes


Forecasting

Appendix

Stat. Tables

101 / 295
Potential common pitfalls Statistics and
Econometrics II

Remark 4.9 (Multicollinearity) J.-P. Renne

I Model: yi = —1 xi,1 + —2 xi,2 + Ái where all variables are zero-mean and Prelim.

Var (Ái ) = ‡ 2 . CLT


I We have 5 q q 6 Tests
2
xi,1 xi,1 xi,2
Õ
X X= q i qi
2 therefore Linear regression
xi,1 xi,2 xi,2 Assumptions
i i
OLS
Inference
Common pitfalls
Õ ≠1
(X X) = Large sample
5 q 2 q 6 IV

q q 1
q q i xi,2 ≠
qi xi,12 xi,2 . GRM

2
xi,1 2 ≠(
xi,2 xi,1 xi,2 )2 ≠ xi,1 xi,2 xi,1
i i i i i Panels

Max.Lik.Estim.
I The inverse of the upper-left parameter of (XÕ X)≠1 is:
Binary

ÿ q ÿ Time Series
2
( i xi,1 xi,2 )2 2 2
xi,1 ≠ q 2 = xi,1 (1 ≠ correl1,2 ), (22) AR processes

i
xi,2 MLE of AR processes
i i Forecasting

Appendix
where correl1,2 is the sample correlation between x1 and x2 .
Stat. Tables
I Hence, the closer to one correl1,2 , the higher the variance of b1 (recall
that the variance of b1 is the upper-left component of ‡ 2 (XÕ X)≠1 ).
102 / 295
Potential common pitfalls Statistics and
Econometrics II

J.-P. Renne

Prelim.
Remark 4.10 (Omitted variables)
CLT
I “True model”: Tests
y = X1 —1 2+ X 2 — +Á
¸˚˙˝ ¸˚˙˝ ¸˚˙˝ ¸˚˙˝ Linear regression
n◊K1 K1 ◊1 n◊K2 K2 ◊1 Assumptions
OLS

I Then, if one computes b1 by regressing y on X1 only: Inference


Common pitfalls
Large sample
b1 = (XÕ1 X1 )≠1 XÕ1 y = —1 + (XÕ1 X1 )≠1 XÕ1 X2 — 2 + (XÕ1 X1 )≠1 XÕ1 Á. IV
GRM

I Omitted-variable formula: Panels

Max.Lik.Estim.
E(b1 |X) = — 1 + (XÕ1 X1 )≠1 (XÕ1 X2 ) — 2 Binary
¸ ˚˙ ˝
K1 ◊K2 Time Series
AR processes
MLE of AR processes

(each column of (XÕ1 X1 )≠1 (XÕ1 X2 ) are the OLS regressors obtained Forecasting

when regressing the columns of X2 on X1 ). Appendix

Stat. Tables

103 / 295
Potential common pitfalls Statistics and
Econometrics II

J.-P. Renne

Prelim.
Example 4.9 (Omitted variables & Effect of education on wages) CLT

I Consider the “true model”: Tests

Linear regression
wagei = —0 + —1 edui + —2 abilityi + Ái , Ái ≥ i.i.d. N (0, ‡Á2 ) (23) Assumptions
OLS
Inference
I Further, we assume that the edu variable is correlated to the ability . Common pitfalls

Specifically: Large sample


IV
GRM

edui = –0 + –1 abilityi + ÷i , ÷i ≥ i.i.d. N (0, ‡÷2 ). Panels

I Assume we mistakingly run the regression omitting the ability variable: Max.Lik.Estim.

Binary
wagei = “0 + “1 edui + ›i . (24) Time Series
AR processes

I It can be seen that ›i = Ái ≠ (—2 /–1 )÷i ≥ i.i.d. N (0, ‡Á2 + (—2 /–1 )2 ‡÷2 ) MLE of AR processes
Forecasting
and that the population regression coefficient is “1 = —1 + —2 /–1 ”= —1 .
Appendix

Stat. Tables

104 / 295
Potential common pitfalls Statistics and
Econometrics II

Example 4.10 (Omitted variables & Effect of class size on performances) J.-P. Renne

Prelim.
Figure 20: California school district data CLT

Tests

● Linear regression
700


●● ● ●
Assumptions
● ● ●
● ● ●
●●
● ● OLS
● ●
● ●
●●● ● ● ● Inference
680

● ● ●● ● ●● ● ●
● ● ● ● ● ● ●
● Common pitfalls
● ● ● ● ● ● ● ● ●

● ●
●● ●● ●● ●●
●●● ● ●● ●● ● ● ●● ● Large sample
●●● ● ● ●
● ● ●● ● ● ● ●
● ●● ● ●
● ●● ●●●● ●●●●●● ● ● ●●● ● ● ● IV
Test score

●● ●●
● ●
● ● ● ● ● ● ● ● ●
● ● ● ●●●● ● ● ●●● ● ●
660

● ●● ● ●●
● ● ●● ●● ● ● ●●● ●●
●● ●
● GRM
●●●●
●● ● ● ●● ●● ●● ● ●

●● ●● ● ● ●● ● ● ●● ●● ●
● ●● ●●● ● ●● ● ●
●●
Panels
● ● ●● ● ● ● ● ●● ● ●
●● ● ● ● ● ● ● ● ● ●
●● ● ● ● ●● ●●
● ● ● ●●●
● ● ●
●● ●
● ● ● ● ● ● ●●

● ●●●●● ● ●● ●●●●● ● ●● ● ● ● ●●
Max.Lik.Estim.

640

● ● ● ● ● ● ●● ●

● ● ●● ● ● ● ●
● ●● ● ●●●● ●●● ● ● ● ● ●
● ● ● ● ●●●● ● ●

● ●● ● ●●
● ● ●●

● ● ● ● ● ● ●
Binary

● ●● ●● ● ● ● ● ●● ● ●
● ● ● ●●
● ● ● ● ● ● ●● ● ●
● ● ●● ●
620

Time Series
● ● ● ● ●
● ● ●
● ●● ● ●●
● ● AR processes

● ● MLE of AR processes
Forecasting

14 16 18 20 22 24 26 Appendix

Student−teacher ratio Stat. Tables

Source of the data Stock and Watson (2004).

105 / 295
Potential common pitfalls Statistics and
Econometrics II

Example 4.10 (Effect of class size on performances (cont’d)) J.-P. Renne

Estimate Std. Error t value Pr(>|t|) Prelim.


(Intercept) 698.933 9.467 73.825 0.000 CLT
StockWatsonEduc$str -2.280 0.480 -4.751 0.000
Tests
Table 8: Regression of Test score on Student-teacher ratio. Linear regression
Assumptions
OLS

Estimate Std. Error t value Pr(>|t|) Inference


Common pitfalls
(Intercept) 686.032 7.411 92.566 0.000 Large sample
StockWatsonEduc$str -1.101 0.380 -2.896 0.004 IV

StockWatsonEduc$elpct -0.650 0.039 -16.516 0.000 GRM

Panels
Table 9: Regression of Test score on Student-teacher ratio.
Max.Lik.Estim.

Binary
Estimate Std. Error t value Pr(>|t|)
Time Series
(Intercept) 700.15 4.69 149.42 0.00 AR processes
StockWatsonEduc$str -1.00 0.24 -4.18 0.00 MLE of AR processes

StockWatsonEduc$elpct -0.12 0.03 -3.76 0.00 Forecasting

StockWatsonEduc$mealpct -0.55 0.02 -25.34 0.00 Appendix

Stat. Tables
Table 10: Regression of Test score on Student-teacher ratio.

106 / 295
Potential common pitfalls Statistics and
Econometrics II

J.-P. Renne

Prelim.
Remark 4.11 (Irrelevant variable) CLT
I “True model”: Tests
y = X1 — 1 + Á Linear regression
I “Estimated model”: Assumptions
OLS
y = X1 — 1 + X2 — 2 + Á Inference
Common pitfalls
I Unbiased estimate. Problem however: the variance of the estimate of Large sample

— 1 resulting from this regression is higher than that stemming from the IV

correct regression (unless the correlation between X1 and X2 is null, see


GRM

Eq. 22). Panels

∆ The estimator is inefficient, i.e. that there exists an alternative Max.Lik.Estim.


consistent estimator whose variance is lower. Binary
I The inefficiency problem can have serious consequences when testing Time Series
hypotheses of type H0 : —1 = 0 due to the loss of power, so we might AR processes

infer that they are no relevant variables when they truly are (Type-II MLE of AR processes

error; False Negative, see Slide 36).


Forecasting

Appendix

Stat. Tables

107 / 295
Least Squares: Large Sample Properties Statistics and
Econometrics II

J.-P. Renne

Prelim.

I Even if we relax the normality assumption (Assumption A.5), we can CLT


approximate the finite-sample behavior of the estimators by using Tests
large-sample or asymptotic properties.
Linear regression
I To begin with, we proceed under Assumptions A.1 to A.4. (We will see Assumptions

later how to deal with –partial– relaxations of Assumptions A.3 and OLS

A.4.)
Inference
Common pitfalls
Large sample
IV
Preview of OLS large-sample properties under A.1 to A.4 GRM

I Under A.1 to A.4, even if the residuals are not normally-distributed, the Panels
least square estimators can be asymptotically normal and inference can Max.Lik.Estim.
be performed as in small samples when A.1 to A.5 hold.
Binary
I This derives from Prop. 4.9 (below).
Time Series
I The F-test (Prop. 4.8) and the t-test described in Slide 93 can then be AR processes

performed. MLE of AR processes


Forecasting

Appendix

Stat. Tables

108 / 295
Statistics and
Proposition 4.9 (Asymp. distri. of b with independent observations) Econometrics II

J.-P. Renne
Under Assumptions A.1 to A.4, and assuming further that:
Prelim.
XÕ X
Q = plimnæŒ , (25) CLT
n
Tests
and that the (xi , Ái )s are independent (across entities i), we have: Linear regression
Assumptions
Ô d ! " OLS
n(b ≠ —) æ N 0, ‡ 2 Q ≠1
, (26) Inference
Common pitfalls
Large sample
! XÕ X "≠1 ! XÕ Á " Ô ! Õ "≠1 ! 1 " Õ IV
Proof Since b = — + n n
, we have: n(b ≠ —) = XnX Ô X Á. Since
! XÕ X "≠1 ≠1 n GRM

f : A æ A≠1 is a continuous function (for A ”= 0), plimnæŒ n


=Q (see see Panels
Prop. 9.4(ii), continuous mapping theorem). Let us denote by Vi the vector xi Ái . Because the (xi , Ái )s
Max.Lik.Estim.
are independent, the Vi s are independent as well. Their covariance matrix is ‡ 2 E(xi xiÕ ) = ‡ 2 Q.
Applying the multivariate central limit theorem (Theorem 2.3) on the Vi s gives Binary
Ô ! q n
" ! " d
n n1 xi Ái = Ô1 XÕ Á æ N (0, ‡ 2 Q). An application of Prop. 9.4(i) (Slutsky’s Time Series
i=1 n
theorem) then leads to the results. ⌅ AR processes
MLE of AR processes
Forecasting

I In practice, ‡ 2 is estimated with eÕ e (XÕ X)≠1 Appendix


n≠K
(Eq. 11) and Q≠1 with n
.
Stat. Tables
I Eq. (25) and Eq. (26) respectively correspond to convergences in
probability (Def. 9.9) and in distribution (Def. 9.12) .

109 / 295
Instrumental Variables Statistics and
Econometrics II
I Here, we want to relax Assumption A.2 (conditional mean zero J.-P. Renne
assumption, implying in particular that xi and Ái are uncorrelated).
I Model: Prelim.
yi = xi Õ — + Ái , where E(Ái ) = 0 and xi ”‹ Ái . (27) CLT

Tests
Definition 4.3 (Valid set of instruments)
Linear regression
The L-dimensional random variable zi is a valid set of instruments if: Assumptions
OLS

(a) zi is correlated to xi ; Inference


Common pitfalls
(b) we have E(Á|Z) = 0 and Large sample

(c) the orthogonal projections of the xi s on the zi s are not multicollinear.


IV
GRM

Panels
Example 4.11 (A situation where E(Ái ) = 0 and xi ”‹ Ái )
Max.Lik.Estim.
I Let us make the assumption xi ”‹ Ái in Model (27) more precise: Binary

Time Series
E(Ái ) = 0 and E(Ái xi ) = “ AR processes
MLE of AR processes
Forecasting
by the law of large numbers (Theorem 2.1) plimnæŒ XÕ Á/n = “.
Appendix
I If Qxx := plim XÕ X/n, the OLS estimator is not consistent because
Stat. Tables
p
b = — + (X X) Õ ≠1
XÁæ—+
Õ
Q≠1
xx “ ”= —.
110 / 295
Statistics and
I If zi is a valid set of instruments, we have: Econometrics II

1 2 1 2 1 2 J.-P. Renne
ZÕ y ZÕ (X— + Á) ZÕ X
plim = plim = plim —
n n n Prelim.

CLT
ZÕ Á p
Indeed, by the law of large numbers (Theorem 2.1), n
æ E(zi Ái ) = 0. Tests
I If L = K , the matrix ZÕ X
n
is of dimension K ◊ K and we have: Linear regression
Assumptions

Ë 1 2È≠1 1 2 OLS
ZÕ X ZÕ y Inference
plim plim = —.
n n Common pitfalls
Large sample
IV
# ! ZÕ X "$≠1 ! ZÕ X "≠1 GRM
I By continuity of the inverse funct.: plim = plim .
n n
Panels
I The Slutsky Theorem (Prop. 9.4) further implies that
Max.Lik.Estim.
1 2≠1 1 2 31 2≠1 4 Binary
ZÕ X ZÕ y ZÕ X ZÕ y
plim plim = plim . Time Series
n n n n
AR processes
MLE of AR processes
I Hence biv is consistent if it is defined by: Forecasting

Appendix

biv = (ZÕ X)≠1 ZÕ y. Stat. Tables

111 / 295
Statistics and
Proposition 4.10 (Asymp. distri. of biv ) Econometrics II

J.-P. Renne
If zi is a L-dimensional random variable that constitutes a valid set of
instruments (see Def. 4.3) and if L = K , then the asymptotic distribution of Prelim.
biv is: 3 4 CLT
d ‡2 # $≠1
biv æ N —, Qxz Qzz Qzx
≠1
Tests
n
Linear regression
where plim ZÕ Z/n =: Qzz , plim ZÕ X/n =: Qzx , plim XÕ Z/n =: Qxz . Assumptions
OLS
Inference

Proof The proof is very similar to that of Prop. 4.9, the starting point being that Common pitfalls
Large sample
biv = — + (ZÕ X)≠1 ZÕ Á. ⌅
IV
GRM

I When L = K , we have: Panels

Max.Lik.Estim.
# $≠1
Qxz Qzz
≠1
Qzx = Qzx
≠1
Qzz Qxz
≠1 Binary

Time Series
‡ 2 ≠1
we replace ‡ 2 by:
AR processes
I In practice, to estimate Var (biv ) = Qzx Qzz Qxz
≠1
,
n MLE of AR processes
Forecasting

n
1 ÿ Appendix
siv2 = (yi ≠ xiÕ biv )2
n Stat. Tables
i=1

112 / 295
Statistics and
Econometrics II

J.-P. Renne
I And when L > K ?
I Idea: First regress X on the space spanned by Z and then regress y on Prelim.
the fitted values X̂ := Z(ZÕ Z)≠1 ZÕ X. That is biv = (X̂Õ X̂)≠1 X̂Õ y: CLT

Tests
biv = [XÕ Z(ZÕ Z)≠1 ZÕ X]≠1 XÕ Z(ZÕ Z)≠1 ZÕ Y. (28)
Linear regression
Assumptions

I In this case, Prop. 4.10 still holds, with biv given by Eq. (28). OLS
Inference
I biv is also the result of the regression of y on Xú , where the columns of Common pitfalls

Xú are the (othogonal) projections of those of X on Z, i.e. Xú = PZ X Large sample


IV
(using the notations introduced in 4.1). Hence the other names of this GRM

estimator: Two-Stage Least Squares (TSLS). Panels

Max.Lik.Estim.
Definition 4.4 (Weak instruments) Binary
I If the instruments do not properly satisfy Condition (a) in Def. 4.3 (i.e. Time Series
if xi and zi are only loosely related), the instruments are said to be weak. AR processes
MLE of AR processes
I This problem is for instance discussed in Stock and Yogo (2003). See Forecasting
also Stock and Watson pp. 489-490.
Appendix

Stat. Tables

113 / 295
Statistics and
Econometrics II

J.-P. Renne

Definition 4.5 (Hausman test) Prelim.

I The Hausman test can be used to test if IV necessary. CLT

I IV techniques are required if plimnæŒ XÕ Á/n ”= 0. Tests

I Hausman (1978) proposes a test of the efficiency of estimators. Linear regression


Assumptions
I Under the null hypothesis two estimators, b0 and b1 , are consistent but OLS

b0 is (asymptotically) efficient relative to b1 . Under the alternative Inference

hypothesis, b1 (IV in the present case) remains consistent but not b0


Common pitfalls
Large sample
(OLS in the present case). IV

I The test statistic is: GRM

Panels

H = (b1 ≠ b0 )Õ MPI(Var (b1 ) ≠ Var (b0 ))(b1 ≠ b0 ), Max.Lik.Estim.

Binary
where MPI is the Moore-Penrose pseudo-inverse (see Def. 9.3).
Time Series
I Under the null hypothesis, H ≥ ‰2 (q), where q is the rank of AR processes

Var (b1 ) ≠ Var (b0 ). MLE of AR processes


Forecasting

Appendix

Stat. Tables

114 / 295
Statistics and
Example 4.12 (Estimation of price elasticity) Econometrics II

I See e.g. WHO and estimation of tobacco price elasticity of demand. J.-P. Renne

I We want to estimate what is the effect on demand of an exogenous Prelim.


increase in prices of cigarettes (say).
CLT
I Model:
Tests

qtd = –0 + –1 ◊p t +–2 ◊wt +Ádt Linear regression


¸˚˙˝ ¸˚˙˝ ¸˚˙˝ Assumptions

log(demand) log(price) income OLS


Inference

qts = “0 + “1 ◊ pt + “2 ◊yt +Ást , Common pitfalls

¸˚˙˝ ¸˚˙˝ Large sample

cost factors
IV
log(supply) GRM

Panels
where yt , wt , Ást ≥ N (0, ‡s2 ) and Ádt ≥ N (0, ‡d2 ) are independent.
Max.Lik.Estim.
I Equilibrium: qtd = qts . This implies that prices are endogenous:
Binary

Time Series
–0 + –2 wt + Ádt
≠ “0 ≠ “2 yt ≠ Ást
pt = . AR processes

“1 ≠ – 1 MLE of AR processes
Forecasting

‡d2 Appendix
I In particular we have E(pt Ádt ) = “1 ≠–1
”= 0.
Stat. Tables
∆ Regressing by OLS qtd on pt gives biased estimates (see Example 4.11).

115 / 295
Statistics and
Econometrics II

J.-P. Renne
Example 4.12 (Estimation of price elasticity, continued)
Prelim.
Figure 21: Demand and Supply
CLT

Tests
3.0

Linear regression
Supply 4
Assumptions
2.5

Supply 2 OLS
Inference
2.0

Supply 3
Common pitfalls
log(quantities)

Large sample
1.5


Supply 1
● IV
GRM
1.0


Panels

0.5

Demand 2
Max.Lik.Estim.

Binary
0.0

Demand 4
Demand 1
Time Series
−0.5

Demand 3 AR processes

0.0 0.5 1.0 1.5 2.0 2.5 MLE of AR processes


Forecasting
log(price)
Appendix

Stat. Tables

116 / 295
Statistics and
Example 4.12 (Estimation of price elasticity, continued) Econometrics II
I Estimation of the price elasticity of cigarette demand. J.-P. Renne
Instrument: real tax on cigarettes arising from the state’s general sales
tax. Presumption: in states with a larger general sales tax, cigarette Prelim.
prices are higher, but the general tax is not determined by other forces CLT
affecting Ádt .
Tests

Linear regression
Assumptions
Table 11: IV Regression Results OLS
Inference
Common pitfalls
Dependent variable:
Large sample
log(packs) IV
log(rprice) ≠1.277úúú GRM
(0.263)
Panels
log(rincome) 0.280
Max.Lik.Estim.
(0.239)
Binary
Constant 9.895 úúú
(1.059) Time Series
AR processes
Observations 48 MLE of AR processes

R2 0.429 Forecasting

Adjusted R2 0.404
Appendix
Residual Std. Error 0.188 (df = 45)
Note: ú
p<0.1; úú p<0.05; úúú p<0.01 Stat. Tables

117 / 295
Example 4.13 (Education and Wages) Statistics and
Econometrics II
I Potential problem with OLS regression of wages on education: omitted J.-P. Renne
variables (see Example 4.9).
Prelim.
I Card (1993)’s idea: distance to college used as an instrument.
CLT
Table 12: Effect of education on wage
Tests

Dependent variable: wage Linear regression


Assumptions
OLS instrumental
OLS
variable
Inference
(1) (2) (3) Common pitfalls
urbanyes 0.070 0.046 0.046 Large sample
(0.045) (0.045) (0.060) IV
genderfemale ≠0.085úú ≠0.071ú ≠0.071 GRM
(0.037) (0.037) (0.050)
ethnicityafam ≠0.556úúú ≠0.227úúú ≠0.227úú Panels
(0.052) (0.073) (0.099)
Max.Lik.Estim.
ethnicityhispanic ≠0.544úúú ≠0.351úúú ≠0.351úúú
(0.049) (0.057) (0.077) Binary
unemp 0.133úúú 0.139úúú 0.139úúú
(0.007) (0.007) (0.009) Time Series
education 0.005 0.647úúú AR processes
(0.010) (0.136) MLE of AR processes
ed.pred 0.647úúú Forecasting
(0.101)
Observations 4,739 4,739 4,739 Appendix
Adjusted R2 0.109 0.116 ≠0.614
Stat. Tables
F Statistic (df = 6; 4732) 97.274úúú 104.971úúú
Note: ú
p<0.1; úú
p<0.05; úúú
p<0.01

118 / 295
General Regression Model Statistics and
Econometrics II

J.-P. Renne
I We want to relax the assumption according to which the disturbances
are uncorrelated with each other (Assumption A.4) or the Prelim.
homoskedasticity Assumption A.3. CLT
I We replace the latter two assumptions by the general formulation: Tests

Linear regression
E(ÁÁÕ |X) = . (29) Assumptions
OLS

I Note that Eq. (29) is more general than Assumptions A.3 and A.4 Inference
Common pitfalls
because the diagonal entries of may be different (not the case under Large sample

A.3), and the non-diagonal entries of can be ”= 0 (contrary to A.4). IV


GRM

Panels
Definition 4.6 (General Regression Model)
Max.Lik.Estim.

Assumptions A.1 and A.2, together with Eq. (29), form the General Regression Binary
Model (GRM) framework. Time Series
AR processes
MLE of AR processes
Remark 4.12 (A specific case) Forecasting

Appendix
A regression model where Assumptions A.1 to A.4 hold is a specific case of
Stat. Tables
the General Regression Model (GRM) framework.

119 / 295
Statistics and
Econometrics II

J.-P. Renne
I The GRM context notably allows to model heteroskedasticity and
autocorrelation. Prelim.

I Heteroskedasticity: CLT

Tests
S ‡2 0 ... 0
T
1 Linear regression
W 0 ‡22 0 X. Assumptions
=U . .. .. V (30) OLS
. . . . Inference

0 ... 0 ‡n2 Common pitfalls


Large sample
IV

I Autocorrelation: GRM

S T Panels
1 fl2,1 ... fln,1 Max.Lik.Estim.
W .. X
W fl 1 . X. Binary
= ‡ W 2,1
2
X (31)
U .. ..
.
V Time Series
. fln,n≠1 AR processes
fln,1 fln,2 ... 1 MLE of AR processes
Forecasting

Appendix

Stat. Tables

120 / 295
Statistics and
Econometrics II

J.-P. Renne
Example 4.14 (Autocorrelation in the time-series context)
Prelim.
I Autocorrelation is, in particular, a recurrent problem when time-series
data are used (see Section 8). CLT

I In a time-series context, subscript i refers to a date. Tests

I Assume for instance that: Linear regression


Assumptions
OLS
yi = xiÕ — + Ái (32) Inference
Common pitfalls
Large sample
with IV

Ái = flÁi≠1 + vi , vi ≥ N (0, ‡v2 ). (33) GRM

Panels
I In this case, we are in the GRM context, with:
Max.Lik.Estim.
S T
1 fl ... fln Binary
W .. X Time Series
‡v2 W fl 1 . X.
= W X (34) AR processes
1 ≠ fl2 U .. .. V MLE of AR processes
. . fl Forecasting

fln fln≠1 ... 1 Appendix

Stat. Tables

121 / 295
Statistics and
Econometrics II
Definition 4.7 (Generalized Least Squares) J.-P. Renne

I Assume is known (“feasible GLS”). Prelim.


I Because is symmetric positive, it admits a spectral decomposition of CLT
the form = C CÕ , where C is an orthogonal matrix (i.e. CCÕ = Id)
Tests
and is a diagonal matrix (the diagonal entries are the eigenvalues of
). Linear regression
Assumptions
I = (PPÕ )≠1 with P = C ≠1/2
. OLS

I Consider the transformed model: Inference


Common pitfalls
Large sample

P y = P X— + P Á
Õ Õ Õ
or y =X —+Á .
ú ú ú IV
GRM

I The variance of Áú is I. In the transformed model, OLS is BLUE Panels

(Gauss-Markow Theorem 4.1). Max.Lik.Estim.

Binary

bGLS = (XÕ ≠1
X)≠1 XÕ ≠1
y (35) Time Series
AR processes
MLE of AR processes
I bGLS = Generalized least squares estimator of —. Forecasting

I We have: Appendix
Var (bGLS |X) = (XÕ ≠1 X)≠1 . Stat. Tables

122 / 295
Statistics and
Econometrics II
I When is unknown, the GLS estimator is said to be infeasible. Some J.-P. Renne
structure is required.
I Assume admits a parametric form (◊). Prelim.

I The estimation becomes "feasible" (FGLS) if one replaces (◊) by (◊). ˆ CLT

I If ◊ˆ is a consistent estimator of ◊, then the FGLS is asymptotically Tests

efficient (see Example 4.15). Linear regression


Assumptions
I By contrast, when has no obvious structure: the OLS (or IV) is the OLS
only estimator available. It remains unbiased, consistent, and Inference

asymptotically normally distributed, but not efficient. Standard Common pitfalls


Large sample
inference procedures are not appropriate any longer. IV
GRM

Example 4.15 (Autocorrelation in the time-series context (continued)) Panels

Max.Lik.Estim.
I Consider the case presented in Example 4.14.
Binary
I Because the OLS estimate b of — is consistent, the estimates ei s of the
Time Series
Ái s also are. AR processes

I Consistent estimators of fl and ‡v are then obtained by regressing the MLE of AR processes
Forecasting
ei s on the ei≠1 s.
Appendix
I Using these estimates in Eq. (34) provides a consistent estimate of .
Stat. Tables

123 / 295
Statistics and
Proposition 4.11 (Finite sample properties of b in the GRM (Def. 4.6)) Econometrics II

J.-P. Renne
Conditionally on X, we have:
1 2≠1 1 21 2≠1 Prelim.
1 1 Õ 1 Õ 1 Õ
Var (b|X) = XX X X XX . (36) CLT
n n n n
Tests
Under A.5, since b is linear in Á, Linear regression
Assumptions
1 ! "≠1 ! "! "≠1 2 OLS
b|X ≥ N —, XÕ X XÕ X XÕ X . (37) Inference
Common pitfalls
Large sample
IV
Remark 4.13 GRM

The variance of the estimator is not ‡ 2 (XÕ X)≠1 any more, so using s 2 (XÕ X)≠1 Panels

for inference may be misleading. Max.Lik.Estim.

Binary
Proposition 4.12 (OLS consistency in the GRM (Def. 4.6)) Time Series
AR processes

if plim (XÕ X/n) and plim (XÕ X/n) are finite positive definite matrices, then MLE of AR processes

plim (b) = —. Forecasting

Appendix

Proof Var (b) = E[Var (b|X)] + Var [E(b|X)]. Since E(b|X) = —, Var [E(b|X)] = 0. Eq. (36) implies Stat. Tables
that Var (b|X) æ 0. Hence b converges in mean square and therefore in probability (Prop. 9.5). ⌅

124 / 295
Statistics and
Econometrics II

J.-P. Renne

Proposition 4.13 (Asymptotic distri. of b in the GRM (Def. 4.6))


Prelim.

if Qxx = plim (XÕ X/n) and Qx x = plim (XÕ X/n) are finite positive definite CLT
matrices, then: Tests
Ô d
n(b ≠ —) æ N (0, Qxx
≠1
Qx x Qxx
≠1
). Linear regression
Assumptions
OLS
Proposition 4.14 (Asymptotic distri. of biv in the GRM) Inference
Common pitfalls

If regressors and IV variables are “well-behaved”, then: Large sample


IV
GRM
a
biv ≥ N (—, Viv ), Panels

Max.Lik.Estim.
where 1 2
1 1 Õ Binary
Viv = (Qú ) plim Z Z (Qú )Õ ,
n n Time Series
AR processes
with MLE of AR processes
Qú = [Qxz Q≠1
zz Qzx ]
≠1
Qxz Q≠1
zz . Forecasting

Appendix

Stat. Tables

125 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
I For practical purposes, one needs to have estimates of in Props. 4.11,
CLT
4.13 or 4.14.
Tests
I Idea: instead of estimating (dimension n ◊ n) directly, one can
estimate n1 XÕ X (dimension K ◊ K ) since this is this expression (XÕ X) Linear regression
that eventually appears in the formulas (for instance in Eq. 36). Assumptions
OLS
I We have: Inference
n n
1 Õ 1 ÿ ÿ Common pitfalls

X X= ‡i,j xi xjÕ . (38) Large sample

n n IV
i=1 j=1 GRM

I Robust estimation of asymptotic covariance matrices look for estimates Panels

of the previous matrix. Max.Lik.Estim.

I Their computation is based on the fact that if b is consistent, then the Binary
ei s are consistent (pointwise) estimators of the Ái s. Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

126 / 295
Statistics and
Econometrics II

J.-P. Renne

First case: heteroskedasticity (Eq. 30) Prelim.


I This is the case of Eq. (30). CLT
qn
I We then need to estimate 1 ‡ 2 x xÕ .
n i=1 i i i Tests
I White (1980): Under general conditions: Linear regression
Assumptions
A n
B A n
B OLS

1ÿ 1ÿ Inference
plim ‡i2 xi xiÕ = plim ei2 xi xiÕ . (39) Common pitfalls
n n Large sample
i=1 i=1 IV
GRM

I The estimator of 1 Õ
n
X X therefore is: Panels

Max.Lik.Estim.
1 Õ 2
X E X, Binary
n
Time Series
where E is an n ◊ n diagonal matrix whose diagonal elements are the AR processes

estimated residuals ei .
MLE of AR processes
Forecasting

Appendix

Stat. Tables

127 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 22: Salaries versus time since PhD obtention
Prelim.

● CLT

Tests
200000

● ●

● ● ● Linear regression
● ●
● ● Assumptions
● ●

● ● ● ●
● ● OLS
● ● ●
● ●● ● ● ●
● ●● ● Inference
150000


●● ●
●● ● ●●● ●
Salary

● ● ● ● ● ● ● ●● ● Common pitfalls
● ● ● ● ● ● ● ● ●
●● ●● ●
● ● ●● ● ● ● ●
●●●● ● Large sample
● ●
●●● ● ●●
● ● ●●● ●
●●●
● ● ● ● ●●● ●● ●● IV
● ●●● ● ● ●
● ● ●
● ● ● ●● ●● ● ●
●●●●

●●●● ● ●
● GRM
●● ● ● ●
● ● ●●●
● ● ●● ● ● ●● ●● ● ●●● ● ●
100000

● ● ●●●●●● ● ● ●● ● ●
Panels
●● ●●●●●● ●● ● ● ● ●
● ● ●
●●●● ● ● ●● ●● ● ● ● ● ●●● ●

●●●
● ●● ●●●● ●● ● ●
● ●●● ● ● ● ● ●● ● ● ●
● ●
● ● ● ● ● ● ●
●●● ● ● ● ●●●●● ●
Max.Lik.Estim.
●●
●● ●●● ● ● ● ● ● ●●●
●●●●● ● ● ●

●● ● ●●
● ●● ●● ● ● ● ● ●
●● ● ●
●● ● ● ●● ● ● ●
● ●● ●●●● ● ●
Binary
●●●● ● ● ● ●
●● ● ● ●
● ● ●

Time Series
AR processes
0 10 20 30 40 50
MLE of AR processes
Forecasting
years since PhD
Appendix
Data from Fox J. and Weisberg, S. (2011).
Stat. Tables

128 / 295
I Let us illustrate the influence of heteroskedasticity using simulations. Statistics and
Econometrics II
I We consider the following model:
J.-P. Renne

yi = xi + Ái , Ái ≥ N (0, xi2 ). Prelim.

CLT
where the xi s are i.i.d. t(4).
Tests
I Here is a simulated sample (n = 200) of this model:
Linear regression
Assumptions
OLS

Figure 23: Simul.: yi = xi + Ái , Ái ≥ i.i.d.N (0, xi2 ) and xi ≥ i.i.d.t(4) Inference


Common pitfalls
Large sample
IV
10


GRM

Panels

Max.Lik.Estim.
5


● ●● ●
● ● ●


Binary
● ● ● ● ● ●
● ●
● ●● ● ●
● ● ●●
● ● ● ● ● ●
y

●● ● ●
●●

●●●●● ●
● ●● ●●●
●● ●●● ●● ●● ●
●● ●● ● ●
● ●●● ● ● ●●●
● ●●
●●● ●●
●●
●●● ●
● ●● ●● ●

●●●
0

● ● ●●●●

●●●
● ● ● ●● ● ●
●● ● ● ● ● ●●●●
●● ●● ● ● ● ● ●
Time Series
● ●●●
● ●● ●
● ● ●●
●●
●●
●●
●● ●
● ●
● ●
●● ● ● ● ●
● ●




● ●
● AR processes
● ● ● ●


MLE of AR processes
−5

Forecasting

Appendix

−4 −2 0 2 4
Stat. Tables
x

129 / 295
I We simulate 1000 samples of the same model with n = 200. Statistics and
I For each sample, we compute the OLS estimate of — (=1). Econometrics II
I Using these 1000 estimates of b, we construct an approximated J.-P. Renne
(kernel-based) distribution of this OLS estimator (in red on the figure).
I For each of the 1000 OLS estimations, we employ the standard OLS Prelim.
variance formula (s 2 (XÕ X)≠1 ) to estimate the variance of b. The blue CLT
curve is a normal distribution centred on 1 and whose variance is the
average of the 1000 previous variance estimates. Tests

Linear regression
Assumptions
Figure 24: Heteroskedasticity and the inappropriateness of the standard OLS OLS

variance formula Inference


Common pitfalls
Large sample
IV
6

GRM

Panels
5

Max.Lik.Estim.
4

Binary
3

Time Series
AR processes
2

MLE of AR processes
Forecasting
1

Appendix

Stat. Tables
0

0.0 0.5 1.0 1.5 2.0

b value

130 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
I The variance of the simulated b is of 0.040 (that is the “true” one); the
average of the estimated variances based on the standard OLS formula is Tests

of 0.005 (“bad” estimate); the average of the estimated variances based Linear regression
on the White robust covariance matrix is of 0.030 (“better” estimate). Assumptions
OLS
I The standard OLS formula for the variance of b overestimates the Inference

precision of this estimator. Common pitfalls


Large sample
I For almost 50% of the simulations, 1 is not included in the 95% IV

confidence interval of — when the computation of the interval is based GRM

on the standard OLS formula for the variance of b. Panels


I When the White robust covariance matrix is used, 1 is not in the 95% Max.Lik.Estim.
confidence interval of — for less than 10% of the simulations.
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

131 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Second case: HAC
CLT
I This includes the cases of Eqs. (30) and (31).
Tests
I Newey and West (1987): If the correlation between terms i and j gets
Linear regression
sufficiently small when |i ≠ j| increases: Assumptions
OLS
A n n
B Inference

1 ÿÿ Common pitfalls
plim ‡i,j xi xjÕ = (40) Large sample
n IV
i=1 j=1
GRM
A n L n
B
1ÿ 1ÿ ÿ Panels
plim et2 xt xtÕ + w¸ et et≠¸ (xt xt≠¸
Õ
+ xt≠¸ xtÕ )
n n Max.Lik.Estim.
t=1 ¸=1 t=¸+1
Binary

where w¸ = 1 ≠ ¸/(L + 1). Time Series


AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

132 / 295
I Let us illustrate the influence of autocorrelation using simulations. Statistics and
I We consider the following model: Econometrics II

J.-P. Renne
yi = xi + Ái , Ái ≥ N (0, xi2 ) (41)
where the xi s and the Ái s are such that: Prelim.

CLT
xi = 0.8xi≠1 + ui and Ái = 0.8Ái≠1 + vi (42)
Tests
where the ui s and the vi s are i.i.d. N (0, 1).
Linear regression
I Here is a simulated sample (n = 200) of this model: Assumptions
OLS
Inference

Figure 25: Simulation of Eqs. (41) and (42) on 200 periods Common pitfalls
Large sample
IV
xt in red, εt in blue GRM

● Panels
6

●● Max.Lik.Estim.
2
● ●
4

● ●

Binary
●● ●
● ●
● ● ●
● ●
● ●●

2


0

● ●● ● ● ●
Time Series
● ●

●● ● ● ●
●●
yt

● ● ●●
● ● ●● ● ● ● ●●
● ●
● ●● ●
●● ● ●●● ●● ●●●
0

●●● ● ● ●

● ● ●
● ● ● ●
●●
● ●●

●●●
●●● ●
●●
● AR processes
● ●●●●

● ●●● ● ● ● ●
●● ●● ●
● ● ●
●● ●
−2

MLE of AR processes
● ● ● ● ●●● ● ●
● ●●
●● ●●●●● ●
−2

● ●
● ● ● ●●● ●●
● ● ●
●●
Forecasting
● ● ● ●
● ● ● ●● ●
● ● ● ●
● ● ● ● ● ●
−4

● ● ●

Appendix
●● ●
−4

● ● ●
● ● ● ● ●
● ●

−4 −2 0 2 0 50 100 150 200 Stat. Tables


xt

133 / 295
I We simulate 1000 samples of the same model with n = 200. Statistics and
I For each sample, we compute the OLS estimate of — (=1). Econometrics II
I Using these 1000 estimates of b, we construct an approximated J.-P. Renne
(kernel-based) distribution of this OLS estimator (in red on the figure).
I For each of the 1000 OLS estimations, we employ the standard OLS Prelim.
variance formula (s 2 (XÕ X)≠1 ) to estimate the variance of b. The blue CLT
curve is a normal distribution centred on 1 and whose variance is the
average of the 1000 previous variance estimates. Tests

Linear regression
Assumptions
Figure 26: Auto-correlation and the inappropriateness of the standard OLS OLS

variance formula Inference


Common pitfalls
Large sample
IV
Density function of b
GRM

Panels
6

Max.Lik.Estim.
5

Binary
4

Time Series
3

AR processes
MLE of AR processes
Forecasting
2

Appendix
1

Stat. Tables
0

0.4 0.6 0.8 1.0 1.2 1.4 1.6

b value
134 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
I The variance of the simulated b is of 0.020 (that is the “true” one); the
average of the estimated variances based on the standard OLS formula is Tests

of 0.005 (“bad” estimate); the average of the estimated variances based Linear regression
on the White robust covariance matrix is of 0.015 (“better” estimate). Assumptions
OLS
I The standard OLS formula for the variance of b overestimates the Inference

precision of this estimator. Common pitfalls


Large sample
I For about 35% of the simulations, 1 is not included in the 95% IV

confidence interval of — when the computation of the interval is based GRM

on the standard OLS formula for the variance of b. Panels


I When the Newey-West robust covariance matrix is used, 1 is not in the Max.Lik.Estim.
95% confidence interval of — for about 13% of the simulations.
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

135 / 295
I For the sake of comparison, let us consider a model with no Statistics and
Econometrics II
auto-correlation (xi ≥ i.i.d.N (0, 2.8) and Ái ≥ i.i.d.N (0, 2.8)).
J.-P. Renne

xt in red, εt in blue
Prelim.

CLT
6

● ● ●

4
● ●
●● ●

4

●● ● ● ● ●
●● ● ● ●
● ● ●
Tests
● ● ●
● ● ●● ● ●
● ● ● ●●● ●
●●●● ●
● ●● ●● ●
2

2
● ●
● ●● ●● ● ● ● ●● ● ●● ● ● ●
● ● ●●● ● ●●●
●●●● ●●● ●● ●
● ●● ● ●
● ● ●●●
●●● ● ●● ●
0

●●●
Linear regression
●●● ● ●● ●●●●●●● ●●
●●
●● ● ●● ● ●
●●● ●●
● ●● ●● ● ● ●

0
yt

● ●●
●● ● ●● ●● ● ●
●●● ● ● ● ●●
● ●
● ● ● ●
Assumptions

● ●
●● ●
● ● ● ● ● ●
−4

−2
● ●

OLS
● ●

Inference

−4
−8


Common pitfalls
−4 −2 0 2 4 0 50 100 150 200 Large sample
IV
xt
GRM

Panels
Density function of b Max.Lik.Estim.

Binary
6
5

Time Series
4

AR processes
3

MLE of AR processes
Forecasting
2

Appendix
1
0

Stat. Tables
0.8 0.9 1.0 1.1 1.2 1.3

b value

136 / 295
How to detect autocorrelation in residuals? Statistics and
Econometrics II
I Consider the usual regression (say Eq. 32). J.-P. Renne
I The Durbin-Watson test is a typical autocorrelation test. Its test
Prelim.
statistic is:
qn CLT

i=2
(ei ≠ ei≠1 )2 e12 + en2 Tests
DW = qn = 2(1 ≠ r ) ≠ qn 2 ,
e2
i=1 i
e
˚˙ i˝
¸ i=1 Linear regression
Assumptions
p
æ0 OLS
Inference

where r is the slope in the regression of the ei s on the ei≠1 s, i.e.:


Common pitfalls
Large sample

qn IV

ee
i=2 i i≠1
GRM

r = q n≠1 2
. Panels
i=1
ei
Max.Lik.Estim.
(r is a consistent estimator of Cor (Ái , Ái≠1 ), i.e. fl in Eq. 33.) Binary
I Critical values depend only on T and K: see e.g. tables. Time Series
I The one-sided test for H0 : fl = 0 against H1 : fl > 0 is carried out by AR processes

comparing DW to values dL (T , K ) and dU (T , K ): MLE of AR processes


Forecasting

I Appendix
If DW < dL , the null hypothesis is rejected;
if DW > dU , the hypothesis is not rejected; Stat. Tables
If dL Æ DW Æ dU , no conclusion is drawn.

137 / 295
Statistics and
Econometrics II

J.-P. Renne
Table 13: Properties of the OLS estimator
Prelim.

Normality of disturbances
CLT

Uncorrelated residuals
Condit. mean-zero

Homoskedasticity
Tests

Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM

Under Assumptions A.1+ A.2 A.3 A.4 A.5 Panels


b normal in small sample (Slide 85) X X X X
Max.Lik.Estim.
b is BLUE (Thm 4.1) X X X
b unbiased in small sample (Prop. 4.1) X Binary
b consistent (Prop. 4.12)ú X Time Series
b ≥ normal in large sample (Prop. 4.13)ú X AR processes
MLE of AR processes
Forecasting
ú
: see however Prop. 4.12 and Prop. 4.13 for additional hypotheses. Specifically XÕ X/n and XÕ X/n
Appendix
must converge in proba. to finite positive definite matrices ( is defined in Eq. 29).
Stat. Tables

138 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Introduction to Panel Regressions Inference


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

139 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

Context and Objective CLT

I Context: Tests

Lot of entities (n) but small number of periods (T ): Linear regression


“Longitudinal datasets”. Assumptions
OLS
Inference
Regression of the form: Common pitfalls
Large sample

yi,t = xi,t
Õ
— + zÕi – +Ái,t IV

¸˚˙˝ ¸˚˙˝ GRM

K ◊1 Individual effects Panels

Max.Lik.Estim.
I Objective:
Binary
Estimate the previous equation.
Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

140 / 295
Statistics and
Econometrics II
Figure 27: Simulated Panel Data J.-P. Renne

Prelim.

CLT

● ●
Tests

Linear regression
6


Assumptions

OLS
4

● Inference

Common pitfalls
2

● ●
● Large sample
y

● IV
0

● GRM
● ●

Panels
−2

● ● ●

Max.Lik.Estim.
● ●

● ●
−4

● Binary

Time Series
−6 −4 −2 0 2
AR processes
MLE of AR processes
x
Forecasting

The model is yi = –i + —xi,t + Ái,t , t œ {1, 2}. Appendix


— = 1. Blue dots are for t = 1, red dots are for t = 2.
Stat. Tables
The lines relate the dots associated to the same entity i

141 / 295
Statistics and
Econometrics II
Figure 28: Simulated Panel Data J.-P. Renne

Prelim.

CLT

● ●
Tests

Linear regression
6


Assumptions

OLS
4

● Inference

Common pitfalls
2

● ●
● Large sample
y

● IV
0

● GRM
● ●

Panels
−2

● ● ●

Max.Lik.Estim.
● ●

● ●
−4

● Binary

Time Series
−6 −4 −2 0 2
AR processes
MLE of AR processes
x
Forecasting

The model is yi = –i + —xi,t + Ái,t , t œ {1, 2}. Appendix


— = 30. blue dots are for t = 1, red dots are for t = 2.
Stat. Tables
The lines relate the dots associated to the same entity i.

142 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 29: Consumption of cigarettes versus price
Prelim.

● CLT


Tests
● ●
5.0


●● ● Linear regression
● ●● ●
● ● ● ●●● ● ● ●● Assumptions
● ●●
●● ● ●●● ●
OLS
log(Nb. of packs)

● ● ● ●●
● ●● ●
● ● ● ● ● ● ●
● ● ● ●● ● ● ●● Inference
● ● ●
● ● ●
● ● ● ● Common pitfalls
●● ● ●
4.5

● ●
● ● ● ● Large sample
● ● ● ●
● ● ● IV
● ●
● ●
● ● GRM


Panels


4.0

Max.Lik.Estim.

Binary

Time Series
AR processes
4.4 4.6 4.8 5.0 5.2
MLE of AR processes
Forecasting
log(price)
Appendix
Cigarettes data from Stock and Watson (2003). Data for U.S. states, 2 years: 1985 and 1995.
Stat. Tables

143 / 295
Statistics and
Econometrics II

Figure 30: Consumption of cigarettes versus price J.-P. Renne

Prelim.



CLT

Tests
● ●
5.0

●●


●● ●

Linear regression
● ● ● ●●● ● ● ●●
Assumptions
● ●●
●● ● ●●● ●
log(Nb. of packs)

● ● ● ●●●● ●
● ● ● ● ● ● ● ● OLS
● ● ● ●● ● ● ●●

● ●
● ● ● Inference
● ● ● ●
●● ● ● Common pitfalls
4.5

● ●
● ● ● ●
● ● ● Large sample
● ● ● ●
● ● IV
● ●
● ● GRM

● ●
Panels

4.0


Max.Lik.Estim.

Binary

Time Series
4.4 4.6 4.8 5.0 5.2 AR processes
MLE of AR processes

log(price) Forecasting

Appendix
Cigarettes data from Stock and Watson (2003). Data for U.S. states, 2 years: 1985 and 1995.
Each colour corresponds to a given State. Stat. Tables

144 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 31: Airlines: cost versus fuel price
Prelim.

CLT

Tests
17

Linear regression
Assumptions
16

OLS
●●
●●● ●
●● Inference
15

● ● ●
● ● ●

● ● Common pitfalls
● ● ●
● ●
log(cost)

●●
Large sample
14


●●● ● ●● ● ●
●● ● ●
● ●




● IV
● ● ● ●
●● ●
● ● ●●

13

● ● ●● GRM
●● ● ● ●
●●● ● ●
Panels

● ●

12

● ● ●
● ● ●●
● ●
●●●
●●

Max.Lik.Estim.
11

Binary
10

Time Series
11.0 11.5 12.0 12.5 13.0 13.5 14.0 AR processes
MLE of AR processes

log(fuel price) Forecasting

Appendix
Airlines data from Greene (2003). Each colour corresponds to a given airline.
Stat. Tables

145 / 295
Statistics and
Notations Econometrics II

J.-P. Renne
S T S T S xÕ T S T
yi,1 Ái,1 i,1 x1 Prelim.

yi = U
.. V, Ái = U .. V, xi = U .. V, X = U .. V. CLT
. . . .
yi,T Ái,T xi,T
Õ xn Tests
¸ ˚˙ ˝ ¸ ˚˙ ˝ ¸ ˚˙ ˝ ¸ ˚˙ ˝ Linear regression
T ◊1 T ◊1 T ◊K (nT )◊K Assumptions
OLS
S T S T
yi,1 ≠ ȳi
Inference
Ái,1 ≠ Á̄i Common pitfalls
.. ..
ỹi = U .
V , Á̃i = U
.
V, Large sample
IV

yi,T ≠ ȳi Ái,T ≠ Á̄i GRM

S xÕ ≠ x̄Õ T S T S T Panels
i,1 i x̃1 ỹ1
Max.Lik.Estim.
. .. .
x̃i = U .. V , X̃ = U V , Ỹ = U .. V,
. Binary
xi,T
Õ ≠ x̄iÕ x̃n ỹn Time Series
AR processes
where MLE of AR processes
Forecasting

T T T
1 ÿ 1 ÿ 1 ÿ Appendix
ȳi = yi,t , Á̄i = Ái,t and x̄i = xi,t .
T T T Stat. Tables
i=1 i=1 i=1

146 / 295
Three standard cases Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

I Pooled regression: zi © 1. Tests

I Fixed Effects: zi is unobserved, but correlates with xi ∆ b is biased and Linear regression
Assumptions
inconsistent in the OLS regression of y on X (omitted variable, see OLS
Remark 4.10). Inference
Common pitfalls
I Random Effects: zi is unobserved, but uncorrelated with xi . The model Large sample
writes: IV

yi,t = xi,t
Õ
—+–+ ui + Ái,t , GRM

¸ ˚˙ ˝ Panels
compound disturbance
Max.Lik.Estim.
where – = E(zÕi –) and ui = zÕi – ≠ E(zÕi –) ‹ xi . Binary
Least squares are consistent (but inefficient, see GLS Box 4.7).
Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

147 / 295
Estimation of Fixed Effects Models Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Assumption A.6 (Fixed Effects regression assumptions) Tests

Linear regression
(i) There is no perfect multicollinearity among the regressors. Assumptions
OLS
(ii) E(Ái,t |X) = 0, for all i, t. Inference
Ó
‡2 if i = j and s = t,
Common pitfalls

(iii) E(Ái,t Áj,s |X) = Large sample


0 otherwise. IV
GRM

Panels
Remark 5.1
Max.Lik.Estim.
These assumptions are the same as those introduced in the standard linear
Binary
regression:
(i) ¡ A.1, (ii) ¡ A.2, (iii) ¡ A.3 + A.4. Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

148 / 295
Statistics and
Econometrics II
I In matrix form, for a given i, the model writes:
J.-P. Renne

yi = Xi — + 1–i + Ái , Prelim.

CLT
where 1 is a T -dimensional vector of ones.
Tests
I This is the Least Square Dummy Variable (LSDV) model:
Linear regression
Assumptions
Ë È OLS

y = [X D] + Á, (43) Inference
– Common pitfalls
Large sample
IV
with: S 1 T GRM
0 ... 0
Panels
W 0
D=U
1 ... 0 X.
.. V Max.Lik.Estim.
. Binary
0 0 ... 1
¸ ˚˙ ˝ Time Series
(nT ◊n) AR processes
MLE of AR processes

I The linear regression (43) –with the dummy variables– satisfies the Forecasting

Gauss-Markov conditions (Theorem 4.1). Hence, in this context, the Appendix


OLS estimator is the best linear unbiased estimator. Stat. Tables

149 / 295
Statistics and
I Denoting by Z the matrix [X D], and by b and a the respective OLS Econometrics II

estimates of — and of –, we have: J.-P. Renne

Ë È Prelim.
b
= [ZÕ Z]≠1 ZÕ y. (44) CLT
a
Tests

I The asymptotical distribution of [bÕ , aÕ ]Õ derives from the standard OLS Linear regression

context: Prop. 4.9 can be used after having replaced X by Z = [X D].


Assumptions
OLS

I We have: Inference

Ë È 3Ë È 4 Common pitfalls

b d — Q ≠1 Large sample
æN , ‡2 (45)
a – nT
IV
GRM

Panels
where
1 Õ Max.Lik.Estim.
Q = plimnT æŒ Z Z.
nT Binary
I In practice, an estimator of the covariance matrix of [bÕ , aÕ ]Õ is: Time Series
AR processes

2
! "≠1 2 eÕ e MLE of AR processes

s ZZ
Õ
with s = , Forecasting

nT ≠ K ≠ n Appendix

where e is the (nT ) ◊ 1 vector of OLS residuals. Stat. Tables

150 / 295
I There is an alternative way of expressing the LSDV estimators. Statistics and
Econometrics II
I It is based on matrix MD = I ≠ D(DÕ D)≠1 DÕ , which acts as an operator
J.-P. Renne
that removes entity-specific means, i.e. (with the notation of Slide 146):
Prelim.
Ỹ = MD Y, X̃ = MD X and Á̃ = MD Á.
CLT

I With these notations, using Theorem 4.2, we get: Tests

Linear regression
Assumptions
b = [XÕ MD X]≠1 XÕ MD y. (46) OLS
Inference
Common pitfalls
I This amounts to regressing the ỹi,t s (= yi,t ≠ ȳi ) on the x̃i,t s Large sample

(= xi,t ≠ x̄i ). IV
GRM
I The estimates of – are given by:
Panels

Max.Lik.Estim.
a = (DÕ D)≠1 DÕ (y ≠ Xb), (47)
Binary

Time Series
which is obtained by developing the second row of
AR processes

Ë ÈË È Ë È MLE of AR processes

XÕ X XÕ D b XÕ Y Forecasting
= ,
DÕ X DÕ D a DÕ Y Appendix

Stat. Tables
which are the first-order conditions resulting from the least squares
problem (see Eq. 5).

151 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.517 0.229 41.514 0.000 Tests

log(airlines$output) 0.883 0.013 66.599 0.000 Linear regression


log(airlines$pf) 0.454 0.020 22.359 0.000 Assumptions

airlines$lf -1.628 0.345 -4.713 0.000 OLS


Inference
Common pitfalls
Table 14: Dep. variable: log(cost). OLS regression. Large sample
IV
GRM

Panels

Regression of the log of airline costs on log(output), log(fuel price) and capacity utilization of the fleet Max.Lik.Estim.
(Data from Greene (2003), see Fig. 31)
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

152 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Estimate Std. Error t value Pr(>|t|)
log(airlines$output) 0.919 0.030 30.756 0.000 Tests
log(airlines$pf) 0.417 0.015 27.468 0.000 Linear regression
airlines$lf -1.070 0.202 -5.307 0.000 Assumptions

G1 9.706 0.193 50.258 0.000 OLS

G2 9.665 0.199 48.571 0.000 Inference


Common pitfalls
G3 9.497 0.225 42.217 0.000 Large sample
G4 9.890 0.242 40.910 0.000 IV

G5 9.730 0.261 37.288 0.000 GRM

G6 9.793 0.264 37.142 0.000 Panels

Max.Lik.Estim.
Table 15: Dep. variable: log(cost). LSDV regression.
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

153 / 295
Statistics and
Econometrics II

J.-P. Renne
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.067 0.516 19.525 0.000 Prelim.
log(CigarettesSW$rincome) 0.318 0.136 2.337 0.022
CLT
log(CigarettesSW$rprice) -1.334 0.135 -9.856 0.000
Tests
Table 16: Dep. variable: Nb of packs. OLS regression. Linear regression
Assumptions
OLS
Inference
Common pitfalls
Estimate Std. Error t value Pr(>|t|) Large sample

(Intercept) 10.067 0.516 19.525 0.000 IV

log(CigarettesSW$rincome) 0.121 0.213 0.567 0.571


GRM

log(CigarettesSW$rprice) -1.210 0.140 -8.669 0.000 Panels

Max.Lik.Estim.
Table 17: Dep. variable: Nb of packs. LSDV HAC fixed-effect regression.
Binary

Time Series
AR processes
MLE of AR processes
Regression of the number of cigarette packs (log) on real income (log) and real price of cigarettes (log)
Forecasting
(Data from Stock and Watson (2003), see Fig. 29.)
Appendix

Stat. Tables

154 / 295
Statistics and
Econometrics II

J.-P. Renne
Extension: Fixed time and group effects
I Time effects are introduced: Prelim.

CLT
yi,t = xiÕ — + –i + “t + Ái,t . Tests

Linear regression
I The LSDV (Eq. 43) can be extended:
Assumptions
OLS
C D Inference

Common pitfalls
y = [X D C] – + Á, (48) Large sample
“ IV
GRM

with: S T Panels
”1 ”2 ... ” T ≠1 Max.Lik.Estim.
.. .. ..
C=U . . .
V, Binary
”1 ”2 ... ” T ≠1 Time Series
AR processes
where the T -dimensional vector ” t is [0, . . . , 0, 1 , 0, . . . , 0]Õ . MLE of AR processes
¸˚˙˝ Forecasting

tth entry Appendix

Stat. Tables

155 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Example 5.1 (Geo-located data) CLT
I Geo-located data from Airbnb, Zürich Tests
(Airbnb, date 22 June 2017). Linear regression
Assumptions
OLS

Estimate Std. Error t value Pr(>|t|) Inference


Common pitfalls
(Intercept) 76.807 3.370 22.793 0.000 Large sample
accommodates 15.765 1.021 15.447 0.000 IV

studioTRUE -5.540 3.303 -1.677 0.094 GRM

Panels
Table 18: Regression of prices on the number of people the appartment can
Max.Lik.Estim.
accommodate and a dummy indicating if the appartment is a studio.
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

156 / 295
Statistics and
Econometrics II
Example 5.1 (Geo-located data) J.-P. Renne

Estimate Std. Error t value Pr(>|t|)


Prelim.
(Intercept) 102.810 5.273 19.496 0.000
accommodates 15.450 0.989 15.626 0.000 CLT

studioTRUE -7.880 3.217 -2.450 0.014 Tests


neighborhoodKreis 10 -43.499 6.081 -7.154 0.000 Linear regression
neighborhoodKreis 11 -33.803 5.832 -5.796 0.000 Assumptions
neighborhoodKreis 12 -42.158 9.270 -4.548 0.000 OLS

neighborhoodKreis 2 -26.317 6.199 -4.245 0.000 Inference

neighborhoodKreis 3 -30.135 5.069 -5.945 0.000


Common pitfalls
Large sample
neighborhoodKreis 4 -18.820 5.180 -3.633 0.000 IV

neighborhoodKreis 5 -15.078 5.883 -2.563 0.010 GRM

neighborhoodKreis 6 -27.448 5.801 -4.731 0.000 Panels


neighborhoodKreis 7 -19.763 5.963 -3.314 0.001
Max.Lik.Estim.
neighborhoodKreis 8 -14.434 5.589 -2.582 0.010
neighborhoodKreis 9 -46.737 6.285 -7.436 0.000 Binary

Time Series
Table 19: Regression of prices on the number of people the appartment can AR processes
accommodate, a dummy indicating if the appartment is a studio and dummies MLE of AR processes

for 12 city’s areas. Forecasting

Appendix

Stat. Tables

157 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Source: Airbnb, date 22 June 2017. Regression of price for Entire home/apt on number of bedrooms, Appendix
number of people that can be accommodated. Stat. Tables

158 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Source: Airbnb, date 22 June 2017. Regression of price for Entire home/apt on number of bedrooms, Appendix
number of people that can be accommodated and dummies for the 12 city areas. Stat. Tables

159 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix
Source: Airbnb, date 22 June 2017. City’s districts.
Stat. Tables

160 / 295
Estimation of random effects models Statistics and
Econometrics II

J.-P. Renne

Prelim.
I Here, the individual effects are assumed not to be correlated to other
variables (the xi s). CLT

I Random-effect models write: Tests

Linear regression
yi,t = xitÕ — + (– + u
i ) + Ái,t Assumptions
¸˚˙˝ OLS

Random Inference
heterogeneity Common pitfalls
Large sample

with
IV
GRM

Panels
E(Ái,t |X) = E(ui |X) = 0,
Ó Max.Lik.Estim.
‡Á2 if i = j and s = t,
E(Ái,t Áj,s |X) = Binary
0 otherwise.
Ó Time Series
‡u2 if i = j,
E(ui uj |X) = AR processes
0 otherwise. MLE of AR processes

E(Ái,t uj |X) = 0 for all i, j and t. Forecasting

Appendix

Stat. Tables

161 / 295
Statistics and
Econometrics II

J.-P. Renne
I Notation: ÷i,t = ui + Ái,t and ÷ i = [÷i,1 , ÷i,2 , . . . , ÷i,T ]Õ .
I We have E(÷ i |X) = 0 and Var (÷ i |X) = where Prelim.

CLT
S ‡2 + ‡2 ‡u2 ‡u2 ... ‡u2
T
Á u Tests
W ‡u2 ‡Á2 + ‡u2 ‡u2 ... ‡u2 X = ‡2 I + ‡2 11Õ . Linear regression
=U .. .. .. V Á u
. . .
Assumptions
OLS
‡u2 ‡u2 ‡u2 ... ‡Á + ‡u2
2
Inference
Common pitfalls
Large sample
I Denoting by the covariance matrix of ÷ = [÷ Õ1 , . . . , ÷ Õn ]Õ , we have: IV
GRM

=I¢ . Panels

Max.Lik.Estim.
I If we knew , we would apply (feasible) GLS (Eq. (35)):
Binary

— = (X Õ ≠1
X) ≠1
X Õ ≠1
y Time Series
AR processes
MLE of AR processes
≠1/2 Õ ≠1/2 Õ
(recall that this amounts to regressing y on X). Forecasting

Appendix

Stat. Tables

162 / 295
Statistics and
Econometrics II

J.-P. Renne

I It can be checked that ≠1/2


=I¢( ≠1/2
) where Prelim.

CLT
1 2
1 ◊
≠1/2
= I ≠ 11Õ Tests
‡Á T
Linear regression
Assumptions
with ‡Á OLS
◊ =1≠  . Inference

‡Á2 + T ‡u2 Common pitfalls


Large sample

I Hence, if we knew , we would transform the data as follows: IV


GRM

S y ≠ ◊ȳ T Panels
i,1 i
1 W yi,2 ≠ ◊ȳi X Max.Lik.Estim.
≠1/2
yi = U .. V. Binary
‡ Á .
yi,T ≠ ◊ȳi Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

163 / 295
I What about when is unknown? Statistics and
Econometrics II
I Idea: taking deviations from group means removes heterogeneity:
J.-P. Renne

yi,t ≠ ȳi = [xi,t ≠ x̄i ]Õ — + (Ái,t ≠ Á̄i ). Prelim.

CLT
I The previous equation can be consistently estimated by OLS
(the residuals are correlated across t within an entity but the OLS Tests
remain consistent though, see Prop. 4.12). Linear regression
Ëq È Assumptions
T
I We have E (Á
i=1 i,t
≠ Á̄i )2 = (T ≠ 1)‡Á2 . OLS
Inference
I The Ái,t s are not observed but b is a consistent estimator of —; Common pitfalls

adjustment for the degrees of freedom: Large sample


IV
GRM
n T
1 ÿÿ Panels
ˆe2 =
‡ (ei,t ≠ ēi )2 .
nT ≠ n ≠ K Max.Lik.Estim.
i=1 t=1
Binary

I And for ‡u2 ? We can exploit the fact that OLS are consistent in the Time Series
pooled regression: AR processes
MLE of AR processes
Forecasting

2 eÕ e
plim spooled = plim = ‡u2 + ‡Á2 , Appendix
nT ≠ K ≠ 1
Stat. Tables
2
and therefore use spooled ˆe2 as an approximation to ‡u2 .
≠‡

164 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Maximum Likelihood Estimation Inference


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

165 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Context and Objective
Tests
I Context:
Linear regression
I You observe a sample y = {y1 , . . . , yn }.
Assumptions
I You know that these data have been generated by a model OLS

parameterized by ◊0 œ RK . Inference
Common pitfalls
I Objective: Large sample

You want to estimate ◊0 . IV


GRM

I Intuition behind the Maximum Likelihood Estimation: Panels

I Estimator = the value of ◊ that is such that the probability of Max.Lik.Estim.

having observed y is the highest possible. Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

166 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
I Assume that the time periods between the arrivals of two customers in a
shop, denoted by yi , are i.i.d. and follow an exponential distribution, i.e. Tests
yi ≥ E(⁄). Linear regression
I You have observed these arrivals for some time, thereby constituting a Assumptions
OLS
sample {y1 , . . . , yn }. You want to estimate ⁄. Inference
(i.e. in that case, the vector of parameters is simply ◊ = ⁄). Common pitfalls

1
Large sample

I The density of Y is f (y ; ⁄) = exp(≠y /⁄). Fig. 39 represents that IV

⁄ GRM
density functions for different values of ⁄.
Panels
I Your 200 observations are reported at the bottom of Fig. 33 (red).
Max.Lik.Estim.
I You build the histogram and report it on the same chart (Fig. 34).
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

167 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 32: Densities of exponential distribution for different ⁄
Prelim.

CLT

Tests
1.0

lambda=1 Linear regression


lambda=3 Assumptions
lambda=5
0.8

OLS
lambda=7 Inference
Common pitfalls
0.6

Large sample
density

IV
GRM
0.4

Panels

Max.Lik.Estim.
0.2

Binary

Time Series
0.0

AR processes
MLE of AR processes
0 2 4 6 8 10 12 Forecasting

time period between two arrivals Appendix

Stat. Tables

168 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 33: Densities of exponential distribution for different ⁄
Prelim.

CLT

Tests
1.0

lambda=1 Linear regression


lambda=3 Assumptions
lambda=5
0.8

OLS
lambda=7 Inference
Common pitfalls
0.6

Large sample
density

IV
GRM
0.4

Panels

Max.Lik.Estim.
0.2

Binary

Time Series
0.0

AR processes
MLE of AR processes
0 2 4 6 8 10 12 Forecasting

time period between two arrivals Appendix

Stat. Tables

169 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 34: Densities of exponential distribution for different ⁄
Prelim.

CLT

Tests
1.0

lambda=1 Linear regression


lambda=3 Assumptions
lambda=5
0.8

OLS
lambda=7 Inference
Common pitfalls
0.6

Large sample
density

IV
GRM
0.4

Panels

Max.Lik.Estim.
0.2

Binary

Time Series
0.0

AR processes
MLE of AR processes
0 2 4 6 8 10 12 Forecasting

time period between two arrivals Appendix

Stat. Tables

170 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
I What is your estimate of ⁄?
Tests
I Now, assume that you have only four observations: y1 = 1.1, y2 = 2.2,
Linear regression
y3 = 0.7 and y4 = 5.0.
Assumptions
I What was the probability of observing, for a small Á, OLS

1.1 ≠ Á Æ Y1 < 1.1 + Á, Inference


Common pitfalls
2.2 ≠ Á Æ Y2 < 2.2 + Á, Large sample
0.7 ≠ Á Æ Y3 < 0.7 + Á and IV

5.0 ≠ Á Æ Y4 < 5.0 + Á? GRM

r4 Panels
I Because the yi s are i.i.d., this probability is (2Áf (yi , ⁄)).
i=1
Max.Lik.Estim.
I The next plot shows the probability (divided by 16Á4 ) as a function of ⁄.
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

171 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 35: Proba. that yi ≠ Á Æ Yi < yi + Á, i œ {1, 2, 3, 4}
Prelim.

CLT

Tests

Linear regression
6e−04

Assumptions
OLS
Inference
Common pitfalls
4e−04

Large sample
Probability

IV
GRM

Panels
2e−04

Max.Lik.Estim.

Binary
0e+00

Time Series
AR processes
MLE of AR processes
0 2 4 6 8 10 Forecasting

lambda Appendix

Stat. Tables

172 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 36: log of Proba. that yi ≠ Á Æ Yi < yi + Á, i œ {1, 2, 3, 4}
Prelim.

CLT

Tests
−5

Linear regression
Assumptions
OLS
Inference
−10

Common pitfalls
log(Probability)

Large sample
IV
GRM

Panels
−15

Max.Lik.Estim.

Binary

Time Series
−20

AR processes
MLE of AR processes
0 2 4 6 8 10 Forecasting

lambda Appendix

Stat. Tables

173 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 37: Back to the example with 200 observations
Prelim.

CLT

Tests
−400

Linear regression
Assumptions
−420

OLS
Inference
Common pitfalls
log(Probability)

−440

Large sample
IV
GRM
−460

Panels

Max.Lik.Estim.
−480

Binary

Time Series
−500

AR processes
MLE of AR processes
0 2 4 6 8 10 Forecasting

lambda Appendix

Stat. Tables

174 / 295
Statistics and
Notations Econometrics II
I f (y ; ◊) denotes the probability density function (p.d.f.) of a random J.-P. Renne
variable Y which depends on a set of parameters ◊.
Prelim.
I Density of n independent and identically distributed (i.i.d.) observations
of Y : CLT
n
Ÿ Tests
f (y1 , . . . , yn ; ◊) = f (yi ; ◊).
Linear regression
i=1
Assumptions

I y denotes the vector of observations; y = {y1 , . . . , yn }. OLS


Inference
Common pitfalls
Large sample
Definition 6.1 (Likelihood function) IV
GRM
L : ◊ æ L(◊; y) = f (y1 , . . . , yn ; ◊) is the likelihood function.
Panels

Max.Lik.Estim.
I We often work with log L, the log-likelihood function.
Binary

Time Series
Example 6.1 AR processes

If yi ≥ N (µ, ‡ 2 ), then
MLE of AR processes
Forecasting

Appendix
1 ÿ3
n
(yi ≠ µ)2
4
log L(◊; y) = ≠ log ‡ 2 + log 2fi + . Stat. Tables
2 ‡2
i=1

175 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Definition 6.2 (Score)
Tests
ˆ log f (y ;◊)
The score S(y ; ◊) is given by ˆ◊
. Linear regression
Assumptions
OLS

Example 6.2 Inference


Common pitfalls

If yi ≥ N (µ, ‡ 2 ), then Large sample


IV

S T S T GRM
ˆ log f (y ; ◊) y ≠µ
Panels
ˆ log f (y ; ◊) 1 ‡2 2 V.
=U ˆµ V=U
ˆ◊ ˆ log f (y ; ◊) 1 (y ≠µ)2
≠1
Max.Lik.Estim.
2‡ 2 ‡2
ˆ‡ 2 Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

176 / 295
Statistics and
Econometrics II

Proposition 6.1 (Score expectation) J.-P. Renne

The expectation of the score is zero. Prelim.

CLT

Proof We have: Tests

1 2 ⁄ Linear regression
ˆ log f (Y ; ◊) ˆ log f (y ; ◊) Assumptions
E = f (y ; ◊)dy
ˆ◊ ˆ◊ OLS

⁄ ⁄ Inference
Common pitfalls
ˆf (y ; ◊)/ˆ◊ ˆ
= f (y ; ◊)dy = f (y ; ◊)dy = ˆ1/ˆ◊ = 0.⌅ Large sample
f (y ; ◊) ˆ◊ IV
GRM

Panels

Definition 6.3 (Information matrix) Max.Lik.Estim.

Binary
The information matrix is (minus) the the expectation of the second
derivatives of the log-likelihood function: Time Series
AR processes

3 4 MLE of AR processes

ˆ2 log f (Y ; ◊) Forecasting
IY (◊) = ≠E .
ˆ◊ˆ◊ Õ Appendix

Stat. Tables

177 / 295
Statistics and
Econometrics II
Proposition 6.2 (Information matrix) J.-P. Renne
Ë! " ! ˆ log f (Y ;◊) "Õ È
ˆ log f (Y ;◊) Prelim.
We have IY (◊) = E ˆ◊ ˆ◊
= Var [S(Y ; ◊)].
CLT

Tests
ˆ 2 log f (Y ;◊) ˆ 2 f (Y ;◊) 1 ˆ log f (Y ;◊) ˆ log f (Y ;◊)
Proof = ≠ . The expectation of the first
ˆ◊ˆ◊ Õ ˆ◊ˆ◊ Õ f (Y ;◊) ˆ◊ ˆ◊ Õ Linear regression
2
right-hand side term is ˆ 1/(ˆ◊ˆ◊ ) = 0. Õ
⌅ Assumptions
OLS
Inference
Common pitfalls
Example Large sample
IV
If yi ≥ N (µ, ‡ 2 ), let ◊ = [µ, ‡ 2 ]Õ then GRM

Panels
5 3 46Õ
ˆ log f (y ; ◊) y ≠µ 1 (y ≠ µ)2 Max.Lik.Estim.
= ≠1
ˆ◊ ‡2 2‡ 2 ‡2 Binary

Time Series
and AR processes
MLE of AR processes
3 5 64 Ë È
1 ‡2 y ≠µ 1/‡ 2 0
Forecasting

IY (◊) = E (y ≠µ)2 = . Appendix


‡4 y ≠µ ≠ 1 0 1/(2‡ 4 )
‡2 2
Stat. Tables

178 / 295
Statistics and
Econometrics II

Proposition 6.3 (Additive property of the Info. mat.) J.-P. Renne

The information matrix resulting from two independent experiments is the Prelim.
sum of the information matrices:
CLT

IX ,Y (◊) = IX (◊) + IY (◊). Tests

Linear regression
Assumptions
Theorem 6.1 (Fréchet-Darmois-Cramér-Rao bound) OLS
Inference
ˆ ).
Consider an unbiased estimator of ◊ denoted by ◊(Y Common pitfalls

ˆ (which is a linear combination of the


The variance of the random variable Ê Õ ◊ Large sample
IV
ˆ is larger than:
components of ◊) GRM

Panels
(Ê Õ Ê)2 /(Ê Õ IY (◊)Ê).
Max.Lik.Estim.

Binary
Proof The Cauchy-Schwarz inequality (Theorem 9.7) implies that
 Time Series
ˆ ))Var (Ê Õ S(Y ; ◊)) Ø |Ê Õ Cov [◊(Y
Var (Ê Õ ◊(Y ˆ ), S(Y ; ◊)]Ê|. Now,
s s AR processes
ˆ ), S(Y ; ◊)] =
Cov [◊(Y ˆ ) ˆ log f (y ;◊) f (y ; ◊)dy
◊(y = ˆ◊
ˆ ˆ )f (y ; ◊)dy = I because ◊
◊(y ˆ is MLE of AR processes
y ˆ◊ y Forecasting
unbiased.
ˆ )) Ø Var (Ê Õ S(Y ; ◊))≠1 (Ê Õ Ê)2 . Prop. 6.2 leads to the result.
Therefore Var (Ê Õ ◊(Y ⌅ Appendix

Stat. Tables

179 / 295
Statistics and
Econometrics II

Definition 6.4 (Identification) J.-P. Renne

Prelim.
The vector of parameters ◊ is identifiable if, for any other vector ◊ ú :
CLT
◊ ú ”= ◊ ∆ L(◊ ú ; y) ”= L(◊; y). Tests

Linear regression
Definition 6.5 (Maximum Likelihood Estimator (MLE)) Assumptions
OLS
Inference
The maximum likelihood estimator (MLE) is the vector ◊ that maximizes the Common pitfalls

likelihood function. Formally: Large sample


IV
GRM
◊ MLE = arg max L(◊; y) = arg max log L(◊; y). (49)
◊ ◊ Panels

Max.Lik.Estim.

Definition 6.6 (Likelihood equation) Binary

Time Series
Necessary condition for maximizing the likelihood function: AR processes
MLE of AR processes

ˆ log L(◊; y) Forecasting

= 0. (50) Appendix
ˆ◊
Stat. Tables

180 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Assumption A.7 (Regularity Assumptions) CLT

(i) ◊ œ where is compact. Tests

(ii) ◊ 0 is identified. Linear regression


Assumptions
(iii) The log-likelihood function is continuous in ◊. OLS
Inference
(iv) E◊0 (log f (Y ; ◊)) exists. Common pitfalls

(v) The log-likelihood function is such that (1/n) log L(◊; y) converges Large sample
IV
almost surely to E◊0 (log f (Y ; ◊)), uniformly in ◊ œ . GRM

(vi) The log-likelihood function is twice continuously differentiable in an Panels


open neighborood of ◊ 0 .
1 2 Max.Lik.Estim.
ˆ 2 log L(◊;y)
(vii) The matrix I(◊ 0 ) = ≠E0 ˆ◊ˆ◊ Õ
(Fisher Information matrix) Binary

exists and is nonsingular. Time Series


AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

181 / 295
Statistics and
Econometrics II

Proposition 6.4 (Properties of MLE) J.-P. Renne

Prelim.
Under regularity conditions (Assumptions A.7), the MLE is:
(a) Consistent: plim ◊ MLE = ◊0 (◊0 is the true vector of parameters). CLT

(b) Asymptotically normal: ◊ MLE ≥ N (◊ 0 , I(◊ 0 )≠1 ), where Tests

Linear regression
3 4
ˆ 2 log L(◊; y)
Assumptions

I(◊ 0 ) = ≠E0 = nIY (◊ 0 ). (Fisher Info. matrix) OLS

ˆ◊ˆ◊ Õ Inference
Common pitfalls
Large sample

(c) Asymptotically efficient: ◊ MLE is asymptotically efficient and achieves IV

the Fréchet-Darmois-Cramér-Rao lower bound for consistent estimators. GRM

Panels
(d) Invariant: The MLE of g(◊ 0 ) is g(◊ MLE ) if g is a continuous and
continuously differentiable function. Max.Lik.Estim.

Binary
Proof: See Online additional material.
Time Series
Remark 6.1 AR processes
MLE of AR processes

(b) also writes: Forecasting


Ô d
n(◊ MLE ≠ ◊ 0 ) æ N (0, IY (◊ 0 )≠1 ). (51) Appendix

Stat. Tables

182 / 295
Statistics and
Econometrics II

J.-P. Renne

I The asymptotic covariance matrix of the MLE is: Prelim.

CLT
5 3 46≠1
ˆ 2 log L(◊; y) Tests
[I(◊ 0 )]≠1 = ≠E0 .
ˆ◊ˆ◊ Õ Linear regression
Assumptions
OLS
I A direct (analytical) evaluation of this expectation is often out of reach. Inference

I It can however be estimated by, either: Common pitfalls


Large sample
IV
3 4≠1
ˆ 2 log L(◊ MLE ; y)
GRM

Î≠1
1 = ≠ , (52) Panels
ˆ◊ˆ◊ Õ
A n B≠1 Max.Lik.Estim.
ÿ ˆ log L(◊MLE ; yi ) ˆ log L(◊MLE ; yi ) Binary
Î≠1
2 = . (53)
ˆ◊ ˆ◊ Õ Time Series
i=1
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

183 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
To sum up – MLE in practice
CLT
I A parametric model (depending on the vector of parameters ◊ whose
Tests
"true" value is ◊ 0 ) is specified.
Linear regression
I i.i.d. sources of randomness are identified. Assumptions

I The density associated to one observation yi is computed analytically OLS


Inference
(as a function of ◊): f (y ; ◊). Common pitfalls
q
I The log-likelihood is log L(◊; y) = log f (yi ; ◊). Large sample
i IV

I The MLE estimator results from the optimization problem (this is GRM

Eq. 49): Panels


◊ MLE = arg max log L(◊; y). (54) Max.Lik.Estim.

Binary
I We have: ◊ MLE ≥ N (◊0 , I(◊ 0)≠1 ),where I(◊ 0)≠1 is estimated by means
of Eq. (52) or Eq. (53). Most of the time, this computation is numerical. Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

184 / 295
Statistics and
Econometrics II

Example 6.3 (MLE estimation of a mixture of Gaussian distribution) J.-P. Renne

I Consider the returns of the SMI index used in Fig. 38. Prelim.
I Let’s assume that these returns are independently drawn from a mixture CLT
of Gaussian distributions. The p.d.f. f (x ; ◊), with
Tests
◊ = [µ1 , ‡1 , µ2 , ‡2 , p]Õ , is given by:
Linear regression
3 4 3 4 Assumptions
1 (x ≠ µ1 )2 1 (x ≠ µ2 )2 OLS
p exp ≠ + (1 ≠ p)  exp ≠ .
2fi‡12 2‡12 2fi‡22 2‡22 Inference
Common pitfalls
Large sample
IV
p.d.f. of mixtures of Gaussian dist. GRM

I The maximum likelihood estimate is Panels


◊ MLE = [0.30, 1.40, ≠1.45, 3.61, 0.87]Õ . Max.Lik.Estim.
I The first two entries of the diagonal of Î≠1 are 0.00528 and 0.00526. Binary
1
They are the estimates of Var (µ1,MLE ) and of Var (‡1,MLE ), respectively.
Time Series
95% confidence intervals for µ1 and ‡1 are, respectively: AR processes
MLE of AR processes
Ô Ô
0.30 ± 1.96
¸ 0.00528 and 1.40 ± 1.96
¸ 0.00526.
Forecasting
˚˙ ˝ ˚˙ ˝
Appendix
=0.0726 =0.0725
Stat. Tables

185 / 295
Statistics and
Example 6.3 (MLE estim. of a mixture of Gaussian distri. (cont’d)) Econometrics II

J.-P. Renne

Figure 38: Density of 5-day returns on SMI index


Prelim.

CLT
5−day return on SMI index Return's density (solid line)
Tests

0.25
Linear regression
5

Assumptions

0.20
OLS
Inference
Common pitfalls

0.15
0
in percent

Large sample
IV
GRM

0.10
−5

Panels

0.05
Max.Lik.Estim.
−10

Binary
0.00

Time Series
2004 2006 2008 2010 2012 2014 2016 −10 −5 0 5 10
AR processes
return, in percent MLE of AR processes
Forecasting
Gaussian kernel, h = 0.5 (in percent).
The data spans the period from 2 June 2006 to 23 February 2016 at the daily frequency. Appendix
Left-hand plot: the blue lines indicates µ ± 2‡, where µ is the sample mean of the returns and ‡ is Stat. Tables
their sample standard deviation. Right-hand plot: the red dotted line is the density N (µ, ‡ 2 )

186 / 295
Statistics and
Econometrics II
Example 6.3 (MLE estim. of a mixture of Gaussian distri. (cont’d)) J.-P. Renne

Figure 39: Estimated density (vs kernel-based estimate) Prelim.

CLT

Kernel estimate (non−parametric)


Tests
0.25

Mixture of Gaussian (MLE, parametric)


Normal distribution Linear regression
Assumptions
OLS
0.20

Inference
Common pitfalls
Large sample
0.15

IV
GRM

Panels
0.10

Max.Lik.Estim.
0.05

Binary

Time Series
0.00

AR processes
MLE of AR processes
−4 −2 0 2 4 Forecasting

returns, in percent Appendix

Stat. Tables

187 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Binary Choice Models Inference


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

188 / 295
Statistics and
Econometrics II

Context and Objectives J.-P. Renne

I Context: Prelim.

In many instances, the variables to be explained (yi s) have only CLT

two possible values (0 and 1, say). Tests

Assume we suspect some variable xi (K ◊ 1) to be able to account Linear regression


for the probability that yi = 1. Assumptions
OLS

The model reads: Inference

yi |X ≥ B(g(xi ; ◊)),
Common pitfalls
Large sample

where g(xi ; ◊) is the parameter of the Bernoulli distribution. In


IV
GRM

other words, conditionally on X: Panels


; Max.Lik.Estim.
1 with probability g(xi ; ◊)
yi =
0 with probability 1 ≠ g(xi ; ◊), Binary

Time Series
where ◊ is a vector of parameters to be estimated. AR processes
MLE of AR processes
I Objective: Forecasting

To estimate the vector of population parameters ◊. Appendix

Stat. Tables

189 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Example 7.1 Tests

Linear regression
Binary-choice models can be used to account for... Assumptions

I any binary decisions (e.g. in referendums, being owner or renter, OLS


Inference

living in the city or in the countryside, in/out of the labour Common pitfalls

force,...),
Large sample
IV

I contamination (disease or default), GRM

Panels
I success/failure (exams).
Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

190 / 295
I A possibility would be to run a linear regression: Statistics and
Econometrics II
yi = ◊ xi + Ái .
Õ
J.-P. Renne

I Such a regression could be consistent with Assumptions A.1, A.2,


Prelim.
A.4, but more difficultly with A.3 and certainly not with A.5
CLT
(because yi œ {0, 1}).
Tests
I At best, using a linear regression to study the relationship between
Linear regression
xi and yi would be inefficient. Assumptions
OLS
Inference

Figure 40: Linear regression in the Probit case Common pitfalls


Large sample
IV
GRM

Panels
1.0

● ●● ●●
●●●

●●
●●
●●
●●●

●●●
●●
●●
●●●
●●●●
●●
● ●●●●
●●●● ●
● Observations
fitted linear regressions
Max.Lik.Estim.
0.8

Binary
0.6
y

Time Series
0.4

AR processes
0.2

MLE of AR processes
0.0

● ● ●
●●● ●

●●
●●●
●●●
●●●

●●
●●●● ●●
●●
●● Forecasting

−4 −2 0 2 Appendix

x Stat. Tables

The model is P(yi = 1|xi ) = (0.5 + 2xi ), where is the c.d.f. of the normal distribution.

191 / 295
I Two prominent models to tackle this situation. In both models, we have: Statistics and
Econometrics II
P(yi = 1|xi ; ◊) = g(◊ Õ xi ).
J.-P. Renne
I Probit model:
g(z) = (z), (55) Prelim.

where is the c.d.f. of the normal distribution. CLT


I Logit model:
Tests
1
g(z) = . (56) Linear regression
1 + exp(≠z)
Assumptions
OLS
Inference
Figure 41: Probit and Logit functions Common pitfalls
Large sample
IV
GRM

Panels
1.0

Max.Lik.Estim.
0.8

Binary

Time Series
0.6
g(x)

AR processes
MLE of AR processes
0.4

Forecasting

Appendix
0.2

Stat. Tables
0.0

−4 −2 0 2 4

x 192 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

Figure 42: Linear regression in the Probit case CLT

Tests

Linear regression
1.0

● ●● ●●
●●●

●●
●●
●●
●●●

●●●
●●
●●
●●●
●●●●
●●
● ●●●●
●●●● ●
Assumptions
● Observations
fitted linear regressions OLS
0.8

P(yi=1|xi) Inference
Common pitfalls
0.6

Large sample
y

0.4

IV
GRM
0.2

Panels
0.0

● ● ●
●●● ●

●●
●●●
●●●
●●●

●●
●●●● ●●
●●
●●

Max.Lik.Estim.
−4 −2 0 2
Binary
x
Time Series
The model is P(yi = 1|xi ) = (0.5 + 2xi ), where is the c.d.f. of the normal distribution. AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

193 / 295
I These models have an interpretation in terms of latent variables. Statistics and
I Consider the Probit case. We have: Econometrics II

J.-P. Renne
P(yi = 1|xi ; ◊) = (◊ Õ xi ) = P(≠Ái < ◊ Õ xi ),
Prelim.
where Ái ≥ N (0, 1). That is:
CLT
P(yi = 1|xi ; ◊) = P(0 < yiú ), Tests

where yiú= ◊ xi + Ái , with Ái ≥ N (0, 1).


Õ Linear regression
I yiú is a (latent) variable that determines yi : yi = I{y ú >0} . Assumptions

i OLS
Inference
Common pitfalls
Large sample
Figure 43: Distribution of yiú conditional on xi IV
GRM

Panels
0.4

Max.Lik.Estim.

Binary
0.3

P(yi=0|xi;θ) P(yi=1|xi;θ)

Time Series
0.2

AR processes
MLE of AR processes
0.1

Forecasting
θ'x_i
Appendix
0.0

−4 −2 0 2 4 Stat. Tables
Possible values of yi*

194 / 295
Estimation Statistics and
Econometrics II

J.-P. Renne

I These two models can be estimated by Maximum Likelihood Prelim.

approaches (see Section 6). CLT

I The xi are considered to be deterministic. Tests

Linear regression
I The r.v. are assumed to be independent across entities i.
Assumptions

I How to write the likelihood here? OLS


Inference

I It can be seen that (key equation): Common pitfalls


Large sample
IV

f (yi |xi ; ◊) = yi g(◊ xi ) + (1 ≠ yi )(1 ≠ g(◊ xi ))


Õ Õ GRM

yi 1≠yi Panels
= g(◊ xi ) (1 ≠ g(◊ xi ))
Õ Õ
. (57)
Max.Lik.Estim.
I Therefore, if the observations (xi , yi ) are independent (across Binary
entities i), then: Time Series
AR processes
n
ÿ MLE of AR processes

log L(◊; y) = yi log[g(◊ xi )] + (1 ≠ yi ) log[1 ≠ g(◊ xi )].


Õ Õ Forecasting

i=1 Appendix

Stat. Tables

195 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
I The likelihood equation (Def. 6.6) –that is the first-order CLT
condition of the optimization program– is: Tests

ˆ log L(◊; y) Linear regression


= 0, Assumptions
ˆ◊ OLS
Inference
that is: Common pitfalls
Large sample
n
ÿ g Õ (◊ Õ xi ) g Õ (◊ Õ xi ) IV

yi xi ≠ (1 ≠ yi )xi = 0. GRM

g(◊ xi )
Õ
1 ≠ g(◊ Õ xi ) Panels
i=1
Max.Lik.Estim.
∆ Nonlinear equation that generally has to be numerically solved (no Binary
closed-form formula available).
Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

196 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

I Under regularity conditions (Assumptions A.7), we have CLT

(Prop. 6.4): Tests

◊ MLE ≥ N (◊ 0 , I(◊ 0 )≠1 ), Linear regression


Assumptions
where 3 4 OLS

ˆ 2 log L(◊; y) Inference


I(◊ 0 ) = ≠E0 = nIY (◊ 0 ). Common pitfalls
ˆ◊ˆ◊ Õ Large sample
IV

I For finite samples, we can approximate I(◊ 0 ) ≠1


by (Eq. 52): GRM

Panels
3 4≠1
ˆ 2 log L(◊ MLE ; y) Max.Lik.Estim.
I(◊ 0 )
≠1
¥≠ .
ˆ◊ˆ◊ Õ Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

197 / 295
Statistics and
Econometrics II

I In the Probit case (Eq. 55), it can be shown that we have: J.-P. Renne

n Prelim.
ˆ 2 log L(◊; y) ÿ
Õ =≠ g Õ (◊ Õ xi )[xi xiÕ ] ◊ CLT
ˆ◊ˆ◊
i=1 Tests
Ë È
g Õ (◊ Õ xi ) + ◊ Õ xi g(◊ Õ xi ) g Õ (◊ Õ xi ) ≠ ◊ Õ xi (1 ≠ g(◊ Õ xi )) Linear regression
yi + (1 ≠ yi ) .
g(◊ Õ xi )2 (1 ≠ g(◊ Õ xi ))2 Assumptions
OLS
Inference
I In the Logit case (Eq. 56), it can be shown that we have: Common pitfalls
Large sample
IV
n
ˆ 2 log L(◊; y) ÿ GRM

=≠ g (◊
Õ Õ
xi )xi xiÕ , Panels
ˆ◊ˆ◊ Õ
i=1
Max.Lik.Estim.

exp(≠x ) Binary
where g Õ (x ) = .
(1 + exp(≠x ))2 Time Series
AR processes
MLE of AR processes

(Note: because g Õ (x )
> 0, it can be checked that Forecasting

≠ˆ 2 log L(◊; y)/ˆ◊ˆ◊ Õ is positive definite.) Appendix

Stat. Tables

198 / 295
Marginal effects Statistics and
Econometrics II

J.-P. Renne

I How to measure marginal effects, i.e. the effect on the probability that Prelim.
yi = 1 of a marginal increase of xi,k ? CLT
I It is given by: Tests
ˆP(yi = 1|xi ; ◊)
= g Õ (◊ Õ xi ) ◊k . Linear regression
ˆxi,k ¸ ˚˙ ˝ Assumptions
>0 OLS
Inference
I This effect is of the same sign as ◊k . Common pitfalls

I It can be estimated by g Õ (◊ ÕMLE xi )◊MLE ,k . Large sample


IV

I Note that this marginal effect depends on xi . GRM

∆ Increases by 1 unit of xi,k (entity i) and of xj,k (entity j) have not Panels

necessarily the same effects on P(yi = 1|xi ; ◊) and on P(yj = 1|xj ; ◊), Max.Lik.Estim.
respectively. Binary
I Measure of “average” marginal effect? Two solutions. For each
Time Series
explanatory variable k: AR processes

(i) Denoting by x̂ the sample average of the xi s, compute MLE of AR processes

g Õ (◊ ÕMLE x̂)◊MLE ,k .
Forecasting

(ii) Compute the average (across i) of g Õ (◊ ÕMLE xi )◊MLE ,k . Appendix

Stat. Tables

199 / 295
Goodness of fit Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
2
I There is no commonly accepted version of R (see Eq. 6) for Tests

binary-choice models. Linear regression


Assumptions
I Existing measures are called pseudo-R 2 measures. OLS
Inference
I Let us denote by log L0 (y) the (maximum) log-likelihood that Common pitfalls

would be obtained for a model containing only a constant term


Large sample
IV

(i.e. with xi = 1 for all i). GRM

I The McFadden’s pseudo-R 2 is given by: Panels

Max.Lik.Estim.
2 log L(◊; y)
RMF =1≠ . Binary
log L0 (y) Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

200 / 295
Statistics and
Econometrics II

Example 7.2 (Labor market participation) J.-P. Renne

Estimate Std. Error z value Pr(>|z|) Prelim.


(Intercept) 3.749 1.407 2.665 0.008 CLT
income -0.667 0.132 -5.054 0.000
Tests
age 2.075 0.405 5.119 0.000
education 0.019 0.018 1.071 0.284 Linear regression
youngkids -0.714 0.100 -7.117 0.000 Assumptions
OLS
oldkids -0.147 0.051 -2.888 0.004 Inference
foreignyes 0.714 0.121 5.888 0.000 Common pitfalls

I(age^2) -0.294 0.050 -5.893 0.000 Large sample


IV
GRM
Table 20: Women labor force participation in Switzerland
Panels

Max.Lik.Estim.

Binary
Source: Gerfin (1996).
Time Series
AR processes

Marginal effects: MLE of AR processes

income: ≠0.22, age: 0.69, education: 0.006, youngkids: ≠0.24, oldkids: Forecasting

≠0.05, foreignyes: 0.24, age2: ≠0.10. Appendix

Stat. Tables

201 / 295
Statistics and
Econometrics II
Example 7.2 (Labor market participation (cont’d)) J.-P. Renne

Figure 44: Labor market participation vs. age and education Prelim.

CLT

Tests

Linear regression

1.0

1.0
Assumptions
OLS

0.8

0.8
Inference

no
Common pitfalls
no

Large sample

0.6

0.6
participation

participation
IV
GRM

0.4 Panels

0.4
Max.Lik.Estim.

yes
0.2

0.2
yes

Binary

Time Series
0.0

0.0
AR processes
2 3 3.5 4 4.5 5 6 0 4 6 8 10 12 16
MLE of AR processes

age education Forecasting

Appendix

Stat. Tables

202 / 295
Statistics and
Econometrics II
Example 7.3 (Credit risk) J.-P. Renne

Figure 45: Defaults, interest rates and ratings Prelim.

CLT
Share of defaulted entities and rating Interest rate and rating Tests

1.0

0.25
Linear regression
Assumptions
No default

0.8
OLS

0.20
Inference
Common pitfalls
Default status

0.6

Interest rate

Large sample

0.15

IV

0.4




GRM


Default

Panels

0.10



0.2

Max.Lik.Estim.
0.05

0.0

Binary
A B C D E F A B C D E F G

rating (or grade) rating (or grade) Time Series


AR processes
MLE of AR processes
Forecasting

Source: Lending Club (large online peer-to-peer loan platform). About 38.000 loans, 2007 to 2011. Appendix

Stat. Tables

203 / 295
Statistics and
Econometrics II

Example 7.3 (Credit risk (cont’d)) J.-P. Renne

Estimate Std. Error z value Pr(>|z|) Prelim.


(Intercept) -0.185 0.117 -1.585 0.113 CLT
amt.2.income 0.144 0.013 11.508 0.000
Tests
term 60 months 0.513 0.018 28.094 0.000
home_ownershipOWN 0.058 0.032 1.771 0.077 Linear regression
home_ownershipRENT 0.104 0.017 6.064 0.000 Assumptions

delinq_2yrs 0.062 0.016 3.981 0.000


OLS
Inference
issue_Y2008 -0.095 0.101 -0.945 0.345 Common pitfalls

issue_Y2009 -0.268 0.096 -2.783 0.005 Large sample

issue_Y2010 -0.410 0.095 -4.322 0.000 IV


GRM
issue_Y2011 -0.346 0.094 -3.673 0.000
Panels
Table 21: Predicting defaults with probit regression Max.Lik.Estim.

Binary
amt2income = (monthly payment owed by the borrower)/(self-reported annual income); term is 36 or 60 Time Series
months; delinq2yrs = number of 30+ days past-due incidences of delinquency in the borrower’s credit AR processes
file for the past 2 years. home_ownership is either MORTGAGE, OWN or RENT. MLE of AR processes
Forecasting
Marginal effects:
Appendix
amt2income: 0.031; term 60 months: 0.11; homeownershipRENT: 0.023; delinq2yrs: 0.014.
Stat. Tables

204 / 295
Example 7.4 (Credit risk) Statistics and
Econometrics II

J.-P. Renne
Figure 46: Comparing probit-based predictions and ratings
Prelim.

0.4 ● CLT


Tests

Linear regression
Probit−based proba. of defaults


● Assumptions
0.3















OLS

● ●

● ●








● Inference








Common pitfalls






● Large sample
0.2











● IV


GRM

Panels
0.1




Max.Lik.Estim.



● ●



● Binary

Time Series
A B C D E F G AR processes
MLE of AR processes
rating (or grade)
Forecasting

Appendix

Stat. Tables
Source: Lending Club (large online peer-to-peer loan platform). About 38.000 loans, 2007 to 2011.

205 / 295
Likelihood ratio tests Statistics and
Econometrics II
I Because the models are estimated by MLE, one can easily conduct J.-P. Renne
Likelihood ratio (LR) tests in the context of binary-choice models.
I In LR tests, we compare a restricted model (R) and an unrestricted Prelim.

model (U). CLT


I Restricted model = constrained version of the unrestricted one. Tests
I Let us denote by J the number of constraints imposed on R (w.r.t. U). Linear regression
I We denote by log LU (respectively by log LR ) the maximum likelihood Assumptions

resulting from the maximization of the unrestricted log-likelihood (resp.


OLS
Inference
the restricted log-likelihood). We have (mechanically): Common pitfalls
Large sample

log LU > log LR . (58) IV


GRM

I The null hypothesis (see Section 3) of the LR test is Panels

Max.Lik.Estim.
H0 : the “true” model is the restricted one.
Binary

I Even under H0 , Eq. (58) holds. However, if H0 is true, one expects that Time Series
log LU is not too far from log LR . AR processes
MLE of AR processes
I Formally, it can be proven that, asymptotically: Forecasting

Appendix
d
2(log LU ≠ log LR ) æ ‰2 (J). Stat. Tables
¸ ˚˙ ˝
LR test statistic

206 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Introduction to Time Series Tests

Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample

I Time series tools are useful to IV


GRM
I forecast (calculation of confidence intervals)
Panels
I test economic theories, exploiting the time dimension.
I calibrate model used for policy analysis or risk management. Max.Lik.Estim.
I price assets. Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

207 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

Definition 8.1 (Time series) CLT

Tests
I A time series: {yt }+Πk
t=≠Œ = {. . . , y≠2 , y≠1 , y0 , y1 , . . . , yt , . . . }, yi œ R .
Linear regression
I Typical sample: {y1 , . . . , yT } Assumptions
OLS
Inference

Example 8.1 (Examples of time series) Common pitfalls


Large sample
I Constant: yt = c. IV
GRM
I Trend: yt = t.
Panels
I Gaussian white noise (Def. 8.2): yt = Át , Át ≥ i.i.d. N (0, ‡ 2 ).
Max.Lik.Estim.
I Time series specifications are often defined by a difference equation
Binary
(Def. 8.6 below).
Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

208 / 295
Statistics and
Econometrics II
Figure 47: Federal fund rate J.-P. Renne

Prelim.

CLT
Federal Fund Rate
Tests

Linear regression
Assumptions
OLS
Inference
15

Common pitfalls
Large sample
IV
GRM
10

Panels

Max.Lik.Estim.

Binary
5

Time Series
AR processes
MLE of AR processes
0

Forecasting
1960 1980 2000
Appendix

Stat. Tables
Source of the data: FRED database

209 / 295
Statistics and
Econometrics II
Figure 48: Federal fund rate and unemployment rate J.-P. Renne

Prelim.

CLT
Federal Fund Rate and Unemployment Rate
Tests

Linear regression
Assumptions
OLS
15

Inference
Common pitfalls
Large sample
IV
GRM
10

Panels

Max.Lik.Estim.
5

Binary

Time Series
AR processes
MLE of AR processes
0

Forecasting
1960 1980 2000
Appendix

Stat. Tables
Source of the data: FRED database. Unemployment rate in red.

210 / 295
Statistics and
Econometrics II
Figure 49: U.S. Gross Domestic Product J.-P. Renne

Real Gross Domestic Product (bns of Chained 2009 USD) Prelim.

CLT
15000

Tests

Linear regression
10000

Assumptions
OLS
5000

Inference
Common pitfalls
Large sample
1960 1980 2000 IV
GRM

Annual GDP (log) Growth


Panels

Max.Lik.Estim.
0.10

Binary
0.05

Time Series
AR processes
MLE of AR processes
0.00

Forecasting

Appendix
1960 1980 2000
Stat. Tables
Source of the data: FRED database.

211 / 295
Statistics and
Econometrics II
Figure 50: Intraday data J.-P. Renne

IBM Stock Price, 11 February 2016, NASDAQ Prelim.

CLT
119.5

Tests

Linear regression
119.0

Assumptions
OLS
Inference
Common pitfalls
118.5

Large sample
in USD

IV
GRM
118.0

Panels

Max.Lik.Estim.
117.5

Binary

Time Series
117.0

AR processes
MLE of AR processes
Forecasting
11:00 13:00 15:00
Appendix
time
Stat. Tables
Source of the data: Google Finance.

212 / 295
Statistics and
Econometrics II
Figure 51: Seasonality illustration (1/2) J.-P. Renne

Prelim.

CLT
Number of deaths in France (monthly)
Tests
60000

Linear regression
Assumptions
OLS
55000

Inference
Common pitfalls
Large sample
IV
50000

GRM

Panels

Max.Lik.Estim.
45000

Binary

Time Series
40000

AR processes
MLE of AR processes
Forecasting
1985 1990 1995 2000 2005
Appendix

Stat. Tables
Source of the data: FRED database.

213 / 295
Statistics and
Econometrics II
Figure 52: Seasonality illustration (2/2) J.-P. Renne

Prelim.

CLT
Number of Google searches for "depression"
Tests

Linear regression
90

Assumptions
OLS
Inference
80

Common pitfalls
Large sample
IV
70

GRM

Panels
60

Max.Lik.Estim.

Binary
50

Time Series
AR processes
40

MLE of AR processes
Forecasting
2004 2006 2008 2010 2012 2014 2016
Appendix

Stat. Tables
Source of data: Google trends.

214 / 295
I Standard time series models are built using "basic" shocks that we will Statistics and
Econometrics II
often denote by Át . Typically: E(Át ) = 0.
I A basic form of shocks: i.i.d. shocks (independence: Def. 1.6). J.-P. Renne

I The following definition describes a more general form of basic shocks


Prelim.
(not necessarily independent):
CLT

Tests
Definition 8.2 (White noise)
Linear regression
The process {Át }tœ]≠Œ,+Œ[ is a white noise if, for all t (a) E(Át ) = 0, (b) Assumptions
OLS
Var (Át ) = E(Á2t ) = ‡ 2 < Œ and (c) for all s ”= t, E(Át Ás ) = 0. Inference
Common pitfalls
Large sample

Remark 8.1 (A white noise process is not necessarily i.i.d.) IV


GRM

I Consider Át ≥ N (0, 1 + 0.5Á2t≠1 ). We have: Panels

Max.Lik.Estim.
E(Át ) = E [E(Át |Át≠1 )] = 0 and Var (Át |Át≠1 ) = E(Á2t |Át≠1 ) = 1+0.5Á2t≠1 .
Binary

I By the low of total variance (Prop. 1.3): Time Series


AR processes
MLE of AR processes

Var (Át ) = Var [E(Át |Át≠1 )] + E [Var (Át |Át≠1 )] = 0 + E(1 + 0.5Á2t≠1 ). Forecasting

Appendix
I Therefore Var (Át ) = 2 < Œ. It can be shown further that E(Át Ás ) = 0
Stat. Tables
if s ”= t. Hence Át is a white noise. But the Át s are not independent, for
instance because E(Á2t |Át≠1 ) ”= E(Á2t ) = Var (Át ).
215 / 295
Statistics and
Econometrics II

J.-P. Renne
I In the general case, the expectation and the variance of yt may depend
Prelim.
on time t.
I However, in the following, we will focus on those processes yt whose CLT

first two moments do not depend on time. Tests


I This class of processes already allows to accommodate the features of a Linear regression
wide range of time-varying variables. Assumptions
OLS
I These processes are said to be covariance-stationary. Inference
Common pitfalls
Large sample

Remark 8.2 (Unconditional and conditional moments) IV


GRM

I When not stated otherwise, "expectation" means "unconditional Panels


expectation", or "marginal expectation" (i.e. E(yt )).
Max.Lik.Estim.
I As we will see later, in general:
Binary

E(yt ) ”= E(yt |yt≠1 , yt≠2 , . . . ) . Time Series


¸˚˙˝ ¸ ˚˙ ˝ AR processes

Unconditional expect. Conditional expect. MLE of AR processes


Forecasting

Appendix

Stat. Tables

216 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

Definition 8.3 (Covariance-stationarity) CLT

Tests
The process yt is covariance-stationary if, for all t and j,
Linear regression
Assumptions
E(yt ) = µ and E{(yt ≠ µ)(yt≠j ≠ µ)} = “j < Œ. OLS
Inference
Common pitfalls

Proposition 8.1 Large sample


IV
GRM
If yt is covariance-stationary, then “j = “≠j .
Panels

Max.Lik.Estim.
Proof Since yt is covariance-stationary, the covariance between yt and yt≠j (i.e “j ) is the same as
that between yt+j and yt+j≠j (i.e. “≠j ). ⌅ Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

217 / 295
Statistics and
Econometrics II

Figure 53: U.S. GDP growth J.-P. Renne

Prelim.

CLT
Annual GDP (log) Growth
Tests

Linear regression
Assumptions
0.10

OLS
Inference
Common pitfalls
Large sample
IV
0.05

GRM

Panels

Max.Lik.Estim.
0.00

Binary

Time Series
AR processes
MLE of AR processes
1960 1980 2000 Forecasting

Appendix

Blue solid line: sample mean; blue dashed line: sample mean ± 2 std deviations. Stat. Tables

218 / 295
Statistics and
Econometrics II

Figure 54: First-order auto-covariance J.-P. Renne

Prelim.

CLT
The first−order auto−covariance
Tests

Linear regression
Assumptions
0.10


● OLS
● ● Inference
● ● ●
● ● ● ● Common pitfalls
● ● ● ●

Large sample


●● ●

● ● ●
●● ● ●● ● ●● ●
IV
● ● ●
● ● ●
●● ●
0.05

● ●
Growth[t]

● ● ● ● ● ●● ● ● ●
● ●● ● ●● ● ●●
● ●
●●
● ● ●● ●●● ●
●●
●●

●●
●●

● ● ● ● GRM
● ●● ●
●●● ●
●● ● ●●● ● ●
● ● ●● ● ● ●
● ●
●● ● ● ● ●● ● ●
Panels
● ● ●

● ● ● ● ●●●●●● ● ●
●●● ●● ●




●● ● ● ●●● ● ●●●● ●●● ●
● ●
●●●
●●● ● ●● ●

● ●

● ●●
● ● ● ● ●●
● ●
● ● ●● ● ●
● ● ●●● ●
Max.Lik.Estim.
●● ●
● ● ●● ●
● ● ●●● ●
● ● ● ●

0.00

●● ● ●
● ● ● ● ●●

Binary
●● ● ●
● ● ●● ●


●● ●
● ●

Time Series
●● ●
● ● ●
● ●
● AR processes
MLE of AR processes
0.00 0.05 0.10 Forecasting

Growth[t−1]
Appendix

The slope of the blue line is “ „


ˆ1 /Var (growtht ) (see bivariate OLS case 4.5). Stat. Tables

219 / 295
Statistics and
Econometrics II

Figure 55: Second-order auto-covariance J.-P. Renne

Prelim.

CLT
The second−order auto−covariance
Tests

Linear regression
Assumptions
0.10


● OLS
● ● Inference
●● ●
● ● ● ● Common pitfalls
● ● ● ●

Large sample

●● ●
● ●● ● ●
● ● ● ● ● ●●● ● ●
● ● ● ●



IV
0.05

●● ●● ●
Growth[t]

●● ● ● ● ●
● ● ●
● ● ● ●●● GRM
● ● ●

● ●● ● ● ● ● ●● ●
● ● ● ●● ● ● ● ●●
●● ●
●●●●● ● ●● ●
● ● ● ● ●
● ●●
● ●●
● ● ● ●● ●
Panels
● ● ● ●●● ●● ●●
● ●● ● ●● ● ●
● ● ●● ●● ● ● ●
● ●● ● ● ●

● ●● ● ● ●
● ●● ●
●●●
● ●●● ● ● ● ●●● ● ●
●●● ●● ●
● ● ● ●

●● ● ● ● ●● ● ●
● ● ●● ●● ●
●●

Max.Lik.Estim.
● ● ●● ●
● ● ●●●●● ●
● ● ●● ● ● ● ●

● ●● ● ●
0.00

● ●
● ●● ● ●

Binary

● ● ●● ●
● ●
● ●

●●
● ●
● ● ●

Time Series
● ● ●
● ● ●
● ●
● AR processes
MLE of AR processes
0.00 0.05 0.10 Forecasting

Growth[t−2]
Appendix

The slope of the blue line is “ „


ˆ2 /Var (growtht ) (see bivariate OLS case 4.5). Stat. Tables

220 / 295
Statistics and
Econometrics II

Figure 56: Fifth-order auto-covariance J.-P. Renne

Prelim.

CLT
The fifth−order auto−covariance
Tests

Linear regression
Assumptions
0.10


● OLS
● ● Inference
● ● ●
●● ● ● Common pitfalls
● ● ● ●

Large sample

● ● ●
● ●● ● ● ●
● ● ● ● ●● ● ●● ●


● ●● IV
0.05

●● ●● ● ● ●
Growth[t]

● ● ● ●
●●● ● ●
●●
GRM
● ● ● ●
● ● ● ● ●
●● ● ● ●
●●
● ● ● ● ● ●● ●●●●● ● ●● ● ●
●●●● ● ● ●
● ●
● ●● ● ● ● ● ●
●● ● ●
Panels
● ● ●● ● ● ●●●
● ●● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
●● ●

● ●●●●● ● ●●● ●● ● ●
● ●● ●
● ● ● ● ●
● ● ● ●
● ●
● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ●●● ●

Max.Lik.Estim.
●● ●● ● ●
● ● ● ● ● ● ●
● ● ● ● ●● ●
● ● ● ●
0.00

● ● ● ●
● ●
● ● ●

Binary

● ● ●● ●● ●



● ●
● ●
● ●

Time Series
● ● ●

● ●
● ●
● AR processes
MLE of AR processes
0.00 0.05 0.10 Forecasting

Growth[t−5]
Appendix

The slope of the blue line is “ „


ˆ5 /Var (growtht ) (see bivariate OLS case 4.5). Stat. Tables

221 / 295
Statistics and
Econometrics II

J.-P. Renne
Figure 57: An example of nonstationary process: yt = 0.05 ◊ t + Át
Prelim.

CLT
12

Tests
10

Linear regression
Assumptions
OLS
8

Inference
Common pitfalls
6

Large sample
IV
GRM
4

Panels

Max.Lik.Estim.
2

Binary
0

Time Series
AR processes
MLE of AR processes
Forecasting
0 50 100 150 200
Appendix

Stat. Tables

222 / 295
Statistics and
Econometrics II

J.-P. Renne
I It is very common to work with auto-correlated time series.
Prelim.
I Auto-correlation affects, among many other things, the estimation of
CLT
the moments of a variable.
I Typically, assume you have a sample of observations of a time series yt Tests

that, you know, is covariance-stationary. Linear regression


I If you want to estimate µ = E(yt ), you will naturally compute Assumptions

qT OLS

ȳT = 1/T y .
t=1 t
Inference
Common pitfalls
I If you want to compute a confidence interval for µ, you are tempted to Large sample

use the CLT (Theorem 9.6). IV


GRM
I But this is inappropriate, because your yt s are not i.i.d.
Panels
I Intuitively, the more auto-correlated your yt s are, the less precise your
Max.Lik.Estim.
estimate of µ (see next slide).
I There exists an "adapted" version of the CLT for time series processes. Binary

This theorem (8.1 below) involves the sequence of autocovariances Time Series
{“j }jœ]≠Œ,Œ[ . Both theorems (9.6 and 8.1) coincide when “j = 0 for AR processes

all j ”= 0, i.e. when yt is a white noise (Def. 8.2).


MLE of AR processes
Forecasting

Appendix

Stat. Tables

223 / 295
Statistics and
Econometrics II
Figure 58: Processes with same first two unconditional moments J.-P. Renne

Prelim.
Case A
1 2 3 4 5

CLT

Tests
−1

0 20 40 60 80 100 Linear regression


Case B
Assumptions
OLS
Inference
4

Common pitfalls
3

Large sample
2
1

IV

0 20 40 60 80 100 GRM

Case C
Panels
2.5

Max.Lik.Estim.
2.0
1.5

Binary
1.0

0 20 40 60 80 100
Time Series
AR processes
Note: The three samples have been simulated using the following data generating process:
 MLE of AR processes

yt = µ + fl(yt≠1 ≠ µ) + 1 ≠ fl2 Át , where Át ≥ N (0, 1). Forecasting

Case A: fl = 0; Case B: fl = 0.9; Case C: fl = 0.99. Appendix


In the three cases, E(yt ) = µ = 2 and Var (yt ) = 1.
Stat. Tables

Web-interface illustration (use the "AR(1)" panel)

224 / 295
Statistics and
Econometrics II

J.-P. Renne

Theorem 8.1 (Limit theorem for covar. statio. processes) Prelim.

CLT
If process yt is covariance-stationary
q+Πand if the sequence of autocovariance is
Tests
absolutely summable ( j=≠Œ |“j | < Œ), then:
Linear regression
Assumptions
p
ȳT æ µ = E(yt ) (59) OLS
Inference

ÿ Common pitfalls

limT æ+Œ T E[(ȳT ≠ µ)2 ] = “j (60) Large sample


IV
j=≠Œ GRM
A +Œ
B
ÿ Panels
Ô d
T (ȳT ≠ µ) æ N 0, “j . (61) Max.Lik.Estim.
j=≠Œ Binary

Time Series
Proof Extra online material. ⌅ AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

225 / 295
Statistics and
Definition 8.4 (Long-run variance) Econometrics II

Under the assumptions of Theorem 8.1, the limit appearing in Eq. (60) exists J.-P. Renne
and is called long-run variance. It is denoted by S, i.e.:
Prelim.
2
S = limT æ+Œ T E[(ȳT ≠ µ) ] CLT

Tests
I A natural estimator is: Linear regression
q
ÿ Assumptions
ˆ0 + 2
“ ˆ‹
“ (62) OLS
Inference
‹=1 Common pitfalls

1
qT Large sample
where “
ˆ‹ = T ‹+1
(yt ≠ ȳ )(yt≠‹ ≠ ȳ ). IV
GRM
I Problem: in small samples, using Eq. (62) may result in a negative
Panels
figure...
Max.Lik.Estim.

Binary
Definition 8.5 (Newey-West estimator of S)
Time Series
Newey and West (1987) have proposed the following (nonnegative) consistent AR processes

estimator for the long-run variance of yt :


MLE of AR processes
Forecasting

q Appendix
ÿ1 ‹
2
S NW = “
ˆ0 + 2 1≠ ˆ‹ .
“ Stat. Tables
q+1
‹=1

226 / 295
Difference equations and Auto-Regressive (AR) processes Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
I Autoregressive processes constitute an important class of time series
Tests
processes.
I These processes are flexible and can reproduce the dynamics of a wide Linear regression
Assumptions
range of time-varying variables. OLS

I Advantage: an AR process is defined through a few parameters. Inference


Common pitfalls
I If one is interested in a particular variable and its dynamics, one can Large sample

look for an AR specification that is consistent with the considered IV


GRM
variable and estimate the parameters of the AR process (see Slide 243).
Panels
I Once one has estimated the parameters, the AR model can for instance
be used to forecast the variable at hand and to calculate associated Max.Lik.Estim.

confidence intervals (see Slide 254). Binary


I An AR process is defined by means of a difference equation (Def. 8.6 Time Series
below). AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

227 / 295
Statistics and
Definition 8.6 (Difference equation) Econometrics II

J.-P. Renne
A linear p-th order difference equation is of the form:
Prelim.
yt = „1 yt≠1 + „2 yt≠2 + . . . „p yt≠p + Át , (63) CLT

Tests
where Át is a sequence of shocks.
Linear regression
Assumptions
I It will prove convenient to introduce the following notations: OLS
Inference

S T Common pitfalls
„1 „2 ... „p S Á T S y T Large sample
t t
W 1 0 ... 0 X IV
0 yt≠1
W 0 1 ... 0 X, › = W X W X. GRM
F =W X t U .. V , yt = U .. V
U .. .. .. V . .
Panels
. . .
0 yt≠p+1 Max.Lik.Estim.
0 0 ... 1 0
(64) Binary

I We have: Time Series

yt = F yt≠1 + › t (65)
AR processes
MLE of AR processes
Forecasting
I and:
Appendix

yt+j = › t+j + F › t+j≠1 + F 2 › t+j≠2 + · · · + F j≠1 › t+1 + F j yt . Stat. Tables

228 / 295
Statistics and
Econometrics II
Definition 8.7 (Dynamic multiplier)
J.-P. Renne
ˆyt+j
The dynamic multiplier of yt+j w.r.t. Át is .
ˆÁt Prelim.

CLT
ˆyt+j
I We have: = (F j )[1,1] . Tests
ˆÁt
Linear regression
I Let us assume that the eigenvalues of F (see Def. 9.2), denoted by
Assumptions
⁄1 , . . . , ⁄p , are distinct. OLS

I Then, there exists a nonsingular matrix P such that: Inference


Common pitfalls
Large sample
S ⁄ 0 ... 0
T IV
1
0 0
GRM

W ⁄2 ... X
F =PU .. V P ≠1 = PDP ≠1 . Panels
. Max.Lik.Estim.
0 ... 0 ⁄p
Binary

I We have: S T Time Series


⁄j1 0 ... 0 AR processes

⁄j2
MLE of AR processes
W 0 0 ... X ≠1 Forecasting
Fj = P W ..
XP .
U . V Appendix

0 ... 0 ⁄jp Stat. Tables

229 / 295
Statistics and
Econometrics II

I Let us introduce zt = P ≠1 yt and ÷ t = P ≠1 › t . Eq. (65) becomes: J.-P. Renne

Prelim.
zt = Dzt≠1 + ÷ t . (66)
CLT
I The i th of the p equations of Eq. (66) reads zi,t = ⁄i zi,t≠1 + ›i,t . Tests
I We have: Linear regression
Assumptions
OLS
zi,t+h = ⁄i zi,t+h≠1 + ›i,t+h Inference

= ⁄i (⁄i zi,t+h≠2 + ›i,t+h≠1 ) + ›i,t+h Common pitfalls


Large sample

= ›i,t+h + ⁄i ›i,t+h≠1 + ⁄2i ›i,t+h≠2 + ··· + ⁄h≠1


i ›i,t+1 + ⁄hi zi,t . IV
GRM

Panels
I If Át is a white noise sequence, then zi,t also is. Let denote by ‡i2 the
variance of zi,t . We have: Max.Lik.Estim.

Binary
# $ 2(h≠1)
Var ›i,t+h + ⁄i ›i,t+h≠1 + · · · + ⁄h≠1
i ›i,t+1 = ‡i2 (1 + ⁄2i + . . . ⁄i ). Time Series
AR processes
MLE of AR processes
I For h æ +Œ, this series converges iff |⁄i | < 1. Forecasting

I If |⁄i | > 1, zi,t "explodes". Appendix

Stat. Tables

230 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
I If one of the ⁄i s is such that |⁄i | > 1, then F j = PD j P ≠1 "explodes"
- - CLT
- ˆyt+j -
when j get large, and - - æ Œ. Tests
ˆÁt jæŒ
Linear regression
Assumptions

Remark 8.3 (The eigenvalues of F ) OLS


Inference
Common pitfalls
I It can be shown that the eigenvalues of F are the solutions of: Large sample
IV

⁄p ≠ „1 ⁄p≠1 ≠ · · · ≠ „p≠1 ⁄ ≠ „p = 0. (67)


GRM

Panels
I Hence, the process defined by Eq. (63) is "not stable" (explodes) if the Max.Lik.Estim.
roots of the previous equation lie outside the unit circle. Binary
I "Not stable" will be given a more formal definition later on (Def. 8.3).
Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

231 / 295
Statistics and
Econometrics II
Example 8.2
J.-P. Renne
I Consider the 2nd-order difference equation:
Prelim.
yt = 0.9yt≠1 ≠ 0.2yt≠2 + Át , Át ≥ i.i.d. N (0, 1). CLT

Tests
I The eigenvalues of F are 0.5 and 0.4.
Linear regression
Assumptions
OLS
Inference
Dynamic multipliers Simulated samples
Common pitfalls
Large sample
1.0

IV

4
GRM
0.8

2
Panels
0.6

Max.Lik.Estim.
0
0.4

Binary
−2
0.2

Time Series
−4

AR processes
0.0

MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting

j time Appendix

Stat. Tables
Web-interface illustration (use the "ARMA(p,q)" panel, with q=0)

232 / 295
Statistics and
Econometrics II
Example 8.3
J.-P. Renne
I Consider the 2nd-order difference equation:
Prelim.
yt = 1.1yt≠1 ≠ 0.3yt≠2 + Át , Át ≥ i.i.d. N (0, 1). CLT

Tests
I The eigenvalues of F are 0.6 and 0.5.
Linear regression
Assumptions
OLS
Inference
Dynamic multipliers Simulated samples
Common pitfalls
Large sample
IV
1.0

4
GRM
0.8

Panels
2
0.6

Max.Lik.Estim.
0
0.4

Binary
−2

Time Series
0.2

−4

AR processes
0.0

−6

MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting

j time Appendix

Stat. Tables
Web-interface illustration (use ARMA(p,q) panel, with q=0)

233 / 295
Statistics and
Econometrics II
Example 8.4
J.-P. Renne
I Consider the 2nd-order difference equation:
Prelim.
yt = 1.4yt≠1 ≠ 0.7yt≠2 + Át , Át ≥ i.i.d. N (0, 1). CLT

Tests
I The eigenvalues of F are complex conjugates: 0.7 ± 0.56i.
Linear regression
Assumptions
OLS
Inference
Dynamic multipliers Simulated samples
Common pitfalls
Large sample
IV

6
GRM
1.0

4
Panels
2
0.5

Max.Lik.Estim.
0

Binary
−2
0.0

−4

Time Series
AR processes
−6
−0.5

MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting

j time Appendix

Stat. Tables
Web-interface illustration (use the "ARMA(p,q)" panel, with q=0)

234 / 295
Statistics and
Econometrics II
Example 8.5
J.-P. Renne
I Consider the 2nd-order difference equation:
Prelim.
yt = 0.9yt≠1 + 0.2yt≠2 + Át , Át ≥ i.i.d. N (0, 1). CLT

Tests
I The eigenvalues of F are 1.084 and -0.184.
Linear regression
Assumptions
OLS
Inference
Dynamic multipliers Simulated samples
Common pitfalls
Large sample
IV
1.0 1.5 2.0 2.5 3.0 3.5 4.0

15000
GRM

Panels

Max.Lik.Estim.
5000

Binary
0

Time Series
−5000

AR processes
MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting

j time Appendix

Stat. Tables
Web-interface illustration (use the "ARMA(p,q)" panel, with q=0)

235 / 295
Statistics and
Econometrics II

Definition 8.8 (First-order AR process) J.-P. Renne

Process yt is an auto-regressive process of order one (AR(1)) if: Prelim.

CLT
yt = c + „yt≠1 + Át , Tests

Linear regression
where {Át }tœ]≠Œ,+Œ[ is a white noise process (see Def. 8.2). Assumptions
OLS
Inference
I If |„| Ø 1, yt is "explosive" (see Slide 230). Its variance does not exist; Common pitfalls

it is not stationary (see Def. 8.3). Large sample


IV
I If |„| < 1, one can see that: GRM

Panels
yt = c + Át + „(c + Át≠1 ) + „2 (c + Át≠2 ) + · · · + „k (c + Át≠k ) + . . .
Max.Lik.Estim.

I If |„| < 1, the unconditional mean and variance of yt are: Binary

Time Series
c ‡2 AR processes

E(yt ) = =: µ Var (yt ) = . MLE of AR processes


1≠„ 1 ≠ „2 Forecasting

Appendix
Web-interface illustration (use the "AR(1)" panel)
Stat. Tables

236 / 295
Statistics and
Econometrics II

I j th autocovariance: J.-P. Renne

Prelim.
E([yt ≠ µ][yt≠j ≠ µ])
2 j j+1 CLT
= E([Át + „Át≠1 + „ Át≠2 + · · · + „ Át≠j + „ Át≠j≠1 . . . ] ◊
Tests
[Át≠j + „Át≠j≠1 + „2 Át≠j≠2 + · · · + „k Át≠j≠k + . . . ])
Linear regression
= E(„j Á2t≠j + „j+2 Á2t≠j≠1 + „j+4 Á2t≠j≠2 + . . . ) Assumptions
OLS
„j ‡ 2
=
Inference
.
1 ≠ „2 Common pitfalls
Large sample
IV

I Therefore flj = „ . j GRM

Panels
I By what precedes, we have:
Max.Lik.Estim.

Binary
Proposition 8.2 (Covariance-stationarity of an AR(1) process)
Time Series

The AR(1) process defined in Def. 8.8 is covariance-stationary (see


AR processes
MLE of AR processes

Def. 8.3) iff |„| < 1. Forecasting

Appendix
Web-interface illustration (use AR(1) panel)
Stat. Tables

237 / 295
Statistics and
Econometrics II

J.-P. Renne

Figure 59: AR(1) process, yt = 0.6yt≠1 + Át Prelim.

CLT

Tests
3

3
● ●



Linear regression

2

2
● ●



● Assumptions
● ●
● ●

● ●● ● ● ●
●● ● ●
● OLS

1

1
●●● ●
● ●●● Inference

● ● ● ●●● ●
● ●● ●
● ●
●●
Common pitfalls

y(t)
● ●
0

0
● ● ●
● ●● ●● ● ● Large sample
●● ● ● ● ●
● ●● ●
● ● ●
● IV
● ● ●
−1

−1
● ● ● ● ●
● ●●
● ● GRM


● ●
● Panels
−2

−2
Max.Lik.Estim.
−3

−3
● Binary

0 20 40 60 80 100 −3 −2 −1 0 1 2 3 Time Series


AR processes
y(t−1) MLE of AR processes
Forecasting

Appendix
Web-interface illustration (use the "AR(1)" panel)
Stat. Tables

238 / 295
Statistics and
Econometrics II

J.-P. Renne

Figure 60: AR(1) process, yt = 0.95yt≠1 + Át Prelim.

CLT
4

4

Tests

●●

● Linear regression

2

2
●●

● Assumptions

● ● ●

●● ● OLS


●● ●
● Inference

Common pitfalls
0

0
● ●●

y(t)
●● ●●
● ● ●●

● ●
● ● ● ● ● Large sample
● ●
● ● ● ●●
●● ● IV
● ●● ● ●
●●● ● GRM
●● ● ●●
−2

−2

●●● ● ● ●


● ● ●

● ● ● ●
● Panels
● ● ●
●● ●


● ● Max.Lik.Estim.
−4

−4
Binary

0 20 40 60 80 100 −4 −2 0 2 4 Time Series


AR processes
y(t−1) MLE of AR processes
Forecasting

Appendix
Web-interface illustration (use the "AR(1)" panel)
Stat. Tables

239 / 295
Statistics and
Definition 8.9 (p th -order AR process) Econometrics II

J.-P. Renne
Process yt follows a p th -order autoregressive process (AR(p)) if:
Prelim.
yt = c + „1 yt≠1 + „2 yt≠2 + · · · + „p yt≠p + Át , (68)
CLT

where {Át }tœ]≠Œ,+Œ[ is a white noise process (see Def. 8.2). Tests

Linear regression
Assumptions
Proposition 8.3 (Covariance-stationarity of an AR(p) process) OLS
Inference
The AR(p) process defined in Def. 8.9 is covariance-stationary iff (i) the Common pitfalls

eigenvalues of F (as defined by Eq. 64) lie within the unit circle, i.e. iff (ii) the Large sample

roots of
IV
GRM
⁄p ≠ „1 ⁄p≠1 ≠ · · · ≠ „p≠1 ⁄ ≠ „p = 0. (69)
Panels
lie within the unit circle. Max.Lik.Estim.

Binary
Proof The fact that (i) is equivalent to (ii) is Remark 8.3. Slide 230 provides the idea of the proof in
the case where F can be diagonalized. Time Series
AR processes
MLE of AR processes
I If the AR(p) defined in Def. 8.9 is covariance-stationary, we have Forecasting

c
µ = E(yt ) = . Appendix
1 ≠ „1 ≠ · · · ≠ „p
Stat. Tables
I The autocovariances are derived from the Yule-Walker equations (see
next slide).
240 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Computation of the autocovariances of an AR(p) process
CLT
I To compute the autocovariances of yt (assumed to be
Tests
covariance-stationary), let’s rewrite Eq. (68):
Linear regression
Assumptions
(yt ≠ µ) = „1 (yt≠1 ≠ µ) + „2 (yt≠2 ≠ µ) + · · · + „p (yt≠p ≠ µ) + Át . OLS
Inference
Common pitfalls
I Multiplying both sides by yt≠j ≠ µ and taking expectations leads to the Large sample
Yule-Walker equations: IV
GRM
Ó
„1 “j≠1 + „2 “j≠2 + · · · + „p “j≠p if j>0 Panels
“j = (70)
„1 “1 + „2 “2 + · · · + „p “p + ‡ 2 for j = 0. Max.Lik.Estim.

Binary
I Using “j = “≠j (Prop. 8.1), one can solve for (“0 , “1 , . . . , “p ) as
functions of (‡ 2 , „1 , . . . , „p ). Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

241 / 295
Statistics and
Econometrics II

J.-P. Renne

Figure 61: AR(3) process, yt = 1.5yt≠1 ≠ 0.4yt≠2 ≠ 0.2yt≠3 + Át Prelim.

CLT

● ● Tests

Linear regression

5

5


● Assumptions
●●● ●●
●●

● ● OLS
● ●●●
●●● ●
● ● ● Inference

0

0
● ●
●●●●● ● ● Common pitfalls
●●
● ●

y(t)
● ●●●
●●●
●●●● ● ● Large sample
● ● ● ●
●● ●●● ●
● ● ●
IV
● ● ● ●
● ●● ●●● GRM
−5

−5
● ● ●●●
●●●
● ●

Panels

Max.Lik.Estim.
−10

−10

● ●
● Binary

0 20 40 60 80 100 −10 −5 0 5 Time Series


AR processes
y(t−1) MLE of AR processes
Forecasting

Appendix
Web-interface illustration (use the "AR(1)" panel)
Stat. Tables

242 / 295
MLE of AR processes Statistics and
Econometrics II

J.-P. Renne

Context and Objectives Prelim.

I Context: CLT
yt = c + „1 yt≠1 + · · · + „p yt≠p + Át . (71) Tests

where {Át } is an i.i.d. white noise process (see Def. 8.2). Linear regression
Assumptions
I Objective: OLS

To estimate the vector of population parameters ◊, where: Inference


Common pitfalls
Large sample
2
◊ = [c, „1 , . . . , „p , ‡ ]. IV
GRM

I Observations: y = [y1 , . . . , yT ]Õ . Panels

I Probability density function (p.d.f.): fY ,...,Y1 (yT , . . . , y1 ; ◊). Max.Lik.Estim.


T
I Log-likelihood function: log L(◊; y) = log fY ,...,Y1 (yT , . . . , y1 ; ◊). Binary
T
I Maximum likelihood estimate of ◊ (see Section 6): Time Series
AR processes
MLE of AR processes

◊ MLE = arg max L(◊; y) = arg max log L(◊; y).


Forecasting

◊ ◊ Appendix

Stat. Tables

243 / 295
Statistics and
Econometrics II
I We use the Bayes’ formula:
J.-P. Renne

P(x2 = x , x1 = y ) = P(x2 = x |x1 = y )P(x1 = y )... Prelim.

CLT
I ... to get this key decomposition of our likelihood function:
Tests

fYT ,...,Y1 (yT , . . . , y1 ; ◊) = fYT |YT ≠1 ,...,Y1 (yT , . . . , y1 ; ◊) ◊ Linear regression


Assumptions
fYT ≠1 ,...,Y1 (yT ≠1 , . . . , y1 ; ◊). OLS
Inference
Common pitfalls
I Recursive application: Large sample
IV
GRM
T
Ÿ
Panels
fYT ,...,Y1 (yT , . . . , y1 ; ◊) = fY1 (y1 ; ◊) fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊).
t=2 Max.Lik.Estim.

Binary
I Log-likelihood:
Time Series
AR processes
T
ÿ MLE of AR processes

log L(◊; y) = log fY1 (y1 ; ◊) + log fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊). Forecasting

Appendix
t=2
Stat. Tables

244 / 295
MLE of a Gaussian AR(1) process Statistics and
Econometrics II
I For t > 1: J.-P. Renne

fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊) = fYt |Yt≠1 (yt , yt≠1 ; ◊). Prelim.

CLT
I If Át ≥ N (0, ‡ 2 ) and yt = c + „1 yt≠1 + Át :
Tests
3 4 Linear regression
1 (yt ≠ c ≠ „1 yt≠1 )2
fYt |Yt≠1 (yt , yt≠1 ; ◊) = Ô exp ≠ . Assumptions

2fi‡ 2‡ 2 OLS
Inference
Common pitfalls
I What about fY1 (y1 ; ◊)? We may use the marginal distribution of yt . Large sample

A second possibility (a little less efficient but more convenient) is to IV

“sacrifice” this first observation and to consider the approximate GRM

likelihood: Panels

Max.Lik.Estim.
log Lú (◊; y) = log L(◊; y) ≠ log fY1 (y1 ; ◊)
Binary
T
ÿ Time Series
= log fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊). (72) AR processes

t=2 MLE of AR processes


Forecasting

I Exact and approximate MLE have the same large-sample distribution Appendix
(because –if the process is stationary– fY1 (y1 ; ◊) makes a relatively
Stat. Tables
negligible contribution to log L(◊; y)).
I Practical advantage of the conditional MLE: simply obtained by OLS.

245 / 295
MLE of a Gaussian AR(1) process Statistics and
Econometrics II

I With the notations: J.-P. Renne

S T S T Prelim.
y2 1 y1
.. .. .. CLT
Y =U .
V and X = U
. .
V,
Tests
yT 1 yT ≠1
Linear regression
Assumptions
Eq. (72) becomes: OLS
Inference

T ≠1 Common pitfalls

log L (◊; y)
ú
= ≠ log(2fi) ≠ (T ≠ 1) log(‡) Large sample
2 IV

1 GRM
≠ 2 (Y ≠ X [c, „1 ]Õ )Õ (Y ≠ X [c, „1 ]Õ ), (73)
2‡ Panels

Max.Lik.Estim.
which is maximised for:
Binary

[ĉ, „ˆ1 ]Õ = (X Õ X )≠1 X Õ Y (74) Time Series


AR processes
T
1 ÿ MLE of AR processes

‡ˆ2 = (yt ≠ ĉ ≠ „ˆ1 yt≠1 )2 Forecasting


T ≠1
t=2 Appendix
1 Stat. Tables
= Y Õ (I ≠ X (X Õ X )X Õ )Y . (75)
T ≠1

246 / 295
MLE of a Gaussian AR(p) process Statistics and
Econometrics II

J.-P. Renne

I For an AR(p) process, we have: Prelim.

CLT
log L(◊; y) = log fYp ,...,Y1 (yp , . . . , y1 ; ◊) +
Tests
T
ÿ Linear regression
log fYt |Yt≠1 ,...,Yt≠p (yt , . . . , yt≠p ; ◊) . Assumptions
OLS
t=p+1
¸ ˚˙ ˝ Inference
Common pitfalls
log Lú (◊;y)
Large sample
IV

I Maximisation of the approximate log-likelihood log Lú (◊; y): OLS, using GRM

Eqs. (74) and (75) with: Panels

S T S T Max.Lik.Estim.
yp+1 1 yp ... y1
Binary
.. .. .. ..
Y =U .
V and X = U
. . .
V.
Time Series
yT 1 yT ≠1 ... yT ≠p AR processes
MLE of AR processes
Forecasting
I For stationary processes, approximate and exact MLE have the same
Appendix
large-sample distribution.
Stat. Tables

247 / 295
MLE of a Gaussian AR(p) process Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Remark 8.4
Tests
I The regression is:
Linear regression
yt = — Õ xt + Át , (76) Assumptions
OLS
where xt = [1, yt≠1 , . . . , yt≠p ]Õ and — = [c, „1 , . . . , „p ]Õ . Inference

I The fact that xt correlates to the Ás for s < t implies that the OLS Common pitfalls
Large sample
estimator b of — is biased in small sample. IV
GRM

Proof (of second point in Remark 8.4) We have: Panels

Max.Lik.Estim.
Õ ≠1 Õ
b = — + (X X ) X Á (77) Binary

Because of the specific form of X , we have non-zero correlation between xt and Ás for s < t, therefore Time Series
AR processes
E[(X Õ X )≠1 X Õ Á] ”= 0. ⌅
MLE of AR processes
Forecasting

Appendix

Stat. Tables

248 / 295
MLE of a Gaussian AR(p) process Statistics and
Econometrics II

Proposition 8.4 (Large-sample porperties of the OLS estimator) J.-P. Renne

Assume {yt } follows the AR(p) process: Prelim.

CLT
yt = c + „1 yt≠1 + · · · + „p yt≠p + Át Tests

Linear regression
where {Át } is an i.i.d. white noise process. If b is the OLS estimator of — of Assumptions
Eq. (76), we have: OLS
Inference

C T
D≠1 C T
D Common pitfalls

Ô 1 ÿ Ô 1 ÿ Large sample

T (b ≠ —) = xt xtÕ T xt Át , IV

T T GRM
t=p t=1
¸ ˚˙ ˝¸ ˚˙ ˝ Panels
p d
æQ≠1 æN (0,‡ 2 Q) Max.Lik.Estim.

Binary
1
qT 1
qT
where Q = plim T
x xÕ = plim
t=p t t T
x xÕ is given by:
t=1 t t Time Series
AR processes

S T MLE of AR processes
1 µ µ ... µ Forecasting

W µ “0 + µ2 “1 + µ2 ... “p≠1 + µ2 X Appendix


W µ “1 + µ2 “0 + µ2 ... “p≠2 + µ2 X.
Q=W X (78) Stat. Tables
U .. .. .. .. V
. . . ... .
µ “p≠1 + µ2 “p≠2 + µ2 ... “ 0 + µ2
249 / 295
MLE of a Gaussian AR(p) process Statistics and
Econometrics II

J.-P. Renne

Prelim.
Proof (of Proposition 8.4) Rearranging Eq. (77), we have:
CLT
Ô Õ ≠1 Ô Õ Tests
T (b ≠ —) = (X X /T ) T (X Á/T ).
Linear regression
Let us consider the autocovariances of vt = xt Át , denoted by “jv . Using the fact that xt is a linear Assumptions

combination of past Át s and that Át is a white noise, we get that E(Át xt ) = 0. Therefore OLS
Inference
Common pitfalls
v Õ
“j = E(Át Át≠j xt xt≠j ). Large sample
IV
GRM
If j > 0, we have E(Át Át≠j xt xt≠j
Õ
) = E(E[Át Át≠j xt xt≠j
Õ
|Át≠j , xt , xt≠j ]) =
E(Át≠j xt xt≠j
Õ
E[Át |Át≠j , xt , xt≠j ])
= 0. Note that we have E[Át |Át≠j , xt , xt≠j ] = 0 because {Át } is Panels
an i.i.d. white noise sequence (see Remark 1.1). If j = 0, we have: Max.Lik.Estim.

v 2 Õ 2 Õ 2
Binary
“0 = E(Át xt xt ) = E(Át )E(xt xt ) = ‡ Q.
Time Series
Ô Ô qT AR processes
The convergence in distribution of T (X Á/T ) = T T1
Õ
vt results from Theorem 8.1 (applied MLE of AR processes
t=1
to vt = xt Át ), using the “jv computed above. ⌅ Forecasting

Appendix

Stat. Tables

250 / 295
Information Criteria and Lag length selection Statistics and
Econometrics II

J.-P. Renne

I Compromise: Prelim.

I Not enough lags: potential omission of valuable information CLT

contained in older lags. Tests


I Too many lags: "overfitting"/misspecification implying additional
Linear regression
estimation errors (in forecasts). Assumptions

I Different possibilities: OLS


Inference
I F-statistics approach: For an AR model, Common pitfalls
Large sample
(a) Start with a large number of lags (p0 ). IV

(b) If the t-statistics associated to the p0th lag is low, consider a GRM

model with p0 ≠ 1 lags (i.e. go back to (a) with p0 ≠ 1 Panels

instead of p0 ). Max.Lik.Estim.
I Information criteria: AIC, BIC. Binary
Idea: minimisation of a loss function equal to the sum of Time Series
(a) the sum of squared residuals (SSR) and AR processes
MLE of AR processes
(b) a penalty term depending (positively) on the number of Forecasting

parameters. Appendix

Stat. Tables

251 / 295
Statistics and
Definition 8.10 (Information criteria) Econometrics II

J.-P. Renne
The Akaike (AIC), Hannan-Quinn (HQ) and Schwarz information (BIC)
criteria are of the form Prelim.

CLT
ˆ T (k); y)
≠2 log L(◊ k„(i) (T )
(i)
c (k) = + Tests
T T
¸ ˚˙ ˝ ¸ ˚˙ ˝ Linear regression
decreases w.r.t. k increases w.r.t. k Assumptions
OLS

ˆ T (k) denotes the ML estimate of


Inference
with (i) œ {AIC , HQ, BIC } and where ◊ Common pitfalls

◊ 0 (k) –a vector of parameters of length k. For q Ø 0, the model specified by Large sample

k parameters is a particular case of a model of k + q parameters. IV


GRM

Panels
Criterion (i) „(i) (T )
Max.Lik.Estim.
Akaike AIC 2
Hannan-Quinn HQ 2 log(log(T )) Binary
Schwarz BIC log(T ) Time Series
AR processes

The lag suggested by criterion (i) is given by: MLE of AR processes


Forecasting

Appendix
k̂ (i) = argmin c (i) (k). Stat. Tables
k

252 / 295
Case of a normal linear regression Statistics and
Econometrics II
I We consider a linear regression with normal disturbances:
J.-P. Renne

2
yt = xt — + Át , Át ≥ i.i.d. N (0, ‡ ). Prelim.

CLT
The associated log-likelihood is of the form of Eq. (72).
Tests
I In that case, we have:
Linear regression
Assumptions
ˆ T (k); y)
≠2 log L(◊ k„(i) (T )
c (i) (k)
OLS
= +
T T
Inference
Common pitfalls
T
1 ÿ Á2t k„(i) (T )
Large sample

¥ log(2fi) + log(‡‚2 ) + + . IV

T ‡‚2 T GRM

t=1
Panels

I For a large T , for all consistent estimation scheme, we have: Max.Lik.Estim.

Binary
T
1 ÿ Time Series
‡‚2 ¥ Á2t = SSR/T . AR processes
T MLE of AR processes
t=1
Forecasting

Appendix
k„(i) (T )
I Hence k̂ (i) ¥ argmin log(SSR/T ) + . Stat. Tables
k T
I In the case of an ARMA(p,q) process, k = 2 + p + q

253 / 295
Forecasting with AR models Statistics and
Econometrics II

J.-P. Renne

Prelim.
I Macroeconomic forecasts are done in many places: Public CLT
Administration (notably Treasuries), Central Banks, International Tests
Institutions (e.g. IMF, OECD), banks, big firms.
Linear regression
I These institutions are interested in the point estimates (≥ most likely Assumptions
value) of the variable of interest. They also interested in measuring the OLS

uncertainty (≥ dispersion of relatively likely outcomes) associated with Inference

the point estimates.


Common pitfalls
Large sample
IV
GRM
Available forecasts
Panels
I Philly Fed Survey of Professional Forecasters.
Max.Lik.Estim.
I ECB Survey of Professional Forecasters.
Binary
I IMF World Economic Outlook.
Time Series
I OECD Global Economic Outlook. AR processes
MLE of AR processes
I European Commission Economic Forecasts. Forecasting

Appendix

Stat. Tables

254 / 295
persistence in low inflation, but to be broadly balanced
further out. Statistics and
Econometrics II
In light of the economic outlook, at its meeting ending on
3 February the MPC voted to maintain Bank Rate at 0.5% and J.-P. Renne
the stock of purchased assets at £375 billion. The factors
Figure 62: "Fan Chart" frombehind
the that
Bank of are
decision England
set out in the Monetary Policy Prelim.
Summary on pages i–ii of this Report, and in more detail in the
Minutes of the meeting.(1) CLT

Chart 5.2 CPI inflation projection based on market interest Chart 5.3 CPI inflation projection in November based on market
Tests
rate expectations and £375 billion purchased assets interest rate expectations and £375 billion purchased assets
Linear regression
Percentage increase in prices on a year earlier Percentage increase in prices on a year earlier
6 6 Assumptions

5 5
OLS
Inference
4 4
Common pitfalls
3 3
Large sample
2 2 IV
1 1 GRM
+ +
Panels
0 0
– –
1 1

2 2 Max.Lik.Estim.

2011 12 13 14 15 16 17 18 19
3
2011 12 13 14 15 16 17 18 19
3
Binary
Charts 5.2 and 5.3 depict the probability of various outcomes for CPI inflation in the future. They have been conditioned on the assumption that the stock of purchased assets financed by the issuance of central bank reserves
remains at £375 billion throughout the forecast period. If economic circumstances identical to today’s were to prevail on 100 occasions, the MPC’s best collective judgement is that inflation in any particular quarter would lie
within the darkest central band on only 30 of those occasions. The fan charts are constructed so that outturns of inflation are also expected to lie within each pair of the lighter red areas on 30 occasions. In any particular Time Series
quarter of the forecast period, inflation is therefore expected to lie somewhere within the fans on 90 out of 100 occasions. And on the remaining 10 out of 100 occasions inflation can fall anywhere outside the red area of the
fan chart. Over the forecast period, this has been depicted by the light grey background. See the box on pages 48–49 of the May 2002 Inflation Report for a fuller description of the fan chart and what it represents. AR processes
MLE of AR processes

Source: Bank of England Inflation Report,


(1) The Minutes areFeb.
available2016.
at www.bankofengland.co.uk/publications/minutes/
Forecasting
Documents/mpc/pdf/2016/feb.pdf.
Appendix

Stat. Tables

255 / 295
Statistics and
Econometrics II

J.-P. Renne
Context and Objectives
Prelim.
I Context:
CLT
We have observations for yt and xt (potentially multi-dimensional)
variables. Tests
xt may contain lagged values of yt . Linear regression
I Objective: Assumptions
OLS
Inference

Forecasting yt+1 based on a set of information xt . Common pitfalls


Large sample
IV

I Loss function necessary to assess the forecasting method. GRM

Example: Quadratic loss function: Panels

Max.Lik.Estim.
E([yt+1 ≠ yt+1
ú
]2 ) Binary
¸ ˚˙ ˝
Mean square error (MSE) Time Series
AR processes
MLE of AR processes
where yt+1
ú is the forecast of yt+1 (function of xt ). Forecasting

Appendix

Stat. Tables

256 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
Theorem 8.2 (Smallest MSE)
Tests
The smallest MSE is obtained with the expectation of yt+1 conditional on xt . Linear regression
Assumptions
OLS
Proof (Sketch of) Consider yt+1
ú
= g(xt ). We have: Inference
Common pitfalls

ú 2
! ú 2
" Large sample
E([yt+1 ≠ yt+1 ] ) = E [{yt+1 ≠ E(yt+1 |xt )} + {E(yt+1 |xt ) ≠ yt+1 }] . IV
GRM

Expand. Use the law of iterated expectations on the double product (conditioning on xt ); the latter is 0. Panels
ú 2
It implies E([yt+1 ≠ yt+1 ] ) is always larger than E([yt+1 ≠ E(yt+1 |xt )]2 ), and there is equality if
Max.Lik.Estim.
E(yt+1 |xt ) = yt+1
ú
. ⌅
Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

257 / 295
Forecasting an AR(p) process Statistics and
Econometrics II

J.-P. Renne
I Consider the process:
Prelim.
2
yt = c +„1 yt≠1 +„2 yt≠2 +· · ·+„p yt≠p +Át , with Át ≥ i.i.d. N (0, ‡ ). CLT

Tests
I Using the notation of Eq. (64), we have:
Linear regression
Assumptions

yt ≠ µ = F (yt≠1 ≠ µ) + › t . OLS
Inference
Common pitfalls
Hence: yt+h ≠ µ = › t+h + F › t+h≠1 + · · · + F h≠1 › t+1 + F h (yt ≠ µ). Large sample
IV
I Therefore: GRM

Panels
E(yt+h |yt , yt≠1 , . . . ) = µ + F h (yt ≠ µ)
Max.Lik.Estim.
Var ([yt+h ≠ E(yt+h |yt , yt≠1 , . . . )]) = + F F Õ + · · · + F h≠1 (F h≠1 )Õ ,
Binary

where: Time Series


S T
‡2 0 ... AR processes
MLE of AR processes
0 0
=U V. Forecasting
. ..
.. . Appendix

Stat. Tables
Web-interface illustration

258 / 295
Figure 63: Forecasts and 95% confidence intervals when shocks are Gaussian

30

h
25

Et(yt+h) − 1.96 Vart(yt+h)

E(yt)
20

Et(yt+h)
15

Et(yt+h) − 1.96 Vart(yt+h)


10

date of forecast (date t)

0 50 100 150 200

Web-interface illustration
Assessing the performances of a forecasting model Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT
I Once one has fitted a model on a given set of data (of length T , say),
one compute MSE (mean square errors) to evaluate the performance of Tests
the model. Linear regression
I But this MSE is the in-sample one. Assumptions
OLS
I It is easy to reduce in-sample MSE. Typically, if the model is estimated Inference

by OLS, adding covariates mechanically reduces the MSE. Common pitfalls


Large sample
∆ Even if additional data are "irrelevant", the R 2 of the regression IV
increases. GRM

I One also has to analyse out-of-sample performances of the forecasting Panels


model. Max.Lik.Estim.
a. Estimate a model on a sample of reduced size (1, . . . , T ú , with Binary
Tú < T)
b. Use the remaining available periods (T ú + 1, . . . , T ) to compute Time Series

out-of-sample forecasting errors.


AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

260 / 295
Statistics and
Econometrics II
Figure 64: In-sample versus out-of-sample forecasting
J.-P. Renne

p=1 p=2
Prelim.
0.04

0.04
CLT
0.03

0.03
Tests
0.02

0.02
0.01

0.01
Linear regression
0.00

0.00
Assumptions
OLS
Inference
−0.02

−0.02
Common pitfalls
1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 Large sample
IV
p=4 p=7 GRM

Panels
0.04

0.04
0.03

0.03
Max.Lik.Estim.
0.02

0.02

Binary
0.01

0.01
0.00

0.00

Time Series
AR processes
−0.02

−0.02

MLE of AR processes
Forecasting
1970 1980 1990 2000 2010 1970 1980 1990 2000 2010

Appendix
Data: U.S. GDP growth. The blue lines are in-sample forecasts, they are based on AR(p) models
estimated on the whole sample. The red lines are out-of-sample forecasts, they are based on reduced Stat. Tables
samples (from the first date to the date at which the red line begins).

261 / 295
I The previous (out-of-sample) exercise is a form of backtesting. Statistics and
Econometrics II
I Other forms of backtesting aim at testing whether the realisations of the
J.-P. Renne
variable are consistent with (previous) confidence intervals.
I Typically, if realised (ex post) variables tend to be far outside (ex ante) Prelim.
confidence intervals, we may deem the model inadequate.
CLT

Tests

Linear regression
Figure 65: Backtesting a forecasting approach
Assumptions
OLS
Inference

Dubious model Reasonable model Common pitfalls


Large sample
0.04

0.04
IV
GRM
0.03

0.03
Panels
0.02

0.02
Max.Lik.Estim.
0.01

0.01 Binary
0.00

0.00

Time Series
AR processes
MLE of AR processes
−0.02

−0.02

Forecasting

Appendix
1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 Stat. Tables
Model-implied 95% confidence intervals in red.

262 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Appendix Inference
Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

263 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.
Definition 9.1 (Skewness and Kurtosis) CLT

Tests
Let Y be a random variable whose fourth moment exists. The
Linear regression
expectation of Y is denoted by µ. Assumptions

I The skewness of Y is given by: OLS


Inference
Common pitfalls
3
E[(Y ≠ µ) ] Large sample
.
{E[(Y ≠ µ)2 ]}3/2
IV
GRM

Panels
I The kurtosis of Y is given by:
Max.Lik.Estim.
E[(Y ≠ µ)4 ] Binary
.
{E[(Y ≠ µ)2 ]}2 Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

264 / 295
Statistics and
Econometrics II

J.-P. Renne
Definition 9.2 (Eigenvalues)
Prelim.
The eigenvalues of of a matrix M are the numbers ⁄ for which: CLT

|M ≠ ⁄I| = 0, Tests

Linear regression
where | • | is the determinant operator. Assumptions
OLS
Inference

Useful properties of the determinant | • |


Common pitfalls
Large sample
IV
I |MN| = |M| ◊ |N|. GRM

I |M ≠1
| = |M| ≠1
. Panels

Max.Lik.Estim.
I If M admits the diagonal representation M = TDT ≠1 , where D is
Binary
a diagonal matrix whose diagonal entries are {⁄i }i=1,...,n , then:
Time Series
n
Ÿ AR processes

|M ≠ ⁄I| = (⁄i ≠ ⁄). MLE of AR processes


Forecasting
i=1
Appendix

Stat. Tables

265 / 295
Statistics and
Econometrics II

J.-P. Renne

Definition 9.3 (Moore-Penrose inverse) Prelim.


m◊n
If M œ R , then its Moore-Penrose pseudo inverse (exists and) is the CLT

unique matrix M ú œ Rn◊m that satisfies: Tests

Linear regression
(i) MM M = M
ú
Assumptions

(ii) M MM = M
ú ú ú OLS
Inference

(iii) (MM ú )Õ = MM ú Common pitfalls


Large sample

(iv) (M ú M)Õ = M ú M. IV
GRM

Panels
Remark 9.1 (Some properties) Max.Lik.Estim.

I If M is invertible then M ú = M ≠1 . Binary

Time Series
I The pseudo-inverse of a zero matrix is its transpose. AR processes

I The pseudo-inverse of the pseudo-inverse is the original matrix. MLE of AR processes


Forecasting

Appendix

Stat. Tables

266 / 295
Definition 9.4 (F statistics) Statistics and
Econometrics II

Consider n = n1 + n2 i.i.d. N (0, 1) r.v. Xi . If the r.v. F is defined by: J.-P. Renne

qn1 2 Prelim.
Xi n2
F = qn1i=1
+n2 CLT
2
j=n1 +1
Xj n1
Tests

Linear regression
then F ≥ F (n1 , n2 ). Table
Assumptions
OLS
Inference

Definition 9.5 (Student-t distribution) Common pitfalls


Large sample

Z follows a Student-t (or t) distribution with ‹ degrees of freedom (d.f.) if:


IV
GRM

Úq Panels
O ‹
Xi2
i=1 Max.Lik.Estim.
Z = X0 , Xi ≥ i.i.d.N (0, 1).
‹ Binary

Time Series
We have E(Z ) = 0, and Var (Z ) = ‹
‹≠2
if ‹ > 2. Table
AR processes
MLE of AR processes
Forecasting
2
Definition 9.6 (‰ distribution) Appendix
q‹ Stat. Tables
Z follows a ‰2 distribution with ‹ d.f. if Z = i=1
Xi2 where
Xi ≥ i.i.d.N (0, 1). We have E(Z ) = ‹. Table

267 / 295
Statistics and
Definition 9.7 (Cauchy distribution) Econometrics II

J.-P. Renne
The probability distribution function of the Cauchy distribution defined
by a location parameter µ and a scale parameter “ is: Prelim.

CLT
1
f (x ) = 1 # x ≠µ $2 2 . Tests
fi“ 1 + “ Linear regression
Assumptions

The mean and variance of this distribution are undefined.


OLS
Inference
Common pitfalls
Large sample
IV
Cauchy distribution (location=0, scale=1) GRM

Panels
0.30

Max.Lik.Estim.
0.20

Binary

Time Series
0.10

AR processes
MLE of AR processes
0.00

Forecasting

−4 −2 0 2 4 Appendix

Stat. Tables

268 / 295
Statistics and
Econometrics II
Definition 9.8 (Idempotent matrix) J.-P. Renne

2
Matrix M is idempotent if M = M. Prelim.

If M is a symmetric idempotent matrix, then M Õ M = M. CLT

Tests
Proposition 9.1 (Roots of an idempotent matrix) Linear regression
Assumptions

The eigenvalues of an idempotent matrix are either 1 or 0. OLS


Inference
Common pitfalls
Large sample
Proof If ⁄ is an eigenvalue of an idempotent matrix M then ÷x ”= 0 s.t. Mx = ⁄x . Hence IV
M 2 x = ⁄Mx ∆ (1 ≠ ⁄)Mx = 0. Either all element of Mx are zero, in which case ⁄ = 0 or at least GRM
one element of Mx is nonzero, in which case ⁄ = 1. ⌅
Panels

Max.Lik.Estim.
Proposition 9.2 (Idempotent matrix and ‰2 distribution) Binary

Time Series
The rank of a symmetric idempotent matrix is equal to its trace. AR processes
MLE of AR processes
Forecasting
Proof The result follows from Prop. 9.1, combined with the fact that the rank of a symmetric matrix is
equal to the number of its nonzero eigenvalues. ⌅ Appendix

Stat. Tables

269 / 295
Statistics and
Econometrics II

J.-P. Renne

Proposition 9.3 (Constrained least squares) Prelim.

CLT
The solution of the following optimisation problem: Tests
2
min ||y ≠ X—|| Linear regression
— Assumptions

subject to R— = q
OLS
Inference
Common pitfalls

is given by: Large sample


IV
GRM
r
— = — 0 ≠ (X X) Õ ≠1
R {R(X X)
Õ Õ ≠1
R}
Õ ≠1
(R— 0 ≠ q), Panels

Max.Lik.Estim.
where — 0 = (XÕ X)≠1 XÕ y.
Binary

Time Series
Proof See for instance Jackman, 2007. AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

270 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

Theorem 9.1 (Chebychev’s inequality) CLT

Tests
2
E[(X ≠ c) ] Linear regression
’Á > 0, P(|X ≠ c| > Á) Æ . Assumptions
Á2 OLS
Inference
Common pitfalls
Theorem 9.2 (Generalized Chebychev’s inequality) Large sample
IV
r
If E(|X | ) is finite for some r > 0 then, ’Á > 0, GRM
r
]
P(|X ≠ c| > Á) Æ E[|XÁ≠c| r . Panels

Max.Lik.Estim.

Proof Remark that Ár I{|X |ØÁ} Æ |X |r and take the expectation of both sides. ⌅ Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

271 / 295
Statistics and
Econometrics II
p
Definition 9.9 (Convergence in probability, xn æ c) J.-P. Renne

I The random variable sequence xn converges in probability to a Prelim.

constant c if ’Á, limnæŒ P(|xn ≠ c| > Á) = 0. CLT

I It is denoted as: plim xn = c. Tests

Linear regression

Definition 9.10 (Convergence in the Lr norm)


Assumptions
OLS
Inference

xn converges in the r -th mean (or in the Lr -norm) towards x , if E(|xn |r ) Common pitfalls
Large sample
and E(|x |r ) exist and if IV
GRM
r
lim E(|xn ≠ x | ) = 0. Panels
næŒ
Max.Lik.Estim.
In particular, for r = 2, this convergence is called mean square Binary
convergence.
Time Series
AR processes

a.s. MLE of AR processes


Definition 9.11 (Almost sure convergence, xn æ c) Forecasting

The random variable sequence xn converges almost surely to c if Appendix

P(limnæŒ xn = c) = 1. Stat. Tables

272 / 295
Statistics and
Econometrics II
d
Definition 9.12 (Convergence in distribution, xn æ x ) J.-P. Renne

Prelim.
xn is said to converge in distribution (or in law) to x if
CLT
lim Fxn (s) = Fx (s) Tests
næŒ
Linear regression
for all s at which FX –the cumulative distribution of X – is continuous. Assumptions
OLS
Inference

Proposition 9.4 (Rules for limiting distributions) Common pitfalls


Large sample
IV
d p
(i) Slutsky’s theorem: If xn æ x and yn æ c then GRM

Panels
d
xn yn æ xc Max.Lik.Estim.
d
xn + yn æ x +c Binary

d Time Series
xn /yn æ x /c (if c ”= 0) AR processes
MLE of AR processes

d Forecasting
(ii) Continuous mapping theorem: If xn æ x and g is a continuous
Appendix
d
function then g(xn ) æ g(x ). Stat. Tables

273 / 295
Proposition 9.5 (Implications of stochastic convergences) Statistics and
Econometrics II

J.-P. Renne

Prelim.
Ls Lr
æ ∆ æ CLT
1Ær Æs
Tests
»
Linear regression
a.s. p d Assumptions
æ ∆ æ ∆ æ OLS

!p" 1 2 Inference

d Common pitfalls
Proof of the fact that æ ∆ æ . Large sample
IV
p
Assume that Xn æ X . Denoting by F and Fn the c.d.f. of X and Xn , respectively: GRM

Fn (x ) = P(Xn Æ x , X Æ x + Á) + P(Xn Æ x , X > x + Á) Æ F (x + Á) + P(|Xn ≠ X | > Á). (79) Panels


Besides, Max.Lik.Estim.
F (x ≠ Á) = P(X Æ x ≠ Á, Xn Æ x ) + P(X Æ x ≠ Á, Xn > x ) Æ Fn (x ) + P(|Xn ≠ X | > Á),
Binary
which implies:
F (x ≠ Á) ≠ P(|Xn ≠ X | > Á) Æ Fn (x ). (80) Time Series
AR processes
Eqs. (79) and (80) imply:
MLE of AR processes
F (x ≠ Á) ≠ P(|Xn ≠ X | > Á) Æ Fn (x ) Æ F (x + Á) + P(|Xn ≠ X | > Á). Forecasting

Taking limits as n æ Œ yields Appendix


F (x ≠ Á) Æ lim inf Fn (x ) Æ lim sup Fn (x ) Æ F (x + Á). Stat. Tables
næŒ næŒ

The result is then obtained by taking limits as Á æ 0 (if F is continuous at x ). ⌅

274 / 295
Statistics and
Econometrics II

J.-P. Renne
Proposition 9.6 (Convergence in distribution to a constant)
Prelim.
If Xn converges in distribution to a constant c, then Xn converges in CLT
probability to c. Tests

Linear regression
Proof If Á > 0, we have P(Xn < c ≠ Á) æ 0 i.e. P(Xn Ø c ≠ Á) æ 1 and Assumptions
næŒ næŒ OLS
P(Xn < c + Á) æ 1. Therefore P(c ≠ Á Æ Xn < c + Á) æ 1. ⌅
Inference
næŒ næŒ
Common pitfalls
Large sample
IV

Example of plim but not Lr convergence GRM

Panels
I Let {xn }nœN be a series of random variables defined by:
Max.Lik.Estim.

xn = nun , Binary

Time Series
where un are independent random variables s.t. un ≥ B(1/n). AR processes
MLE of AR processes
p Lr
I We have xn æ 0 but xn 9 0 because E(|Xn ≠ 0|) = E(Xn ) = 1. Forecasting

Appendix

Stat. Tables

275 / 295
Statistics and
Econometrics II

J.-P. Renne

Theorem 9.3 (Cauchy criterion (non-stochastic case))


Prelim.
qT
a converges (T æ Œ) iff, for any ÷ > 0, there exists an integer
i=0 i
CLT

N such that, for all M Ø N, Tests


- M - Linear regression
-ÿ -
- - Assumptions

- ai - < ÷. OLS
- - Inference
i=N+1 Common pitfalls
Large sample
IV
Theorem 9.4 (Cauchy criterion (stochastic case)) GRM

qT Panels
◊Á
i=0 i t≠i
converges in mean square (T æ Œ) to a random variable
Max.Lik.Estim.
iff, for any ÷ > 0, there exists an integer N such that, for all M Ø N,
Binary
CA M
B2 D
ÿ Time Series
E ◊i Át≠i < ÷. AR processes
MLE of AR processes
i=N+1 Forecasting

Appendix

Stat. Tables

276 / 295
Statistics and
Econometrics II
Definition 9.13 (Characteristic function)
J.-P. Renne

For any real-valued random variable X , the characteristic function is Prelim.


defined by: CLT
„X : u æ E[exp(iuX )].
Tests

Linear regression
Theorem 9.5 (Law of large numbers) Assumptions
OLS

The sample mean is a consistent estimator of the population mean. Inference


Common pitfalls
Large sample
Proof. Let’s denote by „Xi the characteristic function of a r.v. Xi . If the mean of Xi is µ then the
IV
Talyor expansion of the characteristic function is:
GRM

„Xi (u) = E(exp(iuX )) = 1 + iuµ + o(u). Panels

Max.Lik.Estim.
The properties of the characteristic function (see Def. 9.13) imply that:
Binary
n 1 1 22
Ÿ u u iuµ Time Series
„1 (u) = 1+i µ+o æe .
(X1 +···+Xn ) n n AR processes
n
i=1 MLE of AR processes
Forecasting
iuµ
The facts that (a) e is the characteristic function of the constant µ and (b) that a characteristic Appendix
function uniquely characterises a distribution imply that the sample mean converges in distribution to
Stat. Tables
the constant µ, which further implies that it converges in probability to µ [Exercise]. ⌅

277 / 295
Statistics and
Econometrics II
Theorem 9.6 (Lindberg-Levy Central limit theorem, CLT) J.-P. Renne

If xn is an i.i.d. sequence of random variables with mean µ and variance Prelim.


‡ 2 (œ]0, +Œ[), then: CLT

n Tests
Ô d 2 1ÿ
n(x̄n ≠ µ) æ N (0, ‡ ), where x̄n = xi . Linear regression
n Assumptions
i=1 OLS
Inference
Common pitfalls
Large sample
IV
Ô GRM
Proof Let us introduce the r.v. Yn :=
# ! "$n n(X̄n ≠ µ). We have
„Yn (u) = E exp(i Ô1 u(X1 ≠ µ)) . We have: Panels
n
Max.Lik.Estim.
Ë 1 1 22Èn Ë 1 2Èn
1 1 1 2 2 2
E exp i Ô u(X1 ≠ µ) = E 1 + i Ô u(X1 ≠ µ) ≠ u (X1 ≠ µ) + o(u ) Binary
n n 2n
1 2n Time Series
1 2 2 2 AR processes
= 1≠ u ‡ + o(u ) .
2n MLE of AR processes
Forecasting
! "
Therefore „Yn (u) æ exp ≠ 12 u 2 ‡ 2
, which is the characteristic function of N (0, ‡ 2 ). ⌅ Appendix
næŒ
Stat. Tables

278 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests
Proposition 9.7 (Inverse of a partitioned matrix)
Linear regression
We have: Assumptions
OLS
5 6≠1 Inference
A11 A12 Common pitfalls
=
A21 A22 Large sample
IV
5 6 GRM
(A11 ≠ A12 A≠1
22 A21 )
≠1
11 A12 (A22 ≠ A21 A11 A12 )
≠A≠1 ≠1 ≠1
Panels
≠(A22 ≠ A21 A11 A12 ) A21 A≠1
≠1 ≠1
11 (A22 ≠ A21 A11 A12 )
≠1 ≠1
Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

279 / 295
back Statistics and
Econometrics II
Proposition 9.8 (Independence of a linear and a quadratic form) J.-P. Renne

If A is idempotent and if x is Gaussian, Lx and x Ax are independent if Õ


Prelim.
LA = 0. CLT

Tests
Proof If LA = 0, then the two Gaussian vectors Lx and Ax are independent. This implies the
independence of any function of Lx and any function of Ax. The results then follows from the Linear regression
observation that xÕ Ax = (Ax)Õ (Ax), which is a function of Ax. ⌅ Assumptions
OLS
Inference
Common pitfalls
Proposition 9.9 (Inner product of a multivariate Gaussian variable) Large sample
IV

Let X be a n-dimensional multivariate Gaussian variable: X ≥ N (0, ).


GRM

Panels
We have:
X Õ ≠1 X ≥ ‰2 (n). Max.Lik.Estim.

Binary

Proof Because is a symmetrical definite positive matrix, it admits the spectral decomposition PDP Õ Time Series
where P is an orthogonal matrix (i.e. PP Õ = Id) and D is a diagonal matrix with non-negative entries. AR processes
Ô
Denoting by D ≠1 the diagonal matrix whose Ô diagonal entries are the inverse of those of D, it is easily
MLE of AR processes
Forecasting
checked that the covariance matrix of Y := D ≠1 P Õ X is Id. Therefore Y is a vector of uncorrelated
Gaussian variables. The properties of Gaussian variables imply that the components of Y are then also
q Appendix
independent. Hence Y Õ Y = Yi2 ≥ ‰2 (n).
i
Stat. Tables
It remains to note that Y Õ Y = X Õ PD ≠1 P Õ X = X Õ Var (X )≠1 X to conclude. ⌅

280 / 295
Statistics and
Econometrics II

J.-P. Renne

Theorem 9.7 (Cauchy-Schwarz inequality) Prelim.

CLT
We have: 
|Cov (X , Y )| Æ Var (X )Var (Y ) Tests

Linear regression
and, if X ”== and Y ”= 0, the equality holds iff X and Y are the same Assumptions
up to an affine transformation. OLS
Inference
Common pitfalls
Cov (X ,Y ) Large sample
Proof If Var (X ) = 0, this is trivial. If this is not the case, then let’s define Z as Z = Y ≠ Var (X )
X.
IV
Cov (X ,Y )
It is easily seen that Cov (X , Z ) = 0. Then, the variance of Y = Z + Var (X )
X is equal to the sum of GRM

Cov (X ,Y )
the variance of Z and of the variance of X, that is: Panels
Var (X )

1 22 1 22 Max.Lik.Estim.
Cov (X , Y ) Cov (X , Y )
Var (Y ) = Var (Z ) + Var (X ) Ø Var (X ). Binary
Var (X ) Var (X )
Time Series
Cov (X ,Y )
The equality holds iff Var (Z ) = 0, i.e. iff Y = Var (X )
X + cst. ⌅ AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

281 / 295
Statistics and
Econometrics II

J.-P. Renne
Definition 9.14 (Matrix derivatives)
Prelim.
Consider a fonction f : RK æ R. Its first-order derivative is: CLT
S T
ˆf
ˆb1
(b) Tests

ˆf W .. X Linear regression
(b) = U . V. Assumptions
ˆb
(b)
ˆf OLS
ˆbK Inference
Common pitfalls

We use the notation: Large sample


IV
1 2Õ GRM
ˆf ˆf
(b) = (b) . Panels
ˆbÕ ˆb
Max.Lik.Estim.

Proposition 9.10 Binary

Time Series
I If f (b) = AÕ b where A is a K ◊ 1 vector then ˆf
(b) = A. AR processes
ˆb MLE of AR processes

I If f (b) = b Ab where A is a K ◊ K matrix,


Õ
then ˆb
ˆf
(b) = 2Ab. Forecasting

Appendix

Stat. Tables

282 / 295
Statistics and
Econometrics II

J.-P. Renne

Definition 9.15 (Asymptotic level)


Prelim.

An asymptotic test with critical region n has an asymptotic level equal CLT

to – if: Tests
sup lim P◊ (Sn œ n ) = –, Linear regression
◊œ næŒ
Assumptions

where Sn is the test statistic and is such that the null hypothesis H0 OLS
Inference
is equivalent to ◊ œ . Common pitfalls
Large sample
IV

Definition 9.16 (Asymptotically consistent test) GRM

Panels
An asymptotic test with critical region n is consistent if: Max.Lik.Estim.

c Binary
’◊ œ , P◊ (Sn œ n) æ 1,
Time Series
c
where Sn is the test statistic and is such that the null hypothesis H0 AR processes
MLE of AR processes
/ c.
is equivalent to ◊ œ Forecasting

Appendix

Stat. Tables

283 / 295
Understanding two-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne

Prelim.
Figure 66: Understanding two-sided tests (1/3)
CLT

Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ t(5)
Linear regression
Assumptions
α is the level (or size) of the test. OLS
Inference
0.5

Common pitfalls
Critical region (sum of areas = α)
Large sample
Density of test statistic under Ho

IV
0.4

GRM

Panels
0.3

Max.Lik.Estim.
Φ−1
X (α 2) Φ−1
X (1 − α 2)
Binary
0.2

Time Series
AR processes
0.1

MLE of AR processes
Forecasting
0.0

Appendix
−4 −2 0 2 4 Stat. Tables
Test statistic X

284 / 295
Understanding two-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne

Prelim.
Figure 67: Understanding two-sided tests (2/3)
CLT

Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ t(5)
Linear regression
Assumptions
α is the level (or size) of the test. p−value > α, which implies OLS

that we cannot reject Ho Inference


0.5

Common pitfalls
Sum of areas = p−value
Large sample
Density of test statistic under Ho

IV
0.4

GRM
(Sample−based) Test statistics X
Panels
0.3

Max.Lik.Estim.
Φ−1
X (α 2) Φ−1
X (1 − α 2)
Binary
0.2

Time Series
AR processes
0.1

MLE of AR processes
Forecasting
0.0

Appendix
−4 −2 0 2 4 Stat. Tables
Test statistic X

285 / 295
Understanding two-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne

Prelim.
Figure 68: Understanding two-sided tests (3/3)
CLT

Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ t(5)
Linear regression
Assumptions
α is the level (or size) of the test. p−value < α, which implies OLS

that we reject Ho Inference


0.5

Common pitfalls
Sum of areas = p−value
Large sample
Density of test statistic under Ho

IV
0.4

GRM
(Sample−based) Test statistics X
Panels
0.3

Max.Lik.Estim.
Φ−1
X (α 2) Φ−1
X (1 − α 2)
Binary
0.2

Time Series
AR processes
0.1

MLE of AR processes
Forecasting
0.0

Appendix
−4 −2 0 2 4 Stat. Tables
Test statistic X

286 / 295
Understanding one-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne

Prelim.
Figure 69: Understanding one-sided tests (1/3)
CLT

Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ χ2(5)
Linear regression
Assumptions
α is the level (or size) of the test. OLS
Inference
0.20

Common pitfalls
Critical region (area = α)
Large sample
Density of test statistic under Ho

IV
GRM
0.15

Panels

Max.Lik.Estim.
0.10

Φ−1
X (1 − α)
Binary

Time Series
0.05

AR processes
MLE of AR processes
Forecasting
0.00

Appendix
0 5 10 15 Stat. Tables
Test statistic X

287 / 295
Understanding one-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne

Prelim.
Figure 70: Understanding one-sided tests (2/3)
CLT

Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ χ2(5)
Linear regression
Assumptions
α is the level (or size) of the test. p−value > α, which implies OLS

that we cannot reject Ho Inference


0.20

Common pitfalls
Area = p−value
Large sample
Density of test statistic under Ho

IV
GRM
0.15

(Sample−based) Test statistics X


Panels

Max.Lik.Estim.
0.10

Φ−1
X (1 − α)
Binary

Time Series
0.05

AR processes
MLE of AR processes
Forecasting
0.00

Appendix
0 5 10 15 Stat. Tables
Test statistic X

288 / 295
Understanding one-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne

Prelim.
Figure 71: Understanding one-sided tests (3/3)
CLT

Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ χ2(5)
Linear regression
Assumptions
α is the level (or size) of the test. p−value < α, which implies OLS

that we reject Ho Inference


0.20

Common pitfalls
Large sample
Density of test statistic under Ho

IV
GRM
0.15

Area = p−value
(Sample−based) Test statistics X
Panels

Max.Lik.Estim.
0.10

Φ−1
X (1 − α)
Binary

Time Series
0.05

AR processes
MLE of AR processes
Forecasting
0.00

Appendix
0 5 10 15 Stat. Tables
Test statistic X

289 / 295
Statistics and
Econometrics II

J.-P. Renne

Prelim.

CLT

Tests

Linear regression
Assumptions
OLS

Statistical Tables Inference


Common pitfalls
Large sample
IV
GRM

Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

Appendix

Stat. Tables

290 / 295
Statistical Tables Statistics and
Econometrics II

J.-P. Renne

Figure 72: What is reported in distribution tables?


Prelim.

Statistical tables usually report either probabilities – or quantiles z, the two CLT
being related as follows: Tests

Linear regression
Assumptions
For random variables on R For random variables on R+ OLS
Inference
Common pitfalls
Large sample
Probability α= IV
Probability α=
P(0<X ≤ z) GRM
P(0<X ≤ z)
Panels

Max.Lik.Estim.

Binary

Time Series
AR processes
MLE of AR processes
Forecasting

0 z 0 z Appendix

Stat. Tables

291 / 295
TCL Theorem
Statistics and
Econometrics II
Table 22: Normal distribution N (0, 1) J.-P. Renne

z =y +x Prelim.
x æ 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
y ¿ CLT
0 0.000 0.004 0.008 0.012 0.016 0.020 0.024 0.028 0.032 0.036 Tests
0.1 0.040 0.044 0.048 0.052 0.056 0.060 0.064 0.067 0.071 0.075
0.2 0.079 0.083 0.087 0.091 0.095 0.099 0.103 0.106 0.110 0.114 Linear regression
0.3 0.118 0.122 0.126 0.129 0.133 0.137 0.141 0.144 0.148 0.152 Assumptions
0.4 0.155 0.159 0.163 0.166 0.170 0.174 0.177 0.181 0.184 0.188 OLS
0.5 0.191 0.195 0.198 0.202 0.205 0.209 0.212 0.216 0.219 0.222
Inference
0.6 0.226 0.229 0.232 0.236 0.239 0.242 0.245 0.249 0.252 0.255
Common pitfalls
0.7 0.258 0.261 0.264 0.267 0.270 0.273 0.276 0.279 0.282 0.285
Large sample
0.8 0.288 0.291 0.294 0.297 0.300 0.302 0.305 0.308 0.311 0.313
IV
0.9 0.316 0.319 0.321 0.324 0.326 0.329 0.331 0.334 0.336 0.339
GRM
1.0 0.341 0.344 0.346 0.348 0.351 0.353 0.355 0.358 0.360 0.362
1.1 0.364 0.367 0.369 0.371 0.373 0.375 0.377 0.379 0.381 0.383 Panels
1.2 0.385 0.387 0.389 0.391 0.393 0.394 0.396 0.398 0.400 0.401
1.3 0.403 0.405 0.407 0.408 0.410 0.411 0.413 0.415 0.416 0.418 Max.Lik.Estim.
1.4 0.419 0.421 0.422 0.424 0.425 0.426 0.428 0.429 0.431 0.432
1.5 0.433 0.434 0.436 0.437 0.438 0.439 0.441 0.442 0.443 0.444 Binary
1.6 0.445 0.446 0.447 0.448 0.449 0.451 0.452 0.453 0.454 0.454
1.7 0.455 0.456 0.457 0.458 0.459 0.460 0.461 0.462 0.462 0.463 Time Series
1.8 0.464 0.465 0.466 0.466 0.467 0.468 0.469 0.469 0.470 0.471 AR processes
1.9 0.471 0.472 0.473 0.473 0.474 0.474 0.475 0.476 0.476 0.477 MLE of AR processes
2.0 0.477 0.478 0.478 0.479 0.479 0.480 0.480 0.481 0.481 0.482 Forecasting
2.1 0.482 0.483 0.483 0.483 0.484 0.484 0.485 0.485 0.485 0.486
2.2 0.486 0.486 0.487 0.487 0.487 0.488 0.488 0.488 0.489 0.489 Appendix
2.3 0.489 0.490 0.490 0.490 0.490 0.491 0.491 0.491 0.491 0.492
Stat. Tables
2.4 0.492 0.492 0.492 0.492 0.493 0.493 0.493 0.493 0.493 0.494
2.5 0.494 0.494 0.494 0.494 0.494 0.495 0.495 0.495 0.495 0.495
Notes: If z = x + y and if X ≥ N (0, 1), the cell of the table corresponding to x and y gives
P(0 < X Æ z) (see Slide 291, left chart). 292 / 295
Statistics and
Econometrics II
Table 23: Student-t distribution t(‹) (Def. 9.5) J.-P. Renne

–æ 0.2500 0.4000 0.4500 0.4750 0.4950 0.4995


–ú æ 0.5000 0.8000 0.9000 0.9500 0.9900 0.9990 Prelim.
‹ ¿
CLT
1 1.000 3.078 6.314 12.706 63.657 636.619
2 0.816 1.886 2.920 4.303 9.925 31.599 Tests
3 0.765 1.638 2.353 3.182 5.841 12.924
4 0.741 1.533 2.132 2.776 4.604 8.610 Linear regression
5 0.727 1.476 2.015 2.571 4.032 6.869 Assumptions
6 0.718 1.440 1.943 2.447 3.707 5.959 OLS
7 0.711 1.415 1.895 2.365 3.499 5.408 Inference
8 0.706 1.397 1.860 2.306 3.355 5.041 Common pitfalls
9 0.703 1.383 1.833 2.262 3.250 4.781 Large sample
10 0.700 1.372 1.812 2.228 3.169 4.587 IV
20 0.687 1.325 1.725 2.086 2.845 3.850
GRM
30 0.683 1.310 1.697 2.042 2.750 3.646
40 0.681 1.303 1.684 2.021 2.704 3.551 Panels
50 0.679 1.299 1.676 2.009 2.678 3.496
60 0.679 1.296 1.671 2.000 2.660 3.460 Max.Lik.Estim.
70 0.678 1.294 1.667 1.994 2.648 3.435
80 0.678 1.292 1.664 1.990 2.639 3.416 Binary
90 0.677 1.291 1.662 1.987 2.632 3.402
100 0.677 1.290 1.660 1.984 2.626 3.390 Time Series
200 0.676 1.286 1.653 1.972 2.601 3.340 AR processes

500 0.675 1.283 1.648 1.965 2.586 3.310 MLE of AR processes


Forecasting

Notes: This table reports the quantiles of Student-t distributions with ‹ degrees of freedom. If Appendix
X ≥ t(‹), the cell of the table corresponding to ‹ and – gives z that is such – = P(0 < X Æ z) (see
Stat. Tables
Slide 291, left chart). z is also such that –ú = P(≠z < X Æ z).

293 / 295
Statistics and
Econometrics II
Table 24: ‰2 distribution ‰2 (‹) (Def. 9.6) J.-P. Renne

–æ 0.0500 0.1000 0.7500 0.9000 0.9500 0.9750 0.9900 0.9990 Prelim.


‹ ¿
CLT
1 0.004 0.016 1.323 2.706 3.841 5.024 6.635 10.828
2 0.103 0.211 2.773 4.605 5.991 7.378 9.210 13.816 Tests
3 0.352 0.584 4.108 6.251 7.815 9.348 11.345 16.266
4 0.711 1.064 5.385 7.779 9.488 11.143 13.277 18.467 Linear regression
5 1.145 1.610 6.626 9.236 11.070 12.833 15.086 20.515 Assumptions
6 1.635 2.204 7.841 10.645 12.592 14.449 16.812 22.458 OLS
7 2.167 2.833 9.037 12.017 14.067 16.013 18.475 24.322 Inference
8 2.733 3.490 10.219 13.362 15.507 17.535 20.090 26.124 Common pitfalls
9 3.325 4.168 11.389 14.684 16.919 19.023 21.666 27.877
Large sample
10 3.940 4.865 12.549 15.987 18.307 20.483 23.209 29.588
IV
20 10.851 12.443 23.828 28.412 31.410 34.170 37.566 45.315
GRM
30 18.493 20.599 34.800 40.256 43.773 46.979 50.892 59.703
40 26.509 29.051 45.616 51.805 55.758 59.342 63.691 73.402 Panels
50 34.764 37.689 56.334 63.167 67.505 71.420 76.154 86.661
60 43.188 46.459 66.981 74.397 79.082 83.298 88.379 99.607 Max.Lik.Estim.
70 51.739 55.329 77.577 85.527 90.531 95.023 100.425 112.317
80 60.391 64.278 88.130 96.578 101.879 106.629 112.329 124.839 Binary
90 69.126 73.291 98.650 107.565 113.145 118.136 124.116 137.208
100 77.929 82.358 109.141 118.498 124.342 129.561 135.807 149.449 Time Series
200 168.279 174.835 213.102 226.021 233.994 241.058 249.445 267.541 AR processes

500 449.147 459.926 520.950 540.930 553.127 563.852 576.493 603.446 MLE of AR processes
Forecasting

Notes: This table reports the quantiles of ‰2 distributions with ‹ degrees of freedom. If X ≥ ‰2 (‹), Appendix
the cell of the table corresponding to ‹ and – gives z that is such – = P(0 < X Æ z) (see Slide 291, Stat. Tables
right chart).

294 / 295
Statistics and
Table 25: F distribution (Def. 9.4) Econometrics II

J.-P. Renne
n1 æ 1 2 3 4 5 6 7 8 9 10
n2 ¿ Prelim.
– = 0.90
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 CLT
10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 Tests
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06
20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 1.10 1.96 1.94 Linear regression
50 2.81 2.41 2.20 2.06 1.97 1.90 1.84 1.80 1.76 1.73 Assumptions
100 2.76 2.36 2.14 2.00 1.91 1.83 1.78 1.73 1.69 1.66 OLS
500 2.72 2.31 2.09 1.96 1.86 1.79 1.73 1.68 1.64 1.61
Inference
– = 0.95
Common pitfalls
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 Large sample
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
IV
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
GRM
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03 Panels
100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85 Max.Lik.Estim.
– = 0.99
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 Binary
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85
Time Series
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 AR processes

50 7.17 5.06 4.20 3.72 3.41 3.19 3.02 2.89 2.78 2.70 MLE of AR processes

100 6.90 4.82 3.98 3.51 3.21 2.99 2.82 2.69 2.59 2.50 Forecasting

500 6.69 4.65 3.82 3.36 3.05 2.84 2.68 2.55 2.44 2.36
Appendix

Notes: This table reports the quantiles of F distributions with (n1 , n2 ) degrees of freedom. If Stat. Tables
X ≥ F (n1 , n2 ), the cell of the table corresponding to n1 , n2 and a given – gives z that is such
– = P(0 < X Æ z) (see Slide 291, right chart).

295 / 295

You might also like