Cours 2020

Statistics and Econometrics II
Jean-Paul Renne
HEC Lausanne
Organisational issues Statistics and
Econometrics II
J.-P. Renne
Prelim.
I Different types of sessions: CLT
Tests
I Lectures (#19)
I Exercise sessions (#7) Linear regression
I Mid-term exam + correction (#2) Assumptions
OLS
I Teaching Assistants: Giacomo Mangiante and Elena Sudan. Inference

Common pitfalls
I Moodle page. Large sample

IV
I Evaluation: GRM
Panels
I Two exams: mid-term and final exam.
Max.Lik.Estim.
I Mid-term exam: 40% of the total grade if it is higher than the
grade obtained at the final exam (0% otherwise). Binary
I Pen-and-paper exams. Time Series
I A4 sheet (home-made) with pertinent information is allowed; AR processes
calculator not allowed. MLE of AR processes

Forecasting
Appendix
Stat. Tables
2 / 295
Outline of course Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
1. Key Concepts (revisions) Tests
Linear regression
2. Central Limit Theorem Assumptions
OLS
3. Tests Inference
Common pitfalls
4. Linear Regressions Large sample

IV
5. Panel Regressions GRM
Panels
6. Maximum Likelihood Estimation Max.Lik.Estim.
7. Binary-Choice Models Binary
8. Introduction to Time Series Time Series

AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
3 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Key Concepts (revisions) Inference

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
4 / 295
Definition 1.1 (Cumulative distribution function (c.d.f.)) Statistics and
Econometrics II
The random variable (r.v.) X admits the cumulative distribution function F if, J.-P. Renne
for all a:
Prelim.
F (a) = P(X Æ a), ’a.
CLT
Definition 1.2 (Probability density function (p.d.f.)) Tests
Linear regression
A continuous random variable X admits the probability density function f if: Assumptions
OLS
⁄ b Inference
Common pitfalls
P(a < X Æ b) = f (x )dx , ’a Æ b, Large sample
a IV
GRM
where f (x ) Ø 0 for all x . Panels
Max.Lik.Estim.
I In particular (if this limit exists), we have:
Binary
P(x < X Æ x + Á) F (x + Á) ≠ F (x ) Time Series

f (x ) = lim = lim . (1) AR processes
Áæ0 Á Áæ0 Á MLE of AR processes
Forecasting
and ⁄ a
Appendix
F (a) = f (x )dx .
Stat. Tables
≠Œ
Web-interface illustration Extra material on p.d.f. estimation (kernel approaches)
5 / 295
Statistics and
Definition 1.3 (Joint cumulative distribution function (c.d.f.)) Econometrics II
J.-P. Renne
The random variables X and Y admit the joint cumulative distribution
function FXY if, for all a and b: Prelim.
CLT
FXY (a, b) = P(X Æ a, Y Æ b).
Tests
Linear regression
Definition 1.4 (Joint probability density function (p.d.f.)) Assumptions
OLS
The continuous random variables X and Y admit the joint p.d.f. fXY , where Inference
fXY (x , y ) Ø 0 for all x and y , if: Common pitfalls

Large sample
⁄ b⁄ d
IV
GRM
P(a < X Æ b, c < Y Æ d) = fXY (x , y )dxdy , ’a Æ b, c Æ d. Panels

a c
Max.Lik.Estim.
I In particular (if this limit exists), we have: Binary
Time Series
fXY (x , y ) AR processes
MLE of AR processes
P(x < X Æ x + Á, y < Y Æ y + Á) Forecasting
= lim
Áæ0 Á2 Appendix
FXY (x + Á, y + Á) ≠ FXY (x , y + Á) ≠ FXY (x + Á, y ) + FXY (x , y ) Stat. Tables
= lim .
Áæ0 Á2
6 / 295
Statistics and
Econometrics II
Figure 1: Joint normal distribution: c.d.f. illustration J.-P. Renne
Prelim.
CLT
Tests
0.15 Linear regression

Assumptions
joint de
OLS
0.10
Inference
ns
Common pitfalls
ity
0.05 Large sample

−3
IV
−2
GRM
−1
Panels
−3
0
−2
X
−1
0
1
Max.Lik.Estim.
Y
1 2
Binary
2
3
3 Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
I The volume between the horizontal plane (z = 0) and the surface is Stat. Tables
equal to FXY (0.5, 1).
7 / 295
Statistics and
Figure 2: Joint normal distribution Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
0.15 Linear regression

Assumptions
joint
0.10 OLS
densi
Inference
3
ty
0.05 Common pitfalls

2
Large sample
1
IV
−3
−2
0 GRM
Y
Panels
−1 −1
0
X
1 −2 Max.Lik.Estim.
2
−3 Binary
3
Time Series
AR processes
MLE of AR processes
I Assume that the basis of the black column is defined by those points Forecasting
whose x -coordinates are between x and x + Á and y -coordinates are Appendix

between y and y + Á. Stat. Tables
∆ Then the volume of the black column is equal to
P(x < X Æ x + Á, y < Y Æ y + Á), which is approximately equal to
fXY (x , y )Á2 if Á is small.
8 / 295
Statistics and
Definition 1.5 (Conditional probability distribution function) Econometrics II
If X and Y are continuous r.v., then the distribution of X conditional on J.-P. Renne
Y = y , is denoted by fX |Y (x , y ) and satisfies:

Prelim.
CLT
P(x < X Æ x + Á|Y = y )
fX |Y (x , y ) = lim . Tests
Áæ0 Á
Linear regression
Proposition 1.1 (Conditional probability distribution function)

Assumptions
OLS
Inference
We have Common pitfalls
fXY (x , y ) Large sample

fX |Y (x , y ) = . IV
fY (y ) GRM
Panels
Proof We have:
Max.Lik.Estim.
P(x < X Æ x + Á|Y = y ) Binary
fX |Y (x , y ) = lim
Áæ0 Á
Time Series
1 AR processes
= lim P(x < X Æ x + Á|y < Y Æ y + Á)
Áæ0 Á MLE of AR processes
Forecasting
1 P(x < X Æ x + Á, y < Y Æ y + Á)
= lim
Áæ0 Á P(y < Y Æ y + Á) Appendix
1 Á2 fXY (x , y ) Stat. Tables

= lim .⌅
Áæ0 Á ÁfY (y )
9 / 295
Statistics and
Econometrics II
Definition 1.6 (Independent random variables) J.-P. Renne
Consider two r.v., X and Y , with c.d.f. and p.d.f. FX , FY , fX and fY . Prelim.
These random variables are independent if and only if (iff) the joint c.d.f. of CLT
X and Y (see Def. 1.3) is given by: Tests
Linear regression
FXY (x , y ) = FX (x ) ◊ FY (y ) Assumptions
OLS
Inference
or, equivalently, iff the joint p.d.f. of (X , Y ) (see Def. 1.4) is given by: Common pitfalls
Large sample
fXY (x , y ) = fX (x ) ◊ fY (y ). IV
GRM
Panels
Remark 1.1 Max.Lik.Estim.
I If X and Y are independent, fX |Y (x , y ) = fX (x ), implying, in particular, Binary

E(g(X )|Y ) = E(g(X )) where g is any function. Time Series
I If X and Y are independent, then E(g(X )h(Y )) = E(g(X ))E(h(Y )) AR processes
and Cov (g(X ), h(Y )) = 0 where g and h are any functions.

MLE of AR processes
Forecasting
I If X and Y are uncorrelated, they are not necessarily independent Appendix

[Example: X = Y 2 ≠ 1 and Y ≥ N (0, 1)].
Stat. Tables
10 / 295
Proposition 1.2 (Law of iterated expectations) Statistics and
Econometrics II
J.-P. Renne
If X and Y are two random variables and if E(|X |) < Œ, we have:
Prelim.
E(X ) = E(E(X |Y )).
CLT
Tests
Proof (in the case where the p.d.f. of (X , Y ) exists) Let us denote by fX , fY and fXY the probability
distribution functions (p.d.f.) of X , Y and (X , Y ), respectively. We have: Linear regression
⁄ Assumptions
OLS
fX (x ) = fXY (x , y )dy . Inference

Common pitfalls
Large sample
Besides, we have (Bayes equality, Prop. 1.1): IV
GRM
fXY (x , y ) = fX |Y (x , y )fY (y ).
Panels
Therefore:
Max.Lik.Estim.
⁄ ⁄ ⁄ ⁄ ⁄
Binary
E(X ) = xfX (x )dx = x fXY (x , y )dy dx = xfX |Y (x , y )fY (y )dydx
Time Series
¸ ˚˙ ˝ AR processes
=fX (x ) MLE of AR processes
⁄ 3⁄ 4 Forecasting
= xfX |Y (x , y )dx fY (y )dy = E (E[X |Y ]) .⌅ Appendix
¸ ˚˙ ˝ Stat. Tables
E[X |Y =y ]
11 / 295
Example 1.1 (Mixture of Gaussian distributions (1/2)) Statistics and
Econometrics II
X is drawn from a mixture of Gaussian distributions if: J.-P. Renne
Prelim.
X = B ◊ Y1 + (1 ≠ B) ◊ Y2 ,
CLT
where B, Y1 and Y2 are three independent variables: Tests
B ≥ Bernoulli(p), Y1 ≥ N (µ1 , ‡12 ) and Y2 ≥ N (µ2 , ‡22 ). Web-interface
Linear regression
Assumptions
OLS
p=0.5, µ1=−1, σ1=1, µ2=2, σ2=1, p=0.1, µ1=−2, σ1=0.5, µ2=2, σ2=1, p=0.3, µ1=−2, σ1=2, µ2=1, σ2=1, Inference
Common pitfalls
0.30
0.20
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Large sample
IV
0.25
0.15
GRM
0.20
Panels
0.15
0.10
Max.Lik.Estim.
0.10
0.05
Binary
0.05
Time Series
0.00
0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 AR processes
MLE of AR processes
Forecasting
The law of iterated expectations gives: Appendix
Stat. Tables
E(X ) = E(E(X |B)) = E(1{B=1} µ1 + 1{B=0} µ2 ) = pµ1 + (1 ≠ p)µ2 .
¸ ˚˙ ˝ ¸ ˚˙ ˝
=B =1≠B
12 / 295
Statistics and
Econometrics II
Example 1.2 (Buffon (1733)’s needles and fi)
J.-P. Renne
Suppose we have a floor made of
parallel strips of wood, each the same Prelim.
width [w = 1], and we drop a needle CLT

onto the floor. What is the probability Tests
that the needle lands on a line?
Linear regression
Assumption: needles’ length = 1/2. Assumptions
OLS
Let’s define the random variable X by Inference

Common pitfalls
Large sample
Ó
1 if the needle crosses a line
angle θ
IV
X =
d
0 otherwise
GRM
Panels
Conditionally on ◊:
w=1
Max.Lik.Estim.
●
E(X |◊) = cos(◊)/2. Binary
◊ is uniformly distributed on Time Series

[≠fi/2, fi/2], therefore: AR processes
⁄ fi/2 1 2 MLE of AR processes
1 d◊ 1 Forecasting
E(X ) = E(E(X |◊)) = E(cos(◊)/2) = cos(◊) = .
≠fi/2
2 fi fi Appendix
Stat. Tables
Web-interface illustration
13 / 295
Statistics and
Example 1.3 (Davis and Lo’s (2001) model) Econometrics II
Davis and Lo (2001) introduce a contagion model and propose applications in J.-P. Renne
terms of credit-risk management.
Prelim.
They consider n entities. The state of entity i is denoted by di : di = 0 if CLT
entity i stays non-infected and 1 if it gets infected. There are two ways of
getting infected: Tests
I direct, or autonomous, infection; Linear regression

Assumptions
I indirect infection, or contamination. OLS
Inference
Direct infection and contamination are modelled through two types of random Common pitfalls
Large sample
variables: Xi s (n of them) and Yji s (n2 of them). These n(n + 1) variables are IV
independently Bernoulli distributed. The Bernoulli parameter is p for the Xi s GRM
and q for the Yji s. We have: Panels
A B Max.Lik.Estim.
Ÿ
Binary
di = Xi +(1 ≠ Xi ) 1≠ (1 ≠ Xj Yji ) .
¸˚˙˝ Time Series
j”=i
autonomous infection ¸ ˚˙ ˝ AR processes
contamination
MLE of AR processes
Forecasting
Appendix
In order to determine the distribution of the number of infected entities, it is
convenient to first reason conditionally
q on the number of autonomous Stat. Tables
infections r defined by r = X.
i i
14 / 295
Statistics and
Proposition 1.3 (Law of total variance) Econometrics II
J.-P. Renne
If X and Y are two random variables and if the variance of X is finite,
we have: Prelim.
Var (X ) = E(Var (X |Y )) + Var (E(X |Y )). CLT
Tests
Proof Linear regression

Assumptions
2 2
Var (X ) = E(X ) ≠ E(X ) OLS
Inference
2 2
= E(E(X |Y )) ≠ E(X ) Common pitfalls
2 2 2 2 Large sample
= E(E(X |Y )≠E(X |Y ) ) + E(E(X |Y ) ) ≠ E(X )
IV
2 2 2 2
= E(E(X |Y ) ≠ E(X |Y ) ) + E(E(X |Y ) ) ≠ E(E(X |Y )) .⌅ GRM
¸ ˚˙ ˝ ¸ ˚˙ ˝
Panels
Var (X |Y ) Var (E(X |Y ))
Max.Lik.Estim.
Binary
Example 1.4 (Mixture of Gaussian distributions (2/2)) Time Series
Consider the case of a mixture of Gaussian distributions (Example 1.1).

AR processes
MLE of AR processes
We have: Forecasting
Appendix
Var (X ) = E(Var (X |B)) + Var (E(X |B))
Stat. Tables
= p‡12 + (1 ≠ p)‡22 + p(1 ≠ p)(µ1 ≠ µ2 )2 .
15 / 295
About consistent estimators Statistics and
Econometrics II
J.-P. Renne
I The objective of econometrics is to estimate parameters out of
data observations (samples). Prelim.
I Except in degenerate cases, the estimates are different from the CLT
“true” value. Tests
I A good estimator is expected to “converge” to the true value Linear regression

when the sample size increases: consistency. Assumptions
OLS
I Denote by ◊ˆn the estimate of ◊ based on a sample of length n. Inference
◊ˆ is consistent if, for any Á > 0 (even if very small), the

Common pitfalls
I
Large sample
probability that ◊ˆn is in [◊ ≠ Á, ◊ + Á] goes to 1 when n goes to Œ: IV
GRM
! "
lim P ◊ˆn œ [◊ ≠ Á, ◊ + Á] = 1 Panels
næ+Œ
Max.Lik.Estim.
I That is, ◊ˆ is a convergent estimator if ◊ˆn converges in probability Binary
(Def. 9.9) to ◊.
Time Series
AR processes
MLE of AR processes
Example 1.5 Forecasting
Examples of parameters to be estimated: causal effect of a variable on Appendix
another, elasticities, parameters defining some distribution of interest, Stat. Tables

preference parameters (risk aversion)...
16 / 295
Example 1.6 (Example of non-convergent estimator) Statistics and
Econometrics II
I Assume that Xi ≥ i.i.d.Cauchy with a location parameter of 1 and J.-P. Renne
a scale parameter of 1 (Def. 9.7).
Prelim.
I The sample mean
n CLT
1ÿ
X̄n = Xi Tests
n
i=1 Linear regression
does not converge in probability.

Assumptions
OLS
(This is because a Cauchy distribution has no mean, hence the Inference

Common pitfalls
law of large numbers cannot apply.) Large sample
IV
GRM
Xn Xn Panels
Max.Lik.Estim.
1
Binary
−1000 1000
Time Series
−1
AR processes
MLE of AR processes
−2
Forecasting
−3
−4000
Appendix
Stat. Tables
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
sample size (n) sample size (n)
17 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Central Limit Theorem Inference

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
18 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Definition 2.1 (Sample and Population) CLT
In statistics, a sample refers to a set of observations drawn from a Tests
population. Linear regression

Assumptions
OLS
Remarks Inference
Common pitfalls
I Most of the time, it is necessary to use samples for research, Large sample
IV
because it is impractical to study the whole population. GRM
I Suppose one wants to know the average height of 20-year-old Panels
Swiss students: Not possible to get the heights of all of the Max.Lik.Estim.
20-year-old students in Switzerland, but possible to get them for a Binary
sample of them. Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
19 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 3: Data (red bars) and estimated density of men heights
Prelim.
CLT
0.06
Tests
0.05
Linear regression
Assumptions
0.04
OLS
Inference
frequency
Common pitfalls
0.03
Large sample
IV
0.02
GRM
Panels
0.01
Max.Lik.Estim.
0.00
Binary
160 170 180 190 200 Time Series

AR processes
height MLE of AR processes
Forecasting
Source of the heights data: Davis (1990).
The distribution is estimated by employing the kernel approach Appendix
(see extra material on p.d.f. estimation by kernel approach). Stat. Tables
20 / 295
Statistics and
Econometrics II
J.-P. Renne
I The central limit theorem (CLT) and the law of large numbers
(LLN) are (the) two fundamental theorems of probability. Prelim.
I LLN: The sample mean converges to the population mean (when CLT
it exists). Tests
I CLT: The distribution of the sum (or average) of a large number Linear regression
of independent, identically distributed (i.i.d.) variables with finite Assumptions
OLS
variance is approximately normal, regardless of the underlying Inference
distribution. Common pitfalls

Large sample
I These theorems are at the basis of most statistical procedures. IV

GRM
Panels
Max.Lik.Estim.
Example 2.1 Binary
I Consider N classes of 20-year-old students. For each of the N Time Series

AR processes
classes, we compute the average size of the students. These N MLE of AR processes
values can be seen as draws from an (approximate) Gaussian Forecasting
distribution. Appendix
Stat. Tables
21 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Theorem 2.1 (Law of large numbers)
Tests
The sample mean is a consistent estimator of the population mean. Linear regression
Assumptions
OLS
Inference
Common pitfalls
Remarks Large sample

IV
I Consistent estimator: convergence in proba. (Slide 16, Def. 9.9).

GRM
Panels
I The assumption of finite variance is not necessary.
Max.Lik.Estim.
Binary
Proof. See extra material.
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
22 / 295
Statistics and
Econometrics II
J.-P. Renne
I How many students should we measure to “correctly” estimate Prelim.

the average height µ? CLT
I It depends on the required precision. Tests
1
qn
I The LLN states that the sample average x̄n = xi converges Linear regression
n i=1
to µ.
Assumptions
OLS
Inference
I The CLT tells us about the accuracy of the estimator x̄n . Common pitfalls
Large sample
IV
GRM
Consider Buffon’s experiment (Example 1.2) Panels
I What is the minimal number n of needles that one has to throw to Max.Lik.Estim.
get an estimate fî (= 1/X̄n ) of fi that is such that there is a 95% Binary
chance that fi œ [fî ≠ 0.01, fî + 0.01]? Time Series

AR processes
I The central limit theorem will help find an approximate solution. MLE of AR processes
Forecasting
Appendix
Stat. Tables
23 / 295
Statistics and
Econometrics II
J.-P. Renne
I How far can we go without the CLT? Prelim.
CLT
I Assume xi ≥ i.i.d. L(µ, ‡ 2 ), where L(µ, ‡) denotes a distribution
2
of mean µ and variance ‡ . Tests
Linear regression
I We have: Assumptions
1
OLS
E(x̄n ) = µ and Var (µ ≠ x̄n ) = ‡ 2 æ 0 Inference
n næŒ Common pitfalls

Large sample
IV
I The Chebychev inequality (Thm. 9.1) implies that, for any Á > 0 GRM
(even very small): Panels
Max.Lik.Estim.
Var (µ ≠ x̄n ) ‡2
P(|x̄n ≠ µ| > Á) Æ = 2 æ 0. Binary
Á 2 nÁ næŒ
Time Series
I How does the CLT complement this information? AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
24 / 295
Statistics and
Econometrics II
J.-P. Renne
Theorem 2.2 (Lindberg-Levy Central limit theorem, CLT) Prelim.
CLT
If xn is an i.i.d. sequence of random variables with mean µ and variance Tests
‡ 2 (œ]0, +Œ[), then:
Linear regression
Assumptions
n
Ô d 2 1ÿ OLS
n(x̄n ≠ µ) æ N (0, ‡ ), where x̄n = xi . Inference
n Common pitfalls
i=1 Large sample
IV
GRM
d
[“æ” denotes the convergence in distribution (see Def. 9.12)] Panels
Max.Lik.Estim.
Binary
Proof. See online extra material.
Time Series
AR processes
Web-interface illustration N Statistical Table MLE of AR processes

Forecasting
Appendix
Stat. Tables
25 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
History of CLT
Tests
I First version of this theorem: postulated by the mathematician
Linear regression
Abraham de Moivre. In an article published in 1733, he used the Assumptions
normal distribution to approximate the distribution of the number OLS

Inference
of heads resulting from many tosses of a fair coin. Common pitfalls
Large sample
I This finding was nearly forgotten until Pierre-Simon Laplace IV
rescued it from obscurity in his work Théorie analytique des GRM
probabilités, published in 1812. Panels
I Laplace expanded De Moivre’s finding by approximating the Max.Lik.Estim.
binomial distribution with the normal distribution. Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
26 / 295
Statistics and
Econometrics II
Figure 4: Distribution of sample means, heights example (see Fig. 3) J.-P. Renne
Prelim.
CLT
0.10
Height distribution (centered)

Tests
Distribution of simulated sample means
0.08
Linear regression
Assumptions
OLS
0.06
frequency
Inference
Common pitfalls
Large sample
0.04
IV
GRM
0.02
Panels
Max.Lik.Estim.
0.00
Binary
−20 −10 0 10 20 Time Series
AR processes
(centered) height MLE of AR processes
Forecasting
Appendix
Notes: To obtain the black distribution, we simulate a large number (4000) of samples of n = 100
points using the red distributions. For each sample, we compute the sample mean.ÔThe black curve is Stat. Tables
the distribution of the (simulated) sample means, after having multiplied them by n = 10.
27 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Theorem 2.3 (Multivariate Central limit theorem, CLT) Tests
Linear regression
If xn is an i.i.d. sequence of m-dimensional random variables with mean Assumptions
µ and variance , where is positive definite and finite, then: OLS

Inference
Common pitfalls
n
1ÿ
Large sample
Ô d
n(x̄n ≠ µ) æ N (0, ), where x̄n = xi . IV
n GRM
i=1
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
28 / 295
Statistics and
Econometrics II
J.-P. Renne
I Whatever the distribution L of the xi ’s (it just has to feature an Prelim.

average µ and a variance ‡ 2 ), the error between x̄n and µ can be CLT
seen as a random variable that is normally
Ô distributed with a mean Tests
of zero and a standard deviation of ‡/ n.
Ô Linear regression
I We say that the convergence rate is “in n”. Assumptions
OLS
I The good news is: we know a lot about the normal distribution Inference
(contrary to L in the general case).

Common pitfalls
N Statistical Table
Large sample
I This allows us to design statistical tests.

IV
GRM
Panels
Max.Lik.Estim.
Computer-free simulation of N Binary
Time Series
I Assume you have a lot of time and a coin. How would you
AR processes
approximate the 100 percentiles of the N (0, 1) distribution? MLE of AR processes
Forecasting
Appendix
Stat. Tables
29 / 295
Statistics and
Example 2.2 (Comparison of sample means) Econometrics II
J.-P. Renne
I The question is:
Is the average of hourly wage rates the same for men (µm ) and Prelim.
women (µw )? CLT
I Wage sample for men: {m1 , . . . , mnm }. Tests
Linear regression
I Wage sample for women: {w1 , . . . , wnw }.
Assumptions
I CLT: OLS
Inference
Common pitfalls
Ô 2 Ô
nm (m̄ ≠ µm ) ≥ N (0, ‡m ) and nw (w̄ ≠ µw ) ≥ N (0, ‡w2 ), Large sample
IV
GRM
or
2 Panels
m̄ ≥ N (µm , ‡m /nm ) and w̄ ≥ N (µw , ‡w2 /nw ).
Max.Lik.Estim.
I Because m̄ and w̄ are based on randomly selected samples, they
Binary
are independent (Def. 1.6) and:
Time Series
AR processes
! 2
" ‡2 ‡2
m̄ ≠ w̄ ≥ N µm ≠ µw , ‡ with ‡2 = m + w .
MLE of AR processes
nm nw Forecasting
Appendix
I In particular, under the assumption that µm ≠ µw = 0, the
Stat. Tables
probability that m̄ ≠ w̄ is in [≠2‡, 2‡] is approximately 0.95.
30 / 295
Statistics and
Econometrics II
J.-P. Renne
Example 2.2 (Comparison of sample means (continued))

Prelim.
CLT
Figure 5: Distributions of hourly wage rates
Tests
(Case A) education>12 years (Case B) education>18 years
Linear regression
Men Men Assumptions
Women Women OLS
0.06
0.06
Inference
Common pitfalls
Large sample
0.04
0.04
frequency
frequency
IV
GRM
Panels
0.02
0.02
Max.Lik.Estim.
0.00
0.00
Binary
0 10 20 30 40 50 0 10 20 30 40 50 60
Time Series
hourly wage rate hourly wage rate AR processes
MLE of AR processes
Source of the data: Fox (2008), Canadian Survey of Labour and Income Dynamics, Ontario province. Forecasting
Appendix
Stat. Tables
31 / 295
Statistics and
Econometrics II
J.-P. Renne
Example 2.2 (Comparison of sample means (continued))
Prelim.
I Case A (Education > 12 years):
Ô CLT
We have m̄ ≠ w̄ = 3.44, ‡ 2 = 0.33.
Tests
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠0.65, 0.65]) ¥ 5%.
Linear regression
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠0.85, 0.85]) ¥ 1%.
Assumptions
I If µm ≠ µw = 0, P(m̄ ≠ w̄ > 3.44) ¥ 0.0000001%.
¸ ˚˙ ˝ OLS
Inference
p-value Common pitfalls
Large sample
N Statistical Table
IV
GRM
Panels
I Case B (Education > 18 years):
Ô Max.Lik.Estim.
We have m̄ ≠ w̄ = 2.34, ‡ 2 = 1.22. Binary
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠2.39, 2.39]) ¥ 5%. Time Series
I If µm ≠ µw = 0, P(m̄ ≠ w̄ œ
/ [≠3.14, 3.14]) ¥ 1%. AR processes
I If µm ≠ µw = 0, P(m̄ ≠ w̄ > 2.34) ¥ 2.7% . MLE of AR processes
¸˚˙˝ Forecasting
p-value Appendix
Stat. Tables
32 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Tests Inference
Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
33 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Context and objective
CLT
I Context: We consider a vector of parameters ◊.
Tests
I Objective: We want to test a specific hypothesis about ◊. Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
Definition 3.1 (Hypothesis) GRM
I Hypothesis = conjecture about a given property of a population. Panels
Max.Lik.Estim.
I We consider:
Binary
a null hypothesis (H0 : {◊ œ }) and
Time Series
c
an alternative hypothesis (H1 : {◊ œ
/ }, i.e. H1 : {◊ œ }). AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
34 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I Researchers may want to find support for the null hypothesis H0 Tests
as well as for the alternative hypothesis H1 . Linear regression
I General idea:
Assumptions
OLS
The null is assumed to be true;

Inference
I
Common pitfalls
I Under this assumption, one look for the distribution of a given Large sample
statistic; IV
I The statistic is computed using observed data; GRM
I If the obtained statistic is located in some “implausible” parts of Panels

the distribution, H0 is rejected; otherwise it is accepted. Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
35 / 295
Statistics and
I When testing a hypothesis, one can make two types of errors: Econometrics II
J.-P. Renne
Type I Reject H0 while it is true = False Positive (FP).
Type II Accept H0 while it is false = False Negative (FN).
Prelim.
CLT
Example 3.1 (A practical illustration of type-I and type-II errors) Tests
I Early warning systems (EWS): approaches designed to warn policy Linear regression
makers of potential future economic and financial crises. See e.g.

Assumptions
OLS
ECB (2014). Inference

Common pitfalls
I Researchers look for signals/approaches forecasting crises. Large sample
IV
H1 : A crisis is coming.
GRM
Panels
I Four cases: Max.Lik.Estim.
I True Positive: The approach indicates that a crisis is coming; Binary

which turns out to be true. Time Series
I False Positive: The approach indicates that a crisis is coming; AR processes
which turns out to be false. MLE of AR processes
True Negative: The approach indicates that no crisis is coming;

Forecasting
I
which turns out to be true. Appendix
I False Negative: The approach indicates that no crisis is coming; Stat. Tables
which turns out to be false.
36 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Definition 3.2 (Size and Power of a test)
CLT
For a given test, Tests
I the probability of type-I errors, denoted by –, is called the size (or Linear regression
the significance level) of the test,

Assumptions
OLS
I the power of a test is equal to 1 ≠ —, where — is the probability of

Inference
Common pitfalls
type-II errors. Large sample

IV
GRM
I For a given size, we prefer tests with high power. Power = Panels
probability that the test will lead to the rejection of the null Max.Lik.Estim.
hypothesis if the latter is false (almost ruling out false negatives). Binary
I √ size ∆ √ Type-I error but, often, ¬ Type-II error . Time Series

AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
37 / 295
Statistics and
Econometrics II
J.-P. Renne
I A test involves:
I a vector of (unknown) parameters (◊), Prelim.
I a test statistic (say S) and CLT
I a critical region (say ).
Tests
I H0 expressed some constraints on ◊, formulated as h(◊) = 0.
Linear regression
I The computation of S is based on a set of observed data Y (i.e. Assumptions
OLS
S © S(Y)). Inference
Common pitfalls
I Decision: Large sample
IV
I H0 is accepted if S ”œ ; GRM
I H0 is rejected if S œ .
Panels
I Notations: Max.Lik.Estim.
Binary
– = P(S œ |H0 ) (Proba. of false positive) (2)
Time Series
— = P(S ”œ |H1 ) (Proba. of false negative). (3) AR processes
MLE of AR processes
Forecasting
Charts on tests (critical region, level, p-value) Web-interface illustration Appendix
Stat. Tables
38 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Definition 3.3 (Different types of tests) CLT
I Two-tailed test: The region of rejection is on both sides of the Tests
sampling distribution (Fig. 66). Linear regression
Typically used if the sampling distribution is symmetrical (e.g.

Assumptions
OLS
N (0, 1)). Then, the critical region is Inference

Common pitfalls
Large sample
] ≠ Œ, ≠1
(–/2)] fi [ ≠1
(1 ≠ –/2), +Œ[. IV
GRM
I One-tailed test: The region of rejection is on one side of the Panels

sampling distribution (Fig. 69). Max.Lik.Estim.
Typically used if the sampling distribution has a support on R+ Binary
(e.g. ‰2 ). Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
39 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Example 3.2 (Factory example (1/2))
CLT
I Consider a factory that produces metal cylinders whose diameter Tests
has to be equal to 1cm. The tolerance is a = 0.01cm and more Linear regression
than 90% of the pieces have to satisfy the tolerance for the whole
Assumptions
OLS
production (say 1.000.000 pieces) to be bought by the client. Inference

Common pitfalls
I The production technology is such that a proportion ◊ of the Large sample
pieces does not satisfy the tolerance.

IV
GRM
I ◊ could be computed by measuring all the pieces but this would Panels
be costly. Max.Lik.Estim.
I Instead, it is decided that n << 1.000.000 pieces will be measured. Binary
I The null hypothesis H0 is ◊ < 10%. Time Series

AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
40 / 295
Statistics and
Econometrics II
Example 3.2 (Factory example (2/2)) J.-P. Renne
I Let us denote by di the binary indicator defined as: Prelim.
; CLT
0 if the size of the i th cylinder is in [1 ≠ a, 1 + a];
di = Tests
1 otherwise.
Linear regression
qn Assumptions
I We set xn = di , i.e. xn is the number of measured pieces OLS
i=1
that do not satisfy the tolerance (out of n).
Inference
Common pitfalls
x Large sample
I A decision rule is: accept H0 if n Æ b, reject otherwise. IV
n GRM
I In that case: Panels

I the test statistic is Sn = xnn and Max.Lik.Estim.
I the critical region is = [b, 1]. Binary
I The probability to reject H0 is: Time Series
AR processes
n
ÿ MLE of AR processes
P◊ (Sn œ )= Cni ◊i (1 n≠i

≠ ◊) . Forecasting
i=b◊n+1 Appendix
Stat. Tables
41 / 295
Statistics and
Econometrics II
Figure 6: Factory example
J.-P. Renne
Prelim.
Probability to reject H0, N=100
CLT
Tests
1.0
Linear regression
Assumptions
0.8
We would like We would like OLS

Inference
to accept Ho to reject Ho Common pitfalls
0.6
Large sample
Probability
Power of the test IV

GRM
0.4
Panels
Size of the test
Max.Lik.Estim.
0.2
b = 0.05 Binary
b = 0.10 Time Series
b = 0.20
0.0
AR processes
MLE of AR processes
Forecasting
0.0 0.1 0.2 0.3 0.4
Appendix
"true" θ
Stat. Tables
42 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
1.0
Linear regression
Assumptions
0.8

Inference
0.6
Large sample
Probability

GRM
0.4
Panels
Size of the test
Max.Lik.Estim.
0.2
b = 0.05 Binary
b = 0.20
0.0
AR processes
MLE of AR processes
Forecasting
0.0 0.1 0.2 0.3 0.4
Appendix
"true" θ
Stat. Tables
43 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
1.0
Linear regression
Assumptions
0.8

Inference
0.6
Large sample
Probability

GRM
0.4
Panels
Size of the test
Max.Lik.Estim.
0.2
b = 0.05 Binary
b = 0.20
0.0
AR processes
MLE of AR processes
Forecasting
0.0 0.1 0.2 0.3 0.4
Appendix
"true" θ
Stat. Tables
44 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Definition 3.4 (Asymptotic level)
Tests
An asymptotic test with critical region n has an asymptotic level equal Linear regression
to – if: Assumptions
sup lim P◊ (Sn œ n ) = –.

OLS
Inference
◊œ næŒ
Common pitfalls
Large sample
IV
Definition 3.5 (Asymptotically consistent test) GRM
Panels
An asymptotic test with critical region n is consistent if: Max.Lik.Estim.
c
’◊ œ , P◊ (Sn œ n ) æ 1. Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
45 / 295
Statistics and
Econometrics II
Back to the Factory example (Exmpl. 3.2): Asymptotic level J.-P. Renne
I Because Sn = d̄n , and since E(di ) = ◊ and Var (di ) = ◊(1 ≠ ◊), the Prelim.
CLT (Thm 9.6) leads to: CLT
1 2 Ô
1 n(Sn ≠ ◊) Tests
Sn ≥ N ◊, ◊(1 ≠ ◊) or  ≥ N (0, 1)
n ◊(1 ≠ ◊) Linear regression
Assumptions
1Ô 2 OLS
n(b≠◊) Inference
I Hence, P◊ (Sn œ n) = P◊ (Sn > b) ¥ 1 ≠ Ô . Common pitfalls
◊(1≠◊)
Large sample
1Ô 2 IV
n(b≠◊)
I ◊ æ1≠ Ô increases w.r.t. ◊, therefore: GRM
◊(1≠◊)
Panels
sup P◊ (Sn > bn ) = P◊=0.1 (Sn œ n) ¥ Max.Lik.Estim.

◊œ =[0,0.1]
Binary
3Ô 4
n(bn ≠ 0.1) Time Series
1≠ . AR processes
0.3 MLE of AR processes
Forecasting
Ô
∆ If we set bn = 0.1 + 0.3 ≠1 (1 ≠ –)/ n, then we have Appendix
sup◊œ =[0,0.1] P◊ (Sn > bn ) ¥ – for large values of n. Stat. Tables
46 / 295
Statistics and
Econometrics II
J.-P. Renne
Back to the Factory example (Exmpl. 3.2): Asymptotic consistency
I We proceed under the assumption that ◊ > 0.1 and we consider Prelim.
bn = b = 0.1. CLT
Tests
I We still have:
AÔ B Linear regression
n(b ≠ ◊) Assumptions
P◊ (Sn œ n) = P◊ (Sn > b) ¥ 1 ≠  . OLS

Inference
◊(1 ≠ ◊) Common pitfalls
Large sample
Ô
I Because Ôn(b≠◊)
IV
æ ≠Œ, we have GRM
◊(1≠◊) næŒ
Panels
AÔ B
n(b ≠ ◊) Max.Lik.Estim.
P◊ (Sn > b) ¥ 1 ≠  æ 1. Binary
◊(1 ≠ ◊) næŒ
¸ ˚˙ ˝ Time Series
æ 0 AR processes
næŒ MLE of AR processes
Forecasting
I Therefore, with bn = b = 0.1, the test is consistent. Appendix
Stat. Tables
47 / 295
Normality tests Statistics and
Econometrics II
J.-P. Renne
I Let f be the p.d.f. of Y . The k th standardized moment of Y is Prelim.
defined as: CLT

µk
Âk = 1 2k , Tests
Var (Y ) Linear regression
Assumptions
where E(Y ) = µ and OLS

Inference
⁄ Œ Common pitfalls
µk = E[(Y ≠ µ)k ] = (y ≠ µ)k f (y )dy Large sample

IV
≠Œ GRM
Panels
is the k th central moment of Y .
Max.Lik.Estim.
I In particular, µ2 = Var (Y ) = ‡ 2 , say. Therefore:
Binary
µk
Âk = 1 2k , Time Series
1/2 AR processes
µ2 MLE of AR processes
Forecasting
I The skewness corresponds to Â3 and the kurtosis to Â4 (Def. 9.1). Appendix
Stat. Tables
48 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Proposition 3.1 (Normal distribution) Tests
Linear regression
For a Gaussian var., the skewness (Â3 ) is 0 and the kurtosis (Â4 ) is 3. Assumptions
OLS
Inference
3 3
Proof For a centered Gaussian distribution, (≠y ) f (≠y ) = ≠y f (y ). This implies that
sŒ s0 sŒ sŒ sŒ Common pitfalls
y 3 f (y )dy = y 3 f (y )dy + y 3 f (y )dy = ≠ y 3 f (y )dy + y 3 f (y )dy = 0, Large sample

≠Œ ≠Œ 0 0 0 IV
which leads to the skewness result. GRM
Moreover, for a Gaussian distribution, df (y )/dy = ≠yf (y ) and therefore
d (y 3 f (y )) = 3y 2 f (y ) ≠ y 4 f (y ). Partial integration leads to the kurtosis result. ⌅ Panels
dy
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
49 / 295
Statistics and
Econometrics II
J.-P. Renne
I k th central sample moment of Y : Prelim.
CLT
n
1ÿ Tests
mk = (yi ≠ ȳ )k .
n Linear regression
i=1
Assumptions
OLS
I k th standardized sample moment of Y : Inference
Common pitfalls
mk Large sample
gk = k/2
. IV
m2 GRM
Panels
Max.Lik.Estim.
Proposition 3.2 (Consistency of central sample moments)
Binary
If the k th central moment of Y , exists, then the sample central moment Time Series
mk is a consistent estimate of the central moment µk . AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
50 / 295
Statistics and
Econometrics II
J.-P. Renne
Proposition 3.3 (Asymptotic distribution of g3 when Y ≥ N ) Prelim.

Ô d
If Y ≥ N (µ, ‡ 2 ), then ng3 æ N (0, 6). CLT
Tests
Proof See e.g. Lehmann (1999). Linear regression

Assumptions
OLS
Inference
Proposition 3.4 (Asymptotic distribution of g4 when Y ≥ N ) Common pitfalls

Large sample
Ô d IV
If Y ≥ N (µ, ‡ 2 ), then n(g4 ≠ 3) æ N (0, 24). GRM
Panels
Max.Lik.Estim.
Proposition 3.5 (Joint asymptotic distribution of g3 and g4 ) Binary
Ô Ô Time Series
Asymptotically, the vector ( ng3 , n(g4 ≠ 3)) is bivariate Gaussian. AR processes
Further its elements are uncorrelated and therefore independent.

MLE of AR processes
Forecasting
Appendix
Stat. Tables
51 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
The Jarque-Bera statistic is defined by: CLT
3 2
4 3 2
4 Tests
g32 (g4 ≠ 3) n (g4 ≠ 3)
JB = n + = g32 + . Linear regression
6 24 6 4 Assumptions
OLS
Inference
Common pitfalls
Proposition 3.6 (Jarque-Bera asympt. distri.) Large sample
IV
d 2
If Y is Gaussian, JB æ ‰ (2). GRM
Panels
Proof This directly derives from Proposition 3.5. ⌅ Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
52 / 295
Statistics and
Econometrics II
Figure 7: The distr. of the JB stat. in small-sample (Y ≥ N ).
J.-P. Renne
n=5 n=10
Prelim.
1.0
2.0
CLT
0.8
1.5
0.6
Density
Density
Tests
1.0
0.4
0.5
0.2
Linear regression
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10 Assumptions
all.res[, i] all.res[, i] OLS
n=20 n=50
Inference
0.5
Common pitfalls
0.6
0.4
Large sample
0.4
0.3
Density
Density
IV
0.2
GRM
0.2
0.1
Panels
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
all.res[, i] all.res[, i]
Max.Lik.Estim.
n=100 n=500
Binary
0.0 0.1 0.2 0.3 0.4 0.5
0.4
Time Series
0.3
Density
Density
AR processes
0.2
MLE of AR processes
0.1
Forecasting
0.0
0 2 4 6 8 10 0 2 4 6 8 10
Appendix
10.000 samples of size n are simulated (from a normal distribution). For each sample, the JB statistic is
Stat. Tables
computed. The red line shows the ‰2 (2) density. This figure illustrates the convergence of the JB
statistic to the ‰2 (2) distribution under H0 (the null hypothesis of normality).
53 / 295
Statistics and
Econometrics II
Figure 8: The distr. of the JB stat. in small-sample (Y ≥ U ).
J.-P. Renne
n=5 n=10
Prelim.
1.2
2.0
CLT
1.5
0.8
Density
Density
Tests
1.0
0.4
0.5
Linear regression
0.0
0.0
0 5 10 15 0 5 10 15 Assumptions
all.res[, i] all.res[, i] OLS
n=20 n=50
Inference
Common pitfalls
0.8
Large sample
0.6
0.4
Density
Density
IV
0.4
GRM
0.2
0.2
Panels
0.0
0.0
0 5 10 15 0 5 10 15
all.res[, i] all.res[, i]
Max.Lik.Estim.
n=100 n=500
Binary
0.15
0.3
Time Series
0.10
Density
Density
0.2
AR processes
MLE of AR processes
0.05
0.1
Forecasting
0.00
0.0
0 5 10 15 0 5 10 15
Appendix
10.000 samples of size n are simulated (from a uniform distribution). For each sample, the JB statistic is
Stat. Tables
computed. The red line shows the ‰2 (2) density. This figure illustrates the absence of convergence of
the JB statistic to the ‰2 (2) distribution under H1 .
54 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
Linear Regressions OLS

Inference
and Ordinary Least Squares (OLS)

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
55 / 295
Statistics and
Econometrics II
J.-P. Renne
Example 4.1 (Infant mortality rate and GDP/capita) Prelim.
CLT
Figure 9: Infant mortality rate (deaths per 1000 births) and GDP/capita Tests
Linear regression
● ● ●
● ●
● ●
Assumptions
5
● ●
●●●
● ●● ●●● ● ●
150
150
●● ● ●
●
● ●● ●
OLS
● ● ● ●
●
● ●
● ●●● ●● ●
● ● ●
● ● ● ●● ● ●
● ●● ●
Inference
● ● ● ●● ● ●
● ● ●● ● ● ●
● ●
4
● ● ●● ● ●●
log(Infant mortality rate)

●
● ● ● ●
● ● ●●●
● ●
Common pitfalls
●
Infant mortality rate
Infant mortality rate

● ●● ● ● ●●● ●
●
●
● ●● ●● ●●●
● ● ● ●
●
●● ● ● ● ● ●● ● ● ●
100
100
●
● ●● ● ● ●●
Large sample
● ●
●
●
● ● ● ● ●● ● ●●●●●
●
●
●
●
●● ● ● ● ●● ●● ●
●● ● ● ●●●
3
● ●●
IV
●● ● ●●● ● ● ● ● ● ● ● ● ●
●
●
●
●
●● ● ● ●● ● ● ●●
●●● ● ● ● ● ●
● ●
●
● ●●
●
●
●
● ●●●● ● ● ● ●
GRM
● ● ● ●●● ●
●● ●● ●
●●● ● ● ●
●●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ● ●● ●
●
● ● ● ● ●
●● ● ●
●●
50
50
2
Panels
● ●
● ● ●● ●●
●●●●● ● ●
●●● ● ● ● ●●●
●●
● ●● ●
● ● ● ●●
● ●●●
● ●
●●
● ● ●●●
●●●
●● ● ●● ●● ●●● ● ●
●● ●●
● ●
●● ● ● ●●●● ● ●
●● ●
●
● ●●●
●●●●● ●●● ●
● ●● ●● ● ● ● ●● ●●●●
Max.Lik.Estim.
●●
●● ●●
●
●●● ● ● ● ● ● ●
● ●● ●●●
● ●
●●●
●
●●● ● ● ● ● ● ●
●● ● ●●● ● ● ● ● ●
●● ●● ● ● ● ● ● ●● ● ● ●●
●●
1
●● ●●●
● ●● ● ● ●● ●●●●●●
● ●● ●● ●●●●
●● ●●● ● ●
●● ●
●
●● ● ●●
●● ●● ●● ● ● ● ●● ●
●
●
●●●
●
●●●●
●
●
● ●●●
●● ●● ●●
●●●
●
● ●●
● ●
● ●
0
0 10000 20000 30000 40000 4 5 6 7 8 9 10 4 5 6 7 8 9 10

Binary
GDP per capita log(GDP per capita) log(GDP per capita) Time Series
AR processes
Source of the data United Nations (1998) Social indicators. MLE of AR processes
Forecasting
Appendix
Stat. Tables
56 / 295
Statistics and
Example 4.2 (Income, Seniority and Education) Econometrics II
J.-P. Renne
Figure 10: Income, Seniority and Education

Prelim.
CLT
100
● ●
●
●
● ●
Tests
● ●
80
● ● ●
● ●
● ●
Linear regression
●
● ●
Income
60
●
● Assumptions
OLS
● ●
●
40
● ● ●
Inference
●
● ● ● ●
● ●
●
●
● ● Common pitfalls
20
● ● ● ● ●
●●
Large sample
● ●
●
Incom
10 12 14 16 18 20 22 ●
IV
●
Education
GRM
e
● ●
● ● ●
●
Panels
●
100
rity
● ● ●
●
● ●
nio
●
● ● ●
Max.Lik.Estim.
●
Se
Ed
uca
80
● ● ● tion
● ●
● ●
●
Binary
● ●
Income
60
●
●
●
●
Time Series
40
●
●
AR processes
●
●
● ● MLE of AR processes
20
●
●
Forecasting
50 100 150
Seniority Appendix
Source of the data An Introduction to Statistical Learning, with applications in R, (Springer, 2013) Stat. Tables
with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani.
57 / 295
Statistics and
Econometrics II
J.-P. Renne
Example 4.3 (Advertising and Sales)
Prelim.
Figure 11: Advertising and Sales CLT
Tests
Linear regression
● ● ●
● ● ●
● ● ●● ●
●● ● ● ● ● ●
25
25
25
● ● ●
● ●
Assumptions
● ● ●
●
●● ● ● ●
● ● ● ●
● ● ●
●● ● ● ●● ● ● ● ●
OLS
●● ● ●● ● ●●
● ● ● ● ●
●
● ● ● ●●
●
● ● ●
● ● ● ● ●● ● ● ●● ●
Inference
20
20
20
● ● ● ●● ● ●● ●
● ●● ●
●
● ●●
● ● ●● ● ● ●
● ● ●● ●● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ●
Common pitfalls
● ● ● ●● ●
● ● ● ● ●
●
● ● ●
● ●● ●● ●● ● ●● ●
● ● ● ● ● ● ● ● ●
●●
● ● ●● ● ● ●●
● ●● ●● ● ● ● ●
● ●
● ● ●● ● ●● ● ●● ●
● ●● ●
●● ● ●
● ● ● ● ● ●●
●●
●
●●
● ● Large sample
Sales
Sales
Sales
15
15
15
● ● ● ● ● ● ● ●
●● ● ●●●● ● ● ●● ●
● ●●● ● ●● ●
● ●
● ●● ● ●
● ●●●
● ● ●●
● ●
● ●
●
● ●●
●
●●● ●●
● ● ●
●
●●
● ●
● ● ●
●●● ●
●
● ● ●
IV
● ●
● ● ● ● ●
●● ●● ● ● ● ●● ● ●● ● ● ● ● ●
●● ● ● ● ● ● ●●
●● ● ●● ●
GRM
● ● ●●
●● ●●
●
● ●● ● ●
● ●
●● ●
● ● ● ● ●
● ● ●● ● ● ●●
●●● ●
●● ● ● ● ●
● ●● ●●●
●●
● ●● ●
● ● ●●●●●●● ●
●● ● ●
●
●
● ● ● ● ● ●● ●● ●
● ● ● ● ● ●● ● ●●
●●
●●● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ●● ●
10
10
10
●
● ●●
● ●● ● ●
●●●
● ● ●● ● ● ● ●
●●● ●●
● ● ● ●
Panels
● ● ● ●
● ●● ●● ●● ● ● ●
●● ●● ● ● ● ●●
●●● ●● ● ● ●
● ● ● ● ● ●● ●
●● ● ● ● ●●● ● ●
● ●
●● ●
●● ●● ● ● ●●
● ● ● ● ● ●● ●● ●
● ●● ● ● ● ● ●●
● ●
Max.Lik.Estim.
●●●
●
● ● ●● ● ●
● ●
●● ● ● ● ●
5
5
● ● ●
● ● ●
● ● ● Binary
0 50 100 150 200 250 300 0 10 20 30 40 50 0 20 40 60 80 100 Time Series
TV Radio Newspaper AR processes
MLE of AR processes
Source of the data An Introduction to Statistical Learning, with applications in R, (Springer, 2013) Forecasting
with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani. Appendix
Stat. Tables
58 / 295
Statistics and
Econometrics II
yi = xi,1 ◊—1 + · · · + xi,K ◊—K + Ái (4)
¸˚˙˝ ¸˚˙˝ ¸˚˙˝ ¸˚˙˝ J.-P. Renne
regressand th disturbance
first k
Prelim.
regressor regressor
CLT
Tests
I Using the notation xi = [xi,1 , . . . , xi,K ]Õ , we get:
Linear regression
Assumptions
yi = xiÕ — + Ái . OLS
scalar (K ◊1) scalar Inference
Common pitfalls
I — is the vector of unknown parameters. Large sample

IV
I Regressors are also called independent variables, explicative variables, or GRM
covariates. Panels
I We have a sample of n observations for yi and for xi . Max.Lik.Estim.
Binary
Remark 4.1
Time Series
I Important to distinguish between population quantities (— and Ái ) and AR processes
their sample estimates (denoted by b and ei ). MLE of AR processes

Forecasting
I In general we will have:
Appendix
Stat. Tables
yi = xiÕ — + Ái = xiÕ b + ei but — ”= b and Ái ”= ei .
59 / 295
Statistics and
Econometrics II
J.-P. Renne
Context and Objective Prelim.
I Context: Eq. (4). CLT
I Objective: Estimate — and test hypotheses based on this vector. Tests
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Notations
Large sample
IV
We will use the notations y, Á and X, where:

GRM
Panels
S y T S Á T S xÕ T Max.Lik.Estim.
1 1 1
W y2 X W Á2 X W x2Õ X Binary
y = U . V, Á = U . V and X = U . V.
(n◊1)
. . (n◊1) . . (n◊K ) . . Time Series
yn Án xnÕ AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
60 / 295
Assumptions of the classical linear regression model Statistics and
Econometrics II
Assumption A.1 (Full rank) J.-P. Renne
There is no exact linear relationship among the independent variables (the xi s). Prelim.
CLT
Example of exact linear relationship among the xi s Tests
xi,1 = 3xi,2 ≠ xi,5 . Linear regression

Assumptions
OLS
Inference
Assumption A.2 (The conditional mean-zero assumption) Common pitfalls

Large sample
IV
GRM
E(Á|X) = 0.
Panels
Max.Lik.Estim.
Implications
Binary
Under Assumption A.2:
Time Series
i) E(Ái ) = 0; AR processes
ii) the xij s and the Ái s are uncorrelated, i.e. ’i, j Corr (xij , Ái ) = 0. MLE of AR processes
Forecasting
Proof i) By the law of iterated expectations (Prop. 1.2): Appendix
E(Á) = E(E(Á|X)) = E(0) = 0. Stat. Tables

ii) E(xij Ái ) = E(E(xij Ái |X)) = E(xij E(Ái |X)) = 0. ⌅
¸ ˚˙ ˝
=0
61 / 295
Statistics and
Econometrics II
Assumption A.3 (Homoskedasticity) J.-P. Renne
Prelim.
’i, Var (Ái |X) = ‡ 2 .
CLT
Tests
Linear regression
Figure 12: Heteroskedasticity (left) and homoskedasticity (right) Assumptions
OLS
Inference
6
● ●
●● ●
●
●●●
● ●
● Common pitfalls
6
● ●
● ●
●
● ● ●
●●● ●
4
●● ●
●●● ● ●
● ● Large sample
●● ● ●●
●●●
●● ●
●●● ●●● ● ● ● ●●
●● ●●● ●● IV
4
● ● ● ●●● ● ●
●●
●●
2
● ●●
● ● ●● ● ●
GRM
● ●●
●●
●● ● ● ●●● ● ●●
● ●
●● ● ● ●●
●● ●
Y
Y
● ●
2
●● ●● ●
● ● ●● ●●● ●●●●
0
Panels
● ● ● ●
● ●●
● ● ● ●
● ● ● ●● ● ●
●● ● ● ●
●
● ●● ● ●●
● ● ●
−2
0
●● ● ●
Max.Lik.Estim.
● ● ●
● ● ●
● ●● ●●
● ●
● ●
● ● ● ●
● ●
−4
−2
Binary
●
●
●●
●● ●
−2 −1 0 1 −2 −1 0 1 Time Series
AR processes
X X
MLE of AR processes
Xi ≥ N (0, 1) and Áú
i ≥ N (0, 1). ! " Forecasting
Left-hand side (heteroskedasticity): Yi = 2 + 2Xi + 21{X <0} + 0.21{X Ø0} i .
Áú
i i Appendix
Right-hand side (homoskedasticity): Yi = 2 + 2Xi + Áú
i .
Stat. Tables
62 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 13: Salaries versus years since end of PhD
Prelim.
● CLT
Tests
200000
● ●
● ● ● Linear regression
● ●
● ● Assumptions
● ●
●
● ● ● ●
● ● OLS
● ● ●
● ●● ● ● ●
● ●● ● Inference
150000
●
●● ●
●● ● ●●● ●
Salary
● ● ● ● ● ● ● ●● ● Common pitfalls
● ● ● ● ● ● ● ● ●
●● ●● ●
● ● ●● ● ● ● ●
●●●● ● Large sample
● ●
●●● ● ●●
● ● ●●● ●
●●●
● ● ● ● ●●● ●● ●● IV
● ●●● ● ● ●
● ● ●
● ● ● ●● ●● ● ●
●●●●
●
●●●● ● ●
● GRM
●● ● ● ●
● ● ●●●
● ● ●● ● ● ●● ●● ● ●●● ● ●
100000
● ● ●●●●●● ● ● ●● ● ●
Panels
●● ●●●●●● ●● ● ● ● ●
● ● ●
●●●● ● ● ●● ●● ● ● ● ● ●●● ●
●
●●●
● ●● ●●●● ●● ● ●
● ●●● ● ● ● ● ●● ● ● ●
● ●
● ● ● ● ● ● ●
●●● ● ● ● ●●●●● ●
Max.Lik.Estim.
●●
●● ●●● ● ● ● ● ● ●●●
●●●●● ● ● ●
●
●● ● ●●
● ●● ●● ● ● ● ● ●
●● ● ●
●● ● ● ●● ● ● ●
● ●● ●●●● ● ●
Binary
●●●● ● ● ● ●
●● ● ● ●
● ● ●
●
Time Series
AR processes
0 10 20 30 40 50
MLE of AR processes
Forecasting
years since PhD
Appendix
Data from Fox J. and Weisberg, S. (2011).
Stat. Tables
63 / 295
Statistics and
Econometrics II
Assumption A.4 (Non-correlated residuals)
J.-P. Renne
’i ”= j, Cov (Ái , Áj |X) = 0.
Prelim.
CLT
Remark 4.2 (Implication of A.3 and A.4)
Tests
If A.3 and A.4 hold, then: Linear regression
Assumptions
2
Var (Á|X) = ‡ Id, OLS
Inference
Common pitfalls
where Id is the n ◊ n identity matrix. Large sample

IV
GRM
Panels
Assumption A.5 (Normal distribution) Max.Lik.Estim.
Binary
’i, Ái ≥ N (0, ‡ 2 ).
Time Series
AR processes
About normality MLE of AR processes

Forecasting
As we will see later (Slide 108), even if this assumption is not satisfied, the Appendix
normal distribution will show up in asymptotic results (by virtue of the Central
Stat. Tables
Limit Theorem 9.6).
64 / 295
The least square coefficient vector Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
I For a given vector of coefficients b, the sum of square residuals is: Linear regression
Assumptions
n
A K
B2 n OLS
ÿ ÿ ÿ Inference
f (b) = yi ≠ xi,j bj = (yi ≠ xiÕ b)2 Common pitfalls
Large sample
i=1 j=1 i=1 IV
GRM
I Minimizing the sum of squared residuals amounts to minimizing: Panels
Max.Lik.Estim.
f (b) = (y ≠ Xb)Õ (y ≠ Xb).
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
65 / 295
Statistics and
Econometrics II
J.-P. Renne
I We have (see Def. 9.14 for the computation of ˆf

(b)): Prelim.
ˆb
CLT
ˆf
(b) = ≠2XÕ y + 2XÕ Xb. Tests
ˆb
Linear regression
I Necessary first-order condition (FOC): Assumptions
OLS
Inference
XÕ Xb = XÕ y. (5)
Common pitfalls
Large sample
IV
GRM
I Under Assumption A.1, XÕ X is invertible. Hence:
Panels
Max.Lik.Estim.
b = (XÕ X)≠1 XÕ y.
Binary
I b minimises the sum of squared residuals. (f is a non-negative quadratic Time Series

AR processes
function, it admits a minimum.) MLE of AR processes
Forecasting
Appendix
Stat. Tables
66 / 295
Statistics and
Example 4.4 (Income, Seniority and Education (Example 4.2)) Econometrics II
J.-P. Renne
Figure 14: Income, Seniority and Education
Prelim.
CLT
Tests
●
●
● ● ●
●
● ●
● ●
●
● Linear regression
● ● ● ●●
Incom ● Assumptions
●
OLS
●
Inference
e
● ● ●
● ●
ity
● ●
Large sample
r
●
nio
Ed IV
Se
uc atio
n GRM
Panels
Max.Lik.Estim.
Binary
Estimate Std. Error t value Pr(>|t|) Time Series
(Intercept) -50.086 5.999 -8.349 0.000 AR processes
Education 5.896 0.357 16.513 0.000 MLE of AR processes
Seniority 0.173 0.024 7.079 0.000

Forecasting
Appendix
Table 1: Income, Education and Seniority Stat. Tables
67 / 295
Statistics and
Econometrics II
I The estimated residuals are: J.-P. Renne
e = y ≠ X(XÕ X)≠1 XÕ y = My Prelim.
CLT
where M := I ≠ X(XÕ X)≠1 XÕ is called the residual maker matrix. Tests
I We have MX = 0: if one regresses one of the explanatory variables on
Linear regression
X, the residuals are null. Assumptions
I We have My = MÁ (because y = X— + Á and MX = 0). OLS

Inference
I The fitted values are: Common pitfalls
Large sample
IV
ŷ = X(XÕ X)≠1 XÕ y = Py, GRM
Panels
where P = X(XÕ X)≠1 XÕ is a projection matrix. (ŷ is the projection of
Max.Lik.Estim.
the vector y onto the vectorial space spanned by the columns of X.)
Binary
Time Series
Remark 4.3 AR processes
I It can be shown that each column x̃k of X is orthogonal to e. MLE of AR processes

Forecasting
∆ If intercepts are included in the regression (xi,1 © 1), the average of the Appendix
residuals is null.
Stat. Tables
68 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Properties of M and P Linear regression
I M is symmetric (M = MÕ ) and idempotent (M = M2 = Mk for k > 0). Assumptions
OLS
I P is symmetric and idempotent. Inference

Common pitfalls
I PX = X. Large sample
IV
I PM = MP = 0. GRM
I y = Py + My (two orthogonal parts). Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
69 / 295
Statistics and
Econometrics II
Proposition 4.1 (Properties of the OLS estimator) J.-P. Renne
(i) Under Assumptions A.1 and A.2, the OLS estimator is linear and Prelim.
unbiased. CLT
(ii) Under Assumptions A.1 to A.4, the conditional covariance matrix of b Tests
is: Var (b|X) = ‡ 2 (XÕ X)≠1 . Linear regression
Assumptions
OLS
Proof Under A.1, XÕ X can be inverted. We have: Inference
Common pitfalls
Õ ≠1 Õ Õ ≠1 Õ
b = (X X) X y = — + (X X) X Á. Large sample
IV
GRM
Panels
(i) Let us consider the expectation of the last term, i.e. E((XÕ X)≠1 XÕ Á). Using the law of iterated
expectations (Prop. 1.2), we obtain: Max.Lik.Estim.
Õ ≠1 Õ Õ ≠1 Õ Õ ≠1 Õ Binary
E((X X) X Á) = E(E[(X X) X Á|X]) = E((X X) X E[Á|X]).
Time Series
AR processes
By A.2, we have E[Á|X] = 0. Hence E((XÕ X)≠1 XÕ Á) = 0 and result (i) follows.
MLE of AR processes
(ii) Var (b|X) = (X X) X E(ÁÁ |X)X(X X) . By Remark 4.2, if A.3 and A.4 hold, then we
Õ ≠1 Õ Õ Õ ≠1
Forecasting
have E(ÁÁÕ |X) = Var (Á|X) = ‡ 2 Id. The result follows. ⌅

Appendix
Stat. Tables
70 / 295
Statistics and
Example 4.5 (Bivariate case (K = 2)) Econometrics II
I If K = 2 and if the regression includes a constant (xi,1 = 1), X is a n J.-P. Renne
matrix whose i th row is [1, xi,2 ]. Let us use the notation wi = xi,2 :
Prelim.
5 q 6 CLT
Õ
X X = qn q i w2i ,
wi wi Tests
i i
5 q q 6 Linear regression
2
Õ
(X X)
≠1
= q 1
q qi wi ≠
i
wi
, Assumptions
n w2 ≠( wi )2 ≠ wi n OLS
i i i i
Inference
5 q q q q 6
wi2
Common pitfalls
1 yi ≠ wi wi yi
Õ
(X X)
≠1 Õ
X y = q q iq iq i q i Large sample
n w2 ≠( wi )2 ≠ wi yi + n wi yi IV
i i i i i i
5 ȳ
q 2 w̄ q 6 GRM
= q 1 n
1
qi wi ≠ n i wi yi . Panels
1 (wi ≠ w̄ )2 n
(wi ≠ w̄ )(yi ≠ ȳ )
n i i Max.Lik.Estim.
Binary
I It can be seen that the second element of b = (XÕ X)≠1 XÕ y is:
Time Series
AR processes
Cov (W , Y ) MLE of AR processes

b2 = , Forecasting
Var (W )
Appendix
Stat. Tables
where Cov (W , Y ) and Var (W ) are sample estimates.
I Since there is a constant in the regression, we have b1 = ȳ ≠ b2 w̄ .
71 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Theorem 4.1 (Gauss-Markov Theorem)
CLT
Under Assumptions A.1 to A.4, for any vector w , the minimum-variance linear Tests
unbiased estimator of w Õ — is w Õ b, where b is the least squares estimator. Linear regression
(BLUE: Best Linear Unbiased Estimator.) Assumptions
OLS
Proof.: Consider bú = C y, another linear unbiased estimator of —. Since it is unbiased, we must have Inference
E(C y|X) = E(C X— + C Á|X) = —. We have E(C Á|X) = C E(Á|X) = 0 (by A.2). Therefore bú is Common pitfalls
unbiased if E(C X)— = —. This has to be the case for any —, which implies that we must have C X = I. Large sample
Let us compute Var (bú |X). For this, we introduce D = C ≠ (XÕ X)≠1 XÕ , which is such that IV
Dy = bú ≠ b. The fact that C X = I implies that DX = 0. GRM
We have Var (bú |X) = Var (C y|X) = Var (C Á|X) = ‡ 2 CC Õ (by Assumptions A.3 and A.4, see
Remark 4.2). Using C = D + (XÕ X)≠1 XÕ and exploiting the fact that DX = 0 leads to: Panels
# $ Max.Lik.Estim.
ú 2 Õ ≠1 Õ Õ ≠1 Õ Õ 2 Õ
Var (b |X) = ‡ (D + (X X) X )(D + (X X) X ) = Var (b|X) + ‡ DD . Binary
Time Series
Therefore, we have Var (w Õ bú |X) = w Õ Var (b|X)w + ‡ 2 w Õ DDÕ w Ø w Õ Var (b|X)w = Var (w Õ b|X). ⌅ AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
72 / 295
Statistics and
Econometrics II
J.-P. Renne
Remark 4.4 Prelim.
Prop. 4.1, Theorems 4.1 and 4.2 (next slide) hold for any sample size. CLT
Tests
Notations 4.1 Linear regression

Assumptions
Consider the linear least square regression of y on X. We introduce the

OLS
Inference
notations: Common pitfalls
I by/X : OLS estimates of —, Large sample

IV
I MX : residual-maker matrix of any regression on X (see Slide 68), GRM
I PX : projection matrix of any regression on X (see Slide 68). Panels
Max.Lik.Estim.
I Consider the case where we have two sets of explanatory variables: Binary
X = [X1 , X2 ]. Time Series
I With obvious notations: by/X = [bÕ1 , bÕ2 ]Õ . AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
73 / 295
Statistics and
Econometrics II
Theorem 4.2 (Frisch-Waugh Theorem)
J.-P. Renne
We have: X1
y/MX1 X2 Prelim.
b2 = bM .
CLT
Proof The minimization of the least squares leads to (these are first-order conditions, see Eq. 5): Tests
Ë ÈË È Ë È Linear regression
XÕ1 X1 XÕ1 X2 b1 XÕ1 y Assumptions
= .
XÕ2 X1 XÕ2 X2 b2 XÕ2 y OLS
Inference
Common pitfalls
Use the first-row block of equations to solve for b1 first; it comes as a function of b2 . Then use the Large sample
second set of equations to solve for b2 , which leads to: IV
GRM
Õ X ≠1 Õ X
[X2 M 1 X2 ] X2 M 1 y.
Õ Õ Õ Õ ≠1 Õ Õ Õ
b2 = [X2 X2 ≠ X2 X1 (X1 X1 )X1 X2 ] X2 (Id ≠ X1 (X1 X1 )X1 )y = Panels
Max.Lik.Estim.
Using the fact that MX1 is idempotent and symmetric leads to the result. ⌅
Binary
Time Series
MLE of AR processes
This suggests a second way of estimating b2 : Forecasting
1. Regress Y on X1 , regress X2 on X1 . Appendix
2. Regress the former residuals on the latter. Stat. Tables
74 / 295
Statistics and
Econometrics II
J.-P. Renne
200
Prelim.
150
CLT
Tests
100
Linear regression
Assumptions
50
OLS
Inference
Common pitfalls
2005 2006 2007 2008 2009 2010 2011 Large sample
IV
GRM
Figure 15: Google searches for “parapluie” (red) versus precipitations (black) Panels
Max.Lik.Estim.
Binary
Estimate Std. Error t value Pr(>|t|) Time Series

(Intercept) 5.539 2.055 2.695 0.009 AR processes
precip 0.160 0.031 5.150 0.000

MLE of AR processes
Forecasting
Table 2: Regression of Google searches on precipitations Appendix
Stat. Tables
75 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.762 1.954 11.138 0.000 CLT
monthly.dummy1 -9.011 2.763 -3.261 0.002 Tests

monthly.dummy2 -13.337 2.763 -4.827 0.000 Linear regression
monthly.dummy3 -8.188 2.763 -2.963 0.004 Assumptions
monthly.dummy4 -5.590 2.763 -2.023 0.048 OLS
monthly.dummy5 -3.457 2.763 -1.251 0.216 Inference
monthly.dummy6 -5.425 2.763 -1.964 0.054

Common pitfalls
Large sample
monthly.dummy7 -9.893 2.763 -3.581 0.001 IV
monthly.dummy8 -9.378 2.763 -3.394 0.001 GRM
monthly.dummy9 -6.154 2.763 -2.227 0.030 Panels

monthly.dummy10 -5.906 2.763 -2.137 0.037
Max.Lik.Estim.
monthly.dummy11 2.793 2.763 1.011 0.316
Binary
Table 3: Deseasonalization regression of Google searches Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
76 / 295
Statistics and
Estimate Std. Error t value Pr(>|t|) Econometrics II
(Intercept) 0.000 0.465 0.000 1.000 J.-P. Renne
deseas.precip 0.108 0.025 4.271 0.000
Prelim.
Table 4: Regression of deseas. Google searches on deseas. precipitations
CLT
Tests
Linear regression
Estimate Std. Error t value Pr(>|t|) Assumptions
(Intercept) 14.233 2.601 5.472 0.000 OLS

Inference
precip 0.108 0.027 3.921 0.000 Common pitfalls
monthly.dummy1 -7.282 2.521 -2.889 0.005 Large sample
monthly.dummy2 -11.507 2.525 -4.557 0.000 IV

GRM
monthly.dummy3 -7.429 2.489 -2.984 0.004
monthly.dummy4 -4.346 2.502 -1.737 0.088 Panels
monthly.dummy5 -4.118 2.487 -1.656 0.103 Max.Lik.Estim.

monthly.dummy6 -4.247 2.500 -1.699 0.095 Binary
monthly.dummy7 -8.455 2.509 -3.370 0.001
monthly.dummy8 -8.553 2.491 -3.434 0.001 Time Series
monthly.dummy9 -5.374 2.490 -2.159 0.035

AR processes
MLE of AR processes
monthly.dummy10 -5.580 2.483 -2.247 0.028 Forecasting
monthly.dummy11 2.036 2.489 0.818 0.417 Appendix
Table 5: Regression of Google searches on [precipitations + Dummies] Stat. Tables
77 / 295
Partial regression coefficients Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Special case: x2 is scalar Assumptions

OLS
I When b2 is scalar (and then X2 is of dimension n ◊ 1), Theorem 4.2 Inference

Common pitfalls
leads to: Large sample
IV
XÕ2 M X1 y GRM
b2 = (partial regression coefficient). Panels

XÕ2 M X1 X2
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
78 / 295
Goodness of fit Statistics and
Econometrics II
J.-P. Renne
I Total variation in y = sum of squared deviations:
Prelim.
n
ÿ CLT
TSS = (yi ≠ ȳ )2 .
Tests
i=1
Linear regression
I We have: Assumptions
y = Xb + e = ŷ + e
OLS
Inference
Common pitfalls
I In the following, we assume that the regression includes a constant (i.e. Large sample
for all i, xi,1 = 1). IV
GRM
I Let us denote by M0 the matrix that transforms observations into
Panels
deviations from sample means. Using that M0 e = e and that XÕ e = 0,
we have: Max.Lik.Estim.
Binary
yÕ M0 y = (Xb + e)Õ M0 (Xb + e)
Total sum of sq. AR processes
MLE of AR processes
0
= b X M Xb
Õ Õ
+ ee
Õ Forecasting
¸ ˚˙ ˝ ¸˚˙˝
Appendix
"Explained" sum of sq. Sum of sq. residuals
Stat. Tables
TSS = Expl.SS + SSR.
79 / 295
Coefficient of determination R 2 Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Expl.SS SSR eÕ e
Coefficient of determination = =1≠ =1≠ Õ 0 . (6) Linear regression
TSS TSS yM y Assumptions
OLS
Inference
Common pitfalls
Remark 4.6 Large sample
IV
I It can be shown [Greene, 2012, Section 3.5] that: GRM
qn Panels
[ (yi ≠ ȳ )(yî ≠ ȳ )]2
Coefficient of determination = qn i=1 qn . Max.Lik.Estim.
i=1
(yi ≠ ȳ )
2
i=1
(yî ≠ ȳ )2 Binary
Time Series
∆ R 2 is the sample squared correlation between y and the AR processes
(regression-implied) y ’s predictions. MLE of AR processes

Forecasting
Appendix
Stat. Tables
80 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 16: Bivariate regressions with low/high R 2
Prelim.
CLT
Low R2 High R2
Tests
Linear regression
● ●
●● ●
3
4
● ●
● ● ● Assumptions
● ●
●
● ● ● ●
● OLS
● ●●●
●● ● ●●
3
Inference
● ●
● ●
● ● ●
2
● ● ●● ●●
●
● ●
Common pitfalls
● ●●●
● ● ●● ● ● ●●
● ●
●● ● ●●
●
●
● ●●
2
Large sample
●●
● ● ● ● ●●●●
● ● ●● ● ●●
●●●●
● ● ●
IV
● ● ●
1
●● ●● ● ●●
● ●
●●
Y
Y
● ● ● ● ●
GRM
1
● ● ● ● ●●●●●●
● ●● ● ●●●●
●
●● ●
●
● ● ● ● ●
Panels
●● ●
●●
● ●
●● ● ●● ● ●●
●
0
●
●
0
● ●
● ●●
● ●● ●
Max.Lik.Estim.
● ●
● ● ● ● ● ●
● ●
−1
● ● ●
−1
● ● ●
●
●
● ●
● Binary
●
−2
● ● Time Series
−2
AR processes
−3 −2 −1 0 1 2 −3 −2 −1 0 1 2
MLE of AR processes
X X Forecasting
Appendix
Stat. Tables
81 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Definition 4.1 (Partial correlation coefficient) Tests
The partial correlation between y and z, controlling for some variables X is Linear regression
the sample correlation between y ú and z ú , where the latter two variables are
Assumptions
OLS
the residuals in regressions of y on X and of z on X, respectively. Inference
Common pitfalls
Large sample
The correlation is denoted by X.
ryz By definition, we have: IV
GRM
zúÕ yú Panels
X
ryz =  . (7)
Max.Lik.Estim.
(zúÕ zú )(yúÕ yú )
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
82 / 295
Statistics and
Econometrics II
Proposition 4.2 (Change in SSR (see Slide 79) when a variable is added) J.-P. Renne
We have: Prelim.
uÕ u = eÕ e ≠ c 2 (zúÕ zú ) (Æ eÕ e) (8) CLT
where (i) u and e are the residuals in the regressions of y on [X, z] and of y on Tests
X, respectively, (ii) c is the regression coefficient on z in the former regression Linear regression
and where zú are the residuals in the regression of z on X. Assumptions
OLS
Proof.: The OLS estimates [dÕ , c]Õ in the regression of y on [X, z] satisfies (first-order cond., Eq. 5) Inference
Common pitfalls
Ë ÈË È Ë È Large sample
XÕ X XÕ z d XÕ y
= . IV
zÕ X zÕ z c zÕ y
GRM
Panels
Hence, in particular d = b ≠ (XÕ X)≠1 XÕ zc, where b is the OLS of y on X. Substituting in
u = y ≠ Xd ≠ zc, we get u = e ≠ zú c. We therefore have: Max.Lik.Estim.
Binary
Õ ú ú Õ 2 úÕ ú úÕ
u u = (e ≠ z c)(e ≠ z c) = e e + c (z z ) ≠ 2cz e. (9)
Time Series
AR processes
MLE of AR processes
Now zú Õ e = zú Õ (y ≠ Xb) = zú Õ y because zú are the residuals in an OLS regression on X. Since
Forecasting
c = (zú Õ zú )≠1 zú Õ yú (by an application of Theorem 4.2), we have (zú Õ zú )c = zú Õ yú and, therefore, Appendix
zú Õ e = (zú Õ zú )c. Inserting this in Eq. (9) leads to the results. ⌅ Stat. Tables
83 / 295
Statistics and
Proposition 4.3 (Change in R 2 when a variable is added) Econometrics II
J.-P. Renne
Denoting by RW2 the coefficient of determination in the regression of y on
some variable W, we have: Prelim.
CLT
2
RX,z = RX2 + (1 ≠ RX2 )(ryz
X 2
) , Tests
Linear regression
X is the coefficient of partial correlation (Def. 4.1)
where ryz Assumptions
OLS
Inference
Proof.: Let’s use the same notations as in Prop. 4.2. Theorem 4.2 implies that c = (z z ) z
ú Õ ú ≠1 ú Õ ú
y .
Common pitfalls
Using this in Eq. (8) gives uÕ u = eÕ e ≠ (zú Õ yú )2 /(zú Õ zú ). Using the definition of the partial
! " Large sample
X 2
correlation (Eq. 7), we get u u = e e 1 ≠
Õ Õ
(ryz ) . The results is obtained by dividing both sides of IV
GRM
the previous equation by yÕ M0 y. ⌅
Panels
Max.Lik.Estim.
I The previous theorem shows that we necessarily increase the R 2 if we
add variables, even if they are irrelevant. Binary
I The adjusted R 2 , denoted by R̄ 2 , is a fit measure that penalizes large Time Series
numbers of regressors: AR processes
MLE of AR processes
Forecasting
2 eÕ e/(n ≠ K ) n≠1 Appendix

R̄ = 1 ≠ Õ 0 =1≠ (1 ≠ R 2 ).
y M y/(n ≠ 1) n≠K Stat. Tables
84 / 295
Inference and Prediction Statistics and
Econometrics II
J.-P. Renne
I Under the normality assumption (Assumption A.5), we know the
Prelim.
distribution of b (conditional on X).
CLT
I Indeed, (b|X) © (XÕ X)≠1 XÕ y is multivariate Gaussian:
Tests
2
b|X ≥ N (—, ‡ (X X) Õ ≠1
). (10) Linear regression
Assumptions
I Problem: In practice, we do not know ‡ 2 (population parameter). OLS

Inference
Common pitfalls
Large sample
2
Proposition 4.4 (Conditional expectation of s ) IV
GRM
Under A.1 to A.4, an unbiased estimate of ‡2 is given by: Panels
Max.Lik.Estim.
eÕ e
s2 = . (11) Binary
n≠K
Time Series
AR processes
(It is sometimes denoted by 2
‡OLS .) MLE of AR processes
Forecasting
2
Proof.: E(e e|X) = E(Á MÁ|X) = E(Tr(Á MÁ)|X)) = Tr(ME(ÁÁ |X)) = ‡ Tr(M). (Note that we
Õ Õ Õ Õ
Appendix
have E(ÁÁÕ |X) = ‡ 2 Id by Assumptions A.3 and A.4, see Remark 4.2.) Finally:
Stat. Tables
Tr(M) = n ≠ Tr(X(XÕ X)≠1 XÕ ) = n ≠ Tr((XÕ X)≠1 XÕ X) = n ≠ Tr(IdK ◊K ) (see also Prop. 9.2). ⌅
85 / 295
Statistics and
Econometrics II
I Two results will prove important to perform hypothesis testing: J.-P. Renne
(i) We know the distribution of s 2 (Prop. 4.5). Prelim.

(ii) s 2 and b are independent random variables (Prop. 4.6).
CLT
Tests
Proposition 4.5 (Distribution of s 2 )
Linear regression
s2 Assumptions
Under A.1 to A.5, we have: 2 |X ≥ ‰2 (n ≠ K )/(n ≠ K ). OLS

‡ Inference
Common pitfalls
Large sample
Proof I We have e e = Á MÁ. M is an idempotent symmetric matrix. Therefore it can be
Õ Õ
IV
decomposed as PDP Õ where D is a diagonal matrix and P is an orthogonal matrix. As a result GRM
eÕ e = (P Õ Á)Õ D(P Õ Á), i.e. eÕ e is a weighted sum of independent squared Gaussian variables (the
entries of P Õ Á are independent because they are Gaussian – under A.5 – and uncorrelated). The Panels
variance of each of these i.i.d. Gaussian variable is ‡ 2 .
Max.Lik.Estim.
I Because M is an idempotent symmetric matrix, its eigenvalues are either 0 or 1 (Prop. 9.1) and
its rank equals its trace (Prop. 9.2). Further, its trace is equal to n ≠ K (see proof of Eq. (11)). Binary
Therefore D has n ≠ K entries equal to 1 and K equal to 0.
Time Series
I Hence, eÕ e = (P Õ Á)Õ D(P Õ Á) is a sum of n ≠ K squared independent Gaussian variables of AR processes
variance ‡ 2 . MLE of AR processes
2
I Therefore eÕ e = (n ≠ K) s 2 is a sum of n ≠ k squared i.i.d. standard normal variables. ⌅ Forecasting
‡2 ‡
Appendix
Stat. Tables
86 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Proposition 4.6 (Independence of s 2 and b)
Linear regression
Under A.1 to A.5, b and s 2 are independent. Assumptions
OLS
Inference
Common pitfalls
Proof We have b = — + [XÕ X]≠1 XÁ and s 2 = ÁÕ MÁ/(n ≠ K ). Hence b is an affine combination of Á Large sample
and s 2 is a quadratic combination of the same Gaussian shocks. One can write s 2 as IV
s 2 = (MÁ)Õ MÁ/(n ≠ K ) and b as — + TÁ. Since TM = 0, TÁ and MÁ are independent (because two GRM
uncorrelated Gaussian variables are independent), therefore b and s 2 , which are functions of respective
independent variables, are independent. ⌅ Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
87 / 295
Statistics and
I Under A.1 to A.5, let us consider bk , the k th entry of b: Econometrics II
J.-P. Renne
bk |X ≥ N (—k , ‡ 2 vk ),
Prelim.
where vk is the kth component of the diagonal of (XÕ X)≠1 . CLT
I Besides, we have (Prop. 4.5): Tests
Linear regression
(n ≠ K )s 2
|X ≥ ‰2 (n ≠ K ). Assumptions
‡2 OLS
Inference
Common pitfalls
I As a result (using Props. 4.5 and 4.6), we have: Large sample
IV
GRM
bk ≠—k
Ô
‡ 2 vk bk ≠ —k Panels
tk = Ò =  ≥ t(n ≠ K ), (12)
Max.Lik.Estim.
(n≠K )s 2 s 2 vk
‡ 2 (n≠K ) Binary
Time Series
where t(n ≠ K ) denotes a t distribution with n ≠ K degrees of freedom AR processes
(Def. 9.5). MLE of AR processes

Forecasting
bk ≠—k (n≠K )s 2
I Remark: Ô |X ≥ N (0, 1) and ‡2
|X ≥ ‰2 (n ≠ K ). These two Appendix
‡ 2 vk
distributions do not depend on X ∆ the marginal distribution of tk is Stat. Tables
also t.
88 / 295
Statistics and
Remark 4.7 Econometrics II
I s 2 vk is not exactly the conditional variance of bk : J.-P. Renne

The variance of bk conditional on X is ‡ 2 vk .
Prelim.
I However s 2 vk is an unbiased estimate of ‡ 2 vk (by Prop. 4.4).
CLT
I The previous result (Eq. 12) can be extended to any linear combinations Tests
of elements of b (Eq. 12 is for its k th component only). Linear regression

Assumptions
I Let us consider –Õ b, the OLS estimate of –Õ —. OLS
I From Eq. (10), we have: Inference

Common pitfalls
Large sample
2
– b|X ≥ N (– —, ‡ – (X X)
Õ Õ Õ Õ ≠1
–). IV
GRM
I Therefore: Panels
–Õ b ≠ –Õ — Max.Lik.Estim.
 |X ≥ N (0, 1).
‡ 2 –Õ (XÕ X)≠1 – Binary
I Using the same approach as the one used to derive Eq. (12), one can Time Series
show that Props. 4.5 and 4.6 imply that:

AR processes
MLE of AR processes
Forecasting
–Õ b ≠ –Õ — Appendix
 ≥ t(n ≠ K ). (13)
s 2 –Õ (XÕ X)≠1 – Stat. Tables
89 / 295
Statistics and
Econometrics II
Figure 17: Normal and Student-t distributions J.-P. Renne
Prelim.
0.4 N(0,1)
CLT
t(3)
t(7) Tests
t(20)
Linear regression
0.3
Assumptions
OLS
Inference
Common pitfalls
0.2
Large sample
IV
GRM
Panels
0.1
Max.Lik.Estim.
Binary
0.0
Time Series
AR processes
−3 −2 −1 0 1 2 3
MLE of AR processes
Forecasting
X
Appendix
Note: See Def. 9.5 of the t distribution. The chart shows that the higher the degree of freedom ‹, the Stat. Tables
closer the distribution of t(‹) gets to the normal distribution. See also Table 23.
90 / 295
Confidence interval of —k Statistics and
Econometrics II
I Assume we want to compute a (symmetrical) confidence interval
J.-P. Renne
[Id,1≠– , Iu,1≠– ] that is such that P(—k œ [Id,1≠– , Iu,1≠– ]) = 1 ≠ –.
I In particular, we want to have: P(—k < Id,1≠– ) = – . Prelim.
2
I For this purpose, we make use of tk = Ô bk ≠—k CLT
≥ t(n ≠ K ) (Eq. 12).
s 2 vk
Tests
I We have:
– Linear regression
P(—k < Id,1≠– ) = …
2 Assumptions
OLS
Inference
A B A B Common pitfalls
bk ≠ —k bk ≠ Id,1≠– – bk ≠ Id,1≠– – Large sample
P  >  = …P tk >  = … IV
s 2 vk s 2 vk 2 s 2 vk 2 GRM
A B Panels
bk ≠ Id,1≠– bk ≠ Id,1≠–
1 2
– – Max.Lik.Estim.
1≠P tk Æ  = …  = ≠1
t(n≠K )
1≠ ,
s 2 vk 2 s2v 2 Binary
k
Time Series
where t(n≠K ) (–) is the c.d.f. of the t(n ≠ K ) distribution (Table 23). AR processes
MLE of AR processes
I Doing the same for Iu,1≠– , we obtain: Forecasting
Appendix
[Id,1≠– , Iu,1≠– ] =
Ë 1 2 1 2 È Stat. Tables
– –
bk ≠ ≠1
t(n≠K )
1≠ s 2 vk , bk + ≠1
t(n≠K )
1≠ s 2 vk .
2 2
91 / 295
Statistics and
Econometrics II
Example 4.6 (Income, Seniority and Education (Example 4.2))
J.-P. Renne
(Intercept) -50.086 5.999 -8.349 0.000 Prelim.
Education 5.896 0.357 16.513 0.000 CLT
Seniority 0.173 0.024 7.079 0.000
Tests
Table 6: Income, Education and Seniority Linear regression

Assumptions
OLS
Inference
The data are the same as in Example 4.2. Common pitfalls
Large sample
IV
Example 4.7 (Advertising and sales (Example 4.3)) GRM
Panels
(Intercept) 2.939 0.312 9.422 0.000 Max.Lik.Estim.
TV 0.046 0.001 32.809 0.000 Binary

radio 0.189 0.009 21.893 0.000 Time Series
newspaper -0.001 0.006 -0.177 0.860 AR processes
MLE of AR processes
Table 7: Advertising and sales Forecasting
Appendix
Stat. Tables
The data are the same as in Example 4.3.
92 / 295
Statistics and
Econometrics II
J.-P. Renne
I In Examples 4.6 and 4.7, the last two columns give the test statistic and Prelim.
p-values associated to the test (Section 3) whose null hypothesis is:
CLT
Tests
H0 : —k = 0.
Linear regression
 Assumptions
I The t-statistics, that is bk / s 2 vk , is the test statistic of the test. OLS
I Under H0 , the t-statistic is t(n ≠ K ) (see Eq. 12). Hence, the critical Inference
Common pitfalls
region (see Slide 38) for the test of size – is: Large sample
IV
È 1 2È Ë 1 2 Ë GRM
– –
≠Œ, ≠ ≠1
t(n≠K )
1≠ fi ≠1
t(n≠K )
1≠ , +Œ .
2 2 Panels
Max.Lik.Estim.
I The p-value is defined as the probability that |Z | > |t|, where t is the
Binary
(computed) t statistics and where Z ≥ t(n ≠ K ). That is, the p-value is
given by 2(1 ≠ t(n≠K ) (|tk |)). Time Series
AR processes
MLE of AR processes
Charts on tests (critical region, level, p-value)
Forecasting
Appendix
Stat. Tables
93 / 295
Set of linear restrictions Statistics and
Econometrics II
I We consider the following model: J.-P. Renne
Prelim.
y = X— + Á, Á ≥ i.i.d.N (0, ‡ 2 ).
CLT
and we want to test for the joint validity of a set of restrictions Tests
involving the components of — in a linear way. Linear regression
I Set of linear restrictions: Assumptions
OLS
Inference
r1,1 —1 + · · · + r1,K —K = q1 Common pitfalls
.. .. Large sample
. . (14) IV
GRM
rJ,1 —1 + · · · + rJ,K —K = qJ ,
Panels
that can be written in matrix form: Max.Lik.Estim.
Binary
R— = q. (15)
Time Series
AR processes
MLE of AR processes
Example 4.8 (Advertising and sales (Examples 4.3 and 4.7)) Forecasting
Appendix
Is the effect of advertising on TV (one unit) equal to that of advertising on
Stat. Tables
radio (one unit)?
∆ R = [0, 1, ≠1, 0], q = 0.
94 / 295
Statistics and
Econometrics II
I Discrepancy vector: m = Rb ≠ q. J.-P. Renne
I Under the null hypothesis:

Prelim.
CLT
E(m|X) = R— ≠ q = 0 and
Tests
Var (m|X) = RVar (b|X)R .Õ
Linear regression
I Under A.1 to A.4, Var (m|X) = ‡ 2 R(XÕ X)≠1 RÕ (see Prop. 4.1). Assumptions
OLS
I Test: Inference
H0 : R— ≠ q = 0 against H1 : R— ≠ q ”= 0. (16) Common pitfalls

Large sample
IV
I We could perform a Wald test. The test statistic is denoted by W . GRM
Under A.1 to A.5 (we need the normality assumption) and under H0 , we Panels
have:
Max.Lik.Estim.
W = mÕ Var (m|X)≠1 m ≥ ‰2 (J) (17)
Binary
(see Prop. 9.9).
Time Series
I However, ‡ 2 is unknown. Hence we cannot compute W . AR processes
We can however approximate it be replacing ‡ 2 by s 2 . The distribution MLE of AR processes
of this new statistic is not ‰2 (J) any more; it is an F distribution, and Forecasting
the test is called F test (see next slide). Appendix
Stat. Tables
95 / 295
Statistics and
Econometrics II
Proposition 4.7 (Definition and distribution of the F stat.)
J.-P. Renne
Under Assumptions A.1 to A.5 and if Eq. (16) holds, we have:

Prelim.
CLT
W ‡2 mÕ (R(XÕ X)≠1 RÕ )≠1 m
F = = ≥ F (J, n ≠ K ), (18)
J s2 s2J Tests
Linear regression
where F is the distribution of the F-statistic (see Def. 9.4). Assumptions
OLS
Inference
Proof According to Eq. (17), W /J ≥ ‰2 (J)/J. Consider the denominator s 2 /‡ 2 . We have Common pitfalls
s 2 = eÕ e/(n ≠ K ) (Eq. (11)) and eÕ e = ÁÕ MÁ. By Prop. 9.2, the latter term is ≥ ‰2 (n ≠ K ). Large sample
Therefore, F is the ratio of a r.v. distributed as ‰2 (J)/J and another distributed as IV

GRM
‰2 (n ≠ K )/(n ≠ K ). It remains to verify that these r.v. are independent.
Panels
Under H0 , we have m = R(b ≠ —) = R(XÕ X)≠1 XÕ Á. Therefore mÕ (R(XÕ X)≠1 RÕ )≠1 m is of the form
Max.Lik.Estim.
ÁÕ TÁ with T = DÕ CD where D = R(XÕ X)≠1 XÕ and C = (R(XÕ X)≠1 RÕ )≠1 .
Binary
Under A.1 to A.4, the covariance between TÁ and MÁ is ‡ 2 TM = 0. Therefore, under A.5, these
variables are Gaussian variables with 0 covariance. Hence they are independent. ⌅ Time Series
AR processes
MLE of AR processes
Forecasting
Remark 4.8 Appendix
For large n ≠ K , the FJ,n≠K distribution converges to FJ,Œ = ‰2 (J)/J. Stat. Tables
96 / 295
Proposition 4.8 (Other expressions of the F -statistics) Statistics and
Econometrics II
The F-statistic defined by Eq. (18) is also equal to: J.-P. Renne
Prelim.
(R 2 ≠ Rú2 )/J (SSRrestr ≠ SSRunrestr )/J
F = = , (19) CLT
(1 ≠ R 2 )/(n ≠ K ) SSRunrestr /(n ≠ K )
Tests
where Rú2 is the coef. of determination (Eq. 6) of the “restricted regression” Linear regression
(SSR: sum of squared residuals, see Slide 79.) Assumptions
OLS
Inference
Proof Let’s denote by eú = y ≠ Xbú the vector of residuals associated to the "restricted regression" Common pitfalls
(i.e. Rbú = q). We have eú = e ≠ X(bú ≠ b). Using eÕ X = 0, we get Large sample
eÕú eú = eÕ e + (bú ≠ b)Õ XÕ X(bú ≠ b) Ø eÕ e. IV
By Prop. 9.3, we know that bú ≠ b = ≠(XÕ X)≠1 RÕ {R(XÕ X)≠1 RÕ }≠1 (Rb ≠ q). Therefore: GRM
Õ Õ Õ Õ ≠1 Õ ≠1
Panels
eú eú ≠ e e = (Rb ≠ q) [R(X X) R ] (Rb ≠ q).
Max.Lik.Estim.
This implies that the F statistic defined in Prop. 4.7 is also equal to:
Binary
(eÕú eú ≠ eÕ e)/J Time Series
.⌅
eÕ e/(n ≠ K ) AR processes
MLE of AR processes
Forecasting
Definition 4.2 (F-test of size –) Appendix
Stat. Tables
The null hypothesis H0 (Eq. 16) of the F-test is rejected if F –defined by Eq.
(18) or (19)– is higher than F1≠– (J, n ≠ K ). [one-sided test: see Slide 287]
97 / 295
Prediction Statistics and
Econometrics II
J.-P. Renne
Prelim.
I We consider the model:
CLT
2 Tests
yi = xiÕ — + Ái , Ái ≥ N (0, ‡ ).
Linear regression
I Prediction exercise: we want to predict y ú , which would be associated Assumptions
with a vector of explanatory variables xú (but this “entity ú” is not in

OLS
Inference
the estimation sample). Common pitfalls
I The Gauss-Markov Theorem (4.1) implies that, under under A.1 to A.4, Large sample
IV
yú
‚ = xúÕ b is the minimum-variance linear unbiased estimator of GRM
E(y ú |xú ) = xúÕ —, where xú is a specific (deterministic) realization of x. Panels

I Forecast error: e ú = y ú ≠ y‚ú = y ú ≠ xúÕ b = xúÕ (— ≠ b) + Áú . Max.Lik.Estim.
I Prediction variance: Binary
Time Series
Var (e ú |X, x = xú ) = xúÕ {‡ 2 (XÕ X)≠1 }xú + ‡ 2 . (20) AR processes
MLE of AR processes
Forecasting
I The prediction variance is estimated by replacing ‡ 2 by s 2 .
Appendix
Stat. Tables
98 / 295
I Let’s focus on the specific case where the regression includes a constant
term and a single regressor, i.e. each row of X is of the form [1, wi ].
(This is Example 4.5.)
I In this case, we have xú = [1, w ú ]Õ and, using the expression of (XÕ X)≠1
computed in Example 4.5 (Eq. 20), we have:
5 6
1 (w ú ≠ w̄ )2
Var ( y ú ≠ xúÕ b |X, x = xú ) = ‡ 2 1 + + q . (21)
¸ ˚˙ ˝ n i
(wi ≠ w̄ )
2
forecast error
I The farther the considered w ú is from the center of observations (i.e.

w̄ ), the lower the confidence in the predicted y ú .
I Conditional on (X, x = xú ), the forecast error e ú = xúÕ (— ≠ b) + Áú is
Gaussian (under A.1 to A.5, using in particular Eq. 10).
It expectation is 0 and its variance is given by Eq. (21). Hence
y ú |X, x = xú ≥ N (xúÕ b, Var (e ú |X, x = xú )).

¸ ˚˙ ˝
given by Eq. (21)
I Hence, under A.1 to A.5, a 95% confidence interval for y ú is:

Ë   È
xúÕ b ≠ 1.96 Var (e ú |X, x = xú ), xúÕ b + 1.96 Var (e ú |X, x = xú ) .
Figure 18: Prediction – Three cases: low, medium and large dispersion of the xi s
4
4
●
● ● ●
● ● ●
● ● ●
● ● ● ●● ●●
2
2
● ● ● ● ● ●
● ● ● ●●
●
● ● ●●● ●●
●
● ● ● ●
● ●
● ●
● ● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
● ● ●
●
● ●●● ●
● ●●● ● ●●●
● ● ● ●●
● ●
●● ● ● ●
● ● ● ●
● ●
●● ● ● ●
● ●● ● ●
● ● ●● ●
● ● ●● ● ● ●
●
0
0
y
y
●
● ●● ●
Theoretical ●
Theoretical ●
Theoretical
regression line ●
regression line regression line
●
−2
−2
−2
Estimated Estimated Estimated
regression line regression line regression line
−4
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x
100 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 19: Seniority and income (data of Example 4.2) Prelim.
CLT
Tests
Linear regression
150
Assumptions
OLS
Inference
Income
100
● ●
●● ● ● Large sample
● ● ●
●●● IV
●● ● ●
● GRM
●
50
●
●
● Panels
● ●
●
● ● Max.Lik.Estim.
0
Binary
0 50 100 150 200 250 300 Time Series

AR processes
Seniority MLE of AR processes

Forecasting
Appendix
Stat. Tables
101 / 295
Potential common pitfalls Statistics and
Econometrics II
Remark 4.9 (Multicollinearity) J.-P. Renne
I Model: yi = —1 xi,1 + —2 xi,2 + Ái where all variables are zero-mean and Prelim.
Var (Ái ) = ‡ 2 . CLT

I We have 5 q q 6 Tests
2
xi,1 xi,1 xi,2
Õ
X X= q i qi
2 therefore Linear regression
xi,1 xi,2 xi,2 Assumptions
i i
OLS
Inference
Common pitfalls
Õ ≠1
(X X) = Large sample
5 q 2 q 6 IV
q q 1
q q i xi,2 ≠
qi xi,12 xi,2 . GRM
2
xi,1 2 ≠(
xi,2 xi,1 xi,2 )2 ≠ xi,1 xi,2 xi,1
i i i i i Panels
Max.Lik.Estim.
I The inverse of the upper-left parameter of (XÕ X)≠1 is:
Binary
ÿ q ÿ Time Series
2
( i xi,1 xi,2 )2 2 2
xi,1 ≠ q 2 = xi,1 (1 ≠ correl1,2 ), (22) AR processes
i
xi,2 MLE of AR processes
i i Forecasting
Appendix
where correl1,2 is the sample correlation between x1 and x2 .
Stat. Tables
I Hence, the closer to one correl1,2 , the higher the variance of b1 (recall
that the variance of b1 is the upper-left component of ‡ 2 (XÕ X)≠1 ).
102 / 295
Econometrics II
J.-P. Renne
Prelim.
Remark 4.10 (Omitted variables)
CLT
I “True model”: Tests
y = X1 —1 2+ X 2 — +Á
¸˚˙˝ ¸˚˙˝ ¸˚˙˝ ¸˚˙˝ Linear regression
n◊K1 K1 ◊1 n◊K2 K2 ◊1 Assumptions
OLS
I Then, if one computes b1 by regressing y on X1 only: Inference

Common pitfalls
Large sample
b1 = (XÕ1 X1 )≠1 XÕ1 y = —1 + (XÕ1 X1 )≠1 XÕ1 X2 — 2 + (XÕ1 X1 )≠1 XÕ1 Á. IV
GRM
I Omitted-variable formula: Panels
Max.Lik.Estim.
E(b1 |X) = — 1 + (XÕ1 X1 )≠1 (XÕ1 X2 ) — 2 Binary
¸ ˚˙ ˝
K1 ◊K2 Time Series
AR processes
MLE of AR processes
(each column of (XÕ1 X1 )≠1 (XÕ1 X2 ) are the OLS regressors obtained Forecasting
when regressing the columns of X2 on X1 ). Appendix
Stat. Tables
103 / 295
Econometrics II
J.-P. Renne
Prelim.
Example 4.9 (Omitted variables & Effect of education on wages) CLT
I Consider the “true model”: Tests
Linear regression
wagei = —0 + —1 edui + —2 abilityi + Ái , Ái ≥ i.i.d. N (0, ‡Á2 ) (23) Assumptions
OLS
Inference
I Further, we assume that the edu variable is correlated to the ability . Common pitfalls
Specifically: Large sample

IV
GRM
edui = –0 + –1 abilityi + ÷i , ÷i ≥ i.i.d. N (0, ‡÷2 ). Panels
I Assume we mistakingly run the regression omitting the ability variable: Max.Lik.Estim.
Binary
wagei = “0 + “1 edui + ›i . (24) Time Series
AR processes
I It can be seen that ›i = Ái ≠ (—2 /–1 )÷i ≥ i.i.d. N (0, ‡Á2 + (—2 /–1 )2 ‡÷2 ) MLE of AR processes
Forecasting
and that the population regression coefficient is “1 = —1 + —2 /–1 ”= —1 .
Appendix
Stat. Tables
104 / 295
Econometrics II
Example 4.10 (Omitted variables & Effect of class size on performances) J.-P. Renne
Prelim.
Figure 20: California school district data CLT
Tests
●
700
●
●● ● ●
Assumptions
● ● ●
● ● ●
●●
● ● OLS
● ●
● ●
●●● ● ● ● Inference
680
● ● ●● ● ●● ● ●
● ● ● ● ● ● ●
● Common pitfalls
● ● ● ● ● ● ● ● ●
●
● ●
●● ●● ●● ●●
●●● ● ●● ●● ● ● ●● ● Large sample
●●● ● ● ●
● ● ●● ● ● ● ●
● ●● ● ●
● ●● ●●●● ●●●●●● ● ● ●●● ● ● ● IV
Test score
●● ●●
● ●
● ● ● ● ● ● ● ● ●
● ● ● ●●●● ● ● ●●● ● ●
660
● ●● ● ●●
● ● ●● ●● ● ● ●●● ●●
●● ●
● GRM
●●●●
●● ● ● ●● ●● ●● ● ●
●
●● ●● ● ● ●● ● ● ●● ●● ●
● ●● ●●● ● ●● ● ●
●●
Panels
● ● ●● ● ● ● ● ●● ● ●
●● ● ● ● ● ● ● ● ● ●
●● ● ● ● ●● ●●
● ● ● ●●●
● ● ●
●● ●
● ● ● ● ● ● ●●
●
● ●●●●● ● ●● ●●●●● ● ●● ● ● ● ●●
Max.Lik.Estim.
●
640
● ● ● ● ● ● ●● ●
●
● ● ●● ● ● ● ●
● ●● ● ●●●● ●●● ● ● ● ● ●
● ● ● ● ●●●● ● ●
●
● ●● ● ●●
● ● ●●
●
● ● ● ● ● ● ●
Binary
●
● ●● ●● ● ● ● ● ●● ● ●
● ● ● ●●
● ● ● ● ● ● ●● ● ●
● ● ●● ●
620
Time Series
● ● ● ● ●
● ● ●
● ●● ● ●●
● ● AR processes
●
● ● MLE of AR processes
Forecasting
14 16 18 20 22 24 26 Appendix
Student−teacher ratio Stat. Tables
Source of the data Stock and Watson (2004).
105 / 295
Econometrics II
Example 4.10 (Effect of class size on performances (cont’d)) J.-P. Renne
Estimate Std. Error t value Pr(>|t|) Prelim.

(Intercept) 698.933 9.467 73.825 0.000 CLT
StockWatsonEduc$str -2.280 0.480 -4.751 0.000
Tests
Table 8: Regression of Test score on Student-teacher ratio. Linear regression
Assumptions
OLS
Estimate Std. Error t value Pr(>|t|) Inference

Common pitfalls
(Intercept) 686.032 7.411 92.566 0.000 Large sample
StockWatsonEduc$str -1.101 0.380 -2.896 0.004 IV
StockWatsonEduc$elpct -0.650 0.039 -16.516 0.000 GRM
Panels
Table 9: Regression of Test score on Student-teacher ratio.
Max.Lik.Estim.
Binary
Time Series
(Intercept) 700.15 4.69 149.42 0.00 AR processes
StockWatsonEduc$str -1.00 0.24 -4.18 0.00 MLE of AR processes
StockWatsonEduc$elpct -0.12 0.03 -3.76 0.00 Forecasting
StockWatsonEduc$mealpct -0.55 0.02 -25.34 0.00 Appendix
Stat. Tables
Table 10: Regression of Test score on Student-teacher ratio.
106 / 295
Econometrics II
J.-P. Renne
Prelim.
Remark 4.11 (Irrelevant variable) CLT
I “True model”: Tests
y = X1 — 1 + Á Linear regression
I “Estimated model”: Assumptions
OLS
y = X1 — 1 + X2 — 2 + Á Inference
Common pitfalls
I Unbiased estimate. Problem however: the variance of the estimate of Large sample
— 1 resulting from this regression is higher than that stemming from the IV
correct regression (unless the correlation between X1 and X2 is null, see

GRM
Eq. 22). Panels
∆ The estimator is inefficient, i.e. that there exists an alternative Max.Lik.Estim.

consistent estimator whose variance is lower. Binary
I The inefficiency problem can have serious consequences when testing Time Series
hypotheses of type H0 : —1 = 0 due to the loss of power, so we might AR processes
infer that they are no relevant variables when they truly are (Type-II MLE of AR processes
error; False Negative, see Slide 36).

Forecasting
Appendix
Stat. Tables
107 / 295
Least Squares: Large Sample Properties Statistics and
Econometrics II
J.-P. Renne
Prelim.
I Even if we relax the normality assumption (Assumption A.5), we can CLT

approximate the finite-sample behavior of the estimators by using Tests
large-sample or asymptotic properties.
Linear regression
I To begin with, we proceed under Assumptions A.1 to A.4. (We will see Assumptions
later how to deal with –partial– relaxations of Assumptions A.3 and OLS
A.4.)
Inference
Common pitfalls
Large sample
IV
Preview of OLS large-sample properties under A.1 to A.4 GRM
I Under A.1 to A.4, even if the residuals are not normally-distributed, the Panels
least square estimators can be asymptotically normal and inference can Max.Lik.Estim.
be performed as in small samples when A.1 to A.5 hold.
Binary
I This derives from Prop. 4.9 (below).
Time Series
I The F-test (Prop. 4.8) and the t-test described in Slide 93 can then be AR processes
performed. MLE of AR processes

Forecasting
Appendix
Stat. Tables
108 / 295
Statistics and
Proposition 4.9 (Asymp. distri. of b with independent observations) Econometrics II
J.-P. Renne
Under Assumptions A.1 to A.4, and assuming further that:
Prelim.
XÕ X
Q = plimnæŒ , (25) CLT
n
Tests
and that the (xi , Ái )s are independent (across entities i), we have: Linear regression
Assumptions
Ô d ! " OLS
n(b ≠ —) æ N 0, ‡ 2 Q ≠1
, (26) Inference
Common pitfalls
Large sample
! XÕ X "≠1 ! XÕ Á " Ô ! Õ "≠1 ! 1 " Õ IV
Proof Since b = — + n n
, we have: n(b ≠ —) = XnX Ô X Á. Since
! XÕ X "≠1 ≠1 n GRM
f : A æ A≠1 is a continuous function (for A ”= 0), plimnæŒ n

=Q (see see Panels
Prop. 9.4(ii), continuous mapping theorem). Let us denote by Vi the vector xi Ái . Because the (xi , Ái )s
Max.Lik.Estim.
are independent, the Vi s are independent as well. Their covariance matrix is ‡ 2 E(xi xiÕ ) = ‡ 2 Q.
Applying the multivariate central limit theorem (Theorem 2.3) on the Vi s gives Binary
Ô ! q n
" ! " d
n n1 xi Ái = Ô1 XÕ Á æ N (0, ‡ 2 Q). An application of Prop. 9.4(i) (Slutsky’s Time Series
i=1 n
theorem) then leads to the results. ⌅ AR processes
MLE of AR processes
Forecasting
I In practice, ‡ 2 is estimated with eÕ e (XÕ X)≠1 Appendix

n≠K
(Eq. 11) and Q≠1 with n
.
Stat. Tables
I Eq. (25) and Eq. (26) respectively correspond to convergences in
probability (Def. 9.9) and in distribution (Def. 9.12) .
109 / 295
Instrumental Variables Statistics and
Econometrics II
I Here, we want to relax Assumption A.2 (conditional mean zero J.-P. Renne
assumption, implying in particular that xi and Ái are uncorrelated).
I Model: Prelim.
yi = xi Õ — + Ái , where E(Ái ) = 0 and xi ”‹ Ái . (27) CLT
Tests
Definition 4.3 (Valid set of instruments)
Linear regression
The L-dimensional random variable zi is a valid set of instruments if: Assumptions
OLS
(a) zi is correlated to xi ; Inference

Common pitfalls
(b) we have E(Á|Z) = 0 and Large sample
(c) the orthogonal projections of the xi s on the zi s are not multicollinear.

IV
GRM
Panels
Example 4.11 (A situation where E(Ái ) = 0 and xi ”‹ Ái )
Max.Lik.Estim.
I Let us make the assumption xi ”‹ Ái in Model (27) more precise: Binary
Time Series
E(Ái ) = 0 and E(Ái xi ) = “ AR processes
MLE of AR processes
Forecasting
by the law of large numbers (Theorem 2.1) plimnæŒ XÕ Á/n = “.
Appendix
I If Qxx := plim XÕ X/n, the OLS estimator is not consistent because
Stat. Tables
p
b = — + (X X) Õ ≠1
XÁæ—+
Õ
Q≠1
xx “ ”= —.
110 / 295
Statistics and
I If zi is a valid set of instruments, we have: Econometrics II
1 2 1 2 1 2 J.-P. Renne
ZÕ y ZÕ (X— + Á) ZÕ X
plim = plim = plim —
n n n Prelim.
CLT
ZÕ Á p
Indeed, by the law of large numbers (Theorem 2.1), n
æ E(zi Ái ) = 0. Tests
I If L = K , the matrix ZÕ X
n
is of dimension K ◊ K and we have: Linear regression
Assumptions
Ë 1 2È≠1 1 2 OLS
ZÕ X ZÕ y Inference
plim plim = —.
n n Common pitfalls
Large sample
IV
# ! ZÕ X "$≠1 ! ZÕ X "≠1 GRM
I By continuity of the inverse funct.: plim = plim .
n n
Panels
I The Slutsky Theorem (Prop. 9.4) further implies that
Max.Lik.Estim.
1 2≠1 1 2 31 2≠1 4 Binary
ZÕ X ZÕ y ZÕ X ZÕ y
plim plim = plim . Time Series
n n n n
AR processes
MLE of AR processes
I Hence biv is consistent if it is defined by: Forecasting
Appendix
biv = (ZÕ X)≠1 ZÕ y. Stat. Tables
111 / 295
Statistics and
Proposition 4.10 (Asymp. distri. of biv ) Econometrics II
J.-P. Renne
If zi is a L-dimensional random variable that constitutes a valid set of
instruments (see Def. 4.3) and if L = K , then the asymptotic distribution of Prelim.
biv is: 3 4 CLT
d ‡2 # $≠1
biv æ N —, Qxz Qzz Qzx
≠1
Tests
n
Linear regression
where plim ZÕ Z/n =: Qzz , plim ZÕ X/n =: Qzx , plim XÕ Z/n =: Qxz . Assumptions
OLS
Inference
Proof The proof is very similar to that of Prop. 4.9, the starting point being that Common pitfalls
Large sample
biv = — + (ZÕ X)≠1 ZÕ Á. ⌅
IV
GRM
I When L = K , we have: Panels
Max.Lik.Estim.
# $≠1
Qxz Qzz
≠1
Qzx = Qzx
≠1
Qzz Qxz
≠1 Binary
Time Series
‡ 2 ≠1
we replace ‡ 2 by:
AR processes
I In practice, to estimate Var (biv ) = Qzx Qzz Qxz
≠1
,
n MLE of AR processes
Forecasting
n
1 ÿ Appendix
siv2 = (yi ≠ xiÕ biv )2
n Stat. Tables
i=1
112 / 295
Statistics and
Econometrics II
J.-P. Renne
I And when L > K ?
I Idea: First regress X on the space spanned by Z and then regress y on Prelim.
the fitted values X̂ := Z(ZÕ Z)≠1 ZÕ X. That is biv = (X̂Õ X̂)≠1 X̂Õ y: CLT
Tests
biv = [XÕ Z(ZÕ Z)≠1 ZÕ X]≠1 XÕ Z(ZÕ Z)≠1 ZÕ Y. (28)
Linear regression
Assumptions
I In this case, Prop. 4.10 still holds, with biv given by Eq. (28). OLS
Inference
I biv is also the result of the regression of y on Xú , where the columns of Common pitfalls
Xú are the (othogonal) projections of those of X on Z, i.e. Xú = PZ X Large sample

IV
(using the notations introduced in 4.1). Hence the other names of this GRM
estimator: Two-Stage Least Squares (TSLS). Panels
Max.Lik.Estim.
Definition 4.4 (Weak instruments) Binary
I If the instruments do not properly satisfy Condition (a) in Def. 4.3 (i.e. Time Series
if xi and zi are only loosely related), the instruments are said to be weak. AR processes
MLE of AR processes
I This problem is for instance discussed in Stock and Yogo (2003). See Forecasting
also Stock and Watson pp. 489-490.
Appendix
Stat. Tables
113 / 295
Statistics and
Econometrics II
J.-P. Renne
Definition 4.5 (Hausman test) Prelim.
I The Hausman test can be used to test if IV necessary. CLT
I IV techniques are required if plimnæŒ XÕ Á/n ”= 0. Tests
I Hausman (1978) proposes a test of the efficiency of estimators. Linear regression

Assumptions
I Under the null hypothesis two estimators, b0 and b1 , are consistent but OLS
b0 is (asymptotically) efficient relative to b1 . Under the alternative Inference
hypothesis, b1 (IV in the present case) remains consistent but not b0

Common pitfalls
Large sample
(OLS in the present case). IV
I The test statistic is: GRM
Panels
H = (b1 ≠ b0 )Õ MPI(Var (b1 ) ≠ Var (b0 ))(b1 ≠ b0 ), Max.Lik.Estim.
Binary
where MPI is the Moore-Penrose pseudo-inverse (see Def. 9.3).
Time Series
I Under the null hypothesis, H ≥ ‰2 (q), where q is the rank of AR processes
Var (b1 ) ≠ Var (b0 ). MLE of AR processes

Forecasting
Appendix
Stat. Tables
114 / 295
Statistics and
Example 4.12 (Estimation of price elasticity) Econometrics II
I See e.g. WHO and estimation of tobacco price elasticity of demand. J.-P. Renne
I We want to estimate what is the effect on demand of an exogenous Prelim.

increase in prices of cigarettes (say).
CLT
I Model:
Tests
qtd = –0 + –1 ◊p t +–2 ◊wt +Ádt Linear regression

¸˚˙˝ ¸˚˙˝ ¸˚˙˝ Assumptions
log(demand) log(price) income OLS

Inference
qts = “0 + “1 ◊ pt + “2 ◊yt +Ást , Common pitfalls
¸˚˙˝ ¸˚˙˝ Large sample
cost factors
IV
log(supply) GRM
Panels
where yt , wt , Ást ≥ N (0, ‡s2 ) and Ádt ≥ N (0, ‡d2 ) are independent.
Max.Lik.Estim.
I Equilibrium: qtd = qts . This implies that prices are endogenous:
Binary
Time Series
–0 + –2 wt + Ádt
≠ “0 ≠ “2 yt ≠ Ást
pt = . AR processes
“1 ≠ – 1 MLE of AR processes
Forecasting
‡d2 Appendix
I In particular we have E(pt Ádt ) = “1 ≠–1
”= 0.
Stat. Tables
∆ Regressing by OLS qtd on pt gives biased estimates (see Example 4.11).
115 / 295
Statistics and
Econometrics II
J.-P. Renne
Example 4.12 (Estimation of price elasticity, continued)
Prelim.
Figure 21: Demand and Supply
CLT
Tests
3.0
Linear regression
Supply 4
Assumptions
2.5
Supply 2 OLS
Inference
2.0
Supply 3
Common pitfalls
log(quantities)
Large sample
1.5
●
Supply 1
● IV
GRM
1.0
●
Panels
●
0.5
Demand 2
Max.Lik.Estim.
Binary
0.0
Demand 4
Demand 1
Time Series
−0.5
Demand 3 AR processes
0.0 0.5 1.0 1.5 2.0 2.5 MLE of AR processes

Forecasting
log(price)
Appendix
Stat. Tables
116 / 295
Statistics and
Example 4.12 (Estimation of price elasticity, continued) Econometrics II
I Estimation of the price elasticity of cigarette demand. J.-P. Renne
Instrument: real tax on cigarettes arising from the state’s general sales
tax. Presumption: in states with a larger general sales tax, cigarette Prelim.
prices are higher, but the general tax is not determined by other forces CLT
affecting Ádt .
Tests
Linear regression
Assumptions
Table 11: IV Regression Results OLS
Inference
Common pitfalls
Dependent variable:
Large sample
log(packs) IV
log(rprice) ≠1.277úúú GRM
(0.263)
Panels
log(rincome) 0.280
Max.Lik.Estim.
(0.239)
Binary
Constant 9.895 úúú
(1.059) Time Series
AR processes
Observations 48 MLE of AR processes
R2 0.429 Forecasting
Adjusted R2 0.404
Appendix
Residual Std. Error 0.188 (df = 45)
Note: ú
p<0.1; úú p<0.05; úúú p<0.01 Stat. Tables
117 / 295
Example 4.13 (Education and Wages) Statistics and
Econometrics II
I Potential problem with OLS regression of wages on education: omitted J.-P. Renne
variables (see Example 4.9).
Prelim.
I Card (1993)’s idea: distance to college used as an instrument.
CLT
Table 12: Effect of education on wage
Tests
Dependent variable: wage Linear regression

Assumptions
OLS instrumental
OLS
variable
Inference
(1) (2) (3) Common pitfalls
urbanyes 0.070 0.046 0.046 Large sample
(0.045) (0.045) (0.060) IV
genderfemale ≠0.085úú ≠0.071ú ≠0.071 GRM
(0.037) (0.037) (0.050)
ethnicityafam ≠0.556úúú ≠0.227úúú ≠0.227úú Panels
(0.052) (0.073) (0.099)
Max.Lik.Estim.
ethnicityhispanic ≠0.544úúú ≠0.351úúú ≠0.351úúú
(0.049) (0.057) (0.077) Binary
unemp 0.133úúú 0.139úúú 0.139úúú
(0.007) (0.007) (0.009) Time Series
education 0.005 0.647úúú AR processes
(0.010) (0.136) MLE of AR processes
ed.pred 0.647úúú Forecasting
(0.101)
Observations 4,739 4,739 4,739 Appendix
Adjusted R2 0.109 0.116 ≠0.614
Stat. Tables
F Statistic (df = 6; 4732) 97.274úúú 104.971úúú
Note: ú
p<0.1; úú
p<0.05; úúú
p<0.01
118 / 295
General Regression Model Statistics and
Econometrics II
J.-P. Renne
I We want to relax the assumption according to which the disturbances
are uncorrelated with each other (Assumption A.4) or the Prelim.
homoskedasticity Assumption A.3. CLT
I We replace the latter two assumptions by the general formulation: Tests
Linear regression
E(ÁÁÕ |X) = . (29) Assumptions
OLS
I Note that Eq. (29) is more general than Assumptions A.3 and A.4 Inference
Common pitfalls
because the diagonal entries of may be different (not the case under Large sample
A.3), and the non-diagonal entries of can be ”= 0 (contrary to A.4). IV

GRM
Panels
Definition 4.6 (General Regression Model)
Max.Lik.Estim.
Assumptions A.1 and A.2, together with Eq. (29), form the General Regression Binary
Model (GRM) framework. Time Series
AR processes
MLE of AR processes
Remark 4.12 (A specific case) Forecasting
Appendix
A regression model where Assumptions A.1 to A.4 hold is a specific case of
Stat. Tables
the General Regression Model (GRM) framework.
119 / 295
Statistics and
Econometrics II
J.-P. Renne
I The GRM context notably allows to model heteroskedasticity and
autocorrelation. Prelim.
I Heteroskedasticity: CLT
Tests
S ‡2 0 ... 0
T
1 Linear regression
W 0 ‡22 0 X. Assumptions
=U . .. .. V (30) OLS
. . . . Inference
0 ... 0 ‡n2 Common pitfalls

Large sample
IV
I Autocorrelation: GRM
S T Panels
1 fl2,1 ... fln,1 Max.Lik.Estim.
W .. X
W fl 1 . X. Binary
= ‡ W 2,1
2
X (31)
U .. ..
.
V Time Series
. fln,n≠1 AR processes
fln,1 fln,2 ... 1 MLE of AR processes
Forecasting
Appendix
Stat. Tables
120 / 295
Statistics and
Econometrics II
J.-P. Renne
Example 4.14 (Autocorrelation in the time-series context)
Prelim.
I Autocorrelation is, in particular, a recurrent problem when time-series
data are used (see Section 8). CLT
I In a time-series context, subscript i refers to a date. Tests
I Assume for instance that: Linear regression

Assumptions
OLS
yi = xiÕ — + Ái (32) Inference
Common pitfalls
Large sample
with IV
Ái = flÁi≠1 + vi , vi ≥ N (0, ‡v2 ). (33) GRM
Panels
I In this case, we are in the GRM context, with:
Max.Lik.Estim.
S T
1 fl ... fln Binary
W .. X Time Series
‡v2 W fl 1 . X.
= W X (34) AR processes
1 ≠ fl2 U .. .. V MLE of AR processes
. . fl Forecasting
fln fln≠1 ... 1 Appendix
Stat. Tables
121 / 295
Statistics and
Econometrics II
Definition 4.7 (Generalized Least Squares) J.-P. Renne
I Assume is known (“feasible GLS”). Prelim.

I Because is symmetric positive, it admits a spectral decomposition of CLT
the form = C CÕ , where C is an orthogonal matrix (i.e. CCÕ = Id)
Tests
and is a diagonal matrix (the diagonal entries are the eigenvalues of
). Linear regression
Assumptions
I = (PPÕ )≠1 with P = C ≠1/2
. OLS
I Consider the transformed model: Inference

Common pitfalls
Large sample
P y = P X— + P Á
Õ Õ Õ
or y =X —+Á .
ú ú ú IV
GRM
I The variance of Áú is I. In the transformed model, OLS is BLUE Panels
(Gauss-Markow Theorem 4.1). Max.Lik.Estim.
Binary
bGLS = (XÕ ≠1
X)≠1 XÕ ≠1
y (35) Time Series
AR processes
MLE of AR processes
I bGLS = Generalized least squares estimator of —. Forecasting
I We have: Appendix
Var (bGLS |X) = (XÕ ≠1 X)≠1 . Stat. Tables
122 / 295
Statistics and
Econometrics II
I When is unknown, the GLS estimator is said to be infeasible. Some J.-P. Renne
structure is required.
I Assume admits a parametric form (◊). Prelim.
I The estimation becomes "feasible" (FGLS) if one replaces (◊) by (◊). ˆ CLT
I If ◊ˆ is a consistent estimator of ◊, then the FGLS is asymptotically Tests
efficient (see Example 4.15). Linear regression

Assumptions
I By contrast, when has no obvious structure: the OLS (or IV) is the OLS
only estimator available. It remains unbiased, consistent, and Inference
asymptotically normally distributed, but not efficient. Standard Common pitfalls

Large sample
inference procedures are not appropriate any longer. IV
GRM
Example 4.15 (Autocorrelation in the time-series context (continued)) Panels
Max.Lik.Estim.
I Consider the case presented in Example 4.14.
Binary
I Because the OLS estimate b of — is consistent, the estimates ei s of the
Time Series
Ái s also are. AR processes
I Consistent estimators of fl and ‡v are then obtained by regressing the MLE of AR processes
Forecasting
ei s on the ei≠1 s.
Appendix
I Using these estimates in Eq. (34) provides a consistent estimate of .
Stat. Tables
123 / 295
Statistics and
Proposition 4.11 (Finite sample properties of b in the GRM (Def. 4.6)) Econometrics II
J.-P. Renne
Conditionally on X, we have:
1 2≠1 1 21 2≠1 Prelim.
1 1 Õ 1 Õ 1 Õ
Var (b|X) = XX X X XX . (36) CLT
n n n n
Tests
Under A.5, since b is linear in Á, Linear regression
Assumptions
1 ! "≠1 ! "! "≠1 2 OLS
b|X ≥ N —, XÕ X XÕ X XÕ X . (37) Inference
Common pitfalls
Large sample
IV
Remark 4.13 GRM
The variance of the estimator is not ‡ 2 (XÕ X)≠1 any more, so using s 2 (XÕ X)≠1 Panels
for inference may be misleading. Max.Lik.Estim.
Binary
Proposition 4.12 (OLS consistency in the GRM (Def. 4.6)) Time Series
AR processes
if plim (XÕ X/n) and plim (XÕ X/n) are finite positive definite matrices, then MLE of AR processes
plim (b) = —. Forecasting
Appendix
Proof Var (b) = E[Var (b|X)] + Var [E(b|X)]. Since E(b|X) = —, Var [E(b|X)] = 0. Eq. (36) implies Stat. Tables
that Var (b|X) æ 0. Hence b converges in mean square and therefore in probability (Prop. 9.5). ⌅
124 / 295
Statistics and
Econometrics II
J.-P. Renne
Proposition 4.13 (Asymptotic distri. of b in the GRM (Def. 4.6))

Prelim.
if Qxx = plim (XÕ X/n) and Qx x = plim (XÕ X/n) are finite positive definite CLT
matrices, then: Tests
Ô d
n(b ≠ —) æ N (0, Qxx
≠1
Qx x Qxx
≠1
). Linear regression
Assumptions
OLS
Proposition 4.14 (Asymptotic distri. of biv in the GRM) Inference
Common pitfalls
If regressors and IV variables are “well-behaved”, then: Large sample

IV
GRM
a
biv ≥ N (—, Viv ), Panels
Max.Lik.Estim.
where 1 2
1 1 Õ Binary
Viv = (Qú ) plim Z Z (Qú )Õ ,
n n Time Series
AR processes
with MLE of AR processes
Qú = [Qxz Q≠1
zz Qzx ]
≠1
Qxz Q≠1
zz . Forecasting
Appendix
Stat. Tables
125 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
I For practical purposes, one needs to have estimates of in Props. 4.11,
CLT
4.13 or 4.14.
Tests
I Idea: instead of estimating (dimension n ◊ n) directly, one can
estimate n1 XÕ X (dimension K ◊ K ) since this is this expression (XÕ X) Linear regression
that eventually appears in the formulas (for instance in Eq. 36). Assumptions
OLS
I We have: Inference
n n
1 Õ 1 ÿ ÿ Common pitfalls
X X= ‡i,j xi xjÕ . (38) Large sample
n n IV
i=1 j=1 GRM
I Robust estimation of asymptotic covariance matrices look for estimates Panels
of the previous matrix. Max.Lik.Estim.
I Their computation is based on the fact that if b is consistent, then the Binary
ei s are consistent (pointwise) estimators of the Ái s. Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
126 / 295
Statistics and
Econometrics II
J.-P. Renne
First case: heteroskedasticity (Eq. 30) Prelim.

I This is the case of Eq. (30). CLT
qn
I We then need to estimate 1 ‡ 2 x xÕ .
n i=1 i i i Tests
I White (1980): Under general conditions: Linear regression
Assumptions
A n
B A n
B OLS
1ÿ 1ÿ Inference
plim ‡i2 xi xiÕ = plim ei2 xi xiÕ . (39) Common pitfalls
n n Large sample
i=1 i=1 IV
GRM
I The estimator of 1 Õ
n
X X therefore is: Panels
Max.Lik.Estim.
1 Õ 2
X E X, Binary
n
Time Series
where E is an n ◊ n diagonal matrix whose diagonal elements are the AR processes
estimated residuals ei .
MLE of AR processes
Forecasting
Appendix
Stat. Tables
127 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 22: Salaries versus time since PhD obtention
Prelim.
● CLT
Tests
200000
● ●
● ● ● Linear regression
● ●
● ● Assumptions
● ●
●
● ● ● ●
● ● OLS
● ● ●
● ●● ● ● ●
● ●● ● Inference
150000
●
●● ●
●● ● ●●● ●
Salary
● ● ● ● ● ● ● ●● ● Common pitfalls
● ● ● ● ● ● ● ● ●
●● ●● ●
● ● ●● ● ● ● ●
●●●● ● Large sample
● ●
●●● ● ●●
● ● ●●● ●
●●●
● ● ● ● ●●● ●● ●● IV
● ●●● ● ● ●
● ● ●
● ● ● ●● ●● ● ●
●●●●
●
●●●● ● ●
● GRM
●● ● ● ●
● ● ●●●
● ● ●● ● ● ●● ●● ● ●●● ● ●
100000
● ● ●●●●●● ● ● ●● ● ●
Panels
●● ●●●●●● ●● ● ● ● ●
● ● ●
●●●● ● ● ●● ●● ● ● ● ● ●●● ●
●
●●●
● ●● ●●●● ●● ● ●
● ●●● ● ● ● ● ●● ● ● ●
● ●
● ● ● ● ● ● ●
●●● ● ● ● ●●●●● ●
Max.Lik.Estim.
●●
●● ●●● ● ● ● ● ● ●●●
●●●●● ● ● ●
●
●● ● ●●
● ●● ●● ● ● ● ● ●
●● ● ●
●● ● ● ●● ● ● ●
● ●● ●●●● ● ●
Binary
●●●● ● ● ● ●
●● ● ● ●
● ● ●
●
Time Series
AR processes
0 10 20 30 40 50
MLE of AR processes
Forecasting
years since PhD
Appendix
Data from Fox J. and Weisberg, S. (2011).
Stat. Tables
128 / 295
I Let us illustrate the influence of heteroskedasticity using simulations. Statistics and
Econometrics II
I We consider the following model:
J.-P. Renne
yi = xi + Ái , Ái ≥ N (0, xi2 ). Prelim.
CLT
where the xi s are i.i.d. t(4).
Tests
I Here is a simulated sample (n = 200) of this model:
Linear regression
Assumptions
OLS
Figure 23: Simul.: yi = xi + Ái , Ái ≥ i.i.d.N (0, xi2 ) and xi ≥ i.i.d.t(4) Inference

Common pitfalls
Large sample
IV
10
●
GRM
Panels
●
Max.Lik.Estim.
5
●
● ●● ●
● ● ●
●
●
Binary
● ● ● ● ● ●
● ●
● ●● ● ●
● ● ●●
● ● ● ● ● ●
y
●● ● ●
●●
●
●●●●● ●
● ●● ●●●
●● ●●● ●● ●● ●
●● ●● ● ●
● ●●● ● ● ●●●
● ●●
●●● ●●
●●
●●● ●
● ●● ●● ●
●
●●●
0
● ● ●●●●
●
●●●
● ● ● ●● ● ●
●● ● ● ● ● ●●●●
●● ●● ● ● ● ● ●
Time Series
● ●●●
● ●● ●
● ● ●●
●●
●●
●●
●● ●
● ●
● ●
●● ● ● ● ●
● ●
●
●
●
●
● ●
● AR processes
● ● ● ●
●
●
MLE of AR processes
−5
Forecasting
Appendix
●
−4 −2 0 2 4
Stat. Tables
x
129 / 295
I We simulate 1000 samples of the same model with n = 200. Statistics and
I For each sample, we compute the OLS estimate of — (=1). Econometrics II
I Using these 1000 estimates of b, we construct an approximated J.-P. Renne
(kernel-based) distribution of this OLS estimator (in red on the figure).
I For each of the 1000 OLS estimations, we employ the standard OLS Prelim.
variance formula (s 2 (XÕ X)≠1 ) to estimate the variance of b. The blue CLT
curve is a normal distribution centred on 1 and whose variance is the
average of the 1000 previous variance estimates. Tests
Linear regression
Assumptions
Figure 24: Heteroskedasticity and the inappropriateness of the standard OLS OLS
variance formula Inference

Common pitfalls
Large sample
IV
6
GRM
Panels
5
Max.Lik.Estim.
4
Binary
3
Time Series
AR processes
2
MLE of AR processes
Forecasting
1
Appendix
Stat. Tables
0
0.0 0.5 1.0 1.5 2.0
b value
130 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I The variance of the simulated b is of 0.040 (that is the “true” one); the
average of the estimated variances based on the standard OLS formula is Tests
of 0.005 (“bad” estimate); the average of the estimated variances based Linear regression
on the White robust covariance matrix is of 0.030 (“better” estimate). Assumptions
OLS
I The standard OLS formula for the variance of b overestimates the Inference
precision of this estimator. Common pitfalls

Large sample
I For almost 50% of the simulations, 1 is not included in the 95% IV
confidence interval of — when the computation of the interval is based GRM
on the standard OLS formula for the variance of b. Panels

I When the White robust covariance matrix is used, 1 is not in the 95% Max.Lik.Estim.
confidence interval of — for less than 10% of the simulations.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
131 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Second case: HAC
CLT
I This includes the cases of Eqs. (30) and (31).
Tests
I Newey and West (1987): If the correlation between terms i and j gets
Linear regression
sufficiently small when |i ≠ j| increases: Assumptions
OLS
A n n
B Inference
1 ÿÿ Common pitfalls
plim ‡i,j xi xjÕ = (40) Large sample
n IV
i=1 j=1
GRM
A n L n
B
1ÿ 1ÿ ÿ Panels
plim et2 xt xtÕ + w¸ et et≠¸ (xt xt≠¸
Õ
+ xt≠¸ xtÕ )
n n Max.Lik.Estim.
t=1 ¸=1 t=¸+1
Binary
where w¸ = 1 ≠ ¸/(L + 1). Time Series

AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
132 / 295
I Let us illustrate the influence of autocorrelation using simulations. Statistics and
I We consider the following model: Econometrics II
J.-P. Renne
yi = xi + Ái , Ái ≥ N (0, xi2 ) (41)
where the xi s and the Ái s are such that: Prelim.
CLT
xi = 0.8xi≠1 + ui and Ái = 0.8Ái≠1 + vi (42)
Tests
where the ui s and the vi s are i.i.d. N (0, 1).
Linear regression
I Here is a simulated sample (n = 200) of this model: Assumptions
OLS
Inference
Figure 25: Simulation of Eqs. (41) and (42) on 200 periods Common pitfalls
Large sample
IV
xt in red, εt in blue GRM
● Panels
6
●● Max.Lik.Estim.
2
● ●
4
● ●
Binary
●● ●
● ●
● ● ●
● ●
● ●●
●
2
●
0
● ●● ● ● ●
Time Series
● ●
●
●● ● ● ●
●●
yt
● ● ●●
● ● ●● ● ● ● ●●
● ●
● ●● ●
●● ● ●●● ●● ●●●
0
●●● ● ● ●
●
● ● ●
● ● ● ●
●●
● ●●
●
●●●
●●● ●
●●
● AR processes
● ●●●●
●
● ●●● ● ● ● ●
●● ●● ●
● ● ●
●● ●
−2
MLE of AR processes
● ● ● ● ●●● ● ●
● ●●
●● ●●●●● ●
−2
● ●
● ● ● ●●● ●●
● ● ●
●●
Forecasting
● ● ● ●
● ● ● ●● ●
● ● ● ●
● ● ● ● ● ●
−4
● ● ●
●
Appendix
●● ●
−4
● ● ●
● ● ● ● ●
● ●
−4 −2 0 2 0 50 100 150 200 Stat. Tables

xt
133 / 295
I We simulate 1000 samples of the same model with n = 200. Statistics and
I For each sample, we compute the OLS estimate of — (=1). Econometrics II
I Using these 1000 estimates of b, we construct an approximated J.-P. Renne
(kernel-based) distribution of this OLS estimator (in red on the figure).
I For each of the 1000 OLS estimations, we employ the standard OLS Prelim.
variance formula (s 2 (XÕ X)≠1 ) to estimate the variance of b. The blue CLT
curve is a normal distribution centred on 1 and whose variance is the
average of the 1000 previous variance estimates. Tests
Linear regression
Assumptions
Figure 26: Auto-correlation and the inappropriateness of the standard OLS OLS
variance formula Inference

Common pitfalls
Large sample
IV
Density function of b
GRM
Panels
6
Max.Lik.Estim.
5
Binary
4
Time Series
3
AR processes
MLE of AR processes
Forecasting
2
Appendix
1
Stat. Tables
0
0.4 0.6 0.8 1.0 1.2 1.4 1.6
b value
134 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I The variance of the simulated b is of 0.020 (that is the “true” one); the
average of the estimated variances based on the standard OLS formula is Tests
of 0.005 (“bad” estimate); the average of the estimated variances based Linear regression
on the White robust covariance matrix is of 0.015 (“better” estimate). Assumptions
OLS
I The standard OLS formula for the variance of b overestimates the Inference
precision of this estimator. Common pitfalls

Large sample
I For about 35% of the simulations, 1 is not included in the 95% IV
confidence interval of — when the computation of the interval is based GRM
on the standard OLS formula for the variance of b. Panels

I When the Newey-West robust covariance matrix is used, 1 is not in the Max.Lik.Estim.
95% confidence interval of — for about 13% of the simulations.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
135 / 295
I For the sake of comparison, let us consider a model with no Statistics and
Econometrics II
auto-correlation (xi ≥ i.i.d.N (0, 2.8) and Ái ≥ i.i.d.N (0, 2.8)).
J.-P. Renne
xt in red, εt in blue
Prelim.
CLT
6
● ● ●
4
● ●
●● ●
●
4
●● ● ● ● ●
●● ● ● ●
● ● ●
Tests
● ● ●
● ● ●● ● ●
● ● ● ●●● ●
●●●● ●
● ●● ●● ●
2
2
● ●
● ●● ●● ● ● ● ●● ● ●● ● ● ●
● ● ●●● ● ●●●
●●●● ●●● ●● ●
● ●● ● ●
● ● ●●●
●●● ● ●● ●
0
●●●
Linear regression
●●● ● ●● ●●●●●●● ●●
●●
●● ● ●● ● ●
●●● ●●
● ●● ●● ● ● ●
0
yt
● ●●
●● ● ●● ●● ● ●
●●● ● ● ● ●●
● ●
● ● ● ●
Assumptions
●
● ●
●● ●
● ● ● ● ● ●
−4
−2
● ●
●
OLS
● ●
Inference
−4
−8
●
Common pitfalls
−4 −2 0 2 4 0 50 100 150 200 Large sample
IV
xt
GRM
Panels
Density function of b Max.Lik.Estim.
Binary
6
5
Time Series
4
AR processes
3
MLE of AR processes
Forecasting
2
Appendix
1
0
Stat. Tables
0.8 0.9 1.0 1.1 1.2 1.3
b value
136 / 295
How to detect autocorrelation in residuals? Statistics and
Econometrics II
I Consider the usual regression (say Eq. 32). J.-P. Renne
I The Durbin-Watson test is a typical autocorrelation test. Its test
Prelim.
statistic is:
qn CLT
i=2
(ei ≠ ei≠1 )2 e12 + en2 Tests
DW = qn = 2(1 ≠ r ) ≠ qn 2 ,
e2
i=1 i
e
˚˙ i˝
¸ i=1 Linear regression
Assumptions
p
æ0 OLS
Inference
where r is the slope in the regression of the ei s on the ei≠1 s, i.e.:

Common pitfalls
Large sample
qn IV
ee
i=2 i i≠1
GRM
r = q n≠1 2
. Panels
i=1
ei
Max.Lik.Estim.
(r is a consistent estimator of Cor (Ái , Ái≠1 ), i.e. fl in Eq. 33.) Binary
I Critical values depend only on T and K: see e.g. tables. Time Series
I The one-sided test for H0 : fl = 0 against H1 : fl > 0 is carried out by AR processes
comparing DW to values dL (T , K ) and dU (T , K ): MLE of AR processes

Forecasting
I Appendix
If DW < dL , the null hypothesis is rejected;
if DW > dU , the hypothesis is not rejected; Stat. Tables
If dL Æ DW Æ dU , no conclusion is drawn.
137 / 295
Statistics and
Econometrics II
J.-P. Renne
Table 13: Properties of the OLS estimator
Prelim.
Normality of disturbances
CLT
Uncorrelated residuals
Condit. mean-zero
Homoskedasticity
Tests
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM
Under Assumptions A.1+ A.2 A.3 A.4 A.5 Panels

b normal in small sample (Slide 85) X X X X
Max.Lik.Estim.
b is BLUE (Thm 4.1) X X X
b unbiased in small sample (Prop. 4.1) X Binary
b consistent (Prop. 4.12)ú X Time Series
b ≥ normal in large sample (Prop. 4.13)ú X AR processes
MLE of AR processes
Forecasting
ú
: see however Prop. 4.12 and Prop. 4.13 for additional hypotheses. Specifically XÕ X/n and XÕ X/n
Appendix
must converge in proba. to finite positive definite matrices ( is defined in Eq. 29).
Stat. Tables
138 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Introduction to Panel Regressions Inference

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
139 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Context and Objective CLT
I Context: Tests
Lot of entities (n) but small number of periods (T ): Linear regression

“Longitudinal datasets”. Assumptions
OLS
Inference
Regression of the form: Common pitfalls
Large sample
yi,t = xi,t
Õ
— + zÕi – +Ái,t IV
¸˚˙˝ ¸˚˙˝ GRM
K ◊1 Individual effects Panels
Max.Lik.Estim.
I Objective:
Binary
Estimate the previous equation.
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
140 / 295
Statistics and
Econometrics II
Figure 27: Simulated Panel Data J.-P. Renne
Prelim.
CLT
● ●
Tests
Linear regression
6
●
Assumptions
●
OLS
4
● Inference
●
Common pitfalls
2
● ●
● Large sample
y
● IV
0
● GRM
● ●
Panels
−2
● ● ●
Max.Lik.Estim.
● ●
●
● ●
−4
● Binary
Time Series
−6 −4 −2 0 2
AR processes
MLE of AR processes
x
Forecasting
The model is yi = –i + —xi,t + Ái,t , t œ {1, 2}. Appendix

— = 1. Blue dots are for t = 1, red dots are for t = 2.
Stat. Tables
The lines relate the dots associated to the same entity i
141 / 295
Statistics and
Econometrics II
Figure 28: Simulated Panel Data J.-P. Renne
Prelim.
CLT
● ●
Tests
Linear regression
6
●
Assumptions
●
OLS
4
● Inference
●
Common pitfalls
2
● ●
● Large sample
y
● IV
0
● GRM
● ●
Panels
−2
● ● ●
Max.Lik.Estim.
● ●
●
● ●
−4
● Binary
Time Series
−6 −4 −2 0 2
AR processes
MLE of AR processes
x
Forecasting
The model is yi = –i + —xi,t + Ái,t , t œ {1, 2}. Appendix

— = 30. blue dots are for t = 1, red dots are for t = 2.
Stat. Tables
The lines relate the dots associated to the same entity i.
142 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 29: Consumption of cigarettes versus price
Prelim.
● CLT
●
●
Tests
● ●
5.0
●
●● ● Linear regression
● ●● ●
● ● ● ●●● ● ● ●● Assumptions
● ●●
●● ● ●●● ●
OLS
log(Nb. of packs)
● ● ● ●●
● ●● ●
● ● ● ● ● ● ●
● ● ● ●● ● ● ●● Inference
● ● ●
● ● ●
● ● ● ● Common pitfalls
●● ● ●
4.5
● ●
● ● ● ● Large sample
● ● ● ●
● ● ● IV
● ●
● ●
● ● GRM
●
●
Panels
●
●
4.0
Max.Lik.Estim.
●
Binary
Time Series
AR processes
4.4 4.6 4.8 5.0 5.2
MLE of AR processes
Forecasting
log(price)
Appendix
Cigarettes data from Stock and Watson (2003). Data for U.S. states, 2 years: 1985 and 1995.
Stat. Tables
143 / 295
Statistics and
Econometrics II
Figure 30: Consumption of cigarettes versus price J.-P. Renne
Prelim.
●
●
CLT
●
Tests
● ●
5.0
●●
●
●
●● ●
●
Linear regression
● ● ● ●●● ● ● ●●
Assumptions
● ●●
●● ● ●●● ●
log(Nb. of packs)
● ● ● ●●●● ●
● ● ● ● ● ● ● ● OLS
● ● ● ●● ● ● ●●
●
● ●
● ● ● Inference
● ● ● ●
●● ● ● Common pitfalls
4.5
● ●
● ● ● ●
● ● ● Large sample
● ● ● ●
● ● IV
● ●
● ● GRM
●
● ●
Panels
●
4.0
●
Max.Lik.Estim.
Binary
Time Series
4.4 4.6 4.8 5.0 5.2 AR processes
MLE of AR processes
log(price) Forecasting
Appendix
Cigarettes data from Stock and Watson (2003). Data for U.S. states, 2 years: 1985 and 1995.
Each colour corresponds to a given State. Stat. Tables
144 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 31: Airlines: cost versus fuel price
Prelim.
CLT
Tests
17
Linear regression
Assumptions
16
OLS
●●
●●● ●
●● Inference
15
● ● ●
● ● ●
●
● ● ●
● ●
log(cost)
●●
Large sample
14
●
●●● ● ●● ● ●
●● ● ●
● ●
●
●
●
●
● IV
● ● ● ●
●● ●
● ● ●●
●
13
● ● ●● GRM
●● ● ● ●
●●● ● ●
Panels
●
● ●
●
12
● ● ●
● ● ●●
● ●
●●●
●●
●
Max.Lik.Estim.
11
Binary
10
Time Series
11.0 11.5 12.0 12.5 13.0 13.5 14.0 AR processes
MLE of AR processes
log(fuel price) Forecasting
Appendix
Airlines data from Greene (2003). Each colour corresponds to a given airline.
Stat. Tables
145 / 295
Statistics and
Notations Econometrics II
J.-P. Renne
S T S T S xÕ T S T
yi,1 Ái,1 i,1 x1 Prelim.
yi = U
.. V, Ái = U .. V, xi = U .. V, X = U .. V. CLT
. . . .
yi,T Ái,T xi,T
Õ xn Tests
¸ ˚˙ ˝ ¸ ˚˙ ˝ ¸ ˚˙ ˝ ¸ ˚˙ ˝ Linear regression
T ◊1 T ◊1 T ◊K (nT )◊K Assumptions
OLS
S T S T
yi,1 ≠ ȳi
Inference
Ái,1 ≠ Á̄i Common pitfalls
.. ..
ỹi = U .
V , Á̃i = U
.
V, Large sample
IV
yi,T ≠ ȳi Ái,T ≠ Á̄i GRM
S xÕ ≠ x̄Õ T S T S T Panels
i,1 i x̃1 ỹ1
Max.Lik.Estim.
. .. .
x̃i = U .. V , X̃ = U V , Ỹ = U .. V,
. Binary
xi,T
Õ ≠ x̄iÕ x̃n ỹn Time Series
AR processes
where MLE of AR processes
Forecasting
T T T
1 ÿ 1 ÿ 1 ÿ Appendix
ȳi = yi,t , Á̄i = Ái,t and x̄i = xi,t .
T T T Stat. Tables
i=1 i=1 i=1
146 / 295
Three standard cases Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I Pooled regression: zi © 1. Tests
I Fixed Effects: zi is unobserved, but correlates with xi ∆ b is biased and Linear regression
Assumptions
inconsistent in the OLS regression of y on X (omitted variable, see OLS
Remark 4.10). Inference
Common pitfalls
I Random Effects: zi is unobserved, but uncorrelated with xi . The model Large sample
writes: IV
yi,t = xi,t
Õ
—+–+ ui + Ái,t , GRM
¸ ˚˙ ˝ Panels
compound disturbance
Max.Lik.Estim.
where – = E(zÕi –) and ui = zÕi – ≠ E(zÕi –) ‹ xi . Binary
Least squares are consistent (but inefficient, see GLS Box 4.7).
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
147 / 295
Estimation of Fixed Effects Models Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Assumption A.6 (Fixed Effects regression assumptions) Tests
Linear regression
(i) There is no perfect multicollinearity among the regressors. Assumptions
OLS
(ii) E(Ái,t |X) = 0, for all i, t. Inference
Ó
‡2 if i = j and s = t,
Common pitfalls
(iii) E(Ái,t Áj,s |X) = Large sample

0 otherwise. IV
GRM
Panels
Remark 5.1
Max.Lik.Estim.
These assumptions are the same as those introduced in the standard linear
Binary
regression:
(i) ¡ A.1, (ii) ¡ A.2, (iii) ¡ A.3 + A.4. Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
148 / 295
Statistics and
Econometrics II
I In matrix form, for a given i, the model writes:
J.-P. Renne
yi = Xi — + 1–i + Ái , Prelim.
CLT
where 1 is a T -dimensional vector of ones.
Tests
I This is the Least Square Dummy Variable (LSDV) model:
Linear regression
Assumptions
Ë È OLS
—
y = [X D] + Á, (43) Inference
– Common pitfalls
Large sample
IV
with: S 1 T GRM
0 ... 0
Panels
W 0
D=U
1 ... 0 X.
.. V Max.Lik.Estim.
. Binary
0 0 ... 1
(nT ◊n) AR processes
MLE of AR processes
I The linear regression (43) –with the dummy variables– satisfies the Forecasting
Gauss-Markov conditions (Theorem 4.1). Hence, in this context, the Appendix

OLS estimator is the best linear unbiased estimator. Stat. Tables
149 / 295
Statistics and
I Denoting by Z the matrix [X D], and by b and a the respective OLS Econometrics II
estimates of — and of –, we have: J.-P. Renne
Ë È Prelim.
b
= [ZÕ Z]≠1 ZÕ y. (44) CLT
a
Tests
I The asymptotical distribution of [bÕ , aÕ ]Õ derives from the standard OLS Linear regression
context: Prop. 4.9 can be used after having replaced X by Z = [X D].

Assumptions
OLS
I We have: Inference
Ë È 3Ë È 4 Common pitfalls
b d — Q ≠1 Large sample
æN , ‡2 (45)
a – nT
IV
GRM
Panels
where
1 Õ Max.Lik.Estim.
Q = plimnT æŒ Z Z.
nT Binary
I In practice, an estimator of the covariance matrix of [bÕ , aÕ ]Õ is: Time Series
AR processes
2
! "≠1 2 eÕ e MLE of AR processes
s ZZ
Õ
with s = , Forecasting
nT ≠ K ≠ n Appendix
where e is the (nT ) ◊ 1 vector of OLS residuals. Stat. Tables
150 / 295
I There is an alternative way of expressing the LSDV estimators. Statistics and
Econometrics II
I It is based on matrix MD = I ≠ D(DÕ D)≠1 DÕ , which acts as an operator
J.-P. Renne
that removes entity-specific means, i.e. (with the notation of Slide 146):
Prelim.
Ỹ = MD Y, X̃ = MD X and Á̃ = MD Á.
CLT
I With these notations, using Theorem 4.2, we get: Tests
Linear regression
Assumptions
b = [XÕ MD X]≠1 XÕ MD y. (46) OLS
Inference
Common pitfalls
I This amounts to regressing the ỹi,t s (= yi,t ≠ ȳi ) on the x̃i,t s Large sample
(= xi,t ≠ x̄i ). IV
GRM
I The estimates of – are given by:
Panels
Max.Lik.Estim.
a = (DÕ D)≠1 DÕ (y ≠ Xb), (47)
Binary
Time Series
which is obtained by developing the second row of
AR processes
Ë ÈË È Ë È MLE of AR processes
XÕ X XÕ D b XÕ Y Forecasting
= ,
DÕ X DÕ D a DÕ Y Appendix
Stat. Tables
which are the first-order conditions resulting from the least squares
problem (see Eq. 5).
151 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
(Intercept) 9.517 0.229 41.514 0.000 Tests
log(airlines$output) 0.883 0.013 66.599 0.000 Linear regression

log(airlines$pf) 0.454 0.020 22.359 0.000 Assumptions
airlines$lf -1.628 0.345 -4.713 0.000 OLS

Inference
Common pitfalls
Table 14: Dep. variable: log(cost). OLS regression. Large sample
IV
GRM
Panels
Regression of the log of airline costs on log(output), log(fuel price) and capacity utilization of the fleet Max.Lik.Estim.
(Data from Greene (2003), see Fig. 31)
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
152 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
log(airlines$output) 0.919 0.030 30.756 0.000 Tests
log(airlines$pf) 0.417 0.015 27.468 0.000 Linear regression
airlines$lf -1.070 0.202 -5.307 0.000 Assumptions
G1 9.706 0.193 50.258 0.000 OLS
G2 9.665 0.199 48.571 0.000 Inference

Common pitfalls
G3 9.497 0.225 42.217 0.000 Large sample
G4 9.890 0.242 40.910 0.000 IV
G5 9.730 0.261 37.288 0.000 GRM
G6 9.793 0.264 37.142 0.000 Panels
Max.Lik.Estim.
Table 15: Dep. variable: log(cost). LSDV regression.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
153 / 295
Statistics and
Econometrics II
J.-P. Renne
(Intercept) 10.067 0.516 19.525 0.000 Prelim.
log(CigarettesSW$rincome) 0.318 0.136 2.337 0.022
CLT
log(CigarettesSW$rprice) -1.334 0.135 -9.856 0.000
Tests
Table 16: Dep. variable: Nb of packs. OLS regression. Linear regression
Assumptions
OLS
Inference
Common pitfalls
Estimate Std. Error t value Pr(>|t|) Large sample
(Intercept) 10.067 0.516 19.525 0.000 IV
log(CigarettesSW$rincome) 0.121 0.213 0.567 0.571

GRM
log(CigarettesSW$rprice) -1.210 0.140 -8.669 0.000 Panels
Max.Lik.Estim.
Table 17: Dep. variable: Nb of packs. LSDV HAC fixed-effect regression.
Binary
Time Series
AR processes
MLE of AR processes
Regression of the number of cigarette packs (log) on real income (log) and real price of cigarettes (log)
Forecasting
(Data from Stock and Watson (2003), see Fig. 29.)
Appendix
Stat. Tables
154 / 295
Statistics and
Econometrics II
J.-P. Renne
Extension: Fixed time and group effects
I Time effects are introduced: Prelim.
CLT
yi,t = xiÕ — + –i + “t + Ái,t . Tests
Linear regression
I The LSDV (Eq. 43) can be extended:
Assumptions
OLS
C D Inference
—
Common pitfalls
y = [X D C] – + Á, (48) Large sample
“ IV
GRM
with: S T Panels
”1 ”2 ... ” T ≠1 Max.Lik.Estim.
.. .. ..
C=U . . .
V, Binary
”1 ”2 ... ” T ≠1 Time Series
AR processes
where the T -dimensional vector ” t is [0, . . . , 0, 1 , 0, . . . , 0]Õ . MLE of AR processes
¸˚˙˝ Forecasting
tth entry Appendix
Stat. Tables
155 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Example 5.1 (Geo-located data) CLT
I Geo-located data from Airbnb, Zürich Tests
(Airbnb, date 22 June 2017). Linear regression
Assumptions
OLS
Estimate Std. Error t value Pr(>|t|) Inference

Common pitfalls
(Intercept) 76.807 3.370 22.793 0.000 Large sample
accommodates 15.765 1.021 15.447 0.000 IV
studioTRUE -5.540 3.303 -1.677 0.094 GRM
Panels
Table 18: Regression of prices on the number of people the appartment can
Max.Lik.Estim.
accommodate and a dummy indicating if the appartment is a studio.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
156 / 295
Statistics and
Econometrics II
Example 5.1 (Geo-located data) J.-P. Renne

Prelim.
(Intercept) 102.810 5.273 19.496 0.000
accommodates 15.450 0.989 15.626 0.000 CLT
studioTRUE -7.880 3.217 -2.450 0.014 Tests

neighborhoodKreis 10 -43.499 6.081 -7.154 0.000 Linear regression
neighborhoodKreis 11 -33.803 5.832 -5.796 0.000 Assumptions
neighborhoodKreis 12 -42.158 9.270 -4.548 0.000 OLS
neighborhoodKreis 2 -26.317 6.199 -4.245 0.000 Inference
neighborhoodKreis 3 -30.135 5.069 -5.945 0.000

Common pitfalls
Large sample
neighborhoodKreis 4 -18.820 5.180 -3.633 0.000 IV
neighborhoodKreis 5 -15.078 5.883 -2.563 0.010 GRM
neighborhoodKreis 6 -27.448 5.801 -4.731 0.000 Panels

Max.Lik.Estim.
neighborhoodKreis 9 -46.737 6.285 -7.436 0.000 Binary
Time Series
Table 19: Regression of prices on the number of people the appartment can AR processes
accommodate, a dummy indicating if the appartment is a studio and dummies MLE of AR processes
for 12 city’s areas. Forecasting
Appendix
Stat. Tables
157 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Source: Airbnb, date 22 June 2017. Regression of price for Entire home/apt on number of bedrooms, Appendix
number of people that can be accommodated. Stat. Tables
158 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Source: Airbnb, date 22 June 2017. Regression of price for Entire home/apt on number of bedrooms, Appendix
number of people that can be accommodated and dummies for the 12 city areas. Stat. Tables
159 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Source: Airbnb, date 22 June 2017. City’s districts.
Stat. Tables
160 / 295
Estimation of random effects models Statistics and
Econometrics II
J.-P. Renne
Prelim.
I Here, the individual effects are assumed not to be correlated to other
variables (the xi s). CLT
I Random-effect models write: Tests
Linear regression
yi,t = xitÕ — + (– + u
i ) + Ái,t Assumptions
¸˚˙˝ OLS
Random Inference
heterogeneity Common pitfalls
Large sample
with
IV
GRM
Panels
E(Ái,t |X) = E(ui |X) = 0,
Ó Max.Lik.Estim.
‡Á2 if i = j and s = t,
E(Ái,t Áj,s |X) = Binary
0 otherwise.
Ó Time Series
‡u2 if i = j,
E(ui uj |X) = AR processes
0 otherwise. MLE of AR processes
E(Ái,t uj |X) = 0 for all i, j and t. Forecasting
Appendix
Stat. Tables
161 / 295
Statistics and
Econometrics II
J.-P. Renne
I Notation: ÷i,t = ui + Ái,t and ÷ i = [÷i,1 , ÷i,2 , . . . , ÷i,T ]Õ .
I We have E(÷ i |X) = 0 and Var (÷ i |X) = where Prelim.
CLT
S ‡2 + ‡2 ‡u2 ‡u2 ... ‡u2
T
Á u Tests
W ‡u2 ‡Á2 + ‡u2 ‡u2 ... ‡u2 X = ‡2 I + ‡2 11Õ . Linear regression
=U .. .. .. V Á u
. . .
Assumptions
OLS
‡u2 ‡u2 ‡u2 ... ‡Á + ‡u2
2
Inference
Common pitfalls
Large sample
I Denoting by the covariance matrix of ÷ = [÷ Õ1 , . . . , ÷ Õn ]Õ , we have: IV
GRM
=I¢ . Panels
Max.Lik.Estim.
I If we knew , we would apply (feasible) GLS (Eq. (35)):
Binary
— = (X Õ ≠1
X) ≠1
X Õ ≠1
y Time Series
AR processes
MLE of AR processes
≠1/2 Õ ≠1/2 Õ
(recall that this amounts to regressing y on X). Forecasting
Appendix
Stat. Tables
162 / 295
Statistics and
Econometrics II
J.-P. Renne
I It can be checked that ≠1/2

=I¢( ≠1/2
) where Prelim.
CLT
1 2
1 ◊
≠1/2
= I ≠ 11Õ Tests
‡Á T
Linear regression
Assumptions
with ‡Á OLS
◊ =1≠  . Inference
‡Á2 + T ‡u2 Common pitfalls

Large sample
I Hence, if we knew , we would transform the data as follows: IV

GRM
S y ≠ ◊ȳ T Panels
i,1 i
1 W yi,2 ≠ ◊ȳi X Max.Lik.Estim.
≠1/2
yi = U .. V. Binary
‡ Á .
yi,T ≠ ◊ȳi Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
163 / 295
I What about when is unknown? Statistics and
Econometrics II
I Idea: taking deviations from group means removes heterogeneity:
J.-P. Renne
yi,t ≠ ȳi = [xi,t ≠ x̄i ]Õ — + (Ái,t ≠ Á̄i ). Prelim.
CLT
I The previous equation can be consistently estimated by OLS
(the residuals are correlated across t within an entity but the OLS Tests
remain consistent though, see Prop. 4.12). Linear regression
Ëq È Assumptions
T
I We have E (Á
i=1 i,t
≠ Á̄i )2 = (T ≠ 1)‡Á2 . OLS
Inference
I The Ái,t s are not observed but b is a consistent estimator of —; Common pitfalls
adjustment for the degrees of freedom: Large sample

IV
GRM
n T
1 ÿÿ Panels
ê2 =
‡ (ei,t ≠ ēi )2 .
nT ≠ n ≠ K Max.Lik.Estim.
i=1 t=1
Binary
I And for ‡u2 ? We can exploit the fact that OLS are consistent in the Time Series
pooled regression: AR processes
MLE of AR processes
Forecasting
2 eÕ e
plim spooled = plim = ‡u2 + ‡Á2 , Appendix
nT ≠ K ≠ 1
Stat. Tables
2
and therefore use spooled ê2 as an approximation to ‡u2 .
≠‡
164 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Maximum Likelihood Estimation Inference

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
165 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Context and Objective
Tests
I Context:
Linear regression
I You observe a sample y = {y1 , . . . , yn }.
Assumptions
I You know that these data have been generated by a model OLS
parameterized by ◊0 œ RK . Inference
Common pitfalls
I Objective: Large sample
You want to estimate ◊0 . IV

GRM
I Intuition behind the Maximum Likelihood Estimation: Panels
I Estimator = the value of ◊ that is such that the probability of Max.Lik.Estim.
having observed y is the highest possible. Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
166 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I Assume that the time periods between the arrivals of two customers in a
shop, denoted by yi , are i.i.d. and follow an exponential distribution, i.e. Tests
yi ≥ E(⁄). Linear regression
I You have observed these arrivals for some time, thereby constituting a Assumptions
OLS
sample {y1 , . . . , yn }. You want to estimate ⁄. Inference
(i.e. in that case, the vector of parameters is simply ◊ = ⁄). Common pitfalls
1
Large sample
I The density of Y is f (y ; ⁄) = exp(≠y /⁄). Fig. 39 represents that IV
⁄ GRM
density functions for different values of ⁄.
Panels
I Your 200 observations are reported at the bottom of Fig. 33 (red).
Max.Lik.Estim.
I You build the histogram and report it on the same chart (Fig. 34).
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
167 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 32: Densities of exponential distribution for different ⁄
Prelim.
CLT
Tests
1.0
lambda=1 Linear regression

lambda=3 Assumptions
lambda=5
0.8
OLS
lambda=7 Inference
Common pitfalls
0.6
Large sample
density
IV
GRM
0.4
Panels
Max.Lik.Estim.
0.2
Binary
Time Series
0.0
AR processes
MLE of AR processes
0 2 4 6 8 10 12 Forecasting
time period between two arrivals Appendix
Stat. Tables
168 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
1.0

lambda=5
0.8
OLS
lambda=7 Inference
Common pitfalls
0.6
Large sample
density
IV
GRM
0.4
Panels
Max.Lik.Estim.
0.2
Binary
Time Series
0.0
AR processes
MLE of AR processes
Stat. Tables
169 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
1.0

lambda=5
0.8
OLS
lambda=7 Inference
Common pitfalls
0.6
Large sample
density
IV
GRM
0.4
Panels
Max.Lik.Estim.
0.2
Binary
Time Series
0.0
AR processes
MLE of AR processes
Stat. Tables
170 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I What is your estimate of ⁄?
Tests
I Now, assume that you have only four observations: y1 = 1.1, y2 = 2.2,
Linear regression
y3 = 0.7 and y4 = 5.0.
Assumptions
I What was the probability of observing, for a small Á, OLS
1.1 ≠ Á Æ Y1 < 1.1 + Á, Inference

Common pitfalls
2.2 ≠ Á Æ Y2 < 2.2 + Á, Large sample
0.7 ≠ Á Æ Y3 < 0.7 + Á and IV
5.0 ≠ Á Æ Y4 < 5.0 + Á? GRM
r4 Panels
I Because the yi s are i.i.d., this probability is (2Áf (yi , ⁄)).
i=1
Max.Lik.Estim.
I The next plot shows the probability (divided by 16Á4 ) as a function of ⁄.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
171 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 35: Proba. that yi ≠ Á Æ Yi < yi + Á, i œ {1, 2, 3, 4}
Prelim.
CLT
Tests
Linear regression
6e−04
Assumptions
OLS
Inference
Common pitfalls
4e−04
Large sample
Probability
IV
GRM
Panels
2e−04
Max.Lik.Estim.
Binary
0e+00
Time Series
AR processes
MLE of AR processes
0 2 4 6 8 10 Forecasting
lambda Appendix
Stat. Tables
172 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 36: log of Proba. that yi ≠ Á Æ Yi < yi + Á, i œ {1, 2, 3, 4}
Prelim.
CLT
Tests
−5
Linear regression
Assumptions
OLS
Inference
−10
Common pitfalls
log(Probability)
Large sample
IV
GRM
Panels
−15
Max.Lik.Estim.
Binary
Time Series
−20
AR processes
MLE of AR processes
lambda Appendix
Stat. Tables
173 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 37: Back to the example with 200 observations
Prelim.
CLT
Tests
−400
Linear regression
Assumptions
−420
OLS
Inference
Common pitfalls
log(Probability)
−440
Large sample
IV
GRM
−460
Panels
Max.Lik.Estim.
−480
Binary
Time Series
−500
AR processes
MLE of AR processes
lambda Appendix
Stat. Tables
174 / 295
Statistics and
Notations Econometrics II
I f (y ; ◊) denotes the probability density function (p.d.f.) of a random J.-P. Renne
variable Y which depends on a set of parameters ◊.
Prelim.
I Density of n independent and identically distributed (i.i.d.) observations
of Y : CLT
n
Ÿ Tests
f (y1 , . . . , yn ; ◊) = f (yi ; ◊).
Linear regression
i=1
Assumptions
I y denotes the vector of observations; y = {y1 , . . . , yn }. OLS

Inference
Common pitfalls
Large sample
Definition 6.1 (Likelihood function) IV
GRM
L : ◊ æ L(◊; y) = f (y1 , . . . , yn ; ◊) is the likelihood function.
Panels
Max.Lik.Estim.
I We often work with log L, the log-likelihood function.
Binary
Time Series
Example 6.1 AR processes
If yi ≥ N (µ, ‡ 2 ), then
MLE of AR processes
Forecasting
Appendix
1 ÿ3
n
(yi ≠ µ)2
4
log L(◊; y) = ≠ log ‡ 2 + log 2fi + . Stat. Tables
2 ‡2
i=1
175 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Definition 6.2 (Score)
Tests
ˆ log f (y ;◊)
The score S(y ; ◊) is given by ˆ◊
. Linear regression
Assumptions
OLS
Example 6.2 Inference

Common pitfalls
If yi ≥ N (µ, ‡ 2 ), then Large sample

IV
S T S T GRM
ˆ log f (y ; ◊) y ≠µ
Panels
ˆ log f (y ; ◊) 1 ‡2 2 V.
=U ˆµ V=U
ˆ◊ ˆ log f (y ; ◊) 1 (y ≠µ)2
≠1
Max.Lik.Estim.
2‡ 2 ‡2
ˆ‡ 2 Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
176 / 295
Statistics and
Econometrics II
Proposition 6.1 (Score expectation) J.-P. Renne
The expectation of the score is zero. Prelim.
CLT
Proof We have: Tests
1 2 ⁄ Linear regression
ˆ log f (Y ; ◊) ˆ log f (y ; ◊) Assumptions
E = f (y ; ◊)dy
ˆ◊ ˆ◊ OLS
⁄ ⁄ Inference
Common pitfalls
ˆf (y ; ◊)/ˆ◊ ˆ
= f (y ; ◊)dy = f (y ; ◊)dy = ˆ1/ˆ◊ = 0.⌅ Large sample
f (y ; ◊) ˆ◊ IV
GRM
Panels
Definition 6.3 (Information matrix) Max.Lik.Estim.
Binary
The information matrix is (minus) the the expectation of the second
derivatives of the log-likelihood function: Time Series
AR processes
3 4 MLE of AR processes
ˆ2 log f (Y ; ◊) Forecasting
IY (◊) = ≠E .
ˆ◊ˆ◊ Õ Appendix
Stat. Tables
177 / 295
Statistics and
Econometrics II
Proposition 6.2 (Information matrix) J.-P. Renne
Ë! " ! ˆ log f (Y ;◊) "Õ È
ˆ log f (Y ;◊) Prelim.
We have IY (◊) = E ˆ◊ ˆ◊
= Var [S(Y ; ◊)].
CLT
Tests
ˆ 2 log f (Y ;◊) ˆ 2 f (Y ;◊) 1 ˆ log f (Y ;◊) ˆ log f (Y ;◊)
Proof = ≠ . The expectation of the first
ˆ◊ˆ◊ Õ ˆ◊ˆ◊ Õ f (Y ;◊) ˆ◊ ˆ◊ Õ Linear regression
2
right-hand side term is ˆ 1/(ˆ◊ˆ◊ ) = 0. Õ
⌅ Assumptions
OLS
Inference
Common pitfalls
Example Large sample
IV
If yi ≥ N (µ, ‡ 2 ), let ◊ = [µ, ‡ 2 ]Õ then GRM
Panels
5 3 46Õ
ˆ log f (y ; ◊) y ≠µ 1 (y ≠ µ)2 Max.Lik.Estim.
= ≠1
ˆ◊ ‡2 2‡ 2 ‡2 Binary
Time Series
and AR processes
MLE of AR processes
3 5 64 Ë È
1 ‡2 y ≠µ 1/‡ 2 0
Forecasting
IY (◊) = E (y ≠µ)2 = . Appendix

‡4 y ≠µ ≠ 1 0 1/(2‡ 4 )
‡2 2
Stat. Tables
178 / 295
Statistics and
Econometrics II
Proposition 6.3 (Additive property of the Info. mat.) J.-P. Renne
The information matrix resulting from two independent experiments is the Prelim.
sum of the information matrices:
CLT
IX ,Y (◊) = IX (◊) + IY (◊). Tests
Linear regression
Assumptions
Theorem 6.1 (Fréchet-Darmois-Cramér-Rao bound) OLS
Inference
ˆ ).
Consider an unbiased estimator of ◊ denoted by ◊(Y Common pitfalls
ˆ (which is a linear combination of the

The variance of the random variable Ê Õ ◊ Large sample
IV
ˆ is larger than:
components of ◊) GRM
Panels
(Ê Õ Ê)2 /(Ê Õ IY (◊)Ê).
Max.Lik.Estim.
Binary
Proof The Cauchy-Schwarz inequality (Theorem 9.7) implies that
 Time Series
ˆ ))Var (Ê Õ S(Y ; ◊)) Ø |Ê Õ Cov [◊(Y
Var (Ê Õ ◊(Y ˆ ), S(Y ; ◊)]Ê|. Now,
s s AR processes
ˆ ), S(Y ; ◊)] =
Cov [◊(Y ˆ ) ˆ log f (y ;◊) f (y ; ◊)dy
◊(y = ˆ◊
ˆ ˆ )f (y ; ◊)dy = I because ◊
◊(y ˆ is MLE of AR processes
y ˆ◊ y Forecasting
unbiased.
ˆ )) Ø Var (Ê Õ S(Y ; ◊))≠1 (Ê Õ Ê)2 . Prop. 6.2 leads to the result.
Therefore Var (Ê Õ ◊(Y ⌅ Appendix
Stat. Tables
179 / 295
Statistics and
Econometrics II
Definition 6.4 (Identification) J.-P. Renne
Prelim.
The vector of parameters ◊ is identifiable if, for any other vector ◊ ú :
CLT
◊ ú ”= ◊ ∆ L(◊ ú ; y) ”= L(◊; y). Tests
Linear regression
Definition 6.5 (Maximum Likelihood Estimator (MLE)) Assumptions
OLS
Inference
The maximum likelihood estimator (MLE) is the vector ◊ that maximizes the Common pitfalls
likelihood function. Formally: Large sample

IV
GRM
◊ MLE = arg max L(◊; y) = arg max log L(◊; y). (49)
◊ ◊ Panels
Max.Lik.Estim.
Definition 6.6 (Likelihood equation) Binary
Time Series
Necessary condition for maximizing the likelihood function: AR processes
MLE of AR processes
ˆ log L(◊; y) Forecasting
= 0. (50) Appendix
ˆ◊
Stat. Tables
180 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Assumption A.7 (Regularity Assumptions) CLT
(i) ◊ œ where is compact. Tests
(ii) ◊ 0 is identified. Linear regression

Assumptions
(iii) The log-likelihood function is continuous in ◊. OLS
Inference
(iv) E◊0 (log f (Y ; ◊)) exists. Common pitfalls
(v) The log-likelihood function is such that (1/n) log L(◊; y) converges Large sample
IV
almost surely to E◊0 (log f (Y ; ◊)), uniformly in ◊ œ . GRM
(vi) The log-likelihood function is twice continuously differentiable in an Panels

open neighborood of ◊ 0 .
1 2 Max.Lik.Estim.
ˆ 2 log L(◊;y)
(vii) The matrix I(◊ 0 ) = ≠E0 ˆ◊ˆ◊ Õ
(Fisher Information matrix) Binary
exists and is nonsingular. Time Series

AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
181 / 295
Statistics and
Econometrics II
Proposition 6.4 (Properties of MLE) J.-P. Renne
Prelim.
Under regularity conditions (Assumptions A.7), the MLE is:
(a) Consistent: plim ◊ MLE = ◊0 (◊0 is the true vector of parameters). CLT
(b) Asymptotically normal: ◊ MLE ≥ N (◊ 0 , I(◊ 0 )≠1 ), where Tests
Linear regression
3 4
ˆ 2 log L(◊; y)
Assumptions
I(◊ 0 ) = ≠E0 = nIY (◊ 0 ). (Fisher Info. matrix) OLS
ˆ◊ˆ◊ Õ Inference
Common pitfalls
Large sample
(c) Asymptotically efficient: ◊ MLE is asymptotically efficient and achieves IV
the Fréchet-Darmois-Cramér-Rao lower bound for consistent estimators. GRM
Panels
(d) Invariant: The MLE of g(◊ 0 ) is g(◊ MLE ) if g is a continuous and
continuously differentiable function. Max.Lik.Estim.
Binary
Proof: See Online additional material.
Time Series
MLE of AR processes
(b) also writes: Forecasting

Ô d
n(◊ MLE ≠ ◊ 0 ) æ N (0, IY (◊ 0 )≠1 ). (51) Appendix
Stat. Tables
182 / 295
Statistics and
Econometrics II
J.-P. Renne
I The asymptotic covariance matrix of the MLE is: Prelim.
CLT
5 3 46≠1
ˆ 2 log L(◊; y) Tests
[I(◊ 0 )]≠1 = ≠E0 .
ˆ◊ˆ◊ Õ Linear regression
Assumptions
OLS
I A direct (analytical) evaluation of this expectation is often out of reach. Inference
I It can however be estimated by, either: Common pitfalls

Large sample
IV
3 4≠1
ˆ 2 log L(◊ MLE ; y)
GRM
Î≠1
1 = ≠ , (52) Panels
ˆ◊ˆ◊ Õ
A n B≠1 Max.Lik.Estim.
ÿ ˆ log L(◊MLE ; yi ) ˆ log L(◊MLE ; yi ) Binary
Î≠1
2 = . (53)
ˆ◊ ˆ◊ Õ Time Series
i=1
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
183 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
To sum up – MLE in practice
CLT
I A parametric model (depending on the vector of parameters ◊ whose
Tests
"true" value is ◊ 0 ) is specified.
Linear regression
I i.i.d. sources of randomness are identified. Assumptions
I The density associated to one observation yi is computed analytically OLS

Inference
(as a function of ◊): f (y ; ◊). Common pitfalls
q
I The log-likelihood is log L(◊; y) = log f (yi ; ◊). Large sample
i IV
I The MLE estimator results from the optimization problem (this is GRM
Eq. 49): Panels

◊ MLE = arg max log L(◊; y). (54) Max.Lik.Estim.
◊
Binary
I We have: ◊ MLE ≥ N (◊0 , I(◊ 0)≠1 ),where I(◊ 0)≠1 is estimated by means
of Eq. (52) or Eq. (53). Most of the time, this computation is numerical. Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
184 / 295
Statistics and
Econometrics II
Example 6.3 (MLE estimation of a mixture of Gaussian distribution) J.-P. Renne
I Consider the returns of the SMI index used in Fig. 38. Prelim.
I Let’s assume that these returns are independently drawn from a mixture CLT
of Gaussian distributions. The p.d.f. f (x ; ◊), with
Tests
◊ = [µ1 , ‡1 , µ2 , ‡2 , p]Õ , is given by:
Linear regression
3 4 3 4 Assumptions
1 (x ≠ µ1 )2 1 (x ≠ µ2 )2 OLS
p exp ≠ + (1 ≠ p)  exp ≠ .
2fi‡12 2‡12 2fi‡22 2‡22 Inference
Common pitfalls
Large sample
IV
p.d.f. of mixtures of Gaussian dist. GRM
I The maximum likelihood estimate is Panels

◊ MLE = [0.30, 1.40, ≠1.45, 3.61, 0.87]Õ . Max.Lik.Estim.
I The first two entries of the diagonal of Î≠1 are 0.00528 and 0.00526. Binary
1
They are the estimates of Var (µ1,MLE ) and of Var (‡1,MLE ), respectively.
Time Series
95% confidence intervals for µ1 and ‡1 are, respectively: AR processes
MLE of AR processes
Ô Ô
0.30 ± 1.96
¸ 0.00528 and 1.40 ± 1.96
¸ 0.00526.
Forecasting
˚˙ ˝ ˚˙ ˝
Appendix
=0.0726 =0.0725
Stat. Tables
185 / 295
Statistics and
Example 6.3 (MLE estim. of a mixture of Gaussian distri. (cont’d)) Econometrics II
J.-P. Renne
Figure 38: Density of 5-day returns on SMI index

Prelim.
CLT
5−day return on SMI index Return's density (solid line)
Tests
0.25
Linear regression
5
Assumptions
0.20
OLS
Inference
Common pitfalls
0.15
0
in percent
Large sample
IV
GRM
0.10
−5
Panels
0.05
Max.Lik.Estim.
−10
Binary
0.00
Time Series
2004 2006 2008 2010 2012 2014 2016 −10 −5 0 5 10
AR processes
return, in percent MLE of AR processes
Forecasting
Gaussian kernel, h = 0.5 (in percent).
The data spans the period from 2 June 2006 to 23 February 2016 at the daily frequency. Appendix
Left-hand plot: the blue lines indicates µ ± 2‡, where µ is the sample mean of the returns and ‡ is Stat. Tables
their sample standard deviation. Right-hand plot: the red dotted line is the density N (µ, ‡ 2 )
186 / 295
Statistics and
Econometrics II
Example 6.3 (MLE estim. of a mixture of Gaussian distri. (cont’d)) J.-P. Renne
Figure 39: Estimated density (vs kernel-based estimate) Prelim.
CLT
Kernel estimate (non−parametric)

Tests
0.25
Mixture of Gaussian (MLE, parametric)

Normal distribution Linear regression
Assumptions
OLS
0.20
Inference
Common pitfalls
Large sample
0.15
IV
GRM
Panels
0.10
Max.Lik.Estim.
0.05
Binary
Time Series
0.00
AR processes
MLE of AR processes
−4 −2 0 2 4 Forecasting
returns, in percent Appendix
Stat. Tables
187 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Binary Choice Models Inference

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
188 / 295
Statistics and
Econometrics II
Context and Objectives J.-P. Renne
I Context: Prelim.
In many instances, the variables to be explained (yi s) have only CLT
two possible values (0 and 1, say). Tests
Assume we suspect some variable xi (K ◊ 1) to be able to account Linear regression

for the probability that yi = 1. Assumptions
OLS
The model reads: Inference
yi |X ≥ B(g(xi ; ◊)),
Common pitfalls
Large sample
where g(xi ; ◊) is the parameter of the Bernoulli distribution. In

IV
GRM
other words, conditionally on X: Panels

; Max.Lik.Estim.
1 with probability g(xi ; ◊)
yi =
0 with probability 1 ≠ g(xi ; ◊), Binary
Time Series
where ◊ is a vector of parameters to be estimated. AR processes
MLE of AR processes
I Objective: Forecasting
To estimate the vector of population parameters ◊. Appendix
Stat. Tables
189 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Example 7.1 Tests
Linear regression
Binary-choice models can be used to account for... Assumptions
I any binary decisions (e.g. in referendums, being owner or renter, OLS

Inference
living in the city or in the countryside, in/out of the labour Common pitfalls
force,...),
Large sample
IV
I contamination (disease or default), GRM
Panels
I success/failure (exams).
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
190 / 295
I A possibility would be to run a linear regression: Statistics and
Econometrics II
yi = ◊ xi + Ái .
Õ
J.-P. Renne
I Such a regression could be consistent with Assumptions A.1, A.2,

Prelim.
A.4, but more difficultly with A.3 and certainly not with A.5
CLT
(because yi œ {0, 1}).
Tests
I At best, using a linear regression to study the relationship between
Linear regression
xi and yi would be inefficient. Assumptions
OLS
Inference
Figure 40: Linear regression in the Probit case Common pitfalls

Large sample
IV
GRM
Panels
1.0
● ●● ●●
●●●
●
●●
●●
●●
●●●
●
●●●
●●
●●
●●●
●●●●
●●
● ●●●●
●●●● ●
● Observations
fitted linear regressions
Max.Lik.Estim.
0.8
Binary
0.6
y
Time Series
0.4
AR processes
0.2
MLE of AR processes
0.0
● ● ●
●●● ●
●
●●
●●●
●●●
●●●
●
●●
●●●● ●●
●●
●● Forecasting
−4 −2 0 2 Appendix
x Stat. Tables
The model is P(yi = 1|xi ) = (0.5 + 2xi ), where is the c.d.f. of the normal distribution.
191 / 295
I Two prominent models to tackle this situation. In both models, we have: Statistics and
Econometrics II
P(yi = 1|xi ; ◊) = g(◊ Õ xi ).
J.-P. Renne
I Probit model:
g(z) = (z), (55) Prelim.
where is the c.d.f. of the normal distribution. CLT

I Logit model:
Tests
1
g(z) = . (56) Linear regression
1 + exp(≠z)
Assumptions
OLS
Inference
Figure 41: Probit and Logit functions Common pitfalls
Large sample
IV
GRM
Panels
1.0
Max.Lik.Estim.
0.8
Binary
Time Series
0.6
g(x)
AR processes
MLE of AR processes
0.4
Forecasting
Appendix
0.2
Stat. Tables
0.0
−4 −2 0 2 4
x 192 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Figure 42: Linear regression in the Probit case CLT
Tests
Linear regression
1.0
● ●● ●●
●●●
●
●●
●●
●●
●●●
●
●●●
●●
●●
●●●
●●●●
●●
● ●●●●
●●●● ●
Assumptions
● Observations
fitted linear regressions OLS
0.8
P(yi=1|xi) Inference
Common pitfalls
0.6
Large sample
y
0.4
IV
GRM
0.2
Panels
0.0
● ● ●
●●● ●
●
●●
●●●
●●●
●●●
●
●●
●●●● ●●
●●
●●
Max.Lik.Estim.
−4 −2 0 2
Binary
x
Time Series
The model is P(yi = 1|xi ) = (0.5 + 2xi ), where is the c.d.f. of the normal distribution. AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
193 / 295
I These models have an interpretation in terms of latent variables. Statistics and
I Consider the Probit case. We have: Econometrics II
J.-P. Renne
P(yi = 1|xi ; ◊) = (◊ Õ xi ) = P(≠Ái < ◊ Õ xi ),
Prelim.
where Ái ≥ N (0, 1). That is:
CLT
P(yi = 1|xi ; ◊) = P(0 < yiú ), Tests
where yiú= ◊ xi + Ái , with Ái ≥ N (0, 1).

Õ Linear regression
I yiú is a (latent) variable that determines yi : yi = I{y ú >0} . Assumptions
i OLS
Inference
Common pitfalls
Large sample
Figure 43: Distribution of yiú conditional on xi IV
GRM
Panels
0.4
Max.Lik.Estim.
Binary
0.3
P(yi=0|xi;θ) P(yi=1|xi;θ)
Time Series
0.2
AR processes
MLE of AR processes
0.1
Forecasting
θ'x_i
Appendix
0.0
−4 −2 0 2 4 Stat. Tables
Possible values of yi*
194 / 295
Estimation Statistics and
Econometrics II
J.-P. Renne
I These two models can be estimated by Maximum Likelihood Prelim.
approaches (see Section 6). CLT
I The xi are considered to be deterministic. Tests
Linear regression
I The r.v. are assumed to be independent across entities i.
Assumptions
I How to write the likelihood here? OLS

Inference
I It can be seen that (key equation): Common pitfalls

Large sample
IV
f (yi |xi ; ◊) = yi g(◊ xi ) + (1 ≠ yi )(1 ≠ g(◊ xi ))

Õ Õ GRM
yi 1≠yi Panels
= g(◊ xi ) (1 ≠ g(◊ xi ))
Õ Õ
. (57)
Max.Lik.Estim.
I Therefore, if the observations (xi , yi ) are independent (across Binary
entities i), then: Time Series
AR processes
n
log L(◊; y) = yi log[g(◊ xi )] + (1 ≠ yi ) log[1 ≠ g(◊ xi )].

Õ Õ Forecasting
i=1 Appendix
Stat. Tables
195 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
I The likelihood equation (Def. 6.6) –that is the first-order CLT
condition of the optimization program– is: Tests
ˆ log L(◊; y) Linear regression

= 0, Assumptions
ˆ◊ OLS
Inference
that is: Common pitfalls
Large sample
n
ÿ g Õ (◊ Õ xi ) g Õ (◊ Õ xi ) IV
yi xi ≠ (1 ≠ yi )xi = 0. GRM
g(◊ xi )
Õ
1 ≠ g(◊ Õ xi ) Panels
i=1
Max.Lik.Estim.
∆ Nonlinear equation that generally has to be numerically solved (no Binary
closed-form formula available).
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
196 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
I Under regularity conditions (Assumptions A.7), we have CLT
(Prop. 6.4): Tests
◊ MLE ≥ N (◊ 0 , I(◊ 0 )≠1 ), Linear regression

Assumptions
where 3 4 OLS
ˆ 2 log L(◊; y) Inference

I(◊ 0 ) = ≠E0 = nIY (◊ 0 ). Common pitfalls
ˆ◊ˆ◊ Õ Large sample
IV
I For finite samples, we can approximate I(◊ 0 ) ≠1

by (Eq. 52): GRM
Panels
3 4≠1
ˆ 2 log L(◊ MLE ; y) Max.Lik.Estim.
I(◊ 0 )
≠1
¥≠ .
ˆ◊ˆ◊ Õ Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
197 / 295
Statistics and
Econometrics II
I In the Probit case (Eq. 55), it can be shown that we have: J.-P. Renne
n Prelim.
ˆ 2 log L(◊; y) ÿ
Õ =≠ g Õ (◊ Õ xi )[xi xiÕ ] ◊ CLT
ˆ◊ˆ◊
i=1 Tests
Ë È
g Õ (◊ Õ xi ) + ◊ Õ xi g(◊ Õ xi ) g Õ (◊ Õ xi ) ≠ ◊ Õ xi (1 ≠ g(◊ Õ xi )) Linear regression
yi + (1 ≠ yi ) .
g(◊ Õ xi )2 (1 ≠ g(◊ Õ xi ))2 Assumptions
OLS
Inference
I In the Logit case (Eq. 56), it can be shown that we have: Common pitfalls
Large sample
IV
n
ˆ 2 log L(◊; y) ÿ GRM
=≠ g (◊
Õ Õ
xi )xi xiÕ , Panels
ˆ◊ˆ◊ Õ
i=1
Max.Lik.Estim.
exp(≠x ) Binary
where g Õ (x ) = .
(1 + exp(≠x ))2 Time Series
AR processes
MLE of AR processes
(Note: because g Õ (x )
> 0, it can be checked that Forecasting
≠ˆ 2 log L(◊; y)/ˆ◊ˆ◊ Õ is positive definite.) Appendix
Stat. Tables
198 / 295
Marginal effects Statistics and
Econometrics II
J.-P. Renne
I How to measure marginal effects, i.e. the effect on the probability that Prelim.
yi = 1 of a marginal increase of xi,k ? CLT
I It is given by: Tests
ˆP(yi = 1|xi ; ◊)
= g Õ (◊ Õ xi ) ◊k . Linear regression
ˆxi,k ¸ ˚˙ ˝ Assumptions
>0 OLS
Inference
I This effect is of the same sign as ◊k . Common pitfalls
I It can be estimated by g Õ (◊ ÕMLE xi )◊MLE ,k . Large sample

IV
I Note that this marginal effect depends on xi . GRM
∆ Increases by 1 unit of xi,k (entity i) and of xj,k (entity j) have not Panels
necessarily the same effects on P(yi = 1|xi ; ◊) and on P(yj = 1|xj ; ◊), Max.Lik.Estim.
respectively. Binary
I Measure of “average” marginal effect? Two solutions. For each
Time Series
explanatory variable k: AR processes
(i) Denoting by x̂ the sample average of the xi s, compute MLE of AR processes
g Õ (◊ ÕMLE x̂)◊MLE ,k .
Forecasting
(ii) Compute the average (across i) of g Õ (◊ ÕMLE xi )◊MLE ,k . Appendix
Stat. Tables
199 / 295
Goodness of fit Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
2
I There is no commonly accepted version of R (see Eq. 6) for Tests
binary-choice models. Linear regression

Assumptions
I Existing measures are called pseudo-R 2 measures. OLS
Inference
I Let us denote by log L0 (y) the (maximum) log-likelihood that Common pitfalls
would be obtained for a model containing only a constant term

Large sample
IV
(i.e. with xi = 1 for all i). GRM
I The McFadden’s pseudo-R 2 is given by: Panels
Max.Lik.Estim.
2 log L(◊; y)
RMF =1≠ . Binary
log L0 (y) Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
200 / 295
Statistics and
Econometrics II
Example 7.2 (Labor market participation) J.-P. Renne
Estimate Std. Error z value Pr(>|z|) Prelim.

(Intercept) 3.749 1.407 2.665 0.008 CLT
income -0.667 0.132 -5.054 0.000
Tests
age 2.075 0.405 5.119 0.000
education 0.019 0.018 1.071 0.284 Linear regression
youngkids -0.714 0.100 -7.117 0.000 Assumptions
OLS
oldkids -0.147 0.051 -2.888 0.004 Inference
foreignyes 0.714 0.121 5.888 0.000 Common pitfalls
I(age^2) -0.294 0.050 -5.893 0.000 Large sample

IV
GRM
Table 20: Women labor force participation in Switzerland
Panels
Max.Lik.Estim.
Binary
Source: Gerfin (1996).
Time Series
AR processes
Marginal effects: MLE of AR processes
income: ≠0.22, age: 0.69, education: 0.006, youngkids: ≠0.24, oldkids: Forecasting
≠0.05, foreignyes: 0.24, age2: ≠0.10. Appendix
Stat. Tables
201 / 295
Statistics and
Econometrics II
Example 7.2 (Labor market participation (cont’d)) J.-P. Renne
Figure 44: Labor market participation vs. age and education Prelim.
CLT
Tests
Linear regression
1.0
1.0
Assumptions
OLS
0.8
0.8
Inference
no
Common pitfalls
no
Large sample
0.6
0.6
participation
participation
IV
GRM
0.4 Panels
0.4
Max.Lik.Estim.
yes
0.2
0.2
yes
Binary
Time Series
0.0
0.0
AR processes
2 3 3.5 4 4.5 5 6 0 4 6 8 10 12 16
MLE of AR processes
age education Forecasting
Appendix
Stat. Tables
202 / 295
Statistics and
Econometrics II
Example 7.3 (Credit risk) J.-P. Renne
Figure 45: Defaults, interest rates and ratings Prelim.
CLT
Share of defaulted entities and rating Interest rate and rating Tests
1.0
0.25
Linear regression
Assumptions
No default
0.8
OLS
0.20
Inference
Common pitfalls
Default status
0.6
Interest rate
●
Large sample
0.15
●
IV
●
0.4
●
●
●
●
GRM
●
●
Default
Panels
0.10
●
●
●
0.2
Max.Lik.Estim.
0.05
●
0.0
Binary
A B C D E F A B C D E F G
rating (or grade) rating (or grade) Time Series

AR processes
MLE of AR processes
Forecasting
Source: Lending Club (large online peer-to-peer loan platform). About 38.000 loans, 2007 to 2011. Appendix
Stat. Tables
203 / 295
Statistics and
Econometrics II
Example 7.3 (Credit risk (cont’d)) J.-P. Renne
Estimate Std. Error z value Pr(>|z|) Prelim.

(Intercept) -0.185 0.117 -1.585 0.113 CLT
amt.2.income 0.144 0.013 11.508 0.000
Tests
term 60 months 0.513 0.018 28.094 0.000
home_ownershipOWN 0.058 0.032 1.771 0.077 Linear regression
home_ownershipRENT 0.104 0.017 6.064 0.000 Assumptions
delinq_2yrs 0.062 0.016 3.981 0.000

OLS
Inference
issue_Y2008 -0.095 0.101 -0.945 0.345 Common pitfalls
issue_Y2009 -0.268 0.096 -2.783 0.005 Large sample
issue_Y2010 -0.410 0.095 -4.322 0.000 IV

GRM
issue_Y2011 -0.346 0.094 -3.673 0.000
Panels
Table 21: Predicting defaults with probit regression Max.Lik.Estim.
Binary
amt2income = (monthly payment owed by the borrower)/(self-reported annual income); term is 36 or 60 Time Series
months; delinq2yrs = number of 30+ days past-due incidences of delinquency in the borrower’s credit AR processes
file for the past 2 years. home_ownership is either MORTGAGE, OWN or RENT. MLE of AR processes
Forecasting
Marginal effects:
Appendix
amt2income: 0.031; term 60 months: 0.11; homeownershipRENT: 0.023; delinq2yrs: 0.014.
Stat. Tables
204 / 295
Example 7.4 (Credit risk) Statistics and
Econometrics II
J.-P. Renne
Figure 46: Comparing probit-based predictions and ratings
Prelim.
0.4 ● CLT
●
●
Tests
●
Linear regression
Probit−based proba. of defaults
●
● Assumptions
0.3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
OLS
●
● ●
●
● ●
●
●
●
●
●
●
●
●
● Inference
●
●
●
●
●
●
●
●
Common pitfalls
●
●
●
●
●
●
● Large sample
0.2
●
●
●
●
●
●
●
●
●
●
● IV
●
●
GRM
Panels
0.1
●
●
●
Max.Lik.Estim.
●
●
●
● ●
●
●
●
●
● Binary
●
Time Series
A B C D E F G AR processes
MLE of AR processes
rating (or grade)
Forecasting
Appendix
Stat. Tables
Source: Lending Club (large online peer-to-peer loan platform). About 38.000 loans, 2007 to 2011.
205 / 295
Likelihood ratio tests Statistics and
Econometrics II
I Because the models are estimated by MLE, one can easily conduct J.-P. Renne
Likelihood ratio (LR) tests in the context of binary-choice models.
I In LR tests, we compare a restricted model (R) and an unrestricted Prelim.
model (U). CLT

I Restricted model = constrained version of the unrestricted one. Tests
I Let us denote by J the number of constraints imposed on R (w.r.t. U). Linear regression
I We denote by log LU (respectively by log LR ) the maximum likelihood Assumptions
resulting from the maximization of the unrestricted log-likelihood (resp.

OLS
Inference
the restricted log-likelihood). We have (mechanically): Common pitfalls
Large sample
log LU > log LR . (58) IV

GRM
I The null hypothesis (see Section 3) of the LR test is Panels
Max.Lik.Estim.
H0 : the “true” model is the restricted one.
Binary
I Even under H0 , Eq. (58) holds. However, if H0 is true, one expects that Time Series
log LU is not too far from log LR . AR processes
MLE of AR processes
I Formally, it can be proven that, asymptotically: Forecasting
Appendix
d
2(log LU ≠ log LR ) æ ‰2 (J). Stat. Tables
¸ ˚˙ ˝
LR test statistic
206 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Introduction to Time Series Tests
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
I Time series tools are useful to IV

GRM
I forecast (calculation of confidence intervals)
Panels
I test economic theories, exploiting the time dimension.
I calibrate model used for policy analysis or risk management. Max.Lik.Estim.
I price assets. Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
207 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Definition 8.1 (Time series) CLT
Tests
I A time series: {yt }+Œ k
t=≠Œ = {. . . , y≠2 , y≠1 , y0 , y1 , . . . , yt , . . . }, yi œ R .
Linear regression
I Typical sample: {y1 , . . . , yT } Assumptions
OLS
Inference
Example 8.1 (Examples of time series) Common pitfalls

Large sample
I Constant: yt = c. IV
GRM
I Trend: yt = t.
Panels
I Gaussian white noise (Def. 8.2): yt = Át , Át ≥ i.i.d. N (0, ‡ 2 ).
Max.Lik.Estim.
I Time series specifications are often defined by a difference equation
Binary
(Def. 8.6 below).
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
208 / 295
Statistics and
Econometrics II
Figure 47: Federal fund rate J.-P. Renne
Prelim.
CLT
Federal Fund Rate
Tests
Linear regression
Assumptions
OLS
Inference
15
Common pitfalls
Large sample
IV
GRM
10
Panels
Max.Lik.Estim.
Binary
5
Time Series
AR processes
MLE of AR processes
0
Forecasting
1960 1980 2000
Appendix
Stat. Tables
Source of the data: FRED database
209 / 295
Statistics and
Econometrics II
Figure 48: Federal fund rate and unemployment rate J.-P. Renne
Prelim.
CLT
Federal Fund Rate and Unemployment Rate
Tests
Linear regression
Assumptions
OLS
15
Inference
Common pitfalls
Large sample
IV
GRM
10
Panels
Max.Lik.Estim.
5
Binary
Time Series
AR processes
MLE of AR processes
0
Forecasting
1960 1980 2000
Appendix
Stat. Tables
Source of the data: FRED database. Unemployment rate in red.
210 / 295
Statistics and
Econometrics II
Figure 49: U.S. Gross Domestic Product J.-P. Renne
Real Gross Domestic Product (bns of Chained 2009 USD) Prelim.
CLT
15000
Tests
Linear regression
10000
Assumptions
OLS
5000
Inference
Common pitfalls
Large sample
1960 1980 2000 IV
GRM
Annual GDP (log) Growth

Panels
Max.Lik.Estim.
0.10
Binary
0.05
Time Series
AR processes
MLE of AR processes
0.00
Forecasting
Appendix
1960 1980 2000
Stat. Tables
Source of the data: FRED database.
211 / 295
Statistics and
Econometrics II
Figure 50: Intraday data J.-P. Renne
IBM Stock Price, 11 February 2016, NASDAQ Prelim.
CLT
119.5
Tests
Linear regression
119.0
Assumptions
OLS
Inference
Common pitfalls
118.5
Large sample
in USD
IV
GRM
118.0
Panels
Max.Lik.Estim.
117.5
Binary
Time Series
117.0
AR processes
MLE of AR processes
Forecasting
11:00 13:00 15:00
Appendix
time
Stat. Tables
Source of the data: Google Finance.
212 / 295
Statistics and
Econometrics II
Figure 51: Seasonality illustration (1/2) J.-P. Renne
Prelim.
CLT
Number of deaths in France (monthly)
Tests
60000
Linear regression
Assumptions
OLS
55000
Inference
Common pitfalls
Large sample
IV
50000
GRM
Panels
Max.Lik.Estim.
45000
Binary
Time Series
40000
AR processes
MLE of AR processes
Forecasting
1985 1990 1995 2000 2005
Appendix
Stat. Tables
Source of the data: FRED database.
213 / 295
Statistics and
Econometrics II
Figure 52: Seasonality illustration (2/2) J.-P. Renne
Prelim.
CLT
Number of Google searches for "depression"
Tests
Linear regression
90
Assumptions
OLS
Inference
80
Common pitfalls
Large sample
IV
70
GRM
Panels
60
Max.Lik.Estim.
Binary
50
Time Series
AR processes
40
MLE of AR processes
Forecasting
2004 2006 2008 2010 2012 2014 2016
Appendix
Stat. Tables
Source of data: Google trends.
214 / 295
I Standard time series models are built using "basic" shocks that we will Statistics and
Econometrics II
often denote by Át . Typically: E(Át ) = 0.
I A basic form of shocks: i.i.d. shocks (independence: Def. 1.6). J.-P. Renne
I The following definition describes a more general form of basic shocks

Prelim.
(not necessarily independent):
CLT
Tests
Definition 8.2 (White noise)
Linear regression
The process {Át }tœ]≠Œ,+Œ[ is a white noise if, for all t (a) E(Át ) = 0, (b) Assumptions
OLS
Var (Át ) = E(Á2t ) = ‡ 2 < Œ and (c) for all s ”= t, E(Át Ás ) = 0. Inference
Common pitfalls
Large sample
Remark 8.1 (A white noise process is not necessarily i.i.d.) IV

GRM
I Consider Át ≥ N (0, 1 + 0.5Á2t≠1 ). We have: Panels
Max.Lik.Estim.
E(Át ) = E [E(Át |Át≠1 )] = 0 and Var (Át |Át≠1 ) = E(Á2t |Át≠1 ) = 1+0.5Á2t≠1 .
Binary
I By the low of total variance (Prop. 1.3): Time Series

AR processes
MLE of AR processes
Var (Át ) = Var [E(Át |Át≠1 )] + E [Var (Át |Át≠1 )] = 0 + E(1 + 0.5Á2t≠1 ). Forecasting
Appendix
I Therefore Var (Át ) = 2 < Œ. It can be shown further that E(Át Ás ) = 0
Stat. Tables
if s ”= t. Hence Át is a white noise. But the Át s are not independent, for
instance because E(Á2t |Át≠1 ) ”= E(Á2t ) = Var (Át ).
215 / 295
Statistics and
Econometrics II
J.-P. Renne
I In the general case, the expectation and the variance of yt may depend
Prelim.
on time t.
I However, in the following, we will focus on those processes yt whose CLT
first two moments do not depend on time. Tests

I This class of processes already allows to accommodate the features of a Linear regression
wide range of time-varying variables. Assumptions
OLS
I These processes are said to be covariance-stationary. Inference
Common pitfalls
Large sample
Remark 8.2 (Unconditional and conditional moments) IV

GRM
I When not stated otherwise, "expectation" means "unconditional Panels

expectation", or "marginal expectation" (i.e. E(yt )).
Max.Lik.Estim.
I As we will see later, in general:
Binary
E(yt ) ”= E(yt |yt≠1 , yt≠2 , . . . ) . Time Series

¸˚˙˝ ¸ ˚˙ ˝ AR processes
Unconditional expect. Conditional expect. MLE of AR processes

Forecasting
Appendix
Stat. Tables
216 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Definition 8.3 (Covariance-stationarity) CLT
Tests
The process yt is covariance-stationary if, for all t and j,
Linear regression
Assumptions
E(yt ) = µ and E{(yt ≠ µ)(yt≠j ≠ µ)} = “j < Œ. OLS
Inference
Common pitfalls
Proposition 8.1 Large sample

IV
GRM
If yt is covariance-stationary, then “j = “≠j .
Panels
Max.Lik.Estim.
Proof Since yt is covariance-stationary, the covariance between yt and yt≠j (i.e “j ) is the same as
that between yt+j and yt+j≠j (i.e. “≠j ). ⌅ Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
217 / 295
Statistics and
Econometrics II
Figure 53: U.S. GDP growth J.-P. Renne
Prelim.
CLT
Annual GDP (log) Growth
Tests
Linear regression
Assumptions
0.10
OLS
Inference
Common pitfalls
Large sample
IV
0.05
GRM
Panels
Max.Lik.Estim.
0.00
Binary
Time Series
AR processes
MLE of AR processes
1960 1980 2000 Forecasting
Appendix
Blue solid line: sample mean; blue dashed line: sample mean ± 2 std deviations. Stat. Tables
218 / 295
Statistics and
Econometrics II
Figure 54: First-order auto-covariance J.-P. Renne
Prelim.
CLT
The first−order auto−covariance
Tests
●
Linear regression
Assumptions
0.10
●
● OLS
● ● Inference
● ● ●
● ● ● ●
●
Large sample
●
●
●● ●
●
● ● ●
●● ● ●● ● ●● ●
IV
● ● ●
● ● ●
●● ●
0.05
● ●
Growth[t]
● ● ● ● ● ●● ● ● ●
● ●● ● ●● ● ●●
● ●
●●
● ● ●● ●●● ●
●●
●●
●
●●
●●
●
● ● ● ● GRM
● ●● ●
●●● ●
●● ● ●●● ● ●
● ● ●● ● ● ●
● ●
●● ● ● ● ●● ● ●
Panels
● ● ●
●
● ● ● ● ●●●●●● ● ●
●●● ●● ●
●
●
●
●
●● ● ● ●●● ● ●●●● ●●● ●
● ●
●●●
●●● ● ●● ●
●
● ●
●
● ●●
● ● ● ● ●●
● ●
● ● ●● ● ●
● ● ●●● ●
Max.Lik.Estim.
●● ●
● ● ●● ●
● ● ●●● ●
● ● ● ●
●
0.00
●● ● ●
● ● ● ● ●●
Binary
●● ● ●
● ● ●● ●
●
●
●● ●
● ●
●
Time Series
●● ●
● ● ●
● ●
● AR processes
MLE of AR processes
0.00 0.05 0.10 Forecasting
Growth[t−1]
Appendix
The slope of the blue line is “ „

ˆ1 /Var (growtht ) (see bivariate OLS case 4.5). Stat. Tables
219 / 295
Statistics and
Econometrics II
Figure 55: Second-order auto-covariance J.-P. Renne
Prelim.
CLT
The second−order auto−covariance
Tests
●
Linear regression
Assumptions
0.10
●
● OLS
● ● Inference
●● ●
● ● ● ●
●
Large sample
●
●● ●
● ●● ● ●
● ● ● ● ● ●●● ● ●
● ● ● ●
●
●
●
IV
0.05
●● ●● ●
Growth[t]
●● ● ● ● ●
● ● ●
● ● ● ●●● GRM
● ● ●
●
● ●● ● ● ● ● ●● ●
● ● ● ●● ● ● ● ●●
●● ●
●●●●● ● ●● ●
● ● ● ● ●
● ●●
● ●●
● ● ● ●● ●
Panels
● ● ● ●●● ●● ●●
● ●● ● ●● ● ●
● ● ●● ●● ● ● ●
● ●● ● ● ●
●
● ●● ● ● ●
● ●● ●
●●●
● ●●● ● ● ● ●●● ● ●
●●● ●● ●
● ● ● ●
●
●● ● ● ● ●● ● ●
● ● ●● ●● ●
●●
●
Max.Lik.Estim.
● ● ●● ●
● ● ●●●●● ●
● ● ●● ● ● ● ●
●
● ●● ● ●
0.00
● ●
● ●● ● ●
Binary
●
● ● ●● ●
● ●
● ●
●
●●
● ●
● ● ●
Time Series
● ● ●
● ● ●
● ●
● AR processes
MLE of AR processes
Growth[t−2]
Appendix

220 / 295
Statistics and
Econometrics II
Figure 56: Fifth-order auto-covariance J.-P. Renne
Prelim.
CLT
The fifth−order auto−covariance
Tests
●
Linear regression
Assumptions
0.10
●
● OLS
● ● Inference
● ● ●
●● ● ● Common pitfalls
● ● ● ●
●
Large sample
●
● ● ●
● ●● ● ● ●
● ● ● ● ●● ● ●● ●
●
●
● ●● IV
0.05
●● ●● ● ● ●
Growth[t]
● ● ● ●
●●● ● ●
●●
GRM
● ● ● ●
● ● ● ● ●
●● ● ● ●
●●
● ● ● ● ● ●● ●●●●● ● ●● ● ●
●●●● ● ● ●
● ●
● ●● ● ● ● ● ●
●● ● ●
Panels
● ● ●● ● ● ●●●
● ●● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
●● ●
●
● ●●●●● ● ●●● ●● ● ●
● ●● ●
● ● ● ● ●
● ● ● ●
● ●
● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ●●● ●
●
Max.Lik.Estim.
●● ●● ● ●
● ● ● ● ● ● ●
● ● ● ● ●● ●
● ● ● ●
0.00
● ● ● ●
● ●
● ● ●
Binary
●
● ● ●● ●● ●
●
●
●
● ●
● ●
● ●
●
Time Series
● ● ●
●
● ●
● ●
● AR processes
MLE of AR processes
Growth[t−5]
Appendix

221 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 57: An example of nonstationary process: yt = 0.05 ◊ t + Át
Prelim.
CLT
12
Tests
10
Linear regression
Assumptions
OLS
8
Inference
Common pitfalls
6
Large sample
IV
GRM
4
Panels
Max.Lik.Estim.
2
Binary
0
Time Series
AR processes
MLE of AR processes
Forecasting
0 50 100 150 200
Appendix
Stat. Tables
222 / 295
Statistics and
Econometrics II
J.-P. Renne
I It is very common to work with auto-correlated time series.
Prelim.
I Auto-correlation affects, among many other things, the estimation of
CLT
the moments of a variable.
I Typically, assume you have a sample of observations of a time series yt Tests
that, you know, is covariance-stationary. Linear regression

I If you want to estimate µ = E(yt ), you will naturally compute Assumptions
qT OLS
ȳT = 1/T y .
t=1 t
Inference
Common pitfalls
I If you want to compute a confidence interval for µ, you are tempted to Large sample
use the CLT (Theorem 9.6). IV

GRM
I But this is inappropriate, because your yt s are not i.i.d.
Panels
I Intuitively, the more auto-correlated your yt s are, the less precise your
Max.Lik.Estim.
estimate of µ (see next slide).
I There exists an "adapted" version of the CLT for time series processes. Binary
This theorem (8.1 below) involves the sequence of autocovariances Time Series
{“j }jœ]≠Œ,Œ[ . Both theorems (9.6 and 8.1) coincide when “j = 0 for AR processes
all j ”= 0, i.e. when yt is a white noise (Def. 8.2).

MLE of AR processes
Forecasting
Appendix
Stat. Tables
223 / 295
Statistics and
Econometrics II
Figure 58: Processes with same first two unconditional moments J.-P. Renne
Prelim.
Case A
1 2 3 4 5
CLT
Tests
−1
0 20 40 60 80 100 Linear regression

Case B
Assumptions
OLS
Inference
4
Common pitfalls
3
Large sample
2
1
IV
0 20 40 60 80 100 GRM
Case C
Panels
2.5
Max.Lik.Estim.
2.0
1.5
Binary
1.0
0 20 40 60 80 100
Time Series
AR processes
Note: The three samples have been simulated using the following data generating process:
 MLE of AR processes
yt = µ + fl(yt≠1 ≠ µ) + 1 ≠ fl2 Át , where Át ≥ N (0, 1). Forecasting
Case A: fl = 0; Case B: fl = 0.9; Case C: fl = 0.99. Appendix

In the three cases, E(yt ) = µ = 2 and Var (yt ) = 1.
Stat. Tables
Web-interface illustration (use the "AR(1)" panel)
224 / 295
Statistics and
Econometrics II
J.-P. Renne
Theorem 8.1 (Limit theorem for covar. statio. processes) Prelim.
CLT
If process yt is covariance-stationary
q+Œ and if the sequence of autocovariance is
Tests
absolutely summable ( j=≠Œ |“j | < Œ), then:
Linear regression
Assumptions
p
ȳT æ µ = E(yt ) (59) OLS
Inference
+Œ
ÿ Common pitfalls
limT æ+Œ T E[(ȳT ≠ µ)2 ] = “j (60) Large sample

IV
j=≠Œ GRM
A +Œ
B
ÿ Panels
Ô d
T (ȳT ≠ µ) æ N 0, “j . (61) Max.Lik.Estim.
j=≠Œ Binary
Time Series
Proof Extra online material. ⌅ AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
225 / 295
Statistics and
Definition 8.4 (Long-run variance) Econometrics II
Under the assumptions of Theorem 8.1, the limit appearing in Eq. (60) exists J.-P. Renne
and is called long-run variance. It is denoted by S, i.e.:
Prelim.
2
S = limT æ+Œ T E[(ȳT ≠ µ) ] CLT
Tests
I A natural estimator is: Linear regression
q
ÿ Assumptions
ˆ0 + 2
“ ˆ‹
“ (62) OLS
Inference
‹=1 Common pitfalls
1
qT Large sample
where “
ˆ‹ = T ‹+1
(yt ≠ ȳ )(yt≠‹ ≠ ȳ ). IV
GRM
I Problem: in small samples, using Eq. (62) may result in a negative
Panels
figure...
Max.Lik.Estim.
Binary
Definition 8.5 (Newey-West estimator of S)
Time Series
Newey and West (1987) have proposed the following (nonnegative) consistent AR processes
estimator for the long-run variance of yt :

MLE of AR processes
Forecasting
q Appendix
ÿ1 ‹
2
S NW = “
ˆ0 + 2 1≠ ˆ‹ .
“ Stat. Tables
q+1
‹=1
226 / 295
Difference equations and Auto-Regressive (AR) processes Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I Autoregressive processes constitute an important class of time series
Tests
processes.
I These processes are flexible and can reproduce the dynamics of a wide Linear regression
Assumptions
range of time-varying variables. OLS
I Advantage: an AR process is defined through a few parameters. Inference

Common pitfalls
I If one is interested in a particular variable and its dynamics, one can Large sample
look for an AR specification that is consistent with the considered IV

GRM
variable and estimate the parameters of the AR process (see Slide 243).
Panels
I Once one has estimated the parameters, the AR model can for instance
be used to forecast the variable at hand and to calculate associated Max.Lik.Estim.
confidence intervals (see Slide 254). Binary

I An AR process is defined by means of a difference equation (Def. 8.6 Time Series
below). AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
227 / 295
Statistics and
Definition 8.6 (Difference equation) Econometrics II
J.-P. Renne
A linear p-th order difference equation is of the form:
Prelim.
yt = „1 yt≠1 + „2 yt≠2 + . . . „p yt≠p + Át , (63) CLT
Tests
where Át is a sequence of shocks.
Linear regression
Assumptions
I It will prove convenient to introduce the following notations: OLS
Inference
S T Common pitfalls
„1 „2 ... „p S Á T S y T Large sample
t t
W 1 0 ... 0 X IV
0 yt≠1
W 0 1 ... 0 X, › = W X W X. GRM
F =W X t U .. V , yt = U .. V
U .. .. .. V . .
Panels
. . .
0 yt≠p+1 Max.Lik.Estim.
0 0 ... 1 0
(64) Binary
I We have: Time Series
yt = F yt≠1 + › t (65)
AR processes
MLE of AR processes
Forecasting
I and:
Appendix
yt+j = › t+j + F › t+j≠1 + F 2 › t+j≠2 + · · · + F j≠1 › t+1 + F j yt . Stat. Tables
228 / 295
Statistics and
Econometrics II
Definition 8.7 (Dynamic multiplier)
J.-P. Renne
ˆyt+j
The dynamic multiplier of yt+j w.r.t. Át is .
ˆÁt Prelim.
CLT
ˆyt+j
I We have: = (F j )[1,1] . Tests
ˆÁt
Linear regression
I Let us assume that the eigenvalues of F (see Def. 9.2), denoted by
Assumptions
⁄1 , . . . , ⁄p , are distinct. OLS
I Then, there exists a nonsingular matrix P such that: Inference

Common pitfalls
Large sample
S ⁄ 0 ... 0
T IV
1
0 0
GRM
W ⁄2 ... X
F =PU .. V P ≠1 = PDP ≠1 . Panels
. Max.Lik.Estim.
0 ... 0 ⁄p
Binary
I We have: S T Time Series

⁄j1 0 ... 0 AR processes
⁄j2
MLE of AR processes
W 0 0 ... X ≠1 Forecasting
Fj = P W ..
XP .
U . V Appendix
0 ... 0 ⁄jp Stat. Tables
229 / 295
Statistics and
Econometrics II
I Let us introduce zt = P ≠1 yt and ÷ t = P ≠1 › t . Eq. (65) becomes: J.-P. Renne
Prelim.
zt = Dzt≠1 + ÷ t . (66)
CLT
I The i th of the p equations of Eq. (66) reads zi,t = ⁄i zi,t≠1 + ›i,t . Tests
I We have: Linear regression
Assumptions
OLS
zi,t+h = ⁄i zi,t+h≠1 + ›i,t+h Inference
= ⁄i (⁄i zi,t+h≠2 + ›i,t+h≠1 ) + ›i,t+h Common pitfalls

Large sample
= ›i,t+h + ⁄i ›i,t+h≠1 + ⁄2i ›i,t+h≠2 + ··· + ⁄h≠1

i ›i,t+1 + ⁄hi zi,t . IV
GRM
Panels
I If Át is a white noise sequence, then zi,t also is. Let denote by ‡i2 the
variance of zi,t . We have: Max.Lik.Estim.
Binary
# $ 2(h≠1)
Var ›i,t+h + ⁄i ›i,t+h≠1 + · · · + ⁄h≠1
i ›i,t+1 = ‡i2 (1 + ⁄2i + . . . ⁄i ). Time Series
AR processes
MLE of AR processes
I For h æ +Œ, this series converges iff |⁄i | < 1. Forecasting
I If |⁄i | > 1, zi,t "explodes". Appendix
Stat. Tables
230 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
I If one of the ⁄i s is such that |⁄i | > 1, then F j = PD j P ≠1 "explodes"
- - CLT
- ˆyt+j -
when j get large, and - - æ Œ. Tests
ˆÁt jæŒ
Linear regression
Assumptions
Remark 8.3 (The eigenvalues of F ) OLS

Inference
Common pitfalls
I It can be shown that the eigenvalues of F are the solutions of: Large sample
IV
⁄p ≠ „1 ⁄p≠1 ≠ · · · ≠ „p≠1 ⁄ ≠ „p = 0. (67)

GRM
Panels
I Hence, the process defined by Eq. (63) is "not stable" (explodes) if the Max.Lik.Estim.
roots of the previous equation lie outside the unit circle. Binary
I "Not stable" will be given a more formal definition later on (Def. 8.3).
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
231 / 295
Statistics and
Econometrics II
Example 8.2
J.-P. Renne
I Consider the 2nd-order difference equation:
Prelim.
yt = 0.9yt≠1 ≠ 0.2yt≠2 + Át , Át ≥ i.i.d. N (0, 1). CLT
Tests
I The eigenvalues of F are 0.5 and 0.4.
Linear regression
Assumptions
OLS
Inference
Dynamic multipliers Simulated samples
Common pitfalls
Large sample
1.0
IV
4
GRM
0.8
2
Panels
0.6
Max.Lik.Estim.
0
0.4
Binary
−2
0.2
Time Series
−4
AR processes
0.0
MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting
j time Appendix
Stat. Tables
Web-interface illustration (use the "ARMA(p,q)" panel, with q=0)
232 / 295
Statistics and
Econometrics II
Example 8.3
J.-P. Renne
Prelim.
Tests
I The eigenvalues of F are 0.6 and 0.5.
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
1.0
4
GRM
0.8
Panels
2
0.6
Max.Lik.Estim.
0
0.4
Binary
−2
Time Series
0.2
−4
AR processes
0.0
−6
MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting
j time Appendix
Stat. Tables
Web-interface illustration (use ARMA(p,q) panel, with q=0)
233 / 295
Statistics and
Econometrics II
Example 8.4
J.-P. Renne
Prelim.
Tests
I The eigenvalues of F are complex conjugates: 0.7 ± 0.56i.
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
6
GRM
1.0
4
Panels
2
0.5
Max.Lik.Estim.
0
Binary
−2
0.0
−4
Time Series
AR processes
−6
−0.5
MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting
j time Appendix
Stat. Tables
234 / 295
Statistics and
Econometrics II
Example 8.5
J.-P. Renne
Prelim.
yt = 0.9yt≠1 + 0.2yt≠2 + Át , Át ≥ i.i.d. N (0, 1). CLT
Tests
I The eigenvalues of F are 1.084 and -0.184.
Linear regression
Assumptions
OLS
Inference
Common pitfalls
Large sample
IV
1.0 1.5 2.0 2.5 3.0 3.5 4.0
15000
GRM
Panels
Max.Lik.Estim.
5000
Binary
0
Time Series
−5000
AR processes
MLE of AR processes
0 5 10 15 20 0 20 40 60 80 100 Forecasting
j time Appendix
Stat. Tables
235 / 295
Statistics and
Econometrics II
Definition 8.8 (First-order AR process) J.-P. Renne
Process yt is an auto-regressive process of order one (AR(1)) if: Prelim.
CLT
yt = c + „yt≠1 + Át , Tests
Linear regression
where {Át }tœ]≠Œ,+Œ[ is a white noise process (see Def. 8.2). Assumptions
OLS
Inference
I If |„| Ø 1, yt is "explosive" (see Slide 230). Its variance does not exist; Common pitfalls
it is not stationary (see Def. 8.3). Large sample

IV
I If |„| < 1, one can see that: GRM
Panels
yt = c + Át + „(c + Át≠1 ) + „2 (c + Át≠2 ) + · · · + „k (c + Át≠k ) + . . .
Max.Lik.Estim.
I If |„| < 1, the unconditional mean and variance of yt are: Binary
Time Series
c ‡2 AR processes
E(yt ) = =: µ Var (yt ) = . MLE of AR processes

1≠„ 1 ≠ „2 Forecasting
Appendix
Stat. Tables
236 / 295
Statistics and
Econometrics II
I j th autocovariance: J.-P. Renne
Prelim.
E([yt ≠ µ][yt≠j ≠ µ])
2 j j+1 CLT
= E([Át + „Át≠1 + „ Át≠2 + · · · + „ Át≠j + „ Át≠j≠1 . . . ] ◊
Tests
[Át≠j + „Át≠j≠1 + „2 Át≠j≠2 + · · · + „k Át≠j≠k + . . . ])
Linear regression
= E(„j Á2t≠j + „j+2 Á2t≠j≠1 + „j+4 Á2t≠j≠2 + . . . ) Assumptions
OLS
„j ‡ 2
=
Inference
.
1 ≠ „2 Common pitfalls
Large sample
IV
I Therefore flj = „ . j GRM
Panels
I By what precedes, we have:
Max.Lik.Estim.
Binary
Proposition 8.2 (Covariance-stationarity of an AR(1) process)
Time Series
The AR(1) process defined in Def. 8.8 is covariance-stationary (see

AR processes
MLE of AR processes
Def. 8.3) iff |„| < 1. Forecasting
Appendix
Web-interface illustration (use AR(1) panel)
Stat. Tables
237 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 59: AR(1) process, yt = 0.6yt≠1 + Át Prelim.
CLT
●
Tests
3
3
● ●
●
●
●
Linear regression
●
2
2
● ●
●
●
●
● Assumptions
● ●
● ●
●
● ●● ● ● ●
●● ● ●
● OLS
●
1
1
●●● ●
● ●●● Inference
●
● ● ● ●●● ●
● ●● ●
● ●
●●
Common pitfalls
y(t)
● ●
0
0
● ● ●
● ●● ●● ● ● Large sample
●● ● ● ● ●
● ●● ●
● ● ●
● IV
● ● ●
−1
−1
● ● ● ● ●
● ●●
● ● GRM
●
●
● ●
● Panels
−2
−2
Max.Lik.Estim.
−3
−3
● Binary
0 20 40 60 80 100 −3 −2 −1 0 1 2 3 Time Series

AR processes
y(t−1) MLE of AR processes
Forecasting
Appendix
Stat. Tables
238 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 60: AR(1) process, yt = 0.95yt≠1 + Át Prelim.
CLT
4
4
●
Tests
●
●●
●
●
2
2
●●
●
● Assumptions
●
● ● ●
●
●● ● OLS
●
●
●● ●
● Inference
●
Common pitfalls
0
0
● ●●
y(t)
●● ●●
● ● ●●
●
● ●
● ● ● ● ● Large sample
● ●
● ● ● ●●
●● ● IV
● ●● ● ●
●●● ● GRM
●● ● ●●
−2
−2
●
●●● ● ● ●
●
●
● ● ●
●
● ● ● ●
● Panels
● ● ●
●● ●
●
●
● ● Max.Lik.Estim.
−4
−4
Binary
●
●
0 20 40 60 80 100 −4 −2 0 2 4 Time Series

AR processes
Forecasting
Appendix
Stat. Tables
239 / 295
Statistics and
Definition 8.9 (p th -order AR process) Econometrics II
J.-P. Renne
Process yt follows a p th -order autoregressive process (AR(p)) if:
Prelim.
yt = c + „1 yt≠1 + „2 yt≠2 + · · · + „p yt≠p + Át , (68)
CLT
where {Át }tœ]≠Œ,+Œ[ is a white noise process (see Def. 8.2). Tests
Linear regression
Assumptions
Proposition 8.3 (Covariance-stationarity of an AR(p) process) OLS
Inference
The AR(p) process defined in Def. 8.9 is covariance-stationary iff (i) the Common pitfalls
eigenvalues of F (as defined by Eq. 64) lie within the unit circle, i.e. iff (ii) the Large sample
roots of
IV
GRM
⁄p ≠ „1 ⁄p≠1 ≠ · · · ≠ „p≠1 ⁄ ≠ „p = 0. (69)
Panels
lie within the unit circle. Max.Lik.Estim.
Binary
Proof The fact that (i) is equivalent to (ii) is Remark 8.3. Slide 230 provides the idea of the proof in
the case where F can be diagonalized. Time Series
AR processes
MLE of AR processes
I If the AR(p) defined in Def. 8.9 is covariance-stationary, we have Forecasting
c
µ = E(yt ) = . Appendix
1 ≠ „1 ≠ · · · ≠ „p
Stat. Tables
I The autocovariances are derived from the Yule-Walker equations (see
next slide).
240 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Computation of the autocovariances of an AR(p) process
CLT
I To compute the autocovariances of yt (assumed to be
Tests
covariance-stationary), let’s rewrite Eq. (68):
Linear regression
Assumptions
(yt ≠ µ) = „1 (yt≠1 ≠ µ) + „2 (yt≠2 ≠ µ) + · · · + „p (yt≠p ≠ µ) + Át . OLS
Inference
Common pitfalls
I Multiplying both sides by yt≠j ≠ µ and taking expectations leads to the Large sample
Yule-Walker equations: IV
GRM
Ó
„1 “j≠1 + „2 “j≠2 + · · · + „p “j≠p if j>0 Panels
“j = (70)
„1 “1 + „2 “2 + · · · + „p “p + ‡ 2 for j = 0. Max.Lik.Estim.
Binary
I Using “j = “≠j (Prop. 8.1), one can solve for (“0 , “1 , . . . , “p ) as
functions of (‡ 2 , „1 , . . . , „p ). Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
241 / 295
Statistics and
Econometrics II
J.-P. Renne
Figure 61: AR(3) process, yt = 1.5yt≠1 ≠ 0.4yt≠2 ≠ 0.2yt≠3 + Át Prelim.
CLT
●
● ● Tests
●
Linear regression
●
5
5
●
●
● Assumptions
●●● ●●
●●
●
● ● OLS
● ●●●
●●● ●
● ● ● Inference
●
0
0
● ●
●●●●● ● ● Common pitfalls
●●
● ●
y(t)
● ●●●
●●●
●●●● ● ● Large sample
● ● ● ●
●● ●●● ●
● ● ●
IV
● ● ● ●
● ●● ●●● GRM
−5
−5
● ● ●●●
●●●
● ●
●
Panels
●
Max.Lik.Estim.
−10
−10
●
● ●
● Binary
0 20 40 60 80 100 −10 −5 0 5 Time Series

AR processes
Forecasting
Appendix
Stat. Tables
242 / 295
MLE of AR processes Statistics and
Econometrics II
J.-P. Renne
Context and Objectives Prelim.
I Context: CLT
yt = c + „1 yt≠1 + · · · + „p yt≠p + Át . (71) Tests
where {Át } is an i.i.d. white noise process (see Def. 8.2). Linear regression
Assumptions
I Objective: OLS
To estimate the vector of population parameters ◊, where: Inference

Common pitfalls
Large sample
2
◊ = [c, „1 , . . . , „p , ‡ ]. IV
GRM
I Observations: y = [y1 , . . . , yT ]Õ . Panels
I Probability density function (p.d.f.): fY ,...,Y1 (yT , . . . , y1 ; ◊). Max.Lik.Estim.

T
I Log-likelihood function: log L(◊; y) = log fY ,...,Y1 (yT , . . . , y1 ; ◊). Binary
T
I Maximum likelihood estimate of ◊ (see Section 6): Time Series
AR processes
MLE of AR processes
◊ MLE = arg max L(◊; y) = arg max log L(◊; y).

Forecasting
◊ ◊ Appendix
Stat. Tables
243 / 295
Statistics and
Econometrics II
I We use the Bayes’ formula:
J.-P. Renne
P(x2 = x , x1 = y ) = P(x2 = x |x1 = y )P(x1 = y )... Prelim.
CLT
I ... to get this key decomposition of our likelihood function:
Tests
fYT ,...,Y1 (yT , . . . , y1 ; ◊) = fYT |YT ≠1 ,...,Y1 (yT , . . . , y1 ; ◊) ◊ Linear regression

Assumptions
fYT ≠1 ,...,Y1 (yT ≠1 , . . . , y1 ; ◊). OLS
Inference
Common pitfalls
I Recursive application: Large sample
IV
GRM
T
Ÿ
Panels
fYT ,...,Y1 (yT , . . . , y1 ; ◊) = fY1 (y1 ; ◊) fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊).
t=2 Max.Lik.Estim.
Binary
I Log-likelihood:
Time Series
AR processes
T
log L(◊; y) = log fY1 (y1 ; ◊) + log fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊). Forecasting
Appendix
t=2
Stat. Tables
244 / 295
MLE of a Gaussian AR(1) process Statistics and
Econometrics II
I For t > 1: J.-P. Renne
fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊) = fYt |Yt≠1 (yt , yt≠1 ; ◊). Prelim.
CLT
I If Át ≥ N (0, ‡ 2 ) and yt = c + „1 yt≠1 + Át :
Tests
3 4 Linear regression
1 (yt ≠ c ≠ „1 yt≠1 )2
fYt |Yt≠1 (yt , yt≠1 ; ◊) = Ô exp ≠ . Assumptions
2fi‡ 2‡ 2 OLS
Inference
Common pitfalls
I What about fY1 (y1 ; ◊)? We may use the marginal distribution of yt . Large sample
A second possibility (a little less efficient but more convenient) is to IV
“sacrifice” this first observation and to consider the approximate GRM
likelihood: Panels
Max.Lik.Estim.
log Lú (◊; y) = log L(◊; y) ≠ log fY1 (y1 ; ◊)
Binary
T
ÿ Time Series
= log fYt |Yt≠1 ,...,Y1 (yt , . . . , y1 ; ◊). (72) AR processes
t=2 MLE of AR processes

Forecasting
I Exact and approximate MLE have the same large-sample distribution Appendix
(because –if the process is stationary– fY1 (y1 ; ◊) makes a relatively
Stat. Tables
negligible contribution to log L(◊; y)).
I Practical advantage of the conditional MLE: simply obtained by OLS.
245 / 295
MLE of a Gaussian AR(1) process Statistics and
Econometrics II
I With the notations: J.-P. Renne
S T S T Prelim.
y2 1 y1
.. .. .. CLT
Y =U .
V and X = U
. .
V,
Tests
yT 1 yT ≠1
Linear regression
Assumptions
Eq. (72) becomes: OLS
Inference
T ≠1 Common pitfalls
log L (◊; y)
ú
= ≠ log(2fi) ≠ (T ≠ 1) log(‡) Large sample
2 IV
1 GRM
≠ 2 (Y ≠ X [c, „1 ]Õ )Õ (Y ≠ X [c, „1 ]Õ ), (73)
2‡ Panels
Max.Lik.Estim.
which is maximised for:
Binary
[ĉ, „ˆ1 ]Õ = (X Õ X )≠1 X Õ Y (74) Time Series

AR processes
T
1 ÿ MLE of AR processes
‡ˆ2 = (yt ≠ ĉ ≠ „ˆ1 yt≠1 )2 Forecasting

T ≠1
t=2 Appendix
1 Stat. Tables
= Y Õ (I ≠ X (X Õ X )X Õ )Y . (75)
T ≠1
246 / 295
MLE of a Gaussian AR(p) process Statistics and
Econometrics II
J.-P. Renne
I For an AR(p) process, we have: Prelim.
CLT
log L(◊; y) = log fYp ,...,Y1 (yp , . . . , y1 ; ◊) +
Tests
T
ÿ Linear regression
log fYt |Yt≠1 ,...,Yt≠p (yt , . . . , yt≠p ; ◊) . Assumptions
OLS
t=p+1
¸ ˚˙ ˝ Inference
Common pitfalls
log Lú (◊;y)
Large sample
IV
I Maximisation of the approximate log-likelihood log Lú (◊; y): OLS, using GRM
Eqs. (74) and (75) with: Panels
S T S T Max.Lik.Estim.
yp+1 1 yp ... y1
Binary
.. .. .. ..
Y =U .
V and X = U
. . .
V.
Time Series
yT 1 yT ≠1 ... yT ≠p AR processes
MLE of AR processes
Forecasting
I For stationary processes, approximate and exact MLE have the same
Appendix
large-sample distribution.
Stat. Tables
247 / 295
Econometrics II
J.-P. Renne
Prelim.
CLT
Remark 8.4
Tests
I The regression is:
Linear regression
yt = — Õ xt + Át , (76) Assumptions
OLS
where xt = [1, yt≠1 , . . . , yt≠p ]Õ and — = [c, „1 , . . . , „p ]Õ . Inference
I The fact that xt correlates to the Ás for s < t implies that the OLS Common pitfalls
Large sample
estimator b of — is biased in small sample. IV
GRM
Proof (of second point in Remark 8.4) We have: Panels
Max.Lik.Estim.
Õ ≠1 Õ
b = — + (X X ) X Á (77) Binary
Because of the specific form of X , we have non-zero correlation between xt and Ás for s < t, therefore Time Series
AR processes
E[(X Õ X )≠1 X Õ Á] ”= 0. ⌅
MLE of AR processes
Forecasting
Appendix
Stat. Tables
248 / 295
Econometrics II
Proposition 8.4 (Large-sample porperties of the OLS estimator) J.-P. Renne
Assume {yt } follows the AR(p) process: Prelim.
CLT
yt = c + „1 yt≠1 + · · · + „p yt≠p + Át Tests
Linear regression
where {Át } is an i.i.d. white noise process. If b is the OLS estimator of — of Assumptions
Eq. (76), we have: OLS
Inference
C T
D≠1 C T
D Common pitfalls
Ô 1 ÿ Ô 1 ÿ Large sample
T (b ≠ —) = xt xtÕ T xt Át , IV
T T GRM
t=p t=1
¸ ˚˙ ˝¸ ˚˙ ˝ Panels
p d
æQ≠1 æN (0,‡ 2 Q) Max.Lik.Estim.
Binary
1
qT 1
qT
where Q = plim T
x xÕ = plim
t=p t t T
x xÕ is given by:
t=1 t t Time Series
AR processes
S T MLE of AR processes
1 µ µ ... µ Forecasting
W µ “0 + µ2 “1 + µ2 ... “p≠1 + µ2 X Appendix

W µ “1 + µ2 “0 + µ2 ... “p≠2 + µ2 X.
Q=W X (78) Stat. Tables
U .. .. .. .. V
. . . ... .
µ “p≠1 + µ2 “p≠2 + µ2 ... “ 0 + µ2
249 / 295
Econometrics II
J.-P. Renne
Prelim.
Proof (of Proposition 8.4) Rearranging Eq. (77), we have:
CLT
Ô Õ ≠1 Ô Õ Tests
T (b ≠ —) = (X X /T ) T (X Á/T ).
Linear regression
Let us consider the autocovariances of vt = xt Át , denoted by “jv . Using the fact that xt is a linear Assumptions
combination of past Át s and that Át is a white noise, we get that E(Át xt ) = 0. Therefore OLS
Inference
Common pitfalls
v Õ
“j = E(Át Át≠j xt xt≠j ). Large sample
IV
GRM
If j > 0, we have E(Át Át≠j xt xt≠j
Õ
) = E(E[Át Át≠j xt xt≠j
Õ
|Át≠j , xt , xt≠j ]) =
E(Át≠j xt xt≠j
Õ
E[Át |Át≠j , xt , xt≠j ])
= 0. Note that we have E[Át |Át≠j , xt , xt≠j ] = 0 because {Át } is Panels
an i.i.d. white noise sequence (see Remark 1.1). If j = 0, we have: Max.Lik.Estim.
v 2 Õ 2 Õ 2
Binary
“0 = E(Át xt xt ) = E(Át )E(xt xt ) = ‡ Q.
Time Series
Ô Ô qT AR processes
The convergence in distribution of T (X Á/T ) = T T1
Õ
vt results from Theorem 8.1 (applied MLE of AR processes
t=1
to vt = xt Át ), using the “jv computed above. ⌅ Forecasting
Appendix
Stat. Tables
250 / 295
Information Criteria and Lag length selection Statistics and
Econometrics II
J.-P. Renne
I Compromise: Prelim.
I Not enough lags: potential omission of valuable information CLT
contained in older lags. Tests

I Too many lags: "overfitting"/misspecification implying additional
Linear regression
estimation errors (in forecasts). Assumptions
I Different possibilities: OLS

Inference
I F-statistics approach: For an AR model, Common pitfalls
Large sample
(a) Start with a large number of lags (p0 ). IV
(b) If the t-statistics associated to the p0th lag is low, consider a GRM
model with p0 ≠ 1 lags (i.e. go back to (a) with p0 ≠ 1 Panels
instead of p0 ). Max.Lik.Estim.
I Information criteria: AIC, BIC. Binary
Idea: minimisation of a loss function equal to the sum of Time Series
(a) the sum of squared residuals (SSR) and AR processes
MLE of AR processes
(b) a penalty term depending (positively) on the number of Forecasting
parameters. Appendix
Stat. Tables
251 / 295
Statistics and
Definition 8.10 (Information criteria) Econometrics II
J.-P. Renne
The Akaike (AIC), Hannan-Quinn (HQ) and Schwarz information (BIC)
criteria are of the form Prelim.
CLT
ˆ T (k); y)
≠2 log L(◊ k„(i) (T )
(i)
c (k) = + Tests
T T
¸ ˚˙ ˝ ¸ ˚˙ ˝ Linear regression
decreases w.r.t. k increases w.r.t. k Assumptions
OLS
ˆ T (k) denotes the ML estimate of

Inference
with (i) œ {AIC , HQ, BIC } and where ◊ Common pitfalls
◊ 0 (k) –a vector of parameters of length k. For q Ø 0, the model specified by Large sample
k parameters is a particular case of a model of k + q parameters. IV

GRM
Panels
Criterion (i) „(i) (T )
Max.Lik.Estim.
Akaike AIC 2
Hannan-Quinn HQ 2 log(log(T )) Binary
Schwarz BIC log(T ) Time Series
AR processes
The lag suggested by criterion (i) is given by: MLE of AR processes

Forecasting
Appendix
k̂ (i) = argmin c (i) (k). Stat. Tables
k
252 / 295
Case of a normal linear regression Statistics and
Econometrics II
I We consider a linear regression with normal disturbances:
J.-P. Renne
2
yt = xt — + Át , Át ≥ i.i.d. N (0, ‡ ). Prelim.
CLT
The associated log-likelihood is of the form of Eq. (72).
Tests
I In that case, we have:
Linear regression
Assumptions
ˆ T (k); y)
≠2 log L(◊ k„(i) (T )
c (i) (k)
OLS
= +
T T
Inference
Common pitfalls
T
1 ÿ Á2t k„(i) (T )
Large sample
¥ log(2fi) + log(‡‚2 ) + + . IV
T ‡‚2 T GRM
t=1
Panels
I For a large T , for all consistent estimation scheme, we have: Max.Lik.Estim.
Binary
T
1 ÿ Time Series
‡‚2 ¥ Á2t = SSR/T . AR processes
T MLE of AR processes
t=1
Forecasting
Appendix
k„(i) (T )
I Hence k̂ (i) ¥ argmin log(SSR/T ) + . Stat. Tables
k T
I In the case of an ARMA(p,q) process, k = 2 + p + q
253 / 295
Forecasting with AR models Statistics and
Econometrics II
J.-P. Renne
Prelim.
I Macroeconomic forecasts are done in many places: Public CLT
Administration (notably Treasuries), Central Banks, International Tests
Institutions (e.g. IMF, OECD), banks, big firms.
Linear regression
I These institutions are interested in the point estimates (≥ most likely Assumptions
value) of the variable of interest. They also interested in measuring the OLS
uncertainty (≥ dispersion of relatively likely outcomes) associated with Inference
the point estimates.

Common pitfalls
Large sample
IV
GRM
Available forecasts
Panels
I Philly Fed Survey of Professional Forecasters.
Max.Lik.Estim.
I ECB Survey of Professional Forecasters.
Binary
I IMF World Economic Outlook.
Time Series
I OECD Global Economic Outlook. AR processes
MLE of AR processes
I European Commission Economic Forecasts. Forecasting
Appendix
Stat. Tables
254 / 295
persistence in low inflation, but to be broadly balanced
further out. Statistics and
Econometrics II
In light of the economic outlook, at its meeting ending on
3 February the MPC voted to maintain Bank Rate at 0.5% and J.-P. Renne
the stock of purchased assets at £375 billion. The factors
Figure 62: "Fan Chart" frombehind
the that
Bank of are
decision England
set out in the Monetary Policy Prelim.
Summary on pages i–ii of this Report, and in more detail in the
Minutes of the meeting.(1) CLT
Chart 5.2 CPI inflation projection based on market interest Chart 5.3 CPI inflation projection in November based on market
Tests
rate expectations and £375 billion purchased assets interest rate expectations and £375 billion purchased assets
Linear regression
Percentage increase in prices on a year earlier Percentage increase in prices on a year earlier
6 6 Assumptions
5 5
OLS
Inference
4 4
Common pitfalls
3 3
Large sample
2 2 IV
1 1 GRM
+ +
Panels
0 0
– –
1 1
2 2 Max.Lik.Estim.
2011 12 13 14 15 16 17 18 19
3
2011 12 13 14 15 16 17 18 19
3
Binary
Charts 5.2 and 5.3 depict the probability of various outcomes for CPI inflation in the future. They have been conditioned on the assumption that the stock of purchased assets financed by the issuance of central bank reserves
remains at £375 billion throughout the forecast period. If economic circumstances identical to today’s were to prevail on 100 occasions, the MPC’s best collective judgement is that inflation in any particular quarter would lie
within the darkest central band on only 30 of those occasions. The fan charts are constructed so that outturns of inflation are also expected to lie within each pair of the lighter red areas on 30 occasions. In any particular Time Series
quarter of the forecast period, inflation is therefore expected to lie somewhere within the fans on 90 out of 100 occasions. And on the remaining 10 out of 100 occasions inflation can fall anywhere outside the red area of the
fan chart. Over the forecast period, this has been depicted by the light grey background. See the box on pages 48–49 of the May 2002 Inflation Report for a fuller description of the fan chart and what it represents. AR processes
MLE of AR processes
Source: Bank of England Inflation Report,

(1) The Minutes areFeb.
available2016.
at www.bankofengland.co.uk/publications/minutes/
Forecasting
Documents/mpc/pdf/2016/feb.pdf.
Appendix
Stat. Tables
255 / 295
Statistics and
Econometrics II
J.-P. Renne
Context and Objectives
Prelim.
I Context:
CLT
We have observations for yt and xt (potentially multi-dimensional)
variables. Tests
xt may contain lagged values of yt . Linear regression
I Objective: Assumptions
OLS
Inference
Forecasting yt+1 based on a set of information xt . Common pitfalls

Large sample
IV
I Loss function necessary to assess the forecasting method. GRM
Example: Quadratic loss function: Panels
Max.Lik.Estim.
E([yt+1 ≠ yt+1
ú
]2 ) Binary
¸ ˚˙ ˝
Mean square error (MSE) Time Series
AR processes
MLE of AR processes
where yt+1
ú is the forecast of yt+1 (function of xt ). Forecasting
Appendix
Stat. Tables
256 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Theorem 8.2 (Smallest MSE)
Tests
The smallest MSE is obtained with the expectation of yt+1 conditional on xt . Linear regression
Assumptions
OLS
Proof (Sketch of) Consider yt+1
ú
= g(xt ). We have: Inference
Common pitfalls
ú 2
! ú 2
" Large sample
E([yt+1 ≠ yt+1 ] ) = E [{yt+1 ≠ E(yt+1 |xt )} + {E(yt+1 |xt ) ≠ yt+1 }] . IV
GRM
Expand. Use the law of iterated expectations on the double product (conditioning on xt ); the latter is 0. Panels
ú 2
It implies E([yt+1 ≠ yt+1 ] ) is always larger than E([yt+1 ≠ E(yt+1 |xt )]2 ), and there is equality if
Max.Lik.Estim.
E(yt+1 |xt ) = yt+1
ú
. ⌅
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
257 / 295
Forecasting an AR(p) process Statistics and
Econometrics II
J.-P. Renne
I Consider the process:
Prelim.
2
yt = c +„1 yt≠1 +„2 yt≠2 +· · ·+„p yt≠p +Át , with Át ≥ i.i.d. N (0, ‡ ). CLT
Tests
I Using the notation of Eq. (64), we have:
Linear regression
Assumptions
yt ≠ µ = F (yt≠1 ≠ µ) + › t . OLS
Inference
Common pitfalls
Hence: yt+h ≠ µ = › t+h + F › t+h≠1 + · · · + F h≠1 › t+1 + F h (yt ≠ µ). Large sample
IV
I Therefore: GRM
Panels
E(yt+h |yt , yt≠1 , . . . ) = µ + F h (yt ≠ µ)
Max.Lik.Estim.
Var ([yt+h ≠ E(yt+h |yt , yt≠1 , . . . )]) = + F F Õ + · · · + F h≠1 (F h≠1 )Õ ,
Binary
where: Time Series

S T
‡2 0 ... AR processes
MLE of AR processes
0 0
=U V. Forecasting
. ..
.. . Appendix
Stat. Tables
258 / 295
Figure 63: Forecasts and 95% confidence intervals when shocks are Gaussian
30
h
25
Et(yt+h) − 1.96 Vart(yt+h)
E(yt)
20
Et(yt+h)
15
Et(yt+h) − 1.96 Vart(yt+h)

10
date of forecast (date t)
0 50 100 150 200
Assessing the performances of a forecasting model Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
I Once one has fitted a model on a given set of data (of length T , say),
one compute MSE (mean square errors) to evaluate the performance of Tests
the model. Linear regression
I But this MSE is the in-sample one. Assumptions
OLS
I It is easy to reduce in-sample MSE. Typically, if the model is estimated Inference
by OLS, adding covariates mechanically reduces the MSE. Common pitfalls

Large sample
∆ Even if additional data are "irrelevant", the R 2 of the regression IV
increases. GRM
I One also has to analyse out-of-sample performances of the forecasting Panels

model. Max.Lik.Estim.
a. Estimate a model on a sample of reduced size (1, . . . , T ú , with Binary
Tú < T)
b. Use the remaining available periods (T ú + 1, . . . , T ) to compute Time Series
out-of-sample forecasting errors.

AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
260 / 295
Statistics and
Econometrics II
Figure 64: In-sample versus out-of-sample forecasting
J.-P. Renne
p=1 p=2
Prelim.
0.04
0.04
CLT
0.03
0.03
Tests
0.02
0.02
0.01
0.01
Linear regression
0.00
0.00
Assumptions
OLS
Inference
−0.02
−0.02
Common pitfalls
1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 Large sample
IV
p=4 p=7 GRM
Panels
0.04
0.04
0.03
0.03
Max.Lik.Estim.
0.02
0.02
Binary
0.01
0.01
0.00
0.00
Time Series
AR processes
−0.02
−0.02
MLE of AR processes
Forecasting
1970 1980 1990 2000 2010 1970 1980 1990 2000 2010
Appendix
Data: U.S. GDP growth. The blue lines are in-sample forecasts, they are based on AR(p) models
estimated on the whole sample. The red lines are out-of-sample forecasts, they are based on reduced Stat. Tables
samples (from the first date to the date at which the red line begins).
261 / 295
I The previous (out-of-sample) exercise is a form of backtesting. Statistics and
Econometrics II
I Other forms of backtesting aim at testing whether the realisations of the
J.-P. Renne
variable are consistent with (previous) confidence intervals.
I Typically, if realised (ex post) variables tend to be far outside (ex ante) Prelim.
confidence intervals, we may deem the model inadequate.
CLT
Tests
Linear regression
Figure 65: Backtesting a forecasting approach
Assumptions
OLS
Inference
Dubious model Reasonable model Common pitfalls

Large sample
0.04
0.04
IV
GRM
0.03
0.03
Panels
0.02
0.02
Max.Lik.Estim.
0.01
0.01 Binary
0.00
0.00
Time Series
AR processes
MLE of AR processes
−0.02
−0.02
Forecasting
Appendix
1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 Stat. Tables
Model-implied 95% confidence intervals in red.
262 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Appendix Inference
Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
263 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Definition 9.1 (Skewness and Kurtosis) CLT
Tests
Let Y be a random variable whose fourth moment exists. The
Linear regression
expectation of Y is denoted by µ. Assumptions
I The skewness of Y is given by: OLS

Inference
Common pitfalls
3
E[(Y ≠ µ) ] Large sample
.
{E[(Y ≠ µ)2 ]}3/2
IV
GRM
Panels
I The kurtosis of Y is given by:
Max.Lik.Estim.
E[(Y ≠ µ)4 ] Binary
.
{E[(Y ≠ µ)2 ]}2 Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
264 / 295
Statistics and
Econometrics II
J.-P. Renne
Definition 9.2 (Eigenvalues)
Prelim.
The eigenvalues of of a matrix M are the numbers ⁄ for which: CLT
|M ≠ ⁄I| = 0, Tests
Linear regression
where | • | is the determinant operator. Assumptions
OLS
Inference
Useful properties of the determinant | • |

Common pitfalls
Large sample
IV
I |MN| = |M| ◊ |N|. GRM
I |M ≠1
| = |M| ≠1
. Panels
Max.Lik.Estim.
I If M admits the diagonal representation M = TDT ≠1 , where D is
Binary
a diagonal matrix whose diagonal entries are {⁄i }i=1,...,n , then:
Time Series
n
Ÿ AR processes
|M ≠ ⁄I| = (⁄i ≠ ⁄). MLE of AR processes

Forecasting
i=1
Appendix
Stat. Tables
265 / 295
Statistics and
Econometrics II
J.-P. Renne
Definition 9.3 (Moore-Penrose inverse) Prelim.

m◊n
If M œ R , then its Moore-Penrose pseudo inverse (exists and) is the CLT
unique matrix M ú œ Rn◊m that satisfies: Tests
Linear regression
(i) MM M = M
ú
Assumptions
(ii) M MM = M
ú ú ú OLS
Inference
(iii) (MM ú )Õ = MM ú Common pitfalls

Large sample
(iv) (M ú M)Õ = M ú M. IV
GRM
Panels
Remark 9.1 (Some properties) Max.Lik.Estim.
I If M is invertible then M ú = M ≠1 . Binary
Time Series
I The pseudo-inverse of a zero matrix is its transpose. AR processes
I The pseudo-inverse of the pseudo-inverse is the original matrix. MLE of AR processes

Forecasting
Appendix
Stat. Tables
266 / 295
Definition 9.4 (F statistics) Statistics and
Econometrics II
Consider n = n1 + n2 i.i.d. N (0, 1) r.v. Xi . If the r.v. F is defined by: J.-P. Renne
qn1 2 Prelim.
Xi n2
F = qn1i=1
+n2 CLT
2
j=n1 +1
Xj n1
Tests
Linear regression
then F ≥ F (n1 , n2 ). Table
Assumptions
OLS
Inference
Definition 9.5 (Student-t distribution) Common pitfalls

Large sample
Z follows a Student-t (or t) distribution with ‹ degrees of freedom (d.f.) if:

IV
GRM
Úq Panels
O ‹
Xi2
i=1 Max.Lik.Estim.
Z = X0 , Xi ≥ i.i.d.N (0, 1).
‹ Binary
Time Series
We have E(Z ) = 0, and Var (Z ) = ‹
‹≠2
if ‹ > 2. Table
AR processes
MLE of AR processes
Forecasting
2
Definition 9.6 (‰ distribution) Appendix
q‹ Stat. Tables
Z follows a ‰2 distribution with ‹ d.f. if Z = i=1
Xi2 where
Xi ≥ i.i.d.N (0, 1). We have E(Z ) = ‹. Table
267 / 295
Statistics and
Definition 9.7 (Cauchy distribution) Econometrics II
J.-P. Renne
The probability distribution function of the Cauchy distribution defined
by a location parameter µ and a scale parameter “ is: Prelim.
CLT
1
f (x ) = 1 # x ≠µ $2 2 . Tests
fi“ 1 + “ Linear regression
Assumptions
The mean and variance of this distribution are undefined.

OLS
Inference
Common pitfalls
Large sample
IV
Cauchy distribution (location=0, scale=1) GRM
Panels
0.30
Max.Lik.Estim.
0.20
Binary
Time Series
0.10
AR processes
MLE of AR processes
0.00
Forecasting
−4 −2 0 2 4 Appendix
Stat. Tables
268 / 295
Statistics and
Econometrics II
Definition 9.8 (Idempotent matrix) J.-P. Renne
2
Matrix M is idempotent if M = M. Prelim.
If M is a symmetric idempotent matrix, then M Õ M = M. CLT
Tests
Proposition 9.1 (Roots of an idempotent matrix) Linear regression
Assumptions
The eigenvalues of an idempotent matrix are either 1 or 0. OLS

Inference
Common pitfalls
Large sample
Proof If ⁄ is an eigenvalue of an idempotent matrix M then ÷x ”= 0 s.t. Mx = ⁄x . Hence IV
M 2 x = ⁄Mx ∆ (1 ≠ ⁄)Mx = 0. Either all element of Mx are zero, in which case ⁄ = 0 or at least GRM
one element of Mx is nonzero, in which case ⁄ = 1. ⌅
Panels
Max.Lik.Estim.
Proposition 9.2 (Idempotent matrix and ‰2 distribution) Binary
Time Series
The rank of a symmetric idempotent matrix is equal to its trace. AR processes
MLE of AR processes
Forecasting
Proof The result follows from Prop. 9.1, combined with the fact that the rank of a symmetric matrix is
equal to the number of its nonzero eigenvalues. ⌅ Appendix
Stat. Tables
269 / 295
Statistics and
Econometrics II
J.-P. Renne
Proposition 9.3 (Constrained least squares) Prelim.
CLT
The solution of the following optimisation problem: Tests
2
min ||y ≠ X—|| Linear regression
— Assumptions
subject to R— = q
OLS
Inference
Common pitfalls
is given by: Large sample

IV
GRM
r
— = — 0 ≠ (X X) Õ ≠1
R {R(X X)
Õ Õ ≠1
R}
Õ ≠1
(R— 0 ≠ q), Panels
Max.Lik.Estim.
where — 0 = (XÕ X)≠1 XÕ y.
Binary
Time Series
Proof See for instance Jackman, 2007. AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
270 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
Theorem 9.1 (Chebychev’s inequality) CLT
Tests
2
E[(X ≠ c) ] Linear regression
’Á > 0, P(|X ≠ c| > Á) Æ . Assumptions
Á2 OLS
Inference
Common pitfalls
Theorem 9.2 (Generalized Chebychev’s inequality) Large sample
IV
r
If E(|X | ) is finite for some r > 0 then, ’Á > 0, GRM
r
]
P(|X ≠ c| > Á) Æ E[|XÁ≠c| r . Panels
Max.Lik.Estim.
Proof Remark that Ár I{|X |ØÁ} Æ |X |r and take the expectation of both sides. ⌅ Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
271 / 295
Statistics and
Econometrics II
p
Definition 9.9 (Convergence in probability, xn æ c) J.-P. Renne
I The random variable sequence xn converges in probability to a Prelim.
constant c if ’Á, limnæŒ P(|xn ≠ c| > Á) = 0. CLT
I It is denoted as: plim xn = c. Tests
Linear regression
Definition 9.10 (Convergence in the Lr norm)

Assumptions
OLS
Inference
xn converges in the r -th mean (or in the Lr -norm) towards x , if E(|xn |r ) Common pitfalls
Large sample
and E(|x |r ) exist and if IV
GRM
r
lim E(|xn ≠ x | ) = 0. Panels
næŒ
Max.Lik.Estim.
In particular, for r = 2, this convergence is called mean square Binary
convergence.
Time Series
AR processes
a.s. MLE of AR processes

Definition 9.11 (Almost sure convergence, xn æ c) Forecasting
The random variable sequence xn converges almost surely to c if Appendix
P(limnæŒ xn = c) = 1. Stat. Tables
272 / 295
Statistics and
Econometrics II
d
Definition 9.12 (Convergence in distribution, xn æ x ) J.-P. Renne
Prelim.
xn is said to converge in distribution (or in law) to x if
CLT
lim Fxn (s) = Fx (s) Tests
næŒ
Linear regression
for all s at which FX –the cumulative distribution of X – is continuous. Assumptions
OLS
Inference
Proposition 9.4 (Rules for limiting distributions) Common pitfalls

Large sample
IV
d p
(i) Slutsky’s theorem: If xn æ x and yn æ c then GRM
Panels
d
xn yn æ xc Max.Lik.Estim.
d
xn + yn æ x +c Binary
d Time Series
xn /yn æ x /c (if c ”= 0) AR processes
MLE of AR processes
d Forecasting
(ii) Continuous mapping theorem: If xn æ x and g is a continuous
Appendix
d
function then g(xn ) æ g(x ). Stat. Tables
273 / 295
Proposition 9.5 (Implications of stochastic convergences) Statistics and
Econometrics II
J.-P. Renne
Prelim.
Ls Lr
æ ∆ æ CLT
1Ær Æs
Tests
»
Linear regression
a.s. p d Assumptions
æ ∆ æ ∆ æ OLS
!p" 1 2 Inference
d Common pitfalls
Proof of the fact that æ ∆ æ . Large sample
IV
p
Assume that Xn æ X . Denoting by F and Fn the c.d.f. of X and Xn , respectively: GRM
Fn (x ) = P(Xn Æ x , X Æ x + Á) + P(Xn Æ x , X > x + Á) Æ F (x + Á) + P(|Xn ≠ X | > Á). (79) Panels

Besides, Max.Lik.Estim.
F (x ≠ Á) = P(X Æ x ≠ Á, Xn Æ x ) + P(X Æ x ≠ Á, Xn > x ) Æ Fn (x ) + P(|Xn ≠ X | > Á),
Binary
which implies:
F (x ≠ Á) ≠ P(|Xn ≠ X | > Á) Æ Fn (x ). (80) Time Series
AR processes
Eqs. (79) and (80) imply:
MLE of AR processes
F (x ≠ Á) ≠ P(|Xn ≠ X | > Á) Æ Fn (x ) Æ F (x + Á) + P(|Xn ≠ X | > Á). Forecasting
Taking limits as n æ Œ yields Appendix

F (x ≠ Á) Æ lim inf Fn (x ) Æ lim sup Fn (x ) Æ F (x + Á). Stat. Tables
næŒ næŒ
The result is then obtained by taking limits as Á æ 0 (if F is continuous at x ). ⌅
274 / 295
Statistics and
Econometrics II
J.-P. Renne
Proposition 9.6 (Convergence in distribution to a constant)
Prelim.
If Xn converges in distribution to a constant c, then Xn converges in CLT
probability to c. Tests
Linear regression
Proof If Á > 0, we have P(Xn < c ≠ Á) æ 0 i.e. P(Xn Ø c ≠ Á) æ 1 and Assumptions
næŒ næŒ OLS
P(Xn < c + Á) æ 1. Therefore P(c ≠ Á Æ Xn < c + Á) æ 1. ⌅
Inference
næŒ næŒ
Common pitfalls
Large sample
IV
Example of plim but not Lr convergence GRM
Panels
I Let {xn }nœN be a series of random variables defined by:
Max.Lik.Estim.
xn = nun , Binary
Time Series
where un are independent random variables s.t. un ≥ B(1/n). AR processes
MLE of AR processes
p Lr
I We have xn æ 0 but xn 9 0 because E(|Xn ≠ 0|) = E(Xn ) = 1. Forecasting
Appendix
Stat. Tables
275 / 295
Statistics and
Econometrics II
J.-P. Renne
Theorem 9.3 (Cauchy criterion (non-stochastic case))

Prelim.
qT
a converges (T æ Œ) iff, for any ÷ > 0, there exists an integer
i=0 i
CLT
N such that, for all M Ø N, Tests

- M - Linear regression
-ÿ -
- - Assumptions
- ai - < ÷. OLS
- - Inference
i=N+1 Common pitfalls
Large sample
IV
Theorem 9.4 (Cauchy criterion (stochastic case)) GRM
qT Panels
◊Á
i=0 i t≠i
converges in mean square (T æ Œ) to a random variable
Max.Lik.Estim.
iff, for any ÷ > 0, there exists an integer N such that, for all M Ø N,
Binary
CA M
B2 D
ÿ Time Series
E ◊i Át≠i < ÷. AR processes
MLE of AR processes
i=N+1 Forecasting
Appendix
Stat. Tables
276 / 295
Statistics and
Econometrics II
Definition 9.13 (Characteristic function)
J.-P. Renne
For any real-valued random variable X , the characteristic function is Prelim.

defined by: CLT
„X : u æ E[exp(iuX )].
Tests
Linear regression
Theorem 9.5 (Law of large numbers) Assumptions
OLS
The sample mean is a consistent estimator of the population mean. Inference

Common pitfalls
Large sample
Proof. Let’s denote by „Xi the characteristic function of a r.v. Xi . If the mean of Xi is µ then the
IV
Talyor expansion of the characteristic function is:
GRM
„Xi (u) = E(exp(iuX )) = 1 + iuµ + o(u). Panels
Max.Lik.Estim.
The properties of the characteristic function (see Def. 9.13) imply that:
Binary
n 1 1 22
Ÿ u u iuµ Time Series
„1 (u) = 1+i µ+o æe .
(X1 +···+Xn ) n n AR processes
n
i=1 MLE of AR processes
Forecasting
iuµ
The facts that (a) e is the characteristic function of the constant µ and (b) that a characteristic Appendix
function uniquely characterises a distribution imply that the sample mean converges in distribution to
Stat. Tables
the constant µ, which further implies that it converges in probability to µ [Exercise]. ⌅
277 / 295
Statistics and
Econometrics II
Theorem 9.6 (Lindberg-Levy Central limit theorem, CLT) J.-P. Renne
If xn is an i.i.d. sequence of random variables with mean µ and variance Prelim.

‡ 2 (œ]0, +Œ[), then: CLT
n Tests
Ô d 2 1ÿ
n(x̄n ≠ µ) æ N (0, ‡ ), where x̄n = xi . Linear regression
n Assumptions
i=1 OLS
Inference
Common pitfalls
Large sample
IV
Ô GRM
Proof Let us introduce the r.v. Yn :=
# ! "$n n(X̄n ≠ µ). We have
„Yn (u) = E exp(i Ô1 u(X1 ≠ µ)) . We have: Panels
n
Max.Lik.Estim.
Ë 1 1 22Èn Ë 1 2Èn
1 1 1 2 2 2
E exp i Ô u(X1 ≠ µ) = E 1 + i Ô u(X1 ≠ µ) ≠ u (X1 ≠ µ) + o(u ) Binary
n n 2n
1 2n Time Series
1 2 2 2 AR processes
= 1≠ u ‡ + o(u ) .
2n MLE of AR processes
Forecasting
! "
Therefore „Yn (u) æ exp ≠ 12 u 2 ‡ 2
, which is the characteristic function of N (0, ‡ 2 ). ⌅ Appendix
næŒ
Stat. Tables
278 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Proposition 9.7 (Inverse of a partitioned matrix)
Linear regression
We have: Assumptions
OLS
5 6≠1 Inference
A11 A12 Common pitfalls
=
A21 A22 Large sample
IV
5 6 GRM
(A11 ≠ A12 A≠1
22 A21 )
≠1
11 A12 (A22 ≠ A21 A11 A12 )
≠A≠1 ≠1 ≠1
Panels
≠(A22 ≠ A21 A11 A12 ) A21 A≠1
≠1 ≠1
11 (A22 ≠ A21 A11 A12 )
≠1 ≠1
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
279 / 295
back Statistics and
Econometrics II
Proposition 9.8 (Independence of a linear and a quadratic form) J.-P. Renne
If A is idempotent and if x is Gaussian, Lx and x Ax are independent if Õ

Prelim.
LA = 0. CLT
Tests
Proof If LA = 0, then the two Gaussian vectors Lx and Ax are independent. This implies the
independence of any function of Lx and any function of Ax. The results then follows from the Linear regression
observation that xÕ Ax = (Ax)Õ (Ax), which is a function of Ax. ⌅ Assumptions
OLS
Inference
Common pitfalls
Proposition 9.9 (Inner product of a multivariate Gaussian variable) Large sample
IV
Let X be a n-dimensional multivariate Gaussian variable: X ≥ N (0, ).

GRM
Panels
We have:
X Õ ≠1 X ≥ ‰2 (n). Max.Lik.Estim.
Binary
Proof Because is a symmetrical definite positive matrix, it admits the spectral decomposition PDP Õ Time Series
where P is an orthogonal matrix (i.e. PP Õ = Id) and D is a diagonal matrix with non-negative entries. AR processes
Ô
Denoting by D ≠1 the diagonal matrix whose Ô diagonal entries are the inverse of those of D, it is easily
MLE of AR processes
Forecasting
checked that the covariance matrix of Y := D ≠1 P Õ X is Id. Therefore Y is a vector of uncorrelated
Gaussian variables. The properties of Gaussian variables imply that the components of Y are then also
q Appendix
independent. Hence Y Õ Y = Yi2 ≥ ‰2 (n).
i
Stat. Tables
It remains to note that Y Õ Y = X Õ PD ≠1 P Õ X = X Õ Var (X )≠1 X to conclude. ⌅
280 / 295
Statistics and
Econometrics II
J.-P. Renne
Theorem 9.7 (Cauchy-Schwarz inequality) Prelim.
CLT
We have: 
|Cov (X , Y )| Æ Var (X )Var (Y ) Tests
Linear regression
and, if X ”== and Y ”= 0, the equality holds iff X and Y are the same Assumptions
up to an affine transformation. OLS
Inference
Common pitfalls
Cov (X ,Y ) Large sample
Proof If Var (X ) = 0, this is trivial. If this is not the case, then let’s define Z as Z = Y ≠ Var (X )
X.
IV
Cov (X ,Y )
It is easily seen that Cov (X , Z ) = 0. Then, the variance of Y = Z + Var (X )
X is equal to the sum of GRM
Cov (X ,Y )
the variance of Z and of the variance of X, that is: Panels
Var (X )
1 22 1 22 Max.Lik.Estim.
Cov (X , Y ) Cov (X , Y )
Var (Y ) = Var (Z ) + Var (X ) Ø Var (X ). Binary
Var (X ) Var (X )
Time Series
Cov (X ,Y )
The equality holds iff Var (Z ) = 0, i.e. iff Y = Var (X )
X + cst. ⌅ AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
281 / 295
Statistics and
Econometrics II
J.-P. Renne
Definition 9.14 (Matrix derivatives)
Prelim.
Consider a fonction f : RK æ R. Its first-order derivative is: CLT
S T
ˆf
ˆb1
(b) Tests
ˆf W .. X Linear regression
(b) = U . V. Assumptions
ˆb
(b)
ˆf OLS
ˆbK Inference
Common pitfalls
We use the notation: Large sample

IV
1 2Õ GRM
ˆf ˆf
(b) = (b) . Panels
ˆbÕ ˆb
Max.Lik.Estim.
Proposition 9.10 Binary
Time Series
I If f (b) = AÕ b where A is a K ◊ 1 vector then ˆf
(b) = A. AR processes
ˆb MLE of AR processes
I If f (b) = b Ab where A is a K ◊ K matrix,

Õ
then ˆb
ˆf
(b) = 2Ab. Forecasting
Appendix
Stat. Tables
282 / 295
Statistics and
Econometrics II
J.-P. Renne
Definition 9.15 (Asymptotic level)

Prelim.
An asymptotic test with critical region n has an asymptotic level equal CLT
to – if: Tests
sup lim P◊ (Sn œ n ) = –, Linear regression
◊œ næŒ
Assumptions
where Sn is the test statistic and is such that the null hypothesis H0 OLS
Inference
is equivalent to ◊ œ . Common pitfalls
Large sample
IV
Definition 9.16 (Asymptotically consistent test) GRM
Panels
An asymptotic test with critical region n is consistent if: Max.Lik.Estim.
c Binary
’◊ œ , P◊ (Sn œ n) æ 1,
Time Series
c
where Sn is the test statistic and is such that the null hypothesis H0 AR processes
MLE of AR processes
/ c.
is equivalent to ◊ œ Forecasting
Appendix
Stat. Tables
283 / 295
Understanding two-sided tests Statistics and
Econometrics II
back to main Web interface producing these charts
J.-P. Renne
Prelim.
Figure 66: Understanding two-sided tests (1/3)
CLT
Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ t(5)
Linear regression
Assumptions
α is the level (or size) of the test. OLS
Inference
0.5
Common pitfalls
Critical region (sum of areas = α)
Large sample
Density of test statistic under Ho
IV
0.4
GRM
Panels
0.3
Max.Lik.Estim.
Φ−1
X (α 2) Φ−1
X (1 − α 2)
Binary
0.2
Time Series
AR processes
0.1
MLE of AR processes
Forecasting
0.0
Appendix
Test statistic X
284 / 295
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
α is the level (or size) of the test. p−value > α, which implies OLS
that we cannot reject Ho Inference

0.5
Common pitfalls
Sum of areas = p−value
Large sample
IV
0.4
GRM
(Sample−based) Test statistics X
Panels
0.3
Max.Lik.Estim.
Φ−1
X (α 2) Φ−1
X (1 − α 2)
Binary
0.2
Time Series
AR processes
0.1
MLE of AR processes
Forecasting
0.0
Appendix
Test statistic X
285 / 295
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
α is the level (or size) of the test. p−value < α, which implies OLS
that we reject Ho Inference

0.5
Common pitfalls
Sum of areas = p−value
Large sample
IV
0.4
GRM
Panels
0.3
Max.Lik.Estim.
Φ−1
X (α 2) Φ−1
X (1 − α 2)
Binary
0.2
Time Series
AR processes
0.1
MLE of AR processes
Forecasting
0.0
Appendix
Test statistic X
286 / 295
Understanding one-sided tests Statistics and
Econometrics II
J.-P. Renne
Prelim.
Figure 69: Understanding one-sided tests (1/3)
CLT
Tests
Example of one−sided test [specifically, under H0, the test statistic X ~ χ2(5)
Linear regression
Assumptions
α is the level (or size) of the test. OLS
Inference
0.20
Common pitfalls
Critical region (area = α)
Large sample
IV
GRM
0.15
Panels
Max.Lik.Estim.
0.10
Φ−1
X (1 − α)
Binary
Time Series
0.05
AR processes
MLE of AR processes
Forecasting
0.00
Appendix
0 5 10 15 Stat. Tables
Test statistic X
287 / 295
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
α is the level (or size) of the test. p−value > α, which implies OLS
that we cannot reject Ho Inference

0.20
Common pitfalls
Area = p−value
Large sample
IV
GRM
0.15

Panels
Max.Lik.Estim.
0.10
Φ−1
X (1 − α)
Binary
Time Series
0.05
AR processes
MLE of AR processes
Forecasting
0.00
Appendix
Test statistic X
288 / 295
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
α is the level (or size) of the test. p−value < α, which implies OLS
that we reject Ho Inference

0.20
Common pitfalls
Large sample
IV
GRM
0.15
Area = p−value
Panels
Max.Lik.Estim.
0.10
Φ−1
X (1 − α)
Binary
Time Series
0.05
AR processes
MLE of AR processes
Forecasting
0.00
Appendix
Test statistic X
289 / 295
Statistics and
Econometrics II
J.-P. Renne
Prelim.
CLT
Tests
Linear regression
Assumptions
OLS
Statistical Tables Inference

Common pitfalls
Large sample
IV
GRM
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
Appendix
Stat. Tables
290 / 295
Statistical Tables Statistics and
Econometrics II
J.-P. Renne
Figure 72: What is reported in distribution tables?

Prelim.
Statistical tables usually report either probabilities – or quantiles z, the two CLT
being related as follows: Tests
Linear regression
Assumptions
For random variables on R For random variables on R+ OLS
Inference
Common pitfalls
Large sample
Probability α= IV
Probability α=
P(0<X ≤ z) GRM
P(0<X ≤ z)
Panels
Max.Lik.Estim.
Binary
Time Series
AR processes
MLE of AR processes
Forecasting
0 z 0 z Appendix
Stat. Tables
291 / 295
TCL Theorem
Statistics and
Econometrics II
Table 22: Normal distribution N (0, 1) J.-P. Renne
z =y +x Prelim.
x æ 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
y ¿ CLT
0 0.000 0.004 0.008 0.012 0.016 0.020 0.024 0.028 0.032 0.036 Tests
0.1 0.040 0.044 0.048 0.052 0.056 0.060 0.064 0.067 0.071 0.075
0.2 0.079 0.083 0.087 0.091 0.095 0.099 0.103 0.106 0.110 0.114 Linear regression
0.3 0.118 0.122 0.126 0.129 0.133 0.137 0.141 0.144 0.148 0.152 Assumptions
0.4 0.155 0.159 0.163 0.166 0.170 0.174 0.177 0.181 0.184 0.188 OLS
0.5 0.191 0.195 0.198 0.202 0.205 0.209 0.212 0.216 0.219 0.222
Inference
0.6 0.226 0.229 0.232 0.236 0.239 0.242 0.245 0.249 0.252 0.255
Common pitfalls
0.7 0.258 0.261 0.264 0.267 0.270 0.273 0.276 0.279 0.282 0.285
Large sample
0.8 0.288 0.291 0.294 0.297 0.300 0.302 0.305 0.308 0.311 0.313
IV
0.9 0.316 0.319 0.321 0.324 0.326 0.329 0.331 0.334 0.336 0.339
GRM
1.0 0.341 0.344 0.346 0.348 0.351 0.353 0.355 0.358 0.360 0.362
1.1 0.364 0.367 0.369 0.371 0.373 0.375 0.377 0.379 0.381 0.383 Panels
1.2 0.385 0.387 0.389 0.391 0.393 0.394 0.396 0.398 0.400 0.401
1.3 0.403 0.405 0.407 0.408 0.410 0.411 0.413 0.415 0.416 0.418 Max.Lik.Estim.
1.4 0.419 0.421 0.422 0.424 0.425 0.426 0.428 0.429 0.431 0.432
1.5 0.433 0.434 0.436 0.437 0.438 0.439 0.441 0.442 0.443 0.444 Binary
1.6 0.445 0.446 0.447 0.448 0.449 0.451 0.452 0.453 0.454 0.454
1.7 0.455 0.456 0.457 0.458 0.459 0.460 0.461 0.462 0.462 0.463 Time Series
1.8 0.464 0.465 0.466 0.466 0.467 0.468 0.469 0.469 0.470 0.471 AR processes
1.9 0.471 0.472 0.473 0.473 0.474 0.474 0.475 0.476 0.476 0.477 MLE of AR processes
2.0 0.477 0.478 0.478 0.479 0.479 0.480 0.480 0.481 0.481 0.482 Forecasting
2.1 0.482 0.483 0.483 0.483 0.484 0.484 0.485 0.485 0.485 0.486
2.2 0.486 0.486 0.487 0.487 0.487 0.488 0.488 0.488 0.489 0.489 Appendix
2.3 0.489 0.490 0.490 0.490 0.490 0.491 0.491 0.491 0.491 0.492
Stat. Tables
2.4 0.492 0.492 0.492 0.492 0.493 0.493 0.493 0.493 0.493 0.494
2.5 0.494 0.494 0.494 0.494 0.494 0.495 0.495 0.495 0.495 0.495
Notes: If z = x + y and if X ≥ N (0, 1), the cell of the table corresponding to x and y gives
P(0 < X Æ z) (see Slide 291, left chart). 292 / 295
Statistics and
Econometrics II
Table 23: Student-t distribution t(‹) (Def. 9.5) J.-P. Renne
–æ 0.2500 0.4000 0.4500 0.4750 0.4950 0.4995

–ú æ 0.5000 0.8000 0.9000 0.9500 0.9900 0.9990 Prelim.
‹ ¿
CLT
1 1.000 3.078 6.314 12.706 63.657 636.619
2 0.816 1.886 2.920 4.303 9.925 31.599 Tests
3 0.765 1.638 2.353 3.182 5.841 12.924
4 0.741 1.533 2.132 2.776 4.604 8.610 Linear regression
5 0.727 1.476 2.015 2.571 4.032 6.869 Assumptions
6 0.718 1.440 1.943 2.447 3.707 5.959 OLS
7 0.711 1.415 1.895 2.365 3.499 5.408 Inference
8 0.706 1.397 1.860 2.306 3.355 5.041 Common pitfalls
9 0.703 1.383 1.833 2.262 3.250 4.781 Large sample
10 0.700 1.372 1.812 2.228 3.169 4.587 IV
20 0.687 1.325 1.725 2.086 2.845 3.850
GRM
30 0.683 1.310 1.697 2.042 2.750 3.646
40 0.681 1.303 1.684 2.021 2.704 3.551 Panels
50 0.679 1.299 1.676 2.009 2.678 3.496
60 0.679 1.296 1.671 2.000 2.660 3.460 Max.Lik.Estim.
70 0.678 1.294 1.667 1.994 2.648 3.435
80 0.678 1.292 1.664 1.990 2.639 3.416 Binary
90 0.677 1.291 1.662 1.987 2.632 3.402
100 0.677 1.290 1.660 1.984 2.626 3.390 Time Series
200 0.676 1.286 1.653 1.972 2.601 3.340 AR processes
500 0.675 1.283 1.648 1.965 2.586 3.310 MLE of AR processes

Forecasting
Notes: This table reports the quantiles of Student-t distributions with ‹ degrees of freedom. If Appendix
X ≥ t(‹), the cell of the table corresponding to ‹ and – gives z that is such – = P(0 < X Æ z) (see
Stat. Tables
Slide 291, left chart). z is also such that –ú = P(≠z < X Æ z).
293 / 295
Statistics and
Econometrics II
Table 24: ‰2 distribution ‰2 (‹) (Def. 9.6) J.-P. Renne
–æ 0.0500 0.1000 0.7500 0.9000 0.9500 0.9750 0.9900 0.9990 Prelim.

‹ ¿
CLT
1 0.004 0.016 1.323 2.706 3.841 5.024 6.635 10.828
2 0.103 0.211 2.773 4.605 5.991 7.378 9.210 13.816 Tests
3 0.352 0.584 4.108 6.251 7.815 9.348 11.345 16.266
4 0.711 1.064 5.385 7.779 9.488 11.143 13.277 18.467 Linear regression
5 1.145 1.610 6.626 9.236 11.070 12.833 15.086 20.515 Assumptions
6 1.635 2.204 7.841 10.645 12.592 14.449 16.812 22.458 OLS
7 2.167 2.833 9.037 12.017 14.067 16.013 18.475 24.322 Inference
8 2.733 3.490 10.219 13.362 15.507 17.535 20.090 26.124 Common pitfalls
9 3.325 4.168 11.389 14.684 16.919 19.023 21.666 27.877
Large sample
10 3.940 4.865 12.549 15.987 18.307 20.483 23.209 29.588
IV
20 10.851 12.443 23.828 28.412 31.410 34.170 37.566 45.315
GRM
30 18.493 20.599 34.800 40.256 43.773 46.979 50.892 59.703
40 26.509 29.051 45.616 51.805 55.758 59.342 63.691 73.402 Panels
50 34.764 37.689 56.334 63.167 67.505 71.420 76.154 86.661
60 43.188 46.459 66.981 74.397 79.082 83.298 88.379 99.607 Max.Lik.Estim.
70 51.739 55.329 77.577 85.527 90.531 95.023 100.425 112.317
80 60.391 64.278 88.130 96.578 101.879 106.629 112.329 124.839 Binary
90 69.126 73.291 98.650 107.565 113.145 118.136 124.116 137.208
100 77.929 82.358 109.141 118.498 124.342 129.561 135.807 149.449 Time Series
200 168.279 174.835 213.102 226.021 233.994 241.058 249.445 267.541 AR processes
500 449.147 459.926 520.950 540.930 553.127 563.852 576.493 603.446 MLE of AR processes
Forecasting
Notes: This table reports the quantiles of ‰2 distributions with ‹ degrees of freedom. If X ≥ ‰2 (‹), Appendix
the cell of the table corresponding to ‹ and – gives z that is such – = P(0 < X Æ z) (see Slide 291, Stat. Tables
right chart).
294 / 295
Statistics and
Table 25: F distribution (Def. 9.4) Econometrics II
J.-P. Renne
n1 æ 1 2 3 4 5 6 7 8 9 10
n2 ¿ Prelim.
– = 0.90
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 CLT
10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 Tests
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06
20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 1.10 1.96 1.94 Linear regression
50 2.81 2.41 2.20 2.06 1.97 1.90 1.84 1.80 1.76 1.73 Assumptions
100 2.76 2.36 2.14 2.00 1.91 1.83 1.78 1.73 1.69 1.66 OLS
500 2.72 2.31 2.09 1.96 1.86 1.79 1.73 1.68 1.64 1.61
Inference
– = 0.95
Common pitfalls
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 Large sample
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
IV
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
GRM
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03 Panels
100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85 Max.Lik.Estim.
– = 0.99
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 Binary
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85
Time Series
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 AR processes
50 7.17 5.06 4.20 3.72 3.41 3.19 3.02 2.89 2.78 2.70 MLE of AR processes
100 6.90 4.82 3.98 3.51 3.21 2.99 2.82 2.69 2.59 2.50 Forecasting
500 6.69 4.65 3.82 3.36 3.05 2.84 2.68 2.55 2.44 2.36
Appendix
Notes: This table reports the quantiles of F distributions with (n1 , n2 ) degrees of freedom. If Stat. Tables
X ≥ F (n1 , n2 ), the cell of the table corresponding to n1 , n2 and a given – gives z that is such
– = P(0 < X Æ z) (see Slide 291, right chart).
295 / 295

Cours 2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cours 2020

Uploaded by

Copyright:

Available Formats

Statistics and Econometrics II

I Different types of sessions: CLT

I Teaching Assistants: Giacomo Mangiante and Elena Sudan. Inference

I Moodle page. Large sample

calculator not allowed. MLE of AR processes

1. Key Concepts (revisions) Tests

4. Linear Regressions Large sample

5. Panel Regressions GRM

7. Binary-Choice Models Binary

8. Introduction to Time Series Time Series

Key Concepts (revisions) Inference

Definition 1.2 (Probability density function (p.d.f.)) Tests

where f (x ) Ø 0 for all x . Panels

P(x < X Æ x + Á) F (x + Á) ≠ F (x ) Time Series

Web-interface illustration Extra material on p.d.f. estimation (kernel approaches)

fXY (x , y ) Ø 0 for all x and y , if: Common pitfalls

P(a < X Æ b, c < Y Æ d) = fXY (x , y )dxdy , ’a Æ b, c Æ d. Panels

I In particular (if this limit exists), we have: Binary

Figure 1: Joint normal distribution: c.d.f. illustration J.-P. Renne

0.15 Linear regression

0.05 Large sample

0.15 Linear regression

0.05 Common pitfalls

whose x -coordinates are between x and x + Á and y -coordinates are Appendix

Y = y , is denoted by fX |Y (x , y ) and satisfies:

Proposition 1.1 (Conditional probability distribution function)

We have Common pitfalls

fXY (x , y ) Large sample

1 Á2 fXY (x , y ) Stat. Tables

I If X and Y are independent, fX |Y (x , y ) = fX (x ), implying, in particular, Binary

and Cov (g(X ), h(Y )) = 0 where g and h are any functions.

I If X and Y are uncorrelated, they are not necessarily independent Appendix

fX (x ) = fXY (x , y )dy . Inference

= xfX |Y (x , y )dx fY (y )dy = E (E[X |Y ]) .⌅ Appendix

X is drawn from a mixture of Gaussian distributions if: J.-P. Renne

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

The law of iterated expectations gives: Appendix

width [w = 1], and we drop a needle CLT

Let’s define the random variable X by Inference

E(X |◊) = cos(◊)/2. Binary

◊ is uniformly distributed on Time Series

I direct, or autonomous, infection; Linear regression

and q for the Yji s. We have: Panels

Var (X ) = E(Var (X |Y )) + Var (E(X |Y )). CLT

Proof Linear regression

Consider the case of a mixture of Gaussian distributions (Example 1.1).

“true” value. Tests

I A good estimator is expected to “converge” to the true value Linear regression

I Denote by ◊ˆn the estimate of ◊ based on a sample of length n. Inference

◊ˆ is consistent if, for any Á > 0 (even if very small), the

Examples of parameters to be estimated: causal effect of a variable on Appendix

another, elasticities, parameters defining some distribution of interest, Stat. Tables

does not converge in probability.

(This is because a Cauchy distribution has no mean, hence the Inference

sample size (n) sample size (n)

Central Limit Theorem Inference

In statistics, a sample refers to a set of observations drawn from a Tests

population. Linear regression

I Suppose one wants to know the average height of 20-year-old Panels

160 170 180 190 200 Time Series

(see extra material on p.d.f. estimation by kernel approach). Stat. Tables

distribution. Common pitfalls

I These theorems are at the basis of most statistical procedures. IV

Example 2.1 Binary