You are on page 1of 70

Advanced Econometrics

➢ Instructor: Zhihong Chen (scybczh@aliyun.com)


➢ Office: Keyan Building 725
Office Hour: Wednesday 2:00PM-4:30PM
➢ Email: scybczh@aliyun.com
➢ Course Material: siteeco603@aliyun.com/ wechat
➢ Teaching Philosophy:
1. Treat a person as he is, and he will remain as he is. Treat
him as he could be, and he will become what he should be.
2. Education is what people do to you, learning is what you
do to yourself.

1
Overview

➢ This is an advanced econometrics course for Ph. D


students.
➢ The course has two objectives.
----to provide a theoretical foundation useful for
further study of econometrics
----to gain practical experience in analyzing
economic data with econometric methods. Using
popular software Stata to do Econometric analysis
will be taught as well.
➢ Prerequisite: matrix algebra, probability and
statistics
2
Textbook
➢ William Greene: Econometric Analysis

3
Textbook
➢ Joshua D. Angrist, Jörn-Steffen Pischke:Mostly
Harmless Econometrics: An Empiricist's Companion

4
Reference Book

➢ Introductory:
1. Peter Kennedy:A Guide to Econometrics (Baby book)
2. Stock and Watson:Introduction to Econometrics
3. Michael P. Murray: Econometrics: A Modern Introduction
4. Philip H. Franses: Enjoyable Econometrics
5. Joshua D. Angrist, Jörn-Steffen Pischke:Mastering 'Metrics: The Path
from Cause to Effect

➢ Intermedia:
1. Jeffrey Wooldridge: Introductory Econometrics: A Modern Approach
➢ Advanced :
1. Jeffrey Wooldridge: Econometric Analysis of Cross Section and Panel
Data
2. Bruce Hansen: Econometrics

5
Syllabus--Grading
➢ Class Participation and Homework (15%), Data
Visualization Exercise (15%), Two Reading
Reports (20%), Final Exam (50%).
➢ All assignments and exams should be submitted
on time. No points for late work. Of course, if
there is a verifiable medical reason and
arrangements are made before the exam,
adjustment might be made. Please take the
academic integrity seriously.

6
What is Econometrics: Examples

1. What is the effect of reducing class size


on student achievement?
2. Is there racial discrimination in the market
for home loans?
3. How much do cigarette taxes reduce
smoking?
4. What will the inflation will be next year?

7
Quantitative Features of Modern Economics

➢ Features:
-mathematical modeling for economic theory
-empirical analysis for economic phenomena.
➢ General methodology of modern economic research:
1. Data collection and summary of empirical stylized facts.
2. Development of economic theories/models
3. Empirical verification of economic models.
4. Applications:to test economic theory or hypotheses, to
forecast future evolution of the economy, and to make
policy recommendations.

8
What is Econometrics: Definition
Frisch (1933):
Econometrics is by no means the same as economic
statistics. Not is it identical with what we call general
economic theory, although a considerable portion of this
theory has a definitely quantitative character. Nor should
econometrics be taken as synonymous with the application
of mathematics to economics. Experience has shown that
each of these three viewpoints, that of statistics, economic
theory, and mathematics, is a necessary, but not by itself a
sufficient condition for a real understanding of the
quantitative relations in modern economic life. It is the
unification of all three that is powerful. And it is this
unification that constitutes econometrics.

9
Limitation of Econometrics
➢ Econometrics is the analysis of the "average behavior" of
a large number of realizations. However, economic data
are not produced by a large number of repeated random
experiments, due to the fact that an economy is not a
controlled experiment:
1. Economic theory or model can only capture the main or
most important factors。
2. An economy is an irreversible or non-repeatable system.
3. Economic relationships are often changing over time for an
economy.
4. Data quality
10
Background for Learning this Course

Greene: Appendix A-D


APPENDIX A: Matrix Algebra
➢ Algebraic Manipulation of Matrices
➢ Geometry of Matrices
➢ Solution of a System of Linear Equations
➢ Partitioned Matrices
➢ Characteristic Roots and Vectors
➢ Quadratic Forms and Definite Matrices
➢ Calculus and Matrix Algebra

11
Background for Learning this Course
APPENDIX B: Probability and Distribution Theory
➢ Random Variables
➢ Expectations of a Random Variable
➢ Some Specific Probability Distributions
➢ The Distribution of a Function of a Random Variable
➢ Representations of a Probability Distribution
➢ Joint Distributions
➢ Conditioning in a Bivariate Distribution
➢ The Bivariate Normal Distribution
➢ Multivariate Distributions
➢ Moments
➢ The Multivariate Normal Distribution

12
Background for Learning this Course

APPENDIX C: Estimation and Inference


➢ Statistics as Estimators—Sampling Distributions
➢ Point Estimation of Parameters; Interval Estimation
➢ Hypothesis Testing

APPENDIX D: Large Sample Distribution Theory


(Introduce)

13
Causal Effects and Idealized Experiments

➢ Causality means that a specific action leads to a


specific, measurable consequence.
➢ Randomized controlled experiment, control group
(receives no treatment), treatment group (receives
treatment).
➢ Causal effect is defined to be the effect on an
outcome of a given action or treatment.
➢ You don’t need to know a causal relationship to
make a good forecast.

14
Correlation or Causation?
(By Vali Chandrasekaran)
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html

15
Correlation or Causation?
(By Vali Chandrasekaran)
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html

16
Correlation or Causation?
(By Vali Chandrasekaran)
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html

17
Extended Reading

It ushers in three big shifts: more, messy


and correlations (the book’s chapters 2, 3
and 4).
Instead of trying to uncover causality, the
reasons behind things, it is often sufficient
to simply uncover practical answers. So if
some combinations of aspirin and orange
juice puts a deadly disease into remission, it
is less important to know what the
biological mechanism is than to just drink
the potion. For many things, with big data it
is faster, cheaper and good enough to learn
“what,” not “why.”

18
Extended Reading

19
Data: Source and Types

➢ Experimental data come from experiment


designed to evaluate a treatment or policy or to
investigate a causal effect.
➢ Non-experimental data are obtained by observing
actual behavior outside an experimental setting.
Observational data poses major challenges for
econometrics.

20
Data: Source and Types
➢ There are several different kinds of economic data
sets:
➢ Cross-sectional data
➢ Time series data
➢ Pooled cross sections
➢ Panel/Longitudinal data
➢ Econometric methods depend on the nature of the
data used. Use of inappropriate methods may lead
to misleading results.

21
Types of Data – Cross-sectional Data
➢ Cross-sectional data is a random sample

➢ Each observation is a new individual, firm, etc. with


information at a point in time

➢ If the data is not a random sample, we have a sample-


selection problem

➢ The analysis of cross-sectional data is closely aligned with


the applied microeconomics fields, such as labor
economics, industrial organization, and health economics.

➢ The fact that the ordering of the data does not matter for
econometric analysis is a key feature of cross-sectional
data sets obtained from random sampling. 22
23
Types of Data – Time Series
➢ This includes observations of a variable or several variables
over time.
➢ Typical applications include applied macroeconomics and
finance. Examples include stock prices, money supply,
consumer price index, gross domestic product, annual
homicide rates, automobile sales, and so on.
➢ Time series observations are typically serially correlated.
➢ Ordering of observations conveys important information.
➢ Data frequency may include daily, weekly, monthly,
quarterly, annually, and so on.
➢ Typical features of time series include trends and
seasonality. 24
25
Types of Data – Pooled Cross Sections
➢ Two or more cross sections are combined in one data set.
➢ Cross sections are drawn independently of each other.
➢ Pooled cross sections are often used to evaluate policy
changes.
➢ Example: Evaluating effect of change in property taxes on
house prices:
--Random sample of house prices for the year 1993.
--A new random sample of house prices for the year 1995.
--Compare before/after (1993: before reform, 1995: after
reform).

26
This is NOT a panel data set!

27
Types of Data – Panel or Longitudinal Data
➢ The same cross-sectional units are followed over time.
➢ Panel data have a cross-sectional and a time series
dimension.
➢ Panel data can be used to account for time-invariant
unobservables.
➢ Panel data can be used to model lagged responses.
➢ Example: City crime statistics; each city is observed in two
years.
--Time-invariant unobserved city characteristics may be modeled.
-- Effect of police on crime rates may exhibit time lag.

28
This IS a panel data set!

29
30
Classical Linear Regression Model

Ch 2 Assumptions of the classical linear


regression model (CLRM)

31
Linear Regression Model

y = b0 + b1x + 𝜀
y = b0 + b1x1 + b2x2+…+ bkxk + 𝜀

32
A Simple Example: Reed Auto Sales

Reed Auto periodically has a special


week-long sale. As part of the advertising
campaign Reed runs one or more television
commercials during the weekend preceding
the sale. Data from a sample of 5 previous
sales are shown on the next slide.

33
Example: Reed Auto Sales

Simple Linear Regression

Number of TV Ads Number of Cars Sold


1 14
3 24
2 18
1 17
3 27

Sample?
Observation?
Dependent Variable?
Independent Variable?
Model?
34
Example: Reed Auto Sales
➢Scatter Diagram
30
25
20
Cars Sold

15
10
5
0
0 1 2 3 4
TV Ads

35
36
37
Question: Do districts with smaller classes (lower STR) have
higher test scores?
Test score
STR

38
39
Estimation Process
Regression Model Sample Data:
y = b0 + b1x +e x y
Regression Equation x 1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 x n yn

Estimated
b0 and b1 Regression Equation
provide estimates of yˆ = b0 + b 1 x
b0 and b1 Sample Statistics
b0, b1

40
Matrix Form: Y=X𝛽+𝜀

𝒚𝟏 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝟏 + 𝜷𝟐 𝒙𝟏𝟐 + ⋯ 𝜷𝒌 𝒙𝟏𝒌 + 𝜺𝟏


𝒚𝟐 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟐𝟏 + 𝜷𝟐 𝒙𝟐𝟐 + ⋯ 𝜷𝒌 𝒙𝟐𝒌 + 𝜺𝟐

𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒙𝒊𝟏 + 𝜷𝟐 𝒙𝒊𝟐 + ⋯ 𝜷𝒌 𝒙𝒊𝒌 + 𝜺𝒊
𝒚𝒏 = 𝜷𝟎 + 𝜷𝟏 𝒙𝒏𝟏 + 𝜷𝟐 𝒙𝒏𝟐 + ⋯ 𝜷𝒌 𝒙𝒏𝒌 + 𝜺𝒏
Define:
𝒚𝟏 𝟏 𝒙𝟏𝟏 … 𝒙𝟏𝒌
𝒚𝟐 𝟏 𝒙𝟐𝟏 … 𝒙𝟐𝒌
𝒀= … X=
… … … …
𝒚𝒏 𝟏 𝒙𝒏𝟏 … 𝒙𝒏𝒌
𝜷𝟎 𝜺𝟏
𝜷 𝜺𝟐
𝜷= 𝟏 𝜺= …

𝜷𝒌 𝜺𝒏
41
Assumptions of
the Classical Linear Regression Model
➢ A1. Linearity(in parameters)
➢ A2. Full rank: There is no exact linear relationship
among any of the independent variables in the model.
➢ A3. Exogeneity of the independent variables
𝐸 𝜀𝑋 =0
➢ A4. Homoscedasticity and nonautocorrelation
𝑉𝑎𝑟 𝜀 𝑋 = 𝜎 2 𝐼
➢ A5 Data generation
➢ A6 Normal distribution: The disturbances are
normally distributed
42
Assumptions of
the Classical Linear Regression Model
➢ A1. Linearity(in parameters)

43
Assumptions of
the Classical Linear Regression Model
➢ A2. Full rank: There is no exact linear relationship
among any of the independent variables in the model.

44
Assumptions of
the Classical Linear Regression Model
➢ A3. Exogeneity of the independent variables
𝐸 𝜀𝑋 =0

45
Assumptions of
the Classical Linear Regression Model
➢ A4. Homoscedasticity and nonautocorrelation
𝑉𝑎𝑟 𝜀 𝑋 = 𝜎 2 𝐼

46
Assumptions of
the Classical Linear Regression Model
➢ A5 Data generation

47
Assumptions of
the Classical Linear Regression Model
➢ A6 Normal distribution: The disturbances are
normally distributed

48
Classical Linear Regression Model

Ch 3 Estimation and explanation of CLRM

49
3.2 Least Squares Regression

➢ Given the intuitive idea of fitting a line, we can


set up a formal minimization problem
➢ That is, we want to choose our parameters such
that we minimize the following:

( )
n n

 (ui ) =  yi − b 0 − b1 xi
2
ˆ 2 ˆ ˆ
i =1 i =1

50
Deriving OLS Estimates: Least Square

51
Estimating the Coefficients
n n
➢ Object: Min  (Y − Yˆ ) 2 =  (Y − b − b x ) 2
i i i 0 1 i
i =1 i =1

➢ The OLS estimator minimizes the average squared difference between


the actual values of Yi and the prediction (predicted value) based on the
estimated line.

( )
n
n −1  yi − bˆ0 − bˆ1 xi = 0
➢ Take derivative: i =1

( )
n
n −1  xi yi − bˆ0 − bˆ1 xi = 0
i =1

➢ Therefore, bˆ0 = y − bˆ1 x

bˆ1 =  ( x − x )( y
i i − y)
( xi − x ) 2
52
Example: Reed Auto Sales
➢Regression equation
30

25
20
Cars Sold

^
y = 10 + 5x
15

10
5

0
0 1 2 3 4
TV Ads

53
Matrix Form
Def b = arc min(Y − Xb0 )' (Y − Xb0 )
b0

= arc min(Y 'Y − b0' X 'Y − Y ' Xb0 + b0 ' X ' Xb0 )
b0

scalar scalar
= arc min(Y 'Y − 2Y ' Xb0 + b0' X ' Xb0 )
b0

Q
FOC = −2 X 'Y + 2 X ' Xb0 = 0
b0

 X ' Xb0 = X 'Y (Normal equation)

 b = ( X X ) X 'Y
^ −1
'
☆LS estimator

54
Projection
^ ^
Y = X b (Fitted value)

= X ( X X ) X 'Y
' −1

^ ^
e = Y − Y = Y − X b (Residual,Note: e = Y − X b is error)

= Y − X ( X X ) X 'Y
' −1

=  I − X ( X X ) X '  Y
 ' −1

 

55
Projection

( ) M = I − X (X X ) X'
−1 −1
Define P = X X X
' ' '
X
(Projection Matrix) = I −P (Residual maker)
P , M are symmetric, idempotent
 1 1  1 
1 
Note:if X = (1 1  1) M = I −    
0

n 
 1 1  1 mn

56
Projection

① 𝐏𝐌 = 𝐌𝐏 = 𝟎
② 𝑷𝑿 = 𝑿 , 𝑴𝑿 = 𝟎 , 𝑿𝒆 = 𝑿 ∗ 𝑴𝒀 =
𝟎
③ 𝒀 = 𝑷𝒀 + 𝑴𝒀 = 𝑿𝒃 + 𝒆
෡𝒀
④ 𝐘 ′ 𝐘 = 𝐘 ′ 𝐏′ 𝐏𝐘 + 𝒀′ 𝑴′ 𝑴𝒀 = 𝒀′ ෡ + 𝒆′ 𝒆

57
3.3 Partitioned Regression and Partial Regression
Suppose that the regression involves two sets of variables X1
and X2 Then y = Xb+ e = X1b1+ X2b2+ e
The normal equations are
(𝐗 ′ 𝐗)𝐛 = 𝐗 ′ 𝐲
or:
𝐗 ′𝟏 𝐗 𝟏 𝐗 ′𝟏 𝐗 𝟐 𝐛𝟏 𝐗 ′𝟏 𝐲
[ ′ ′ ]( ) = [ ′ ]
𝐗 𝟐 𝐗 𝟏 𝐗 𝟐 𝐗 𝟐 𝐛𝟐 𝐗𝟐𝐲
Then:
𝐗 ′𝟏 𝐗 𝟏 𝐛𝟏 + 𝐗 ′𝟏 𝐗 𝟐 𝐛𝟐 = 𝐗 ′𝟏 𝐲(1)
𝐗 ′𝟐 𝐗 𝟏 𝐛𝟏 + 𝐗 ′𝟐 𝐗 𝟐 𝐛𝟐 = 𝐗 ′𝟐 𝐲(2)

58
3.3 Partitioned Regression and Partial Regression

From (2):
b2 = (X2X2)-1X2(y - X1b1)
Similarly, b1 = (X1X1)-1X1(y – X2b2) (3-18)

What is this? Regression of (y – X2b2) on X1


If we knew b2, this is the solution for b1.

Theorem 3.1: If X2X1 = 0, b1 = (X1X1)-1X1y and


b2 = (X2X2)-1X2y
59
3.3 Partitioned Regression and Partial Regression
➢ What if X2X1 ≠ 0?
submit 𝒃𝟏 = (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏 (𝒚 − 𝑿𝟐 𝒃𝟐 )
into: 𝑿′𝟐 𝑿𝟏 𝒃𝟏 + 𝑿′𝟐 𝑿𝟐 𝒃𝟐 = 𝑿′𝟐 𝒚
then:
𝑿′𝟐 𝑿𝟏 (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏  𝒚–𝑿′𝟐 𝑿𝟏 (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏  𝑿′𝟐 𝒃𝟐 + 𝑿′𝟐 𝑿𝟐 𝒃𝟐 = 𝑿′𝟐 𝒚

Collect the similar terms:


𝑿′𝟐 𝑰 − 𝑿𝟏 (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏  𝑿𝟐 𝒃𝟐 = 𝑿′𝟐 𝑰 − 𝑿𝟏 (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏  𝒚
Finally:
𝒃𝟐 = [𝑿′𝟐 𝑰 − 𝑿𝟏 (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏  𝑿𝟐 ]−𝟏 𝑿′𝟐 𝑰 − 𝑿𝟏 (𝑿𝟏 𝑿𝟏 )−1𝑿𝟏  𝒚
= [𝑿′𝟐 𝑴𝟏 𝑿𝟐 ]−𝟏 𝑿′𝟐 𝑴𝟏 𝒚
➢ Application: Corollary 3.2.1, detrend, FE
60
Two Theorems

61
3.4 Partial Regression and Partial Correlation Coefficients

➢the interpretation of partial regression as “net


of the effect of …”
➢Partial Correlation Coefficients: Correlation
between sets of residuals.
➢Partial correlations and coefficients can have
signs and magnitudes that differ greatly from
gross correlations and simple regression
coefficients.

62
3.4 Partial Regression and Partial Correlation Coefficients

Theorem 3.5: Change in the Sum of Squares When a


Variable Is Added to a Regression
Let u = the residual in the regression of y on [X,z]
e = the residual in the regression of y on X alone,
then uu = ee – c2(z*z*)  ee
where z* = MXz and c is the coefficient on z in the
regression of y on [X,z].

63
3.5 Goodness of Fit and the Analysis of Variance

𝐒𝐒𝐓 = σ𝐧𝐢=𝟏(𝐘𝐢 − 𝐘
ഥ)𝟐 , SSR = σ𝐧𝐢=𝟏(𝐘 ഥ)𝟐 , SSE = σ𝐧𝐢=𝟏(𝐘𝐢 − 𝐘
෡𝐢 − 𝐘 ෡𝐢 )𝟐
𝐒𝐒𝐓 = 𝐒𝐒𝐑 + 𝐒𝐒𝐄
64
Analysis of Variance
𝐧
ത 𝟐 = 𝐘′𝐌𝟎 𝐘
𝐒𝐒𝐓 = ෍(𝐘𝐢 − 𝐘)
𝐢=𝟏
= 𝐘 ′ 𝐌𝟎 𝐗𝐛 + 𝐘 ′ 𝐌𝟎 𝐞

= 𝐗𝐛 + 𝐞 ′ 𝐌𝟎 𝐗𝐛 + 𝐗𝐛 + 𝐞 ′ 𝐌𝟎 𝐞

= 𝐛′ 𝐗 ′ 𝐌𝟎 𝐗𝐛 + 𝐞′ 𝐌𝟎 𝐗𝐛 + 𝐛′ 𝐗 ′ 𝐌𝟎 𝐞 + 𝐞′ 𝐌𝟎 𝐞

= 𝐛′ 𝐗 ′ 𝐌𝟎 𝐗𝐛 + 𝐞′ 𝐌𝟎 𝐞

= σ𝐧𝐢=𝟏(𝐘෠𝐢 − 𝐘)
ത 𝟐 + σ𝐧𝐢=𝟏(𝐘𝐢 − 𝐘෠𝐢 )𝟐

= SSR + SSE
65
Goodness of Fit

SSR SSE
R =2
= 1−
SST SST
R2 is bounded by zero and one only if:
(a) There is a constant term in X and
(b) The line is computed by linear least squares.

there is no absolute basis for comparing R2.


66
Adding Variables

➢ R2 never falls when a variable z is added to the


regression.
➢ Theorem 3.6: Change in 𝑅2 when adding a variable
R 2Xz with both X and variable z equals
R 2X with only X plus the increase in fit due to z
after X is accounted for:
R 2Xz = R 2X + (1 − R 2X )ryz|X
*2
Goodness of Fit
2 (n − 1)(1 − R )
2
R = 1−
n−k
Theorem 3.6: Change in 𝑅ത 2 when adding a variable
In a multiple regression, 𝑅ത 2 will fall (rise) when
the variable x is deleted from the regression if the
square of the t ratio associated with this variable is
greater (less) than 1.

68
69
3.6 Linearly Transformed Regression
➢ Def Z = XP for KK nonsingular P as a linear
transformation, how does transformation affect the results
of least squares?
➢ Transformation does affect the “estimates
Based on X, b = (XX)-1X’y.
Based on Z, c = (ZZ)-1Z’y = (P’XXP)-1P’X’y
= P-1(X’X)-1P’-1P’X’y = P-1b
➢ Transformation does not affect the fit of a model to a body of
data
➢ “Fitted value” is Zc = (XP)(P-1b) = Xb. The same!!
➢ Residuals from using Z are y - Zc = y - Xb (we just proved this.).
The same!!
➢ Sum of squared residuals must be identical, as y-Xb = e = y-Zc.
➢ R2 must also be identical, as R2 = 1 - ee/y’M0y (!!).
70

You might also like