Professional Documents
Culture Documents
1st in Course Ecmt1020 Notes
1st in Course Ecmt1020 Notes
ECMT1020 Notes
Sean Gong
• Case type: cross-section, time series (esp. used in economics), panel data
Categorical data
• Time series data: Same ’individual’ at different points in time (e.g. Aus-
tralian GDP growth 2000-2009)
1.1.4 Notation
Example: Suppose we have univariate data on x (e.g. x is annual GDP growth
Multivariate cases
2. Inference: What do these numbers tell about the parameters we are trying
to estimate?
2. Dispersion E.g. SD
Central tendency
Sample mean
n
1!
x̄ = xi (1.1)
n i=1
• Not very useful for continuous data; can be made artificially discrete (by
grouping)
Dispersion
Quantiles Quantiles: Special type of percentiles
Sample variance
n
1 !
s2 = (xi − x̄)2 (1.2)
n − 1 i=1
NOTE: n - 1 is the degrees of freedom; not n because we are using sample mean
x̄ not population mean µ
Standard deviation
√
2
s= s (1.3)
68-95-99.7 rule
Coefficient of variation
s
cv = (1.4)
x̄
• Advantage: unit-free ⇒ can be compared across different variables
Range
range = min − max (1.5)
Interquartile range
Symmetry
Symmetry: Similarity when reflected about the median
n
1 ! xi − x̄ 3
Skew(x) = ( ) (1.7)
n i=1 s
Kurtosis
Kurtosis: fatness of the tails; how much frequency is distributed to the tails
n
1 ! xi − x̄ 4
Kurt(x) = ( ) (1.8)
n i=1 s
• Excesskurtosis = Kurt(x) − 3
1.2.3 Graphs
Box plot
• Minimum
• Maximum
• Skewness
Histogram
Line graphs
Line graph: plots observations against observation number
• Shows how values changing over time
• Only useful where x-axis is variable with natural ordering
Stata: tsline
Categorical data
Options:
• Frequency table
• Bar chart
• Pie chart
• Notation:
• Values of X: 0, 1, 1, 2 respectively
PMF:
1
– Pr[X = 0] = 4
1 1 1
– Pr[X = 1] = 4 + 4 = 2
1
– Pr[X = 2] = 4
CDF:
1
– Pr[X ≤ 0] = 4
1 1 1 3
– Pr[X ≤ 1] = 4 + 4 + 4 = 4
3 1
– Pr[X ≤ 2] = 4 + 4 = 1
1 1 1
E[X] = 0 · +1· +2· =1 (2.2)
4 2 4
⇒ EV = 1 head flip
This is a population quantity ⇒ E[X] is the population mean of X, denoted µ
1 1 1
σ 2 = (0 − 1)2 · + (1 − 1)2 · + (2 − 1)2 · (2.4)
4 2 4
" 1
P opulationSD = σ = 1/2 = √ (2.5)
2
√
Population standard deviation of X: σ2 = σ
• Mean of Y = a + bµ
• Variance of Y = b2 σ 2
√
• SD of Y = b2 σ 2 =| b | ·σ
NOTE: b ∈ R, σ = SD ≥ 0
2.1.6 Standardisation
X−µ
σ always has SD 1 and mean 0 // linear transformation
X−µ
• Y = σ ,Y = a + bX
−µ 1
• a= σ ,b = σ
−µ µ
• Mean of Y = a + bµ = σ + σ =0
1
• SD of Y =| b | ·σ = σ ·σ =1
Y is the standardised form of X
x̄ ⇒ µ, s2 ⇒ σ 2 (2.9)
KEY: Sample statistics are themselves random variables (i.e. they are real-
isations of X)
⇒
Sample statistics have a distribution
1
It makes sense to think of it as: X̄ = n (X1 + X2 + ... + Xn )
x̄ is a realisation of X̄
Expected value of X̄
Given that:
1
• X̄ = n X1 + n1 X2 + ... + n1 Xn
Variance of X̄
1 1 1 1 1 1 1 σ2
V ar[X̄] = V ar[ X1 + X2 +...+ Xn ] = 2 ·σ 2 + 2 ·σ 2 +...+ 2 ·σ 2 = n· 2 ·σ 2 =
n n n n n n n n
(2.11)
SD of X̄ is √σn
The z statistic: difference between sample mean and population mean as pro-
portion of SD; how many SDs away from the population mean?
X̄ − µ
z= (2.13)
√σ
n
NOTE: Z ∼ N (0, 1)
The p value
Critical values
Critical region: Values of t such that H0 should be rejected
19
1. H0 : µ ≤ 40000
2. Ha : µ > 40000
1. p-value approach
Difference in means
Example: Male average annual salary higher than female average annual salary?
We have:
• Hypotheses
– H0 : µ 1 − µ 2 ≤ 0
– Ha : µ 1 − µ 2 > 0
t statistic:
Hence:
(X̄1 − X̄2 ) − (µ1 − µ2 )
t= # (3.3)
s1 2 s2 2
n1 + n 2
Example:
Proportions
Example: Proportion of loans in default
• H0 : p = p∗ vs Ha : p ,= p∗
#
p∗ (1−p∗ )
• SE of p̂: n
A logarithm is an exponent:
Usefulness of logarithms/exponentials
∆x
• Proportionate change: xt
∆x
• Logarithmic transformation: ∆ln(x) ≈ xt
4.1 Notation
Bivariate data: 2 variables
• We assue X influences Y
It is common to include:
27
P r[X = x, Y = y]
P r[X = x|Y = y] = (4.3)
P r[Y = y]
Also:
P r[X = x, Y = y]
P r[Y = y|X = x] = (4.4)
P r[X = x]
Combining, we get Bayes’ Rule:
4.6 Independence
Dependence: Knowing value of one variable changes probability distribution of
other variable Independence: Knowing value of one variable does not change
probability distribution of other variable, i.e.:
4.8 Covariance
Covariance: scaled measure of linear dependence
σXY = E[(X − µx )(Y − µy )] (4.8)
• σXY > 0: X and Y either both large or both small
• σXY < 0: One large, other small
• σXY = 0: X and Y are independent ⇒ σXY = 0
NOTICE THIS IS A ONE-WAY STATEMENT
2
NOTE: V ar[X] = σX = σXX
NOTE: Covariance only measures linear dependence (i.e. σXY ≈ 0 even when
non-linear relationship NOTE: Covariance is NOT scale-free (i.e. X in metres
instead of km increases σXY by 1000)
Sample covariance:
n
1 !
sXY = (xi − x̄)(yi − ȳ) (4.9)
n − 1 i=1
4.9 Correlation
Correlation: scale-free measure of linear dependence
σXY
ρXY = (4.10)
σX · σY
Sample correlation:
sXY
rXY = (4.11)
sX · sY
Correlation: the no. of SDs that y changes by when x changes by 1 SD
4.10. REGRESSION 31
4.10 Regression
Regression: Finding line of best fit for scatter plot
ŷi = b1 + b2 xi (4.12)
• ŷ are fitted values (y values when subbed into straight line formula)
• b1 is intercept, b2 is slope
ei = yi − ŷi (4.13)
$n
1. Minimise i=1 ei : Does not work as positive and negative residuals
cancel out
$n
2. Minimise i=1 |ei |: Mathematically complicated and has poor sta-
tistical properties
$n 2
3. Minimise i=1 ei : THIS WORKS; Least squares regression
1
Note that for b2 , when multiplied by n−1 , the numerator is covariance, the
denominator is variance. Hence, we get:
sxy sxy · sy sy
b2 = 2
= = rxy · (4.16)
sx sx · sx · sy sx
b2 : The slope
R2 = rxy
2
= ry2ŷ (4.23)
That is, the variance of the error term as x changes is constant. Otherwise,
heteroskedastic, e.g.:
ui , uj independent (4.28)
That is, where yi is relative to the regression line does not affect where yi+1 will
be relative to the line.
This is often not true for time-series data (e.g. if unemployment is high this
year, it will likely also be high next year):
We need to estimate σu 2 .
E[se 2 ] = σu 2 . Hence, replace σu 2 with se 2 .
We call this result the Gauss-Markov theorem: our OLS estimators are
BLUE (Best Linear Unbiased Estimators).
5.1.2 Heteroskedasticity
Heteroskedasticity: Drop Assumption 3
• V ar[ui |xi ] is not constant at σu 2 for each i.
• The formula for se(b2 ) is incorrect as it depends on σu 2 , which no longer
exists.
39
5.1.3 Autocorrelation
Autocorrelation: Drop Assumption 4
• Estimate Cov[ui , uj ] by ei ej
• Choosing m (2 methods):
√
3
– m≈ T , T is sample size of time-series data
– By observation (e.g. corr[et , et−8 ] > 0.2 but corr[et , et−9 ] < 0.2,
choose m = 8
– NOTE: m = 0 ⇒ heteroskedasticity-robust standard error
5.1.4 Clustering
Clustering: Special type of autocorrelation
• Panel data: correlation across time, but not individuals (i.e. ui,t correlated
with ui,t−1 , but not ui−1,t )
5.2. PREDICTION 41
5.2 Prediction
5.2.1 Point forecasts
Forecast: given data point x∗ , what is out best prediction of corresponding y ∗ ?
To build confidence intervals, we need the standard error of the forecast. How
to find depends on whether we want to forecast:
1. Conditional mean: β1 + β2 x∗
2. Actual value: β1 + β2 x∗ + u∗
Stata:
• Add x∗ , leave y ∗ empty, regress y x
• Execute predict yhat, yhat is the point forecast ŷ ∗
• Find standard error of yhat with predict x, stdp, where stdp is standard
error of prediction
5.2.4 Comparison
6.1 Introduction
Data transformations: f (y) = β1 + β2 g(x) + u instead of y = β1 + β2 x + u
β2
y = β1 + β2 x + u ⇒ y = β1 + (kx) + u
k
y = β1 + β2 x + u ⇒ cy = cβ1 + cβ2 x + cu
6.2 Transformations of x
6.2.1 Dummy variables
Dummy variables introduction
Dummy variable: variable only taking the values 0 or 1 (denoted di )
43
• Hence H0 : b2 = 0 is equivalent to H0 : µ1 = µ2
Example Do mean wages differ significantly between those who did and didn’t
graduate high school?
∆ŷ
ŷ = b1 + b2 x b2 = ∆x Expected change in y given unit change in x
∆ŷ
ŷ = b1 + b2 g(x) b2 = ∆g(x) E.g. Expected wage diff. between 2 groups
6.2. TRANSFORMATIONS OF X 45
∆ŷ ∆g(x)
= b2 · (6.1)
∆x ∆x
g(x) can be anything, but it is usually:
1. Dummy indicator function
2. Natural logarithm
∆y ∆y
Note that because ∆ln(x) ≈ ∆x x , ∆ln(x) ≈ ∆x
x
i.e. b2 is approximately equal to the change in y given a relative change in x
∆x
• Rearranging the above, we also get ∆ŷ ≈ b2 · x
∆g(x) ∆ln(x) 1
M E(g(x)) = b2 · = b2 · ≈ b2 · (6.2)
∆x x x
NOTE: Effect of ∆x on y decreases as x increases
Example Graduating high school has bigger association with wages than PhD
6.3 Transformations of y
6.3.1 Transforming y instead of x (Log-linear model)
We have:
Marginal effect:
∆ŷ ∆ŷ ˆ
∆f (y) ∆ŷ
= · = b2 · (6.3)
∆x ˆ
∆f (y) ∆x ˆ
∆f (y)
∆ln(ŷ)
• Slope coefficient: b2 = ∆x
∆ŷ
∆ŷ y
• ∆ln(ŷ) ≈ ŷ ⇒ b2 ≈ ∆x
∆ln(ŷ) 1
M E(ln(ŷ)) = b2 / ≈ b2 / = b2 · ŷ (6.4)
∆ŷ ŷ
∆ln(ŷ) ∆ŷ/ŷ
• Slope coefficient: b2 = ∆ln(x) ≈ ∆x/x
• Slope coefficient now measure relative change in y for every relative change
in x
6.7. SUMMARY 49
6.7 Summary
Then:
u = aβ3 ∗ Saf ety + other random errors, a < 0 (7.3)
Implication: b2 will be
• Biased
• Inconsistent
– b2 will actually converge to:
Cov(Speed, Saf ety)
β2 + β3 ∗ (7.4)
V ar(Speed)
– If β2 > 0, b2 can be negative if Cov(Speed, Saf ety) is strong enough
as a < 0
51
• Data description
– Plots
– Graphs
7.3.2 Graphs
If we have more than 3 variables, we need other methods:
Y = β1 + β2 X2 + β3 X3 + ... + βk Xk + u (7.5)
7.4.2 Estimation
Our line which fits the data best is:
y = b1 + b2 x2 + b3 x3 + ... + bk xk (7.6)
The OLS estimator for β1 , β2 , ..., βk is the set of values for b1 , b2 , ..., bk that
solves:
n
1!
minb1 ,b2 ,...,bk (yi − yˆi )2 (7.9)
n i=1
or equivalently
n
1!
minb1 ,b2 ,...,bk (yi − (b1 + b2 x2i + ... + bk xki )2 (7.10)
n i=1
or equivalently
n
1! 2
minb1 ,b2 ,...,bk (e) (7.11)
n i=1
To minimise, we need to find k different coefficients: b1 , b2 , ..., bk
Implications:
• The residuals sum to 0
• Each regressor is orthogonal to the residual
Condition on the data for this system to have a unique solution: we need
adequate variations in the data on all the x values (more on this in the next
lecture)
7.4.3 Interpretation
Interpretation for b2 : the partial effect on the predicted value of y when X2
changes by one unit, holding X3 , ..., Xk fixed.
Example:
The aggregate effect of an individual staying at a firm for an extra year (i.e.
both experience and tenure increase by 1) is:
ˆ
∆ln(Earnings) = 0.029∆Experience + 0.011∆T enure
= 0.029 + 0.011
= 0.040
≈ 4% in earnings
Adjusted R2 (R¯2 ):
n − 1 RSS
R¯2 = 1 − · R¯2 ∈ (−∞, 1]
n − k T SS
8.1 Assumptions
8.1.1 Data assumptions
Conditions needed on data to have a unique solution b1 , b2 , ..., bk :
1. Strictly more than k observations (so n − k > 0)
2. Adequate variation on the regressors - no perfect collinearity
Adequate variation: no regressor in the model can be expressed as an exact
linear combination of the other regressors
Suppose our dataset only includes males, so di = 1 for all i. i.e. there is
no variation in di in our sample.
Earnings = 10 + 2 ∗ Age + D
= 10 + 2 ∗ Age + 1 as D = 1
= 11 + 2 ∗ Age
= 11 ∗ D + 2 ∗ Age as D = 1
= (11 − α) ∗ D + α ∗ D + 2 ∗ Age
= (11 − α) ∗ D + α + 2 ∗ Age as D = 1
59
• Intercept coefficient = α
True relation:
Now assume that everyone in our dataset enters school at age 6 and works
as soon as they leave school. Then, we get the linear dependence relationship
(which we call perfectly collinear regressors:
Which means the variable Experience is irrelevant, thus giving infinitely many
solutions.
Multicollinearity
Perfectly collinear regressors: regressors have a linear relationship
Implications of multicollinearity:
• Good
Population assumptions
Population assumptions: similar to bivariate assumptions
1. Linear model:
3. Homoskedasticity: (Heteroskedasticity)
Interpretation
• Assumptions 1 & 2: ensures estimators are unbiased and consistent
• Assumptions 3 & 4: determines precision and distribution of estimators
E[bj ] = βj (8.9)
σu 2
V ar[bj ] = σbj 2 = $ 2 (8.10)
x̃ji
where x̃ji is the residual from regressing xji on n intercept and all regres-
sors other than itself
i.e. xj = β1 + β2 x2 + ... + βj−1 xj−1 + βj+1 xj+1 + ... + βk xk
Example: j = 3 Regress x3i on x2i , x4i , ..., xki (with intercept) using
the same data set. The residuals of this regression is x̃3i .
Comments on variance:
se
se(bj ) = #$ (8.11)
x̃2ji
bj − β j
t= (8.12)
se(bj )
• n→∞
4. If:
• n→∞
• Errors are normally distributed
then
bj ∼ N (βj , σbj 2 ) (8.13)
NOTE: We assume that all other regressors are non-zero in each hypothesis.
Interpretation of the test: ”Given our other regressors in our population model,
do we still need this regressor?”
T-test:
1. Setting up hypotheses:
H0 : β j = 0
Ha : βj ,= 0
• Critical value:
c = tn−k,α/2 (8.18)
5. Conclusion
6. STATA Commands
Single-tailed t-test:
1. Setting up hypotheses:
H0 : β j ≥ 0
Ha : β j < 0
• Interpretation
Question: Does an extra year of formal education (S) have the same effect
on the natural logarithm of earnings (ln(earning)) as an extra year of general
workforce experience (exper)?
H0 : β 2 = β 3
Ha : β2 ,= β3
t-statistic:
b 2 − b3
t= (9.2)
se(b2 − b3 )
67
We will rewrite the model such that the STATA output will directly provide
se(b2 − b3 ). Define:
θ = β2 − β3 (9.3)
ln(earning) = β1 + β2 S + β3 exper + u
= β1 + (θ + β3 )S + β3 exper + u
= β1 + θS + β3 (S + exper) + u
= β1 + θS + β3 X4 + u Define X4 = S + exper
H0 : θ = 0
Ha : θ ,= 0
Our t-statistic:
θ̂ − 0
t= (9.4)
se(θ̂)
9.2. HYPOTHESIS TESTING: MORE THAN ONE LINEAR RESTRICTION - THE F-TEST69
We are now doing joint hypothesis tests: more than one restriction on the
parameters.
• cigs: average no. of cigs mother smoked per day during pregnancy
Our model:
Our question: should motheduc and fatheduc be excluded from the model
after other variables have been controlled for (2 exclusion restrictions)?
H0 : β5 = 0, β6 = 0
Ha : H0 is false
TEST 1
H0 : β 5 = 0
Ha : β5 ,= 0
b5
REJECT IF | | ≥ tn−k,0.25%
se(b5 )
TEST 2
H0 : β 6 = 0
H0 : β6 ,= 0
b6
REJECT IF | | ≥ tn−k,0.25%
se(b6 )
We reject the joint hypothesis if either are rejected, so the probability of re-
jecting is:
P r[|t5 | ≥ tn−k,0.25% or |t6 | ≥ tn−k,0.25% ] (9.6)
IF INDEPENDENT
1 − P r[|t5 | ≥ tn−k,0.25% and |t6 | ≥ tn−k,0.25% ]
1 − P r[|t5 | ≥ tn−k,0.25% ] · P r[|t6 | ≥ tn−k,0.25% ]
1 − 95% · 95%
9.75%
IF PERFECTLY CORRELATED
P r = 5%
Restricted/Unrestricted models
Unrestricted model (UR): Complete/original model
NOTE:
• RSS of Model R ≥ RSS of Model UR
• R2 of Model UR ≥ R2 of Model R
The F-statistic
F-statistic:
RSSR −RSSU R
q
F = RSSU R
(9.9)
n−k
where:
9.2. HYPOTHESIS TESTING: MORE THAN ONE LINEAR RESTRICTION - THE F-TEST71
2 2
RU R −RR
q
F = 1−RU 2 (9.10)
R
n−k
TIP: To handle changes in sign, always make the larger value minus the smaller
value
• n→∞
F-distribution can be denoted F (v1 , v2 ) or Fv1 ,v2 where v1 , v2 are the first (q)
and second (n − k) degrees of freedom
In our example:
2 2
• RU R = 0.0387, RR = 0.0364
2
RU 2
R −RR
q
• So F = 1−R2
= 1.42
UR
n−k
In our example:
• q = 2, n − k = 1191 − 6 = 1185, α = 5%
• Critical value:
9.2. HYPOTHESIS TESTING: MORE THAN ONE LINEAR RESTRICTION - THE F-TEST73
Hypotheses:
Models:
Y = β1 + β2 X2 + ... + βk Xk + u (UR)
Model with q restrictions (R)
F-statistic:
2 2
RSSR −RSSU R RU R −RR
q q
F = RSSU R
= 1−RU 2 if T SSU R = T SSR (9.12)
R
n−k n−k
Critical value:
Conclude:
H0 : βk−q+1 = 0, ..., βk = 0
Ha : H0 is false
Interpretation:
H0 : β2 = 0, .., βk = 0
Ha : H0 false
Models:
Y = β1 + β2 X2 + ... + βk Xk + u (UR)
Y = β1 + u (R)
2
F-statistic: RR = 0, q = k − 1 for overall significance tests
2 2 2
RU R −RR RU R
q k−1
F = 1−RU 2 = 1−RU2 (9.14)
R R
n−k n−k
H0 : β j = 0
Ha : βj ,= 0
In this case:
• F − statistic = (t − statistic)2
• Critical value of F − test = (Critical value of t − test)2
• P r[|Tn−k | > |t|] = P r[Fn−k | > t2 ] = P r[F1,n−k > f ], i.e. same p-values
75
Y = AK α Lβ (10.1)
ˆ ) = ln(A)
ln(Y ˆ + α̂ln(K) + β̂ln(L) (10.3)
Y = β1 + β2 X + β3 X 2 + u (10.5)
Interpretation
In the quadratic model
Y = β1 + β2 X + β3 X 2 (10.7)
we do not interpret the individual coefficients as the partial effects. This is
because all regressors are necessarily dependent. Instead, we look at the:
NOTE:
Y = β1 + β2 X + β3 X 2 (10.8)
∆Y = (β1 + β2 2 + β3 22 ) − (β1 + β2 1 + β3 12 )
= β2 + 3β3
Marginal Effect
Marginal effect: Predicted change ∆Y when X changes by a very small ∆X
∆Y
ME = (10.9)
∆X
This also depends on the value of X. We can interpret ME as the slope of
the E[Y |X] curve at X (derivative).
Quadratic regression:
Estimated model:
ˆ
earnings = 29252.89 − 3830.641 · education + 439.5283 · education2 (10.11)
NOTE:
• If X p is included, it makes more sense to include all X m for m ≤ p
• Including many powers can be considered a nonparametric way of esti-
mating E[Y |X]
• The F-test can help us decide how many powers to use (generally overes-
timates)
We can include other regressors in our polynomial model:
Polynomial models are still multiple regression models. They just use
transformed data (similar to linear restriction transformations).
δ1 is used as the coefficient for the f emale variable to denote that it is a dummy
variable:
(
1 if individual i is f emale
f emalei = (11.2)
0 if individual i is male
11.1.1 Interpretation
We have:
δ1 represents an intercept shift between the regression lines for males and
females
81
where (
1 if individual i is male
male1 = (11.6)
0 if individual i is f emale
Then, we have
f emalei + malei = 1 (11.7)
for all individualsi ⇒ perfectly collinear.
So we have:
• β2 remains the same
• β˜1 =
, β1 , β1 = β˜1 + δ2 (i.e. intercept in Model 1 is the intercept in Model
2 plus δ2
• δ1 = −δ2
• The predicted regression lines for males and females remain the same
NOTE:
• The sum of the dummy variable coefficient and the intercept is always the
intercept of the other model
as this assumes that the differences in ice cream sales across 2 different seasons
are fixed as k · β2 for some integer k ∈ [1, 4].
Instead, we must have a different dummy variable for each season. So,
now let’s consider the model:
ice = β1 + δ1 dspring + δ2 dsummer + δ3 dautumn + δ4 dwinter + u (11.10)
The problem here is that we have perfect collinearity:
dspring + dsummer + dautumn + dwinter = 1 (11.11)
So, we must either:
• Exclude 1 of the 4 variables
• Keep all 4 variables, exclude the intercept
These options are all equivalent. I.e. The conclusions will be the same.
We call spring the baseline group or the reference group (the coefficient
of spring has essentially become β1 ). We have the model:
ice = β1 + δ2 ds ummer + δ3 da utumn + δ4 dw inter + u (11.12)
where
β1 = E[ice|spring]
δ2 = E[ice|summer] − E[ice|spring]
δ3 = E[ice|autumn] − E[ice|spring]
δ4 = E[ice|winter] − E[ice|spring]
This model does not have a problem with collinearity, given that we have ob-
servations from each season in our dataset.
Note: We can calculate the difference in means across groups by finding the
difference in their coefficients
E.g.
These coefficients are shifts in the intercepts of the regression lines for different
groups.
Example Add new dummy variable nonwhite to the earning model to account
for race
The assumption in this model is that the difference in wages between males/females
are the same for white/non-whites.
where (
1 if f emale and nonwhite
f e nw = (11.16)
0 otherwise
So, now we have the model
• (f, nw)
• (f, w)
• (m, nw)
• (m, w)
NOTE:
H0 : β3 = δf emale = 0 (11.20)
We have the following models, which affect either the intercept or the slope:
The results will be equivalent a model including a female dummy and a fe-
male/educ interaction term.
• Specify q, n, k
• Conclude
using data on females only to get RSSU R,f emale and then males only to get
RSSU R,male . Then, we use
using the full sample. We use the RSS of this model as RSSU R .
NOTE: The Chow test can only be used where all coefficients are the
same across groups. Otherwise, we have to use the F-test.
i.e. We can only use the Chow test for the null where all βi for all integers
i ∈ [1, n] are the same across groups.
Example We want to test wheter β2 , β3 are the same for males/females while
allowing β1 to differ in the model
• Unrestricted model includes the dummy variable and the full set of
interaction terms (between dummy and other variables)
12.1.2 Outliers
Outlier: an observation whose value is unusual in the context of the rest of the
data
Outliers:
Take care if your results change significantly when a few key observations (’in-
fluential observations’) are dropped. Results with these observations dropped
may be included as a robustness check.
93
– Regressors (Endogeneity)
Y = β 1 + β2 X2 + u (12.1)
Y = β 1 + β2 X2 + β 3 X3 + u (12.2)
X3 = γ 1 + γ 2 X 2
then we get
E[b˘2 ] = β2 + β3 γ2
ui = u + β 3 X3
• γ 2 , β3 > 0
• γ 2 , β3 < 0
Downward bias: E[b˘2 ] < β2 when
• γ2 > 0, β3 < 0
• γ2 < 0, β3 > 0
No bias: E[b˘2 ] = β2 when
• γ2 = 0
• β3 = 0
Dealing with OVB:
• Simplest: Include the omitted variable in our regression
• Advanced techniques: instrumental variables, natural experiments, etc.
• Last resort: determine the sign of the bias
Irrelevant variables
Implications of including irrelevant variables, or too many variables:
• OLS estimators remain unbiased and consistent
• OLS estimators are less precise
– Multicollinearity may be a problem
– Multicollinearity → se increases → t-test less accurate when n is
small
12.2.2 Endogeneity
In the regression model
Y = β1 + β2 X2 + ... + βk Xk + u (12.5)
• Biased
• Inconsistent
Implications:
• se increases
Example 1
True population model:
Our model:
Y = β1 + β2 educ + β3 age + u
Example 2
True population model:
Our model:
To examine the functional form of a single regressor, look at the scatter plot
of:
Y = β1 + β2 X2 + ... + βk Xk + u (R)
The RESET test adds polynomials of the OLS fitted values Ŷ , usually Yˆ2 and
Yˆ3 .
• Yˆ2 and Yˆ3 are 2 particular combinations of higher order terms and inter-
action terms of the regressors
• If the model is correct, adding Yˆ2 and Yˆ3 should not have any significant
effect
RESET Test:
2. Estimate unrestricted model (UR) with Yˆ2 and Yˆ3 as additional regressors
3. Perform F-test
• H0 : γ1 , γ2 = 0 and Ha : H0 f alse
• In this case, F-statistic is F (2, n − (k + 2)) distributed
• If we reject, (R) is likely to be misspecified
Model 1:
ˆ 2 + γ2 price
• UR: price = β1 + β2 lotsize + β3 sqrf t + β4 bdrms + γ1 price ˆ 3 +u
• n = 88, q = 2, k = 4, F2,88−4−2
• F = 4.67
Model 1:
2
ˆ
• UR: ln(price) = β1 +β2 ln(lotsize)+β3 ln(sqrf t)+β4 bdrms+γ1 ln(price) +
3
ˆ
γ2 ln(price) +u
• n = 88, q = 2, k = 4, F2,88−4−2
• F = 2.56
12.2.4 Heteroskedasticity
Heteroskedasticity: Errors for different observations have different variances.
Heteroskedasticity
Implications of heteroskedasticity:
• Heteroskedasticity does NOT:
– Cause bias or inconsistency
– Affect estimated coefficients
– Affect goodness of fit (R2 , R¯2 )
• Heteroskedasticity does:
– Invalid formulas/estimators of variance ⇒ standard errors of estima-
tors are invalid ⇒ inferences invalid
Fix:
regress y x, robust
12.2.5 Autocorrelation
Implications: