Econometrics - Sebenta

Introduction
quarta-feira, 11 de Setembro de 2013

12:34
• João Valle e Azevedo

• Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, Thomson South-Western College
Publishing, USA
• Moodle: sportingcampeao
• Final grade:
○ Analysis of a dataset (25%)
○ Midterm (25%) (late October).
○ Final exam (50%)
• The Econometrics Society: "An international society for the advancement of economic theory in its relation
to statistics and mathematics."
○ Econometrics: concerned with the tasks of developing and applying quantitative or statistical methods
to the study of causalities, testing of economic theories, and evaluation and implementation of
government and business policy.
• Why study Econometrics?
○ Quantify economic relations, test and inform economic theories (e.g., neutrality of money).
○ Policy evaluation (e.g., impact of a government retraining program on the wages of workers).
○ Forecasting (GDP growth, unemployment, inflation).
○ Need to use non-experimental, or observational data to make inferences.
• Example:
○
• Types of data:
○ Cross Sectional: each observation is an individual,
firm, etc. at some point in time.
 Observations of individuals, households, firms, cities, states, and etc., taken at a given point in
time.
 Random sample.
□ Details are not important; completely
random.
□ If the data is not a random sample, we
have sample-selection problem.
○ Time Series and Panel Data.
 Observed in various time periods (e.g. stock
prices, interest rates).
 Observations on several variables over time.
 Not a random sample (in general), different
problems to consider.
 Usually: processes influence data in different
points of the time.
 Stock Market:
□ Dow returns - exception.
□ Essentially random samples (not correlated).
□ Random variables that generate near data are highly correlated.
 Can follow various random individuals over time – panel data or longitudinal data (Cross Section
× Time Series)
○ Time Series and Cross Sectional data.
 Panel data: time series for each cross-sectional member in the data set.
Econometrics - Carolina Oliveira - Página 1

• Example 1: Simple Model - Returns to education.
○ A model of human capital investment implies that more education should lead to
higher earnings.
○
○ Earnings: dependent variable (explained, response or regressand).
○ Education: independent variable (explanatory, control or regressor).
○ : error term or disturbance (unrelated to education).
 Includes other factors affecting earnings (natural ability that affects both education and
earnings).
 Control these factors as much as possible.
 is also changing: big problem.
□ When there is a variable that affects both: is correlated with education.
 Confounding effects: ability.
○ Include ability in the model.
 Control for it.
 Very difficult to measure.
○ : slope parameter.
○ :intercept parameter.
• Example 2: Crime and Police.
○
○ Places with high crime rate have a big police force.
○ They are positively correlated.
○ Police is correlated with the error.
 We hire more policeman where the crime rate is higher.
○ We can only compare similar cities: one group with more police and other with less police.
 We don't have this.
○ The estimate of (based on the assumption that is “unrelated” to Policeman) measures the
effect of one additional policeman on the crime rate, but does it represent the “ceteris paribus”
(everything else constant) effect of an additional policeman on the crime rate?
○ The error term, , includes other factors affecting crime (e.g., a new and popular illegal drug that
can lead to both an increase in the police force, for political reasons, as well as an increase in
crime rates). So, the number of policeman can vary due to variations in the crime rate itself!
○ Causality is established only by theory, NOT by an Econometric exercise.
• Example 3: GDP growth.
○
○ We don't have the same Portugal with and without austerity measures.
○ Can't have many countries with exactly the same characteristics.
○ Government spending depends on error.
○ is never exogenous.
○ Any country looks alike every year.
○ Each data point has a different .
○ Assumption: wars are unpredictable.
 Changes in military spending are unpredictable and can be considered exogenous.

Important Formulas
quarta-feira, 9 de Outubro de 2013
10:28
•
•
•
•
•
•
•
•
•
•
•
○
○
• Overall significance of the model:
○
• With heteroskedasticity:
○

○
○
○
•
○
• ARCH:

The Simple Regression Model
sexta-feira, 13 de Setembro de 2013
12:30
• and are two variables, representing some

population.
○ Explaining in terms of
○ Studying how varies with changes in
○ dependent, explained, response, predicted
variable or regressand.
○ independent, explanatory, control, predictor
variable or regressor.
○ error term or disturbance in the relationship
(represents factors other than that affect );
unobserved.

• it's a linear relation.

• it's a linear relation.
• it's a linear
relation.
• non linear (doesn't satisfy
the first assumption).
• Equation for all the population.
○ Each individual satisfies the equation.
• Linear relationship: there are no restrictions on
how and relate to the original explained
and explanatory variables of interest.
• This assumption is not very realistic in a time

series.
• The values of are not all the same.

○ We couldn't identify the effects without
variation.
• We just need one individual different.
○ Not very hard to have this assumption.
• Crucial assumption: does not vary with .

• We want to know the exogenous effect of on
, keeping everything else constant.
• Example of education: contains ability.
○ probably the
ability is not the same on both groups.

○ Doesn't satisfy the assumption (it is
extremely rare).
○ Education changes, but other factors, on
average, don't change.
• Sample standard deviation
• The average value of does not depend on the •
value of .
• We need control for other factors, because this assumption ○ and independent and
is extremely rare to accomplish. implies that .

• implied by the previous one.
○ If we can transform in
○
•
• Implies that:
○ Only how the average value of changes with
○ Not that for all units in population.
•
•
•
○ We are only characterising the

average wage, not the entire
distribution.
• The difference between and

the point is the error.
• OLS ordinary least squares.

•
•
○ Before sample estimator.
○ After sample estimate.
• Law of iterated expectations:
○
○

•
• We need variation on , otherwise the

denominator would be zero.

• Estimated regression line:
○ sample regression
function.
• Distance between each and the line:

residual some of squares must be the
smallest.
• The point ( ) is always on the OLS
regression line.
• Each fitted value is on the OLS regression
line.
○ difference between and its
fitted value.
 Positive: the line underpredicts.
 Negative: the line overpredicts.
 Ideal case:
• Idea: minimise the sum of squares of

the error.
○ Minimise the distance between
the straight line and each point.
• Two first order conditions are equal
to the one we had one similar.
○ Same solution of the methods
moments approach.

• Sample covariance between regressors
and the OLS residuals is zero:
○ Sample covariance between and

is zero (fitted values are
uncorrelated with the residual).
• SST: measure of the total sample
variation in the (it measures how
spread out the are in the sample).
○ sample variance.
• sample variation in the .
• sample variation in the
• We can decompose the total sum of squares in

these two components.
○ Is the explained and unexplained variation.
• 4th step:
○ By assumption:

• ratio of the explained variation
compared to the total variation (negative -
p.83) (Low indicates: there is a low
correlation among the regressors in the
sample).
• Including irrelevant variables in a multiple linear regression
does not increase the variance of the OLS estimator of the
remaining parameters if the irrelevant variables are
uncorrelated with the relevant variables.
• If we minimise the we are maximizing the
• and are both positive.
• is smaller than .
○
○ Just need to show that
○
 is always positive.
 Can't be estimated when
• Low has nothing to do with the
assumption of zero mean.
• This can be econometrically low, but can also
be high.
•
○ Disadvantage: it always increases when
• Relevant or having a partial effect:
we add a variable whose
• Statistically significant is different.

• Need to know how precise are our
inferences.
• What is the distribution of our
estimators?
• We want to characterise the
distribution of and
• Bias: diference between the

estimator's expected value and the
true parameter value.
• Distribution across samples of is bell shaped.

○ Average:
○ If we know the distribution, we can say
how likely is the value to be the truth one.
○ If it is steeper, it's more likely that we get
the truth
 Variance is low.
• Sample 1:
○ Sample 2:
○ Sample
• We need to know the variance.
○ How the distribution is concentrated
around its average.
•
• We are just rewriting , to find expected

values.
• What we do to get
○ Compute
○ Take
○
○ we assume that the
sample is random.
 The individuals that we pick from
the population are picked
randomly.
 We take conditional
expectatitions and then we take
it off.

• Need to know what is the variance of the
error term of the model.
○ must be independent from .
○
○
 If I know , it tells me nothing
about the variability of the
other factors.
 Plays no role in showing that
the parameters are unbiased.
○
○
○
○
• The shape doesn't need to be the same in all

the graph.
○ Just the variance must be the same.
• It is quite unrealistic.
• Heteroskedastic: more probable.

○ As we increase the level of education (average is
higher), the shape around the mean gets more spread
out.
○ depends on

• Taking variances conditional on the .
○
• Variance of the sum is the sum of the
variances:
○ They are independent or
uncorrelated.
• Low variance: we cannot go far away from
the true regression line.
• The OLS estimators have a low sampling
variance if:
○ The total sum of squares of each
regressor is high.
○ The variance of the error term is low.
•
○ The more variation we have on ,
the more precise.
○ If all the observations are close,
we have more difficulty to
estimate the real regression line.
○ Lot of variation in : if the
samples are not all close to
○ We want low and high .
○ We can compute , but we
know nothing about , so we
need to estimate it.
•
○
•

Incorporating nonlinearities in Simple Regression
• Linear relationships are not nearly general enough for all economic applications.
• Dependent variable in logarithmic form:
○ Increasing by a constant percentage.
• Constant elasticity model:
○
• Fact:
• Level-level: when changes by one unit, it is predicted that changes by , ceteris paribus.
• Level-log:
○
○ If changes by 1%, it is predicted that changes by units, ceteris paribus.
• Log-level:
○
○ semi-elasticity.
○ If changes by one unit, it is predicted that will change by .
• Log-log:
○
 elasticity.
 If changes by 1%, it is predicted that will change by , ceteris paribus.

Regression through the Origin
• In rare cases, we wish to impose the restriction that, when , the expected value of is
zero.
○
• Obtaining the slope estimator:
○ must solve the first order condition:
○
○ Unbiased only if:

○ necessarily if:
 and are uncorrelated in the sample.
 The estimated effect is equal to zero.
• If
○ It's okay: no bias and the variance is the same.
• If
○ This model is better than the usual one.
• If
○ More efficient but biased.

Multiple Linear Regression Model
sexta-feira, 20 de Setembro de 2013
13:24
• Experience is correlated with tenure.
○ That's ok, because we cannot have
regressors correlated with the
error term.
• Population model or true model.

○ It is possible to estimate a
different mode.
○ Linear in the parameters.
MLR.3: no perfect collinearity:

• Need to have variation in every
individual.
• Variables cannot be perfectly
correlated.
○ Can't be linear related.
○ None of the regressors can be
a linear combination of the
others.
• If one of them is a constant:
○ Everyone as the same level of
education.
○ We cannot detect its effect on
the population (linear
• All factors in the unobserved error term are uncorrelated with the combination of the constant in
explanatory variables. the model).
• Problems: • They can be correlated.
○ Forget to include
○ Include instead of
○ Omitting an important factor that is correlated with any .
• When MLR.4 holds: exogenous explanatory variables.
○ When it doesn't: endogenous explanatory variable.

• We minimize the distance between each and the possible straight line.
•
•
○ individual.
○ variable.
• Number of the parameters: .

○ It fails if the population size is too small comparing with the number of parameters.
• Suppose: we add one more variable.
○ The minimisation would be lower.
○ The would be higher.
•

•
• this is always lower.

○ There is always, in a sample, some correlation between the second variable and .
○ That minimal correlation is enough to decrease the residual sum of squares.
• vector which includes all the observations of a given sample.

• matrix that includes, apart from a vector of 1's, the independent variables that
correspond to each of the parameters and each of the observations.
• vector with the random errors associated to each of the observations.
• vector that contains each of the parameters of the regression.
•
○ individual and variable.
•
•
•
• Given the variance of the OLS estimators in the matrix form it is possible to:
○ Know the variance of the individual estimators as well as their covariances.
○ Derive the variance of a sum of individual estimators.
○ Derive the variance of a linear combination of estimators.
• In a multiple linear regression model, the random sampling assumption implies that the variance
covariance matrix of the errors is diagonal (all the elements out of the diagonal are
equal to zero).

•
•
•
○ If MLR.4 holds, it is only
• The first assumption show

up on the OLS expression.
○ has a determinant
different from zero.
• The expected value of a vector

equals the vector of the expected
value.
•
•

•
○ Given , all the is a constant.

○
○
 information on individuals tells me nothing about my (zero
conditional mean).
 random sample assumption.
• This matrix is simetric

(equal to its
transposed).
• Variances as vectors.
•
•
• Given the variance of the OLS

estimators in the matrix form
it is possible to:
○ Know the variance of
the individual
estimators as well as
their covariances.
○ Derive the variance of a
sum of individual
estimators.
○ Derive the variance of
covariance matrix of the OLS estimator. a linear combination of
estimators.
•

•
•
○
○
• We want to understand the effect of on .

○ Can be seen as the effect of the part of that is not correlated with on clean .
○ We must remove the correlation.
○ Regress on : the residuals of this regressions are like the part of that is not correlated
with .
○ Regressing on and is equivalent to first clean and then regress on the clean .
○ Two options when we want :
1. Estimate big model.
2. Take clean residual in regress on clean get the
same
• Regress on
○ Simple linear regression:

• It will never be the case
that and are
uncorrelated nor that
.
• If education was not
correlated with
experience, we would get
exactly the same value
(we wouldn't need to
clean ).
○ The residuals of
that equation would
contain everything.

•
○
○ We will have higher estimator if we can't observe ability.
 While education is moving, ability is also increasing.
○ If we drop , get unbiased from the "small" model.
○
○ Simple model:
 Including or not including is irrelevant in terms of biasedness.
 If is biased (in general, this is what happens).
○ If and are uncorrelated, and the redefined error term are uncorrelated.
○ Bias depends on correlation (education, ability) and on
 If there is low correlation: bias is small.
 If is low: bias is small.
• Underspecifying the model.

○ Omitted variable bias:
• Original model:
• Transformed model:
•
○
• and are uncorrelated, but that is correlated with : ○ Bias:
○ is correlated with the omitted variable, but is not.
○ is uncorrelated with .
○ Both and will normally be biased.
○ Only exception: when and are also uncorrelated.

• It's only a good idea to
add variables that do not
influence if they are
uncorrelated with other
variables.
• We want a lot of variation on

○ We want individuals that have different characteristics to be able to detect more precisely the
effect of the variables.
• determination coefficient of the
regression of regarding other
○
1. Error variance ( ): larger means larger variances for the OLS estimator.
○ Has nothing to do with the sample size (from population).
○ Unknown component.
○ Only one solution to reduce this: add more explanatory variables to the equation.
2. Total sample variation in ( ): the larger it is, the smaller is the variance of OLS estimator.
○ We prefer as much sample variation in as possible.
○ Solution: increase the sample size.
3. Linear relationships among the independent variables ( ).
○ Proportion of the total variation in that can be explained by the other independent variables
appearing in the equation.
○ Best case: (when has zero sample correlation with every other independent
variable).
○ Worst: (high, but not perfect correlation between two or more independent variables -
multicollinearity).
 It is better to have less correlation between and the other independent variables.
 Problem can be mitigated by collecting more data.
 Questions may be too subtle for the available data to answer with any precision.
 Change the scope of the analysis and lumping all the variables together.
• Variances in misspecified models:
○ Choice whether to include a particular variable in a regression model can be made by analysing
the trade off between bias and variance.

• Standard errors of the OLS estimators:
○
○
○
○
• Estimator: rule that can be applied to any sample of data to produce an estimate.
• Unbiased:
• Linear: an estimator is linear if, and only if, it can be expressed as a linear function of the data on
the dependent variable.
 function of the values of all the independent variables.

• Best: with smallest variance.
○
○ Gauss Markov theorem: when the standard set of assumptions holds, we do not need to look
for alternative unbiased estimators of the parameters: none will be better than OLS.
 If we are presented with an estimator that is both linear and unbiased, then we know
that the variance of this estimator is at least as large as the OLS variance; no additional
calculation is needed to show this.
 Theorem justifies the use of OLS to estimate multiple regression models.
 If any of the Gauss-Markov assumptions fail, then this theorem no longer holds.
•

The Multiple Regression Model: Inference
13:19
• Infer whether we can
assume a value is
reasonable.
○ See the distribution of
a statistic and see if
we get reasonable
value under the null
hypothesis.
• Under the Gauss-
Markov assumptions in
the multiple linear
regression model the
OLS estimator is:
○ Consistent for the
model
parameters.
○ Asymptotically
normally
• distributed.
•
○ Construct a distribution under
○ If the distribution is centred on 0, all the values we compute will be somewhere around
zero. • It's not only uncorrelated.
• is independent of the regressors.
• This is unrealistic.
○ We start with a perfect world.
○ Next step: question the
validity of this assumption.
• Droping
assumptions has
a cost in terms of
precision of the
• Independence is stronger than the zero conditional mean assumption. estimation.
○ You can tell me everything about , but it tells me nothing about .
○
○ It means that the independence is stronger than the second condition.
○ Implies MLR.4 and MLR.5.
• If is independent of 's, information on 's says nothing about .
○ Knowing , changes nothing.
• Classical linear model (CLM) assumptions:
○ MLR.1 to MLR.6.
○ Contain all the Gauss-Markov assumptions plus the assumption of a normally distributed
error term.
○ Stronger efficiency property.
•

• CLM Assumptions:
1. Linearity in parameters.
2. Random sampling.
3. No perfect collinearity.
4. Zero conditional mean.
5. Homoskedasticity.
6. Normality.
• Implies that is also normal distributed.

○
○
○
○ Conditional on , has a normal distribution with mean linear in and a
constant variance.
• is the sum of many different unobserved factors affecting : invoke the central limit theorem
to conclude that has an approximate normal distribution.
• Problems:
○ The factors in can have very different distributions in the population.
○ It assumes that all unobserve factors affect in a separate, additive fashion.
○ Whenever takes on just a few values it cannot have anything close to a normal
distribution.
•
• follows a
distribution.
• This mean has the same normal distribution.

• MLR 1 to MLR 5:
•
○
○ Statistic depends on • Under MLR.6 the errors are
independent, identically distributed
○ Normal random variables.
○ Linear combination of such
• variables are normally
distributed.
○
• Any linear combination of the is also normally distributed and any subset of the
has a joint normal distribution.
• Consistency:
○ Lei dos grandes números:

• Unbiasedness of the OLS estimators: the average of the OLS estimates across an infinite
number of samples equals the true parameter value.
• Problem: we don't know
○ We don't know the standard error, so we cannot use the normal

• distribution.
○ Follows a distribution.
○ Under some hypothesis, this is likely to be near zero.

• • How to study hypothesis about a particular
○ ○ The are unknown features of the population and we will never
know them with certainty.
○ Once have been accounted for, has no effect on the
expected value of • has the same sign as
○ How many standard
deviations is away from 0.
• When we have all the values negative, but we know that it is not reasonable, we can create
an Alternative Hypothesis.
○ Need to define a range outside which we will reject the null.
• significance level
(probability of rejecting
when it is true).
• We set the region in a semetric way such that the probability we reject the null when we are
on the right and on the left is the same.

• I only reject if the
statistic is very large but
positive.
○ Define rejection
region: values
above .
○ The maximum
probability is
• Usually we split the rejection

region in two areas with the
same probability.
○ We reject the null if t
statistic is below or
above .
• has a ceteris paribus effect on
without specifying whether the
effect is positive or negative.
• When a specific alternative is not
stated, it is usually considered to
be two sided.
• Square of labor market

experience: for low levels of
experience the effect is
positive.
○ For high values it is
measuring age, not
experience.
○ For low levels of
experience it has a
positive effect.
○ People earn less as
they get older.
 Apart from
people that are
in the same
firm.
○ It has a quadratic
shape.
○ Sign is negative (like a
bell shaped curve).

this value is not a likely value.
plain rejection of the null.

• is statistically significant, at the 5%
level.
• We will never accept that the effect of

education on wages can be negative.
• is statistically insignificant, at the
5% level.

• Take the observed -statistic, take the error
to the right and the left and sum them.
• Reject the null: if -value is smaller than the
significance level.
• The smallest significance level at which the
null hypothesis would be rejected.
○ Probability of observing a statistic as
extreme as we did if the null
hypothesis is true.
• It is a probability: always between and
• It is especially important to interpret the magnitude of the coefficient, in addition to looking at
statistics, when working with large samples.
○ With large sample sizes, parameters can be estimated very precisely: standard errors are often
quite small relative to the coefficient estimates, which usually results in statistical significance.
○ Using smaller significance levels as the sample size increases: a way to offset the fact that
standard errors are getting smaller.
• Range of likely values for the population
parameter.
○ Not just a point estimate.
• Once we estimate the probability interval
there is not a probability of the interval.
○ Probability statement is before we
have the sample.
○ will be inside of of those

intervals.
○ Probability statement on , not on if the beta is as estimator, the CI is just the a random
interval.
○ The probability that this interval covers the parameter ( ) is
○ Sample 1:
○ …
○ Sample 3:
○ The true parameter will be inside of these intervals.
○ Example: my sample gives this confidence interval at
 There is no probability associated with this interval: it is or , because the IC is not
random, it's a number.
 We would like to get one of the intervals that contain the real parameter.
• If random samples were obtained over and over again, with the Confidence Interval computed each
time, then the (unknown) population value would lie in the interval for of the samples.
○ For the single sample that we use to construct the , we do not know whether is actually
contained in the interval.
• We cannot assign a probability to the event that the true parameter value lies inside that interval.
• If the null hypothesis is , then is rejected against at the significance level
if, and only if, is not in the confidence interval.
• This random interval "covers" with probability 95%.
• If I take a bunch of samples (all possible) 95% of these confidence intervals will have the true inside.

•
• If, after adding some more variables, the residual sum of squares doesn't fall substantially, maybe the
new variables are irrelevant.
○ If it does not decrease enough, we believe in the null.
○ If one of the new variables is really important, the will decrease a lot the null is not
appropriate to this reality.
○ number of parameters that the null is equalling to zero (number of restrictions).
 Number of restrictions imposed in moving from the unrestricted to the restricted model.
 Number of independent variables that have been dropped.
 Numerator degrees of freedom.

 Denominator: unbiased estimator of in the unrestrited model.
• Null hypothesis: a set of variables has no effect on , once another set of variables has been controlled.
• Alternative hypothesis: is not true.
○ At least one of the parameters is different from zero.
• We cannot use statistic: tests an hypothesis that puts no restrictions on the other parameters.
• bigger than the original one, because we add more variables.

○ is always nonnegative.
• For small values of we do not reject the null.
○ If it becomes really large, we will reject the null.
○ We reject the null when is sufficiently large (how large depends on our significance level).
• Test of joint significance: we are testing many parameters.

• Reject the null: are jointly statistically
significant at the appropriate significance level.
○ This test alone doesn't allow us to say which of
the variables has a partial effect on .
• Not to reject the null: are jointly
statistically insignificant at the appropriate
significance level.
○ Often justifies dropping them from the model.
• Often useful for testing exclusion of a group of
variables when the variables are highly correlated.
• We are testing if there is any

effect of experience on
wages.
• In testing multiple restrictions in the multiple regression model under the Classical assumptions,
we are more likely to reject the null that some coefficients are zero if:
○ The R-squared of the unrestricted model is large relative to the R-squared of the restricted
model.
○ The Residuals sum of squares of the restricted model is very large relative to that of the
unrestricted model.
• When using an F test for multiple exclusion restrictions in a multiple linear regression model the
intuition behind the null.
○ Under the null hypothesis the F-statistic is likely to be small, hence, the null hypothesis is
rejected if the residual sum of squares decreases enough upon inclusion of the regressors
that should be excluded under the null.
•

• Null: all the parameters but the constant are equal to zero.
○ None of the explanatory variables has an effect on
○
○
• Restricted model:
• If we don't reject the null hypothesis: we must look for other variables to explain .
• The test of overall significance of the model is a test to ascertain if all coefficients, excluding the
intercept, are equal to zero.
•

Multiple regression analysis: Asymptotic properties
sexta-feira, 11 de Outubro de 2013
12:41
• Asymptotic properties or large sample properties of estimators and test statistics:

○ Are not defined for a particular sample size.
○ They are defined as the sample size grows without bound.
○ Even without the normality assumption, and statistics have approximately and
distributions, at least in large sample sizes.
• Consistency: minimal requirement

for an estimator.
• Consistent: as the sample size increases the probability of getting a closer estimate increases,
near 1.
○ consistent for :


• Clearly, this estimator is not
unbiased.
• increases probability of being
centered on the true gets
higher.
• is still biased.
• If, as the sample size grows, the

probability of being close to the real
parameter value converges to 1.
•
○ As
• Consistency and
unbiasedness
don't imply each
other.
• If
○ Now, if
○ So, by the Law of the iterated expectations:

○ Conclusion:
○ any function of the explanatory variables is uncorrelated with
○ each is uncorrelated with
• Law of the large numbers: under certain

conditions, sample averages converges in
probability to the population average.
○ If are i.i.d.
○ Under certain conditions, or
•
•
○ Using law of large
numbers.

• Unbiasedness consistency. • For practical purposes, inconsistency is the same as bias.
○ ○ Expressed in terms of the population variance of
○ and the population covariance between and ,
○ while biased is base on their sample counterparts
(because we condition on the values of and in
 the sample).

• Consistency unbiasedness. • probability limit.
• Correlation between and any of

generally causes all of
the OLS estimators to be
inconsistent.
• If the error is correlated with any
of the independent variables, then
OLS is biased and inconsistent.
• asymptotic bias.
• If and are uncorrelated in the population: and is a consistent estimator of

○ If the covariance between and is small relative to the variance of , the
inconsistency can be small.
• We cannot avoid inconsisency by adding more observations to the sample.
○ The OLS estimator gets closer to as the sample size grows (problem gets
worse).
• If and are uncorrelated but and are correlated: OLS estimators and will
generally both be inconsistent (the intercept will also be inconsistent.)
○ The inconsistency in arises when and are correlated, as it is usually the case.
○ If and are uncorrelated, then any correlation between and does not result in
the inconsistency.
• We forget normality but we loose something.

○ We need a larger sample in order to
conduct inference safely.
• Central limit theorem: OLS estimators are
approximately normally distributed in large
enough sample sizes.
○ OLS estimators satisfy asymptotic
normality.
○ With more independent variables in
the model, a larger sample size is
usually needed to use the
approximation.
○ Requires homoskedasticity.

• We forget normality and
require more observations.
•
•
○ Asymptotic standard error (converges to zero).
○
 Bigger sample sizes are better.

• Idea is to see if there is any correlation between and
some variables that were previously in the model.
○ Regression on the original variables.
• Run the restricted model and check if there is any
correlation between the residuals and any variable.
○ This would be
evidence
against the null.
• degrees of freedom.
• We reject the null if the observed test statistic is too high.
1. Regress on the restricted set of independent variables and save the residuals, .
2. Regress on all of the independent variables and obtain the R-squared, say, (to
distinguish it from the R-squareds obtained with as the dependent variable).
3. Compute [the sample size times the R-squared obtained from step (ii)].
4. Compare LM to the appropriate critical value, , in a distribution; if , the null
hypothesis is rejected. Even better, obtain the -value as the probability that a random
variable exceeds the value of the test statistic. If the -value is less than the desired
significance level, then is rejected. If not, we fail to reject . The rejection rule is
essentially the same as for testing.
• Degrees of freedom are not important: because of the asymptotic nature of the LM
statistic.
○ All that matters is the number of restrictions being tested , the size of the
auxiliary R-squared and the sample size
• Seemingly low value of the R-squared can still lead to joint significance if is large.

Example
•
•
• We run the regression on all the
variables.
○ There is evidence against the
null.
(p. 179) • OLS is asymptotically

efficient among a
certain class of
estimators under the
GM assumptions.
• Cauchy-Schwartz inequality:

Multiple regression analysis with qualitative information:
12:54 Binary (or dummy) variables
• Controlling for other stuff, the

average wage of males is higher
than females.
○ Evidence in favour of the
hypothesis that there is
discrimination.
○ In order to infer we should
control for as many factors as
positive.
• Qualitative independent variables: come in form of binary information.

○ Relevant information can be captured by defining a zero-one variable.
○ Define which event is assigned the value one and value zero.
• Dummy variable only changes the intercept

parameter.
• difference in between the and not ,
given the same amount of and the same error
term.
○ Intercept shift: does not depend on the
amount of other factors.
• Base group or benchmark

group: group against which
comparisons are made.

• We cannot have a constant and two
variables for a binary characteristic.
• Dummy variable trap: too many

• dummy variables describe a
given number of groups.
○ We need to have women and man.

○ Value of the regressors cannot be the same for all the individuals.
• This doesn't violate

MLR.3 if we have, in our
sample, supporters of
Sporting, Benfica and at
least someone that
doesn't supports any of
these teams.
• We must have at least

one guy on each group.
• If the regression model is to have different intercepts for groups or categories, we need to
include dummy variables in the model along with an intercept.
○ The intercept for the base group is the overall intercept in the model, and the dummy
variable coefficient for a particular group represents the estimated difference in
intercepts between that group and the base group.
○ Including dummy variables along with an intercept will result in the dummy variable
trap.
○ An alternative is to include dummy variables and to exclude an overall intercept.
 More difficult to test for differences relative to a base group.
 Regressiong packages usually change the way is computed when an overall
intercept is not included.

• If there are too
many categories,
we can put them
into groups.
• We must specify a base group: the ones that don't appear.

○ We cannot run this regression on every countries: in Switzerland everyone has some schooling.
• We now have six

categories.
• Base group:
average female.

• Interaction between
dummy variables and
not dummy
explanatory variables.
• Suppose that the effect of

education is different on
man and women.
○ Interact the dummy
with the other
variable.
○ We can have a
different intercept
and different slope
for these two
groups.
• For low levels of education, the average wage for man is higher than for women.
• Effect of education is stronger on women than for man.
○ So, for higher levels of education females have higher wages on average.
•
○ difference in the slopes of the regressions for the two groups defined by
○ difference in the intercepts of the regressions for the two groups defined by

Linear Probability Models (p. 246)
sexta-feira, 17 de Janeiro de 2014
10:31
•
○ Dependent variable is a binary variable: 0 or 1.
○ Changes completely the interpretation we have seen so far.
•
○ How the probability of changes with and
○
○
• Interpretation:
• Example:
○ Probability of getting a job.
○ The probability of getting a job decreases 15 percentage points as we get one year older.
○ if the individual gets older by 1 year, the probability of getting a job decreases by 15 percentage
points.
•
○ It is heteroskedastic.
○ LPM are intrinsically heteroskedastic (the variance depends on the explanatory variables).
 Unless all the estimated coefficients or the regressors are equal to zero.
 If
○ Correct for heteroskedasticity:


• OLS is not able to ensure that all the probabilities are between 0 and 1.
○ We need to ensure that all the fitted values are in the interval.

○ If they are not between 0 and 1:
 If we have a large sample: drop that individuals.
 If we have a small sample: put them near 0 or 1, depending on the nearer neighbourhood.
• Why can't we estimate directly LPM by WLS?
• Why should we take a two-step procedure to estimate LPM by WLS?
•

Multiple Regression Analysis: Further Issues
13:35
• Changing the units of measurement does not affect

• The specific numbers can change, but the interpretation remains the same.
○ Preserve all measured effects and testing outcomes.
• Data scaling: used for cosmetic purposes.
○ To reduce the number of zeros after a decimal point in an estimated coefficient.
○ Improve the appearance of an estimated equation while changing nothing that is
essential.
•
○ This doesn't change the significance of the parameters.
• Substract from
○
• Multiply the variable:
○
○ No change in significance nor interpretation.
• Remove a constant from the regressor .
○
○ Equivalent to the original model.
○ Only changes the intercept.
• Apply all the rules
•
simultaneously.
○ Disappears the
constant term.
○ How much variation
in the 's I need to
get a strong effect in
.
○ Does not include
intercept because it
will be zero.
• After standardizing coefficients we can see which one is more "strong".

• The effect of
education is
stronger than
experience.
• If we work with standardized education:
○
• They might be biased, but still consistent.
• Logs cannot be applied if a variable takes on zero or negative
values.
○ Nonnegative variables:
• Drawback: it is more difficult to predict the original variable.
○ Original model allows us to predict not
○ It is not legitimate to compare from models where
the is the dependent variable in one case and is
the dependent variable in the other.
 This measures explain variations in different
variables.
• Interested in
elasticities or
semi-elasticities:
we should apply
logs.
• Variables that can appear in both ways: proportion or

percentage.
○ Unemployment rate, participation rate,

percentage of students passing an exam.
○ Percentage point interpretation.
• When models using as the dependent variable often satisfy the CLM assumptions more
closely than models using the level of
○ Strictly positive variables often have conditional distributions that are heteroskedastic or skewed;
taking the log can mitigate, if not eliminate both problems.
• Taking logs usually narrows the range of the variable.
○ Estimates are less sensitive to outlying (or extreme) observations on the dependent or
independent variables.

• To capture increasing or
decreasing marginal effects.
• The slope of the relationship
between and depends on
the value of .
○ Estimated slope:
• Effects of one additional unit on : not only .

○ We should take the derivative on .
○ If the second derivative is negative:
• Applying the formula we get

the level of experience that
maximizes wages.
○ On which the average
wage starts
decreasing.
• In general, if workers keep
their jobs, this doesn't
happen.
○ If they loose their job
they get a lower wage.
• It is natural for the

partial effect,
elasticity or semi-
elasticity of the
dependent variable
with respect to an
explanatory variable
to depend on the
magnitude of yet
another explanatory
variable.
• At some area, more rooms would be bad for the house price.

• When we add an
additional variable,
the increases
always a little bit.
• We want something
that ony increases if
the variable is
relevant.
• When increases, decreases,
decreases.
○ For fixed
• The change on the must be
substancial to increase the adjusted
• Preferred measure to choose between the goodness of fit of models with the same .
• it does not correct the bias (the ratio between two unbiased estimator is not an
unbiased estimator).
○ Imposes a penalty for adding additional independent variables to a model.
○ If we add a new independent variable to a regression equation, increases if, and only
if, the statistic on the new variable is greater than one in absolute value (if the
statistic for joint significance of the new variables is greater than unity).
○
• Controlling for too many factors in Regression Analysis:
○ Sometimes it makes no sense to hold some factors fixed precisely because they should be
allowed to change when a policy changes.
• If we remember that different models serve different purposes, and we focus on the ceteris
paribus interpretation of regression, then we will not include the wrong factors in a regression
model.
• We should always include independent variables that affect and are uncorrelated with all of
the independent variables of interest.
○ Adding such a variable does not induce multicollinearity in the population.
○ It reduces the error variance.
• Predictions are useful but

subject to sampling variation,
because they are obtained
using OLS estimators.

•
○
○
○
•
○
○ y
•
• We want
• Our model becomes:
○ Run the regression on
• Because the variance of the intercept estimator is smallest when each explanatory variable has
zero sample mean , it follows that the variance of the prediction is smallest at the mean values of
the .
○ for all
○ As the values of the get farther away from the , gets larger and larger.
• confidence interval for the average value of for the subpopulation with a
given set of covariates.
○ A confidence interval for the average person in the subpopulation is not the same as a
confidence interval for a particular unit from the population.
○ In forming a confidence interval for an unknown outcome on , we must account for
another very important source of variation: the variance in the unobserved error, which
measures our ignorance of the unobserved factors that affect .
• The variance is the sum of the variances, because the individual is not included in the sample.
• I won't be able to know the , so we will just treat him as the average guy.
○ Assume that this guy is not in the sample.
• In order to estimate we will use the with the respective values of the estimated regression.
○ has mean zero.

• Conditional on the :
○
○ The is not in the sample.

○ The variance of the error does not change with the sample size.
○ for large samples, it can be very small (proportional to )
•
• After estimating our model, we should look at the residuals and see which information is there.
○ To see whether the actual value of the dependent variable is above or below the predicted
value.

Heteroskedasticity
13:12
• Homoskedasticity fails whenever the variance of the unobservables changes across different
segments of the population, where the segments are determined by the different values of the
explanatory variables.
• MLR.5 played no role in

showing whether OLS was
unbiased or consistent.
• and are both consistent
estimators of the population
whether or not the
homoskedasticity assumption
holds.
○ Both variances are
unconditional variances.
○ consistently
estimates whether
or not
is constant.
• For low levels of , the distribution of wages is concentrated around the average.
• As the education grows, the wages are more spread around the mean.
○ The average wage is higher, but the variance is also higher.
• The usual OLS statistics do not have distributions in the presence of heteroskedasticity.
○ The problem is not resolved by using large sample sizes.
○ statistics are no longer distributed, and the statistic no longer has an asymptotic
chi-square distribution.
○ The statistics we used to test hypotheses under the Gauss-Markov assumptions are not
valid in the presence of heteroskedasticity.
• If is not constant, OLS is no longer BLUE.
○ Is no longer asymptotically efficient (with relative large sample sizes, it might not be so
important to obtain an efficient estimator).

• statistics are valid
to large samples,
even if we don't rely
on normality.
• For unbiasedness we can still use the OLS.

• Without heteroskedasticity the variance of the error term is varying.
• depends on the
individual (one variation for
each individual).
• Without homoskedasticity we
can still compute the variance
with OLS estimator.
• Usual OLS variance for SLR:
• Substitute: by
○ .
• We want a lot of variation in the regressor. • This average is an

○ Not obvious for individual observations. estimate of the first one.
○ It might not always increase when we have large variation in • We are not estimating
variable x, because we have that ponderation: variance is also the variance of one guy
changing. using the residuals of that
guy.
• Valid estimate of the
variance with both homo
and heteroskedasticity.
○ The correspondent
-statistic also
works with both of
them.

usually overestimates the variance for small samples.
• It is robust, because even

if we have
homoskedasticity we can
use this alternative -
statistic.
• Robust variances and statistics.

○ Valid in the presence of heteroskedasticity of unknown form.
○ We can report new statistics that work regardless of the kind of heteroskedasticity present
in the population.
• It is not accurate in small samples.
○ Whether we have homoskedasticity or heteroskedasticity.
• Works both with homoskedasticity and heteroskedasticity.
• In general, I'm not throwing assumptions if they are irrelevant.
• Homoskedasticity is important or leads to more precise estimates if we rely on it and it is true.
○ If it doesn't hold, maybe we can solve the problem.
○ We can transform the problem into one where it holds.
○ We can still use it, but the approximation will be poorer if we don't take the condition of
being homoskedastic, and it hods.
• We can do a better job without relying on homoskedasticity. • Using heteroskedasticity-robust
• We must first see if there is a problem with OLS. standard errors to perform hypothesis
○ Test if there is a problem. testing on a multiple linear regression
○ Detect whether there is heteroskedasticity or not. model for cross-sectional data because:
○ The error term of the model is for
sure heteroskedastic, but the form
of heteroskedasticity is unknown.
○ We are not sure the
homoskedasticity assumption
holds. • Robust OLS
statistics and -
statistics.
○ Test
statistics for
multiple
restrictions.

• Problem: we don't know the .
○ We use the OLS, take the residuals.
•
•
•
○
 Test whether the is related in expected value to one or more of the explanatory variables.
 If is false, the expected value of , given the independent variables, can be virtually any
function of the
○
 Either or statistics for overall significance of the independent variables in explaining
can be used to test
○
• If there is statistical evidence that there is heteroskedasticity, we should try to correct it.
• If there is no significant evidence, we can conclude that there is homoskedasticity.

•
○ OLS residuals from original model.
• Overall significance test.
1. Estimate the model by OLS and obtain the squared OLS residuals (one for each observation).
2. Run the regression. Keep the R-squared from this regression
3. Form either the statistic or the statistic and compute the p-value
 If the -value is sufficiently small (below the chosen significance level) then we reject the null
hypothesis of homoskedasticity.
• If the test results in a small enough p-value, some corrective measure should be taken.
• Limitation: we are focusing on a linear relation between the 's and the

• If we suspect that heteroskedasticity depends only upon certain independent variables, we can easily
modify the Breusch-Pagan test: we simply regress on whatever independent variables we choose
and carry out the appropriate or test.
○ The appropriate degrees of freedom depends upon the number of independent variables in the
regression with as the dependent variable; the number of independent variables showing up
in equation is irrelevant.
○ If the squared residuals are regressed on only a single independent variable, the test for
heteroskedasticity is just the usual statistic on the variable.
○ A significant statistic suggests that heteroskedasticity is a problem.
• Using the Breush-Pagan F-test for heteroskedasticity in a multiple linear regression model, the intuition
behind the rejection of the null:
○ Under the null hypothesis the correlation betwwen squared errors and the regressors is likely to
be low, hence, the null hypothesis is rejected if the of a regression of squared residuas on the
regressors is high.
○
○ As simple as the original one, but has more parameters, so it is more accurate.
• Weakness of the White test: it uses too many degrees of freedom for models with just a moderate
number of independent variables.
•
○ Regress on and :

•
○
○
○ fitted values (functions of the independent variables and the estimated parameters).
• Adequate when the variance is thought to change with the level of the expected value .
• Steps of the Special Case of the White Test for Heteroskedasticity:
1. Estimate the model by OLS and obtain the OLS residuals and the fitted values . Compute
the squared OLS residuals and the squared fitted values .
2. Run the regression in equation. Keep the R-squared from this regression .
3. Form either the F or LM statistic and compute the p-value (using the distribution in the
former case and the distribution in the latter case).
• If MLR.4 is violated—in particular, if the functional form of is misspecified—then a test for
heteroskedasticity can reject , even if is constant.
when we know the variance function

• Gauss Markov assumptions in
this model.
○ We can get the variance
of the WLS estimator, in
this model, as we do
usually with the OLS
model.
• Here we should either know the form of
heteroskedasticity or estimate it.
• If we have correctly specified the form of the variance (as a function of explanatory variables), then
weighted least squares is more efficient than OLS, and WLS leads to new and statistics that have
and distributions.
• Derived it so we could obtain estimators of the that have better efficiency properties than OLS.
• If the original equation satisfies the first four GM assumptions, then the transformed equation
satisfies them too.
• If has a normal distribution, then has a normal distribution with variance
• Less weight is given to observations with a higher error variance.

• The squared residuals (if we run the regression by WLS) are weighted by and the transformed
variables are weighted by (if we first transform the variables and run OLS).
• A weighted least squares estimator can be defined for any set of positive weights.
○ OLS: special case that gives equal weight to all observations.
• Example:
○ Original model:
○
○

 This model is BLUE.
 We get the variance of the estimators in the usual way.
○
○
○ Estimate the transformed model: weighted least squares. • weights each squared
 Lower variance than the OLS estimator. residual by the inverse of the
○ Interpretation: in the original model. conditional variance of
• given
○ Efficient procedure.
○ Regression of the transformed variable one on all the other transformed variables.
 Total variation on variable one transformed.
○ The of the transformed model.
○ of regression of transformed variable 1 on all the other "transformed" variables.
• Tipically, along with the dependent
○ As this model is BLUE, it must be lower than the variance of the OLSand independent variables in the
estimator.
○ original model, we specify the
 If there is no heteroskedasticity, the variance is the same. weighting function .
 Function of would be 1 (makes no sense to do it). • Forces us to interpret weighted least
squares estimates in the original
model.
○ Estimates and standard errors
will be different from , but
the interpretation is the same.
• Instead of giving equal weight to each

residual (OLS), we minimize the nominador
and give different weights to each residual.
• Give more weight to the guys that we
know are closer to the mean and that are
more reliable.
○ It is more efficient (has a lower
variance).
• The least squares, but each observation has a diferent weight.

○ Big , small weight; small , big weight.
• We are giving more weight to the guys that are closer to what we want (average).
○ Less weight to the guys that are far from what we want.
○ : giving more weight to the guys with lower variance.
 At least in the first region, the straight line would be very closed to the truth one.

• Problem: we don't know
the function of the
variance.
when we don't know the variance function. • has a mean unity, conditional on .
○ Assume that is actually
independent of
• zero mean and

independent of .
• This is something that we can

estimate ( doesn't depend
on the )
• Relation between and
• In most cases, it is difficult to find the function .

○ We can model the function and use the data to estimate the unknown parameters in the model.
○ Using instead of yields an estimator called the feasible GLS estimator.
• Linear models do not ensure that predicted values are positive and our estimated variances must be
positive in order to perform .
• Feasible GLS procedure to correct for heteroskedasticity (p. 283)
1. Run the regression of on and obtain the residuals, .
2. Create by first squaring the residuals and then taking the natural log.
3. Run the regression of on and obtain the fitted values, .
4. Exponentiate the fitted values from on : .
 Just a way of guarantee that we will get positive values for the variance.
5. Estimate the equation by , using weights .
 If instead we first transform all variables and run OLS, each variable gets multiplied by
including the intercept.
• If we could use rather than in the procedure, our estimators would be unbiased.
○ They would be the best linear unbiased estimators.
• is consistent and asymptotically more efficient than .
• Null hypothesis must be stronger than homoskedasticity: and must be independent.
• In order to get the regression model out of the model of the variance of the error term.
• equal to all the constants that we get when we take the logs.
○

• We want to get the variance or u given x.
○ We try to get a regression line from the regression model.
○ Get the parameters
 Fitted values are estimates of the logs of the variance.
○ Substitute above and get the parameters.
• Get firstthe residuals form running the OLS.

Regression Analysis with Time Series Data
sexta-feira, 1 de Novembro de 2013
12:37
• We don't believe in the

assumption that
○ Though condition: we
cannot play with the
parameters such that this
is zero.
○ Random sampling dies.
• Can always write:

○ Where:
○ Just need to define and in the right way.
○ Let me put:
○
○ This forecast is efficient.
○ If we compute , our new and will converge to the real ones.
• Some correlation between close observations.
○ Positive or negative.
• There is a temporal order.
○ The past can affect the future, but no vice-versa.
• Most time series processes are correlated over time, and many of them strongly correlated.
○ They cannot be independent across observations, which simply represent different time periods.
○ Even series that do appear to be roughly uncorrelated – such as stock returns – do not appear to
be independently distributed.
• Stochastic process or a time series process: sequence of random variables indexed by time is called a
stochastic process or a time series process. (“Stochastic” is a synonym for random.) When we collect a
time series data set, we obtain one possible outcome, or realization, of the stochastic process.
○ We can only see a single realization, because we cannot go back in time and start the process over
again.
However, if certain conditions in history had been different, we would generally obtain a different
realization for the stochastic process: we think of time series data as the outcome of random
variables.
○ Population: set of all possible realizations of a time series process.
○ Sample size: number of time periodes over which we observe the variables of interest.
• Example:
○ Can put , and such that
○ does not have more information than both regressors together.
○
○
○ Seeing the relation between what happens in with what happened in t.
○ If we want to evaluate this model,
○ To make the exercise of predict , we only see the data until then.
We do not know information about the present or the previous quarter.

○ We do not know information about the present or the previous quarter.
○
○


• Random variables that generate this
data, are not independent.
○ This is not random sampling.
○ Observations close in time, are
correlated with each other.
• 1st thing to do: plot the data.
• Example 1: Negative correlation between the Unemployment rate and Dow Jones.
○ If the unemployment rate is low: firms are optimistic about the future, firms will invest and
people will invest in the firms (everyone is happy and the Dow will go up.
 Firms are hiring people, making a lot of investment.
○ If something affects employment, everyone is pessimistic, loosing their jobs, demand falls
and every price will fall.
○ News from the behaviour of the employment market strongly affect the Dow of that day.
• We cannot say where is the causality coming from.
○ Just by observing the data we can conclude many things: correlation and trend.
First Models
Static model • Static models:
• Relates variables and data at time contemporaneous. • Allow past values to affect the current y.
○ ○ Finite: finite number of lags.
• Modeling a contemporaneous relationship between and
• When a change in at time is believed to have an immediate effect on
○ when
• Measuring the tradeoff between and
Finite distributed lag models

• Allows one or more variables to affect with a lag.
○
○ More generally, a finite lag model of order will include lags of
• Allow past values of to affect the current .
○ Finite: finite number of lags.

• To focus on the ceteris paribus effect of on , set the error term in each time period to zero.
• impact propensity or impact multiplier.
○ Always the coefficient on the contemporaneous
○ When we omit : impact propensity is zero.
• Long run propensity (LRP): sum of the coefficients on current and lagged
○ Long run change given a permanent increase in
• If increases by one unit forever, we get:
○
○ increases
○ increases
○ Impact on doesn't exist.
• Shock on at time
○
○
○
○ At time , the doesn't show up.
○
 disappears.
• If the shock is permanent:
○ We are in a lower level in and increases in and stays there forever.
○ also changes.
○ The impact in will be
• Lag distribution: graph as a function of
○ Summarizes the dynamic effect that a temporary increase in has on
• Specifying a linear
model for time
series data.
• Explanatory variables can

be correlated: it rules out
perfect correlation in the
sample.
○ None of them is
constant.

• The error at time , , is uncorrelated with each explanatory variable in every time period.
○ A shock in the error term today is not correlated with the explanatory variables at any time
period.
○ Strongest concept: present, past and future.
• is independent of and , then Assumption automatically holds.
○
○ Implies that and the explanatory variables are contemporaneously uncorrelated.
 for all
• Contemporaneous exogeneity: a shock in the error term today is not contemporaneously
uncorrelated with the vector of the explanatory variables today.
○
• Sequential exogeneity:
○ Past values, not future.
• Assumption requires more than contemporaneous exogeneity:
○
○ must be uncorrelated with even when
○ Explanatory variables: are strictly exogenous.
• In a time series context, random sampling is almost never appropriate, so we must explicitly
assume that the expected value of is not related to the explanatory variables in any time
periods.
○ Puts no restriction on correlation in the independent variables or in the across time.
○ Only says that the average value of is not related to the explanatory variables in any time
periods.
○ can have no lag effect.
 If it does, we should estimate a distributed lag model.
○ Excludes the possibility that changes in the error term today can cause future changes in
• Anything that causes the unobservable at time to be correlated with any of the explanatory
variables in any time period causes Assumption TS.3 to fail.
○ Omitted variables and measurement error in some of the regressors.
• Cross sectional:
○ Plus random sampling because we had random sampling (plus the initial
condition, that implies this).
• Here, we are not assuming random sampling, so we assume directly.
•
○
○

Example 1:

• Example 1:
○ There is correlation between and the future police force.

○ Just arguments, not a test.
• Example 2: does not affect the future rainfall.
○ Past rain affects the present farm yield.
○ We can always include past regressors.

• We need not to include correlation present and future regressors.
○ All that future events are correlated with present facts.
○ We cannot control for future events that we cannot know.
○ We can include predictions.
• Assumption: the correlation between and the (only the regressors at time ).
• With this last assumption, we garantee not unbiasedness but consistency.
• Conditional on , the variance

of is the same for all
○ If not: if often depends
on the explanatory
variables at time
• cannot depend on
it is sufficient that and
are independent.
• is constant over
time.
• We can use OLS and

derive the usually OLS
variance, estimator of
is the same, still
• When inflation is high, it is more volatile. unbiased.
○ More difficult to forecast the prices and every value. • OLS remains BLUE.
○ More volatile T-bill rates.
• Conditional on , the errors in two different time periods are uncorrelated.
• Assume that error terms have zero correlations.
○ Not autocorrelated.
• This assumption is playing the role of random sampling.
○ We need this to prove that the variance is the following expression.
• When it fails: errors suffer from serial correlation or autocorrelation: correlated across time.
○ Suppose that when then, on average, the error in the next time period, , is also
positive. Then, , and the errors suffer from serial correlation.
• Assumes nothing about temporal correlation in the independent variables.
• Correlation of the explanatory variables across observations does not cause problems for verifying
the Gauss-Markov assumptions.

• A lot of different
things have a
trend.
• Under assumptions through , the

OLS estimators are the best linear unbiased
estimators conditional on .
• Under Assumptions through , the CLM assumptions for time series, the OLS estimators are
normally distributed, conditional on .
○ Under the null hypothesis, each statistic has a distribution, and each statistic has an F
distribution.
○ The usual construction of confidence intervals is also valid.
• It is very hard to
argue causation
effects between
and , if they have
trend.
• Nothing about
trending variables
necessarily violates
the classical linear
model assumptions
TS.1 through TS.6.
• Unobserved, trending factors that affect might also be correlated with the explanatory variables.
○ If we ignore this possibility, we may find a spurious relationship between and one or more
explanatory variables (finding a relationship between two or more trending variables simply because
each is growing over time).
•
○ Allowing for the trend explicitly recognizes that may be growing or shrinking
over time for reasons essentially unrelated to and .
○ If satisfies assumptions TS.1, TS.2, and TS.3: then omitting from the regression and regressing
on , will generally yield biased estimators of and .
 Omitted an important variable , from the regression.
 Especially true if and are themselves trending: they can be highly correlated with .
• Sometimes, adding a time trend can make a key explanatory variable more significant.
○ If the dependent and independent variables have different kinds of trends.
• Spurious regression: we may find a statistic significant relation between and just because they are
trending together.
○ Phenomenon of finding a relationship between two or more trending variables simply because each is
growing over time.

growing over time.
○ Adding a trend can make a key explanatory variable more significant:
 If the dependent and independent variables have different kinds of trends,but movement in the
independent variable about its trend line causes movement in the dependent variable away
from its trend line.
1. Include the trend in the model: (average change in over the sampling
period).
2. Use detrended data:
 (residuals of regression of on ).
 (residuals of regression of on ).
 Then, estimate the model with detrended data:

• Something that we don't observe but
that we can try to control.
• If it seems that there is a trend, we can
write the models as a straight line
(trend).
• time period.
•
○ intercept.
○ slope.
○ exactly
 Every time t advances by one
period, y increases (out of
nothing) by
• Take logs and then it looks linear.
• Approximately the growth rate.
○ approximately the growth
rate between two periods.
•
•
•
• ≈ growth rate.
○
○ Since
○
•
○ If we don't include the trend in the model: we will get unbiased estimator if:

 If doesn't have a trend or this trend is not correlated with (there is no trend on ).
○ We are in trouble if has a trend or the regressors have a trend.
○ The might seem to be related to one or more of the simply because each contains a
trend.
○ If the trend is statiscally significant, and the results change in important ways when a time
trend is added to a regression: initial results without a trend should be treated with suspicion.
○ Include a trend in the regression if any independent variable is trending, even if is not.

•
• do the OLS here.

• Or:
○ detrended .
 Difference between and its trend.
 Doesn't have the trend anymore: we remove the trend.
○ Do the same with the :

 detrended
○ Run the regression of detrended y on detrended .
 We will get exactly the of the big model.
 Look at the R-squared of this regression: look at the impact of the on .
• Including a time trend in a regression model: new interpretation in terms of detrending the original
data series before using them in regression analysis.
• R-squareds in time series regressions are often very high: time series data often come in aggregate
form (aggregates are often easier to explain than outcomes on individuals, families, or firms, which is
often the nature of cross-sectional data).
○ Usual and adjusted R-squareds for time series regressions can be artificially high when the
dependent variable is trending.
• can substantially overestimate the variance in .
• Simplest method: compute the usual R-squared in a regression where the dependent varibale has
already been detrended.
○ When contains a strong linear time trend, the R-squared of the detrended model can be
much less than the usual one.
○ It nets out the effect of the time trend.
• In computing the R-squared form of an statistic for testing multiple hypotheses: use the usual R-
squareds without any detrending.
○ The R-squared form of the F statistic is just a computational device: the usual formula is always
appropriate.

• Just because of the time of the year,
the is higher or lower.
○ We must take this into account.
○ Monthly and quarterly data
series display seasonal patterns.
• Seasonally adjusted data: seasonal
factors have been removed from it.
○ Just periods within the year.
○ It is not an issue with yearly
• Example: effect of advertising in sales. data because the year is not
○ associated with any season.
○ Collect the data and run the regression. • Controlling for specific events.
○ We know that beer sales looks like the previous graphic.
• Example:
○
○ Running a regression on deseasonalized y on the deseasonalized x and get the same
○ We should look on the R-squared of the regression of the deseasonalized stuff.
 It would be very high because of the dummy variables on the original model, just
because of seasonality.
• Seasonal dummy variables: acount for seasonality in the dependent variable, the independent
variables, or both.
○ If the data are quarterly: include dummy variables for three of the four quarters, with the
omitted category being the base quarter.
○ It may be usegul to interact seasonal dummies with some of the to allow the effect of
on to differ across the year.
○ Deseasonalizing the data.
• If has pronounced seasonality, a better goodness-of-fit measure is an R-squared based on the
deseasonalized .
○ Nets out any seasonal effects that are not explained by the .

Time Series - Further Issues
13:36
• Time series is generated by this random variables: and

○
○
 Take any collection and see the distribution.
○ If I shift this collection periods, the distribution of the new collection will be the same as
the original one.
 For every : we can shift any amount we want.
 The distribution of one is not the same: the distribution of the shift set is equal to the
distribution of the non-shifted set.
 All the have the same distribution: is identically distributed.
○ Whatever the data I choose,
○ The correlations, expected values, variances between the observations doesn't change if I
move forward in time.
○ The behaviour of the time series is more and less constant.
○ does not depend on
○ does not depend on
○ Behaviour of the time series does not change across time.
 Unconditional: without observing anything.
• Stationary process: a stochastic process whose joint probability distribution does not change when
shifted in time.
○ Consequently, parameters such as the mean and variance, if they are present, also do not
change over time and do not follow any trends.
• Stationarity implies
○ Stationarity time series process: probability distributions are stable over time: if we take any
collection of random variables in the sequence and then shift that sequence ahead h time
periods, the joint probability distribution must remain unchanged.
○ Stationarity is an aspect of the underlying stochastic process and not of the available single
realization: it can be very difficult to determine whether the data we have collected were
generated by a stationary process.
○ The variance is also constant across the time.
○ Variance changed (more or less frequent), mean changed or correlation changed (low-high-
low-high, to low-low-high-high).
○ Implies that: mean, variance, probabilities, 3rd and 4th moments must be constant.
○ With stationarity can have
 We cannnot have:
• Nonstationary process.

• i.i.d.
1. Mean must be constant.

2. Variance must be constant.
3.
○ Variances don't depend on , so the covariance don't depend on .
○ If the events in the far future become almost independent of the events today, we have
weak independence: the correlation desapears over time.
○ If we go to the future, the past becomes irrelevant.
○ Correlation equal zero does not imply independence.
○
○ For any depends only on and not on
 Depends only on the distance between the two terms and not on the location of the
initial time period.
• Weak dependent:
○ Restrictions on how strongly related the random variables and can be as the time
distance between them ( ) gets large.
○ A covariance stationary time series is weakly dependent if the correlation between and
goes to zero “sufficiently quickly” as .
 Because of covariance stationarity, the correlation does not depend on the starting
point, .
○ In other words, as the variables get farther apart in time, the correlation between them
becomes smaller and smaller.
○ Covariance stationary sequences where are said to be
asymptotically uncorrelated.
• is a weighted average of and

○ Adjacent terms in the sequence are correlated.
○ In the next period, we drop and then depends on and
•
•

• Weakly dependent process:
○
○
○ Variables in the sequence that are two or more periods apart, these variables are
uncorrelated because they are independent.
 is independent of because is independent across
• only depends on
○
○
○ For observations two periods apart there is not common shock: there is no correlation.
•
•
•
○
○
○
 I don't know values, so we have to forecast it.

○

○ if
• and this can only happen if
•
○ Since
•
○ correlation coefficient
between any two adjacent
terms in the sequence.
• Although and are correlated
for any , this correlation gets
very small for large : because

• Expected value of is not a constant (follows
a trend of increasing or decreasing).
• If we include the trend in the model, we can still use this set of assumptions.
○ We can skip stationarity including deterministic stationarity.
○ Stationary around the trend.
• not stationary, because it depends on
○ But independent of
○
○ Stationary around the trend: if we remove the trend, they are stationary.
• Weak dependence: data behaves well (in a similar way across time).
○ Stationary: dinamics are similar across time.
○ Observations become independent if we are really far apart.
○ Trend stationary (just need to include the trend in the model).
○ The law of large numbers and the central limit theorem can be applied to sample averages.
• TS.2': If we define this very carefully we can get rid of the exogenated X.

• TS.3': the regressors of all time series are uncorrelated with the error term is unrealistic.
○ Relax strictly exogeneity assumption.
○ The explanatory variables are contemporaneously exogenous.
○ It puts no restrictions on how is related to the explanatory variables in other time
periods.
○ By stationarity, if contemporaneous exogeneity holds for one time period, it holds for them
all.
 Relaxing stationarity would simply require us to assume the condition holds for all
○ The following consistency result only requires to have zero unconditional mean and to
be uncorrelated with each :

• We are basing our model on a crazy set of assumptions.
• Contemporaneous exogeneity: we can have correlation between today's regressors and future
error terms.
• Strictly exogeneity:
○ If there is correlation with past error terms, we should include past error terms.
○ In general, we don't have the actual event in the future affecting present facts.
 We just have expectations.
○ contemporaneous exogeneity.
 Much weaker and can be more reasonable.
• With these assumptions we can prove that the OLS estimator is consistent.
○
○ Not necessarily unbiased.

• Weaker homoskedasticity
assumption.
• : weaker than
homoskedasticity.
○ The errors are
contemporaneously
homoskedastic.
• : only conditional on the .

○ We condition only on the explanatory variables in the time periods coinciding with and
• Serial correlation is often a problem in static and finite distributed lag regression models: nothing
guarantees that the unobservables ut are uncorrelated over time.
• We need our data to be stationary and weakly independent.
• With TS.1' to TS.5': we get asymptotic normality.
○ Usually f and t statisitcs.
• If we don't care about inference, we can estimate de OLS estimators resourcing only to
.

• at time is obtained by starting at the previous value, , and adding a zero mean random
variable that is independent of yt 1.
○ First: find the expected value of
○ Taking the expected value of both sides gives:

 The expected value of a random walk does not depend on .
A popular assumption is that (the process begins at zero at time zero) in which case,
for all .
• By contrast, the variance of a random walk does change with .
○ To compute the variance of a random walk, for simplicity we assume that is nonrandom so
that .
○ By the i.i.d. assumption for

○ In other words, the variance of a random walk increases as a linear function of time.
 This shows that the process cannot be stationary.
• Stuff that is always the previous value plus some shock is not stationary.
•
○
○ Variance increases with t: it is not stationary.
○ Whenever we suspect that y_t is the previous value plus some shock we cannot use the usual
methods: it is not stationary nor weakly dependent.
• is not becoming independent of .
○ It still tells me something about .
○ Displays highly persistent behavior in the sense that the value of today is important for
determining the value of in the very distant future.
○ No matter how far we are in the future: out best prediction is today's value.
○ It does not become independent as h goes to infinity.
• Example: returns on stock market.
○
○ Returns are unpredictable: the best guess for the log of the price is the log of the price of the
previous day.

• A random walk is asymptotically uncorrelated but it is not weakly dependent.
• Correlation depends on the starting point.
○ not covariance stationary.
• Does not satisfy the requirement of an asymptotically uncorrelated sequence.
• Example:
○

○
 The error terms depend on t.
• Example 2:
○
○
• today is highly correlated with even in the distant future.
• Random walk with drift:
○
○ drift term.
○ To generate , the constant is added along with the random noise to the previous value
.
○ The expected value of follows a linear time trend by using repeated substitution:

○ Therefore, if : the expected value of is growing over time if and
shrinking over time if
○ , and so the best prediction of at time is plus the drift .
○ The variance of is the same as it was in the pure random walk case.

• Doesn't need to be around the mean.
○ Expected average time to reach zero, on average, for all possible random walks is infinity.
• Non-stationary stuff.
• First: look and plot the data.
○ Visual inspection: see if it is stationary or whether it has a trend.
• Transform random walks with or without drift into a weakly dependent process.
• Weakly dependent processes are said to be integrated of order zero, or .
○ Nothing needs to be done to such series before using them in regression analysis: averages
of such sequences already satisfy the standard limit theorems.
• Unit root processes, such as a random walk (with or without drift), are said to be integrated of
order one, or .
○ The first difference of the process is weakly dependent (and often stationary).
○ A time series that is is often said to be a difference-stationary process.
○ Differencing time series before using them in regression analysis has another benefit: it
removes any linear time trend

Time Series Data: Serial Correlation and Heteroskedasticity
13:35
• In the presence of
serial correlation:
estimator is
no longer BLUE.
• We have two options:

1. Use and robust standard errors.
2. Try to correct the problem with some .
 We can be, on average, closer to the true parameter value.
• contemporaneous exogeneity.
•
○
○ Remaining dynamics will go to
• Violates
○ We are requiring to explain all the dynamics of , so it is very strict.
○ Everything else is on the error.
• in order to have stability.
• Check or :
○
○ did not show
rigorously.
 This cannot be zero.
○ Past don't affect present ones.

• If :
○ The usual formula of OLS variance formula understates the true variance of the
OLS estimator.
○ If is large or has a high degree of positive serial correlation: bias in the usual OLS
variance estimator an be substantial.
○ General statistics will be often too large.
○ The OLS standard errors almost certainly underestimate true standard deviation in .
 The usual confidence intervals and statistics invalid.
• If
○ negative when is odd and positive when is even.
○ Usual OLS variance formula overstates the true variance of .
• The serial correlation in the errors

will cause the usual OLS statistics
to be invalid for testing purposes,
but it will not affect consistency.
• , and
the expected value of ,

given all past , depends on
two lags of
• Provided the data are stationary and weakly dependent: R-squared and adjusted R-squared are
valid.
○
○ The variance of both the errors and the dependent variable do not change over time.
○ By the law of large numbers: and are still consistent.
•
○

○
○ i.i.d. satisfies the no serial correlation assumption.
○ If we rewrite the model with we kill the serial correlation.
• Assumptions that we can use when we use lag variables as independent variables.
○ We cannot rely on strict exogeneity.
• Easy solution: include as many lags as possible to solve this problem.

○ this
○ Add more lags to solve the problem.
• doesn't happen.
○ ○ We must rely ont
○ Could include lags to get rid of serial correlation. contemporaneous exogeneity:
○ .
○ Contemporaneous exogeneity
○ in this model:

 Impossible: is directly generated by . • We can no longer rely on strict
 If is high, will be high. exogeneity.

•
○ Usual OLS standard error:

○ When the errorsin a regression model have AR(1) serial correlation, OLS standard errors tend
to underestimate the sampling variation of .
○ When the dependent and independent variables are in level (or log) form, the AR(1)
parameter, , tends to be positive in time series regression models.
○ The independent variables tend to be positive correlated, so – which is
what generally appears in the equation when the do not have zero sample average –
tends to be positive for most and .
○ If , or if the is negatively autocorrelated, the true standard deviation of is
actually less than
• Null hypothesis:
there is no serial
correlation.
•
○
 That is why we don't need the intercept.
 It is just a slightly different average.
○ If were observed as i.i.d. we aplly immediately the asymptotic normality results.
○ If is clearly weakly dependent.
○ This does not work, because the errors are never observed.
 We replace the error with the corresponding OLS residual.
 Because of strict exogeneity assumption, the large-sample distribution of the statistic is
not affected by using the OLS residuals in place of the errors.
○

•
○ Quarterly data with seasonality.
• Testing for AR(1) Serial Correlation with Strictly Exogenous Regressors:
1. Run the OLS regression of on and obtain the OLS residuals, .
2. Run the regression of on obtaining the coefficient on and its t statistic.
 This regression may or may not contain an intercept.
the statistic for will be slightly affected, but it is asymptotically valid either way.
3. Use to test against in the usual way.
 Since is often expected a priori, the alternative can be
 Typically, we conclude that serial correlation is a problem to be dealt with only if is
rejected at the 5% level.
 It is best to report the p-value for the test.

• In deciding whether serial correlation needs to be addressed:
○ With a large sample size, it is possible to find serial correlation even though is practically
small; when is close to zero, the usual OLS inference procedures will not be far off.
• The test can detect other kinds of serial correlation than AR(1).
○ is a consistent estimator of the correlation between and .
○ Any serail correlation that causes adjacent errors to be correlated.
○ It does not detect serial correlation wherer adjacent errors are uncorrelated.
• Without strict exogeneity we need to include also the regressors.
○
• Under the null hypothesis of no serial correlation, the Durbin Watson statistic should be close to 2.
• Tests based on and test based on are conceptually the same.
○ Requires the full set of classical linear model assumptions, including normality of the error
terms.
○ This distribution depends on the values of the independent variables, on the sample size, the
number of regressors, and whether the regression contains an intercept.
○ Only allows to test Serial Correlation of order 1.
○ Assumes strict exogeneity.
•
○ no serial correlation (says nothing about serial correlation of different
orders).
○
○ high serial correlation.
○
• Compare with two sets of critical values in case of positive serial correlation:
○ upper.
 If fail to reject
○ lower.
 If reject
○ If test is inconclusive.
• Error term used is the one of the original model, not the one of the DW of the test we are taking.

• One or more are
correlated with
•
○ makes test valid without strict exogeneity (only
contemporaneous), by including all the
• Problem with other tests: only control for .
○ It only said that is uncorrelated with , but it could be correlated with .
• Get by OLS and run on the others.
• Alternative is to compute the R-squared of the previous distribution.

• Run the regression of on to obtain the F test for joint significance
of and
○ If these two lags are jointly significant at a small enough level: reject and
conclude that the errors are serially correlated.
• Breush-Godfrey test:
○ Use the Lagrange multiplier:
○ Under the null
• Quarterly or monthly data that have not been seasonally adjusted: test for seasonal forms of serial
correlation.

• Only thing not working: no serial correlation assumption.
○ Assumptions TS.1 through TS.4 hold.
• • We want to have estimates
○ Since is i.d.d. and it turns out that is stationary. that are, on average, closer
○ to the parameter.
 As is stationary,

○

 By stationarity:


Feasible GLS estimation of the AR(1) Model (p. 421)

2. Run the regression of on obtaining the coefficient on and its statistic.
 The statistic for will be slightly affected, but it is asymptotically valid either way.
3. Apply OLS to to estimate

 The usual standard errors, statistics, and statistics are asymptotically valid.
Testing for AR(2)

•
•
•
•
○

• We get a new regresion with the error term
being
○ And we assumed that was i.i.d.
• We can only do this transformation for
○ We can run a model without the first

observation.
○ My sample goes for
• We can run OLS.
• There is no serial
•
correlation between
•
and
• As we don't know we cannot compute the regression for
• Homoskedasticity is
•
being violated.
○ The error terms are serial uncorrelated.
○ We cannot make
○
the
○ If we knew we could estimate and by regressing on . transformation
○ Provided we divide the estimated intercept by for , so we
• We must multiply by to get errors with the same variance. use the .
○ We use this error
BLUE estimators of and under assumptions TS.1-4 and the AR(1) model for
•
• Unless , the GLS estimator (OLS on the transformed data) will generally be different from the
original OLS estimator.
○ GLS is BLUE and and statistics from the transformed equation are valid.
• This transformed model satisfies all the assumptions and thus we can use OLS.
• For inference, we will need stationarity and weak dependence (which are assumed).

• The OLS estimator
in the transformed
model is called the
GLS estimator.
• We can do tests of
hypothesis in the
transformed
models.
• Plus stationarity and weak dependence.

• Asymptotic inference (approximate inference, because we don't have the distributions).
• Asymptotically, it makes no difference whether or not the first observation is used.

• In practice, both the Cochrane-Orcutt and Prais-Winsten methods are used in an iterative scheme.
○ Once the FGLS estimator is found using we can compute a new set of residuals, obtain a
new estimator of , transform the data using the new estimate, and estimate
by OLS.
○ We can repeat the whole process many times, until the estimate of changes by very little
from the previous iteration.
○ Many regression packages implement an iterative procedure automatically, so there is no
additional work for us.
○ It is difficult to say whether more than one iteration helps.
○ Neither Cochrane-Orcutt nor Praise-Winsten estimators are OLS estimators and they are
usually different from each other.
• When we have to estimate the we don't get an unbiased FGLS estimator: consistent and weakly
dependent.
○ The feasible GLS estimator has no tractable finite sample properties.
○ It is not BLUE.
• In order to have consistent estimators, we always need stationarity and weak dependence.
• Unbiased: linearity in the parameters, no perfect collinearity and strict exogeneity.
• Consistent: linearity and stationarity and weak dependence (TS.1', TS.2' and TS.3').
• It is asymptotically more efficient than the OLS estimator when the AR(1) model for serial
correlation holds (and the explanatory variables are strictly exogenous).

• We can do usual
inference.
• More efficient
than OLS.
• Situation: we detected the

problem but we cannot correct
for it.
• Serial correlation and
heteroskedasticity robust
standard errors.
○ Killing two problems at
once.
• The explanatory variables may not be strictly exogenous.

○ FGLS is not even consistent, let alone efficient.
○ In most applications of FGLS, the errors are assumed to follow an AR(1) model.
○ It may be better to compute standard errors for the OLS estimates that are robust to more
general forms of serial correlation.
• If the only problem is heteroskedasticity, the previous process is too much.
○ If we have one problem, we should solve that problem only.
○ Use only heteroskedasticity robust standard errors.
• They allow us to do the usual OLS inference using the t-test.

• Cost of using robust standard errors.
○ We can start using it from the beggining.
○ Problem: the inference and estimators will be crap.
 Compared to a situation where we get rid of heteroskedasticity and serial correlation.
• Computing an SC-robust standard error after quasi-differencing would ensure that any extra serial
correlation is accounted for in statistical inference.
○ SC-robust standard errors probably work better after much serial correlation has been
eliminated using quasi-differencing.
• The SC-robust standard errors after OLS estimation are most useful when we have doubts about
some of the explanatory variables being strictly exogenous, so that methods such as Prais-
Winsten and Cochrane-Orcutt are not even consistent.
○ Also in models with lagged dependent variables.

• Usually, for time series the
samples are quite small.
○ The robust indicators
are bad for small
samples.
○ Test for
heteroskedasticity in
time series
regression: in
relatively small
samples.
○ Any serial correlation
will invalidate a test
for
heteroskedasticity.
• The for observation 1 can be correlated with of observation .
•
○
○
○ For the statistic to be valid: the errors are themselves homoskedastic and serially
uncorrelated.
 Rules out certain forms of dynamic heteroskedasticity.
• If heteroskedasticity is found in (and the are not serially correlated), the heteroskedasticity-
robust test statistics can be used.
○ Alternative: weighted least squares.
•
○ With time series data, we can only do inference if we correct both problems at the same time.
• This does not
violate the
homoskedasticity
assumption.
• The variance of
my error term
today is
correlated with
the variance of
the error term of
the past.
• If contains a lagged dependent variable: heteroskedasticity is dynamic.

○ Dynamic forms of heteroskedasticity can appear even in models with no dynamics in the
regression equation.
• An ARCH regression does not imply serial correlation: the squared residuals can be autocorrelated
but the OLS residuals themselves are not.

•
•
• variance of u_t conditional on X.
•
○ Dummy: 1 if and 0 otherwise.
• First order ARCH model:

○
○ Conditional variance of given past errors only if (errors are serialy
uncorrelated).
○ Since conditional variances must be positive:
 If there are no dynamics in the variance equation.
○
• OLS is still BLUE in ARCH error.
○ We can have ARCH effects and this doesnot violate any of the previous procedures.
• in order to have stability.
○ When squared errors contain positive serial correlation eventhough the
themselves do not.
• Having serial correlation of the error term does not violate Gauss Markov assumptions, so OLS
estimators are still BLUE.
○
○ Even if is not normally distributed: usual OLS test statistics are asymptotically valid under
TS.1-5 (satisfied by statis cnd distributed lag models with ARCH errors).
• Importance of ARCH forms:
○ It is possible to get consistent (but not unbiased) estimators of the that are asymptotically
more efficient than the OLS estimators (weighted least squares procedure or a maximum
likelihood procedure also works under the assumption that the errors ut have a conditional
normal distribution).
○ Interested in dynamics in the conditional variance.
 Variance is often used to measure volatility, and volatility is a key element in asset
pricing theories, ARCH models have become important in empirical finance.
• We cannot speak about asymptotically: because most data is very short.
○ Most data is available since the 60's.
•

• fails
○
○
 i.i.d.
 Quasi-difference data.
• Unit root data.

• Not stationary:
distribution depends on
• Take logs to make the

distribution less
variable and make the
differences.
•
•
• We get a series that is quite stable around

zero.
• first difference data.
•
○
• Maybe it is candidate do ARCH estimators.

• Doesn't make sense to have calendar effects if the markets are efficient.
• 1st lag: on average, tomorrow's return will be +0,34 the returns of today.
•
○ We could see the statistically significance of each coefficient of the lags.
○
○
• Test for seasonality:
○
○
○
○ Up to 10% significance level we are not able to reject the null, and thus we can't reject the
hypothesis that there is no seasonality.
 Stock returns do not depend on the days of the week.
• This model was estimated for a different time period from before.
•
○ It does not happen.

• The variance across time is quite different.
• Volatility is not constant over time.
○ Probably we could have a ARCH model.
• We can find ARCH

effects with
heteroskedasticity.
• If the squared residuals are the dependent variable, it means that we are looking for the variance of our
model and testing for heteroskedasticity.
•
•
○ White test.
○
○
○ The model is heteroskedastic:
 It is really high, so we plain reject the null of homoskedasticity.
○ We reject the null hypothesis
○ We are just looking for the overall significance of the model.
• Testing for serial correlation without strict exogeneity.
○
○ How can we know if the error term is correlated with the error terms of past periods.
○
○
○
○ The F statistic of this null hypothesis is not equivalent to the F of all model.
• Test for ARCH problem.
○ Take the residuals from the original model and square the residuals.

• satisfies a stable AR(1)
process.
• Much of the time serial correlation is viewed as the most important problem: it usually has a larger
impact on standard errors and the efficiency of estimators than does heteroskedasticity.
○ Obtaining tests for serial correlation that are robust to arbitrary heteroskedasticity is fairly
straightforward.
○ If we detect serial correlation using such a test, we can employ the Cochrane-Orcutt (or Prais-
Winsten) transformation and, in the transformed equation, use heteroskedasticity robust
standard errors and test statistics.
○ Or, we can test for heteroskedasticity using the Breusch-Pagan or White tests.
• We need to correct for both at the same time.
○ Through a combined weighted least squares AR(1) procedure.
• Correcting for both problems: serial correlation and heteroskedasticity.
• The error is heteroskedastic in addition of containing serial correlation.
○
• The transformed equation has ARCH errors.
○
○ If we know the particular kind of heteroskedasticity we can estimate the using standard
CO or PW methods.

• We need to compute and apply it to the previous equation.
• It is called feasible GLS because we need to estimate .
Feasible GLS with Heteroskedasticity and AR(1) Serial Correlation:

1. Estimate by OLS and save the residuals, .
2. Regress on (or on ) and obtain the fitted values .
3. Obtain the estimates of : .
4. Estimate the transformed equation by

standard Cochrane-Orcutt or Prais-Winsten methods.
○ The feasible GLS estimators obtained from the procedure are asymptotically efficient.
○ All standard errors and test statistics from the CO or PW estimation are asymptotically valid.
○ If we allow the variance function to be misspecified, or allow the possibility that any serial
correlation does not follow an AR(1) model: apply quasi-differencing, estimating the
resulting equation by OLS, and then obtain the Newey-West standard errors.
 Using a procedure that could be asymptotically efficient while ensuring that our
inference is valid (asymptotically) if we have misspecified our model of either
heteroskedasticity or serial correlation.

Examples
• :
○ Almost any factor influences simultaneously inflation and unemployment rate.
○ Any factor will affect inflation today and future unemployment.

• If unemployment is high, wages won't go up and inflation is stable.
• We expect a negative coefficient to
• As Durbin Watson is far from 2, we have to test for serial correlation.
• Test for AR(1) serial correlation: we take the residuals from the previous regression and run a regression.
○
○ Regress the previous residuals on one period lag residuals.
• Robust to serial correlation: include on this regression the original regressors.

• We include here the regressor C.
• We still reject the null of No Serial Correlation.
• Natural rate is constant: it will go to the constant terms in the model.

• We get a more natural coefficient to unemployment (negative) and a higher Durbin-Watson statistic.

• Depending on the significance level we would reject or not the null hypothesis.
• Most likely, the no serial Correlation assumption is not verified so we are going to try to correct
for it.

• We build a new dependent value, and a new regressor.
• We still include a constant:
○ We don't get directly: if we are really interested on that, we have to do some calculations.
• We get the directly.
• We interpret them in the original model, always.
• We apply the transformation and get the coefficient.

○ The coefficient that was positive, gets negative when we get rid of Serial correlation.
○ No Serial Correlation is not innocent: can transform a positive coefficient in a negative one.
• We can trust that our estimator FGLS is closer to the true parameter value.
• We know that is not directly

• Cochrane-Orcutt is only valid with strict exogeneity.
• If we know that it follows and AR(3), we should use

• Eventhough the coefficient doesn't change, the new robust more correct standard error tells us that
we shouldn't be confident about that.

• We should use first differences on unemployment and inflation.
• Because both variables are not stationary.
• There is a period (70's and 80's) where there is a higher mean than in the rest of the periods.

Tests and Corrections
domingo, 12 de Janeiro de 2014
16:51
Feasible GLS procedure to correct for heteroskedasticity (p. 283)

1. Run the regression of on and obtain the residuals, .
2. Create by first squaring the residuals and then taking the natural log.
3. Run the regression of on and obtain the fitted values, .
4. Exponentiate the fitted values from on : .
 Just a way of guarantee that we will get positive values for the variance.
5. Estimate the equation by , using weights .
 If instead we first transform all variables and run OLS, each variable gets multiplied by
including the intercept.
Testing for AR(1) Serial Correlation with Strictly Exogenous Regressors:

3. Use to test against in the usual way.
 Since is often expected a priori, the alternative can be
 Typically, we conclude that serial correlation is a problem to be dealt with only if is rejected at
the 5% level.
 It is best to report the p-value for the test.
Testing for AR(2)

•
•
•
•

Feasible GLS estimation of the AR(1) Model (p. 421)

3. Apply OLS to to estimate

 The usual standard errors, statistics, and statistics are asymptotically valid.
Feasible GLS with Heteroskedasticity and AR(1) Serial Correlation:

1. Estimate by OLS and save the residuals, .
2. Regress on (or on ) and obtain the fitted values .
3. Obtain the estimates of : .
4. Estimate the transformed equation by standard

Cochrane-Orcutt or Prais-Winsten methods.
○ The feasible GLS estimators obtained from the procedure are asymptotically efficient.
○ All standard errors and test statistics from the CO or PW estimation are asymptotically valid.
○ If we allow the variance function to be misspecified, or allow the possibility that any serial
correlation does not follow an AR(1) model: apply quasi-differencing, estimating the resulting
equation by OLS, and then obtain the Newey-West standard errors.
 Using a procedure that could be asymptotically efficient while ensuring that our inference is
valid (asymptotically) if we have misspecified our model of either heteroskedasticity or serial
correlation.

Econometrics - Sebenta

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics - Sebenta

Uploaded by

Copyright:

Available Formats

Introduction

quarta-feira, 11 de Setembro de 2013

• João Valle e Azevedo

Econometrics - Carolina Oliveira - Página 1

Econometrics - Carolina Oliveira - Página 2

Econometrics - Carolina Oliveira - Página 3

Econometrics - Carolina Oliveira - Página 4

• and are two variables, representing some

• it's a linear relation.

• This assumption is not very realistic in a time

• The values of are not all the same.

• Crucial assumption: does not vary with .

ability is not the same on both groups.

Econometrics - Carolina Oliveira - Página 5

○ We are only characterising the

• The difference between and

• OLS ordinary least squares.

Econometrics - Carolina Oliveira - Página 6

• We need variation on , otherwise the

Econometrics - Carolina Oliveira - Página 7

• Distance between each and the line:

• Idea: minimise the sum of squares of

Econometrics - Carolina Oliveira - Página 8

○ Sample covariance between and

• We can decompose the total sum of squares in

Econometrics - Carolina Oliveira - Página 9

• Bias: diference between the

• Distribution across samples of is bell shaped.

• We are just rewriting , to find expected

Econometrics - Carolina Oliveira - Página 10

• The shape doesn't need to be the same in all

• Heteroskedastic: more probable.

Econometrics - Carolina Oliveira - Página 11

Econometrics - Carolina Oliveira - Página 12

Econometrics - Carolina Oliveira - Página 13

Econometrics - Carolina Oliveira - Página 14

• Population model or true model.

MLR.3: no perfect collinearity:

Econometrics - Carolina Oliveira - Página 15

• Number of the parameters: .

Econometrics - Carolina Oliveira - Página 16

• this is always lower.

• vector which includes all the observations of a given sample.

Econometrics - Carolina Oliveira - Página 17

• The first assumption show

• The expected value of a vector

Econometrics - Carolina Oliveira - Página 18

○ Given , all the is a constant.

• This matrix is simetric

• Given the variance of the OLS

Econometrics - Carolina Oliveira - Página 19

• We want to understand the effect of on .

Econometrics - Carolina Oliveira - Página 20

Econometrics - Carolina Oliveira - Página 21

• Underspecifying the model.

Econometrics - Carolina Oliveira - Página 22

• We want a lot of variation on

Econometrics - Carolina Oliveira - Página 23

 function of the values of all the independent variables.

Econometrics - Carolina Oliveira - Página 24

Econometrics - Carolina Oliveira - Página 25

• Implies that is also normal distributed.