Professional Documents
Culture Documents
•
•
•
•
•
•
•
•
•
•
•
○
○
• Overall significance of the model:
○
• With heteroskedasticity:
○
○
○
•
○
• ARCH:
○ probably the
○
•
• Implies that:
○ Only how the average value of changes with
○ Not that for all units in population.
•
•
•
○ By assumption:
• ratio of the explained variation
compared to the total variation (negative -
p.83) (Low indicates: there is a low
correlation among the regressors in the
sample).
• Including irrelevant variables in a multiple linear regression
does not increase the variance of the OLS estimator of the
remaining parameters if the irrelevant variables are
uncorrelated with the relevant variables.
• If we minimise the we are maximizing the
• and are both positive.
• is smaller than .
○
○ Just need to show that
○
is always positive.
Can't be estimated when
• Low has nothing to do with the
assumption of zero mean.
• This can be econometrically low, but can also
be high.
•
○ Disadvantage: it always increases when
• Relevant or having a partial effect:
we add a variable whose
• Statistically significant is different.
• What we do to get
○ Compute
○ Take
○
○ we assume that the
sample is random.
The individuals that we pick from
the population are picked
randomly.
We take conditional
expectatitions and then we take
it off.
○
○
• Fact:
• Level-level: when changes by one unit, it is predicted that changes by , ceteris paribus.
• Level-log:
○
○ If changes by 1%, it is predicted that changes by units, ceteris paribus.
• Log-level:
○
○ semi-elasticity.
○ If changes by one unit, it is predicted that will change by .
• Log-log:
○
elasticity.
If changes by 1%, it is predicted that will change by , ceteris paribus.
○
○ Unbiased only if:
○ necessarily if:
and are uncorrelated in the sample.
The estimated effect is equal to zero.
• If
○ It's okay: no bias and the variance is the same.
• If
○ This model is better than the usual one.
• If
○ More efficient but biased.
• Given the variance of the OLS estimators in the matrix form it is possible to:
○ Know the variance of the individual estimators as well as their covariances.
○ Derive the variance of a sum of individual estimators.
○ Derive the variance of a linear combination of estimators.
• In a multiple linear regression model, the random sampling assumption implies that the variance
covariance matrix of the errors is diagonal (all the elements out of the diagonal are
equal to zero).
•
○ If MLR.4 holds, it is only
•
•
• Variances as vectors.
•
•
•
○
○
○ Simple model:
Including or not including is irrelevant in terms of biasedness.
If is biased (in general, this is what happens).
○ If and are uncorrelated, and the redefined error term are uncorrelated.
○ Bias depends on correlation (education, ability) and on
If there is low correlation: bias is small.
If is low: bias is small.
• Original model:
• Transformed model:
•
○
• and are uncorrelated, but that is correlated with : ○ Bias:
○ is correlated with the omitted variable, but is not.
○ is uncorrelated with .
○ Both and will normally be biased.
○ Only exception: when and are also uncorrelated.
1. Error variance ( ): larger means larger variances for the OLS estimator.
○ Has nothing to do with the sample size (from population).
○ Unknown component.
○ Only one solution to reduce this: add more explanatory variables to the equation.
2. Total sample variation in ( ): the larger it is, the smaller is the variance of OLS estimator.
○ We prefer as much sample variation in as possible.
○ Solution: increase the sample size.
3. Linear relationships among the independent variables ( ).
○ Proportion of the total variation in that can be explained by the other independent variables
appearing in the equation.
○ Best case: (when has zero sample correlation with every other independent
variable).
○ Worst: (high, but not perfect correlation between two or more independent variables -
multicollinearity).
It is better to have less correlation between and the other independent variables.
Problem can be mitigated by collecting more data.
Questions may be too subtle for the available data to answer with any precision.
Change the scope of the analysis and lumping all the variables together.
• Variances in misspecified models:
○ Choice whether to include a particular variable in a regression model can be made by analysing
the trade off between bias and variance.
• Estimator: rule that can be applied to any sample of data to produce an estimate.
• Unbiased:
• Linear: an estimator is linear if, and only if, it can be expressed as a linear function of the data on
the dependent variable.
○ Gauss Markov theorem: when the standard set of assumptions holds, we do not need to look
for alternative unbiased estimators of the parameters: none will be better than OLS.
If we are presented with an estimator that is both linear and unbiased, then we know
that the variance of this estimator is at least as large as the OLS variance; no additional
calculation is needed to show this.
Theorem justifies the use of OLS to estimate multiple regression models.
If any of the Gauss-Markov assumptions fail, then this theorem no longer holds.
•
• Droping
assumptions has
a cost in terms of
precision of the
• Independence is stronger than the zero conditional mean assumption. estimation.
○ You can tell me everything about , but it tells me nothing about .
○
○ It means that the independence is stronger than the second condition.
○ Implies MLR.4 and MLR.5.
• If is independent of 's, information on 's says nothing about .
○ Knowing , changes nothing.
• Classical linear model (CLM) assumptions:
○ MLR.1 to MLR.6.
○ Contain all the Gauss-Markov assumptions plus the assumption of a normally distributed
error term.
○ Stronger efficiency property.
•
•
• follows a
distribution.
• Any linear combination of the is also normally distributed and any subset of the
has a joint normal distribution.
• Consistency:
• When we have all the values negative, but we know that it is not reasonable, we can create
an Alternative Hypothesis.
○ Need to define a range outside which we will reject the null.
• significance level
(probability of rejecting
when it is true).
• We set the region in a semetric way such that the probability we reject the null when we are
on the right and on the left is the same.
○ Probability statement on , not on if the beta is as estimator, the CI is just the a random
interval.
○ The probability that this interval covers the parameter ( ) is
○ Sample 1:
○ …
○ Sample 3:
○ The true parameter will be inside of these intervals.
○ Example: my sample gives this confidence interval at
There is no probability associated with this interval: it is or , because the IC is not
random, it's a number.
We would like to get one of the intervals that contain the real parameter.
• If random samples were obtained over and over again, with the Confidence Interval computed each
time, then the (unknown) population value would lie in the interval for of the samples.
○ For the single sample that we use to construct the , we do not know whether is actually
contained in the interval.
• We cannot assign a probability to the event that the true parameter value lies inside that interval.
• If the null hypothesis is , then is rejected against at the significance level
if, and only if, is not in the confidence interval.
• This random interval "covers" with probability 95%.
• If I take a bunch of samples (all possible) 95% of these confidence intervals will have the true inside.
• In testing multiple restrictions in the multiple regression model under the Classical assumptions,
we are more likely to reject the null that some coefficients are zero if:
○ The R-squared of the unrestricted model is large relative to the R-squared of the restricted
model.
○ The Residuals sum of squares of the restricted model is very large relative to that of the
unrestricted model.
• When using an F test for multiple exclusion restrictions in a multiple linear regression model the
intuition behind the null.
○ Under the null hypothesis the F-statistic is likely to be small, hence, the null hypothesis is
rejected if the residual sum of squares decreases enough upon inclusion of the regressors
that should be excluded under the null.
•
○
• Restricted model:
• If we don't reject the null hypothesis: we must look for other variables to explain .
• The test of overall significance of the model is a test to ascertain if all coefficients, excluding the
intercept, are equal to zero.
•
• Consistent: as the sample size increases the probability of getting a closer estimate increases,
near 1.
○ consistent for :
• Consistency and
unbiasedness
don't imply each
other.
• If
○ Now, if
○ So, by the Law of the iterated expectations:
○ Conclusion:
○ any function of the explanatory variables is uncorrelated with
○ each is uncorrelated with
•
•
○ Using law of large
numbers.
• asymptotic bias.
•
•
○
Bigger sample sizes are better.
• degrees of freedom.
• We reject the null if the observed test statistic is too high.
1. Regress on the restricted set of independent variables and save the residuals, .
2. Regress on all of the independent variables and obtain the R-squared, say, (to
distinguish it from the R-squareds obtained with as the dependent variable).
3. Compute [the sample size times the R-squared obtained from step (ii)].
4. Compare LM to the appropriate critical value, , in a distribution; if , the null
hypothesis is rejected. Even better, obtain the -value as the probability that a random
variable exceeds the value of the test statistic. If the -value is less than the desired
significance level, then is rejected. If not, we fail to reject . The rejection rule is
essentially the same as for testing.
• Degrees of freedom are not important: because of the asymptotic nature of the LM
statistic.
○ All that matters is the number of restrictions being tested , the size of the
auxiliary R-squared and the sample size
• Seemingly low value of the R-squared can still lead to joint significance if is large.
•
• We run the regression on all the
variables.
○ There is evidence against the
null.
• Cauchy-Schwartz inequality:
• If the regression model is to have different intercepts for groups or categories, we need to
include dummy variables in the model along with an intercept.
○ The intercept for the base group is the overall intercept in the model, and the dummy
variable coefficient for a particular group represents the estimated difference in
intercepts between that group and the base group.
○ Including dummy variables along with an intercept will result in the dummy variable
trap.
○ An alternative is to include dummy variables and to exclude an overall intercept.
More difficult to test for differences relative to a base group.
Regressiong packages usually change the way is computed when an overall
intercept is not included.
• For low levels of education, the average wage for man is higher than for women.
• Effect of education is stronger on women than for man.
○ So, for higher levels of education females have higher wages on average.
•
○ difference in the slopes of the regressions for the two groups defined by
○ difference in the intercepts of the regressions for the two groups defined by
•
○ Dependent variable is a binary variable: 0 or 1.
○ Changes completely the interpretation we have seen so far.
•
○ How the probability of changes with and
○
○
• Interpretation:
• Example:
○ Probability of getting a job.
○ The probability of getting a job decreases 15 percentage points as we get one year older.
○ if the individual gets older by 1 year, the probability of getting a job decreases by 15 percentage
points.
•
○ It is heteroskedastic.
○ LPM are intrinsically heteroskedastic (the variance depends on the explanatory variables).
Unless all the estimated coefficients or the regressors are equal to zero.
If
○ Correct for heteroskedasticity:
• OLS is not able to ensure that all the probabilities are between 0 and 1.
○ We need to ensure that all the fitted values are in the interval.
○ If they are not between 0 and 1:
If we have a large sample: drop that individuals.
If we have a small sample: put them near 0 or 1, depending on the nearer neighbourhood.
• Why can't we estimate directly LPM by WLS?
• Why should we take a two-step procedure to estimate LPM by WLS?
•
○
• They might be biased, but still consistent.
• Logs cannot be applied if a variable takes on zero or negative
values.
○ Nonnegative variables:
• Drawback: it is more difficult to predict the original variable.
○ Original model allows us to predict not
○ It is not legitimate to compare from models where
the is the dependent variable in one case and is
the dependent variable in the other.
This measures explain variations in different
variables.
• Interested in
elasticities or
semi-elasticities:
we should apply
logs.
• When models using as the dependent variable often satisfy the CLM assumptions more
closely than models using the level of
○ Strictly positive variables often have conditional distributions that are heteroskedastic or skewed;
taking the log can mitigate, if not eliminate both problems.
• Taking logs usually narrows the range of the variable.
○ Estimates are less sensitive to outlying (or extreme) observations on the dependent or
independent variables.
• At some area, more rooms would be bad for the house price.
• Preferred measure to choose between the goodness of fit of models with the same .
• it does not correct the bias (the ratio between two unbiased estimator is not an
unbiased estimator).
○ Imposes a penalty for adding additional independent variables to a model.
○ If we add a new independent variable to a regression equation, increases if, and only
if, the statistic on the new variable is greater than one in absolute value (if the
statistic for joint significance of the new variables is greater than unity).
○
• Controlling for too many factors in Regression Analysis:
○ Sometimes it makes no sense to hold some factors fixed precisely because they should be
allowed to change when a policy changes.
• If we remember that different models serve different purposes, and we focus on the ceteris
paribus interpretation of regression, then we will not include the wrong factors in a regression
model.
• We should always include independent variables that affect and are uncorrelated with all of
the independent variables of interest.
○ Adding such a variable does not induce multicollinearity in the population.
○ It reduces the error variance.
•
○
○ y
•
• We want
• Our model becomes:
○ Run the regression on
• Because the variance of the intercept estimator is smallest when each explanatory variable has
zero sample mean , it follows that the variance of the prediction is smallest at the mean values of
the .
○ for all
○ As the values of the get farther away from the , gets larger and larger.
• confidence interval for the average value of for the subpopulation with a
given set of covariates.
○ A confidence interval for the average person in the subpopulation is not the same as a
confidence interval for a particular unit from the population.
○ In forming a confidence interval for an unknown outcome on , we must account for
another very important source of variation: the variance in the unobserved error, which
measures our ignorance of the unobserved factors that affect .
• The variance is the sum of the variances, because the individual is not included in the sample.
• I won't be able to know the , so we will just treat him as the average guy.
○ Assume that this guy is not in the sample.
• In order to estimate we will use the with the respective values of the estimated regression.
○ has mean zero.
• After estimating our model, we should look at the residuals and see which information is there.
○ To see whether the actual value of the dependent variable is above or below the predicted
value.
• Homoskedasticity fails whenever the variance of the unobservables changes across different
segments of the population, where the segments are determined by the different values of the
explanatory variables.
• For low levels of , the distribution of wages is concentrated around the average.
• As the education grows, the wages are more spread around the mean.
○ The average wage is higher, but the variance is also higher.
• The usual OLS statistics do not have distributions in the presence of heteroskedasticity.
○ The problem is not resolved by using large sample sizes.
○ statistics are no longer distributed, and the statistic no longer has an asymptotic
chi-square distribution.
○ The statistics we used to test hypotheses under the Gauss-Markov assumptions are not
valid in the presence of heteroskedasticity.
• If is not constant, OLS is no longer BLUE.
○ Is no longer asymptotically efficient (with relative large sample sizes, it might not be so
important to obtain an efficient estimator).
• depends on the
individual (one variation for
each individual).
• Without homoskedasticity we
can still compute the variance
with OLS estimator.
• Substitute: by
○ .
○
○ As simple as the original one, but has more parameters, so it is more accurate.
• Weakness of the White test: it uses too many degrees of freedom for models with just a moderate
number of independent variables.
•
○ Regress on and :
• If we have correctly specified the form of the variance (as a function of explanatory variables), then
weighted least squares is more efficient than OLS, and WLS leads to new and statistics that have
and distributions.
• Derived it so we could obtain estimators of the that have better efficiency properties than OLS.
• If the original equation satisfies the first four GM assumptions, then the transformed equation
satisfies them too.
• If has a normal distribution, then has a normal distribution with variance
• Less weight is given to observations with a higher error variance.
This model is BLUE.
We get the variance of the estimators in the usual way.
○
○
○ Estimate the transformed model: weighted least squares. • weights each squared
Lower variance than the OLS estimator. residual by the inverse of the
○ Interpretation: in the original model. conditional variance of
• given
○ Efficient procedure.
○ Regression of the transformed variable one on all the other transformed variables.
Total variation on variable one transformed.
○ The of the transformed model.
○ of regression of transformed variable 1 on all the other "transformed" variables.
• Tipically, along with the dependent
○ As this model is BLUE, it must be lower than the variance of the OLSand independent variables in the
estimator.
○ original model, we specify the
If there is no heteroskedasticity, the variance is the same. weighting function .
Function of would be 1 (makes no sense to do it). • Forces us to interpret weighted least
squares estimates in the original
model.
○ Estimates and standard errors
will be different from , but
the interpretation is the same.
when we don't know the variance function. • has a mean unity, conditional on .
○ Assume that is actually
independent of
If instead we first transform all variables and run OLS, each variable gets multiplied by
including the intercept.
• If we could use rather than in the procedure, our estimators would be unbiased.
○ They would be the best linear unbiased estimators.
• is consistent and asymptotically more efficient than .
• Null hypothesis must be stronger than homoskedasticity: and must be independent.
• In order to get the regression model out of the model of the variance of the error term.
• equal to all the constants that we get when we take the logs.
○
○ Let me put:
○
○ This forecast is efficient.
○ If we compute , our new and will converge to the real ones.
• Some correlation between close observations.
○ Positive or negative.
• There is a temporal order.
○ The past can affect the future, but no vice-versa.
• Most time series processes are correlated over time, and many of them strongly correlated.
○ They cannot be independent across observations, which simply represent different time periods.
○ Even series that do appear to be roughly uncorrelated – such as stock returns – do not appear to
be independently distributed.
• Stochastic process or a time series process: sequence of random variables indexed by time is called a
stochastic process or a time series process. (“Stochastic” is a synonym for random.) When we collect a
time series data set, we obtain one possible outcome, or realization, of the stochastic process.
○ We can only see a single realization, because we cannot go back in time and start the process over
again.
However, if certain conditions in history had been different, we would generally obtain a different
realization for the stochastic process: we think of time series data as the outcome of random
variables.
○ Population: set of all possible realizations of a time series process.
○ Sample size: number of time periodes over which we observe the variables of interest.
• Example:
○ Can put , and such that
○ does not have more information than both regressors together.
○
○
○ Seeing the relation between what happens in with what happened in t.
○ If we want to evaluate this model,
○ To make the exercise of predict , we only see the data until then.
We do not know information about the present or the previous quarter.
• Example 1: Negative correlation between the Unemployment rate and Dow Jones.
○ If the unemployment rate is low: firms are optimistic about the future, firms will invest and
people will invest in the firms (everyone is happy and the Dow will go up.
Firms are hiring people, making a lot of investment.
○ If something affects employment, everyone is pessimistic, loosing their jobs, demand falls
and every price will fall.
○ News from the behaviour of the employment market strongly affect the Dow of that day.
• We cannot say where is the causality coming from.
○ Just by observing the data we can conclude many things: correlation and trend.
First Models
Static model • Static models:
• Relates variables and data at time contemporaneous. • Allow past values to affect the current y.
○ ○ Finite: finite number of lags.
• Modeling a contemporaneous relationship between and
• When a change in at time is believed to have an immediate effect on
○ when
• Measuring the tradeoff between and
• Specifying a linear
model for time
series data.
•
○
○
Example 1:
• Assumption: the correlation between and the (only the regressors at time ).
• With this last assumption, we garantee not unbiasedness but consistency.
• Under Assumptions through , the CLM assumptions for time series, the OLS estimators are
normally distributed, conditional on .
○ Under the null hypothesis, each statistic has a distribution, and each statistic has an F
distribution.
○ The usual construction of confidence intervals is also valid.
• It is very hard to
argue causation
effects between
and , if they have
trend.
• Nothing about
trending variables
necessarily violates
the classical linear
model assumptions
TS.1 through TS.6.
• Unobserved, trending factors that affect might also be correlated with the explanatory variables.
○ If we ignore this possibility, we may find a spurious relationship between and one or more
explanatory variables (finding a relationship between two or more trending variables simply because
each is growing over time).
•
○ Allowing for the trend explicitly recognizes that may be growing or shrinking
over time for reasons essentially unrelated to and .
○ If satisfies assumptions TS.1, TS.2, and TS.3: then omitting from the regression and regressing
on , will generally yield biased estimators of and .
Omitted an important variable , from the regression.
Especially true if and are themselves trending: they can be highly correlated with .
• Sometimes, adding a time trend can make a key explanatory variable more significant.
○ If the dependent and independent variables have different kinds of trends.
• Spurious regression: we may find a statistic significant relation between and just because they are
trending together.
○ Phenomenon of finding a relationship between two or more trending variables simply because each is
growing over time.
•
○ If we don't include the trend in the model: we will get unbiased estimator if:
If doesn't have a trend or this trend is not correlated with (there is no trend on ).
○ We are in trouble if has a trend or the regressors have a trend.
○ The might seem to be related to one or more of the simply because each contains a
trend.
○ If the trend is statiscally significant, and the results change in important ways when a time
trend is added to a regression: initial results without a trend should be treated with suspicion.
○ Include a trend in the regression if any independent variable is trending, even if is not.
• R-squareds in time series regressions are often very high: time series data often come in aggregate
form (aggregates are often easier to explain than outcomes on individuals, families, or firms, which is
often the nature of cross-sectional data).
○ Usual and adjusted R-squareds for time series regressions can be artificially high when the
dependent variable is trending.
• can substantially overestimate the variance in .
• Simplest method: compute the usual R-squared in a regression where the dependent varibale has
already been detrended.
○ When contains a strong linear time trend, the R-squared of the detrended model can be
much less than the usual one.
○ It nets out the effect of the time trend.
• In computing the R-squared form of an statistic for testing multiple hypotheses: use the usual R-
squareds without any detrending.
○ The R-squared form of the F statistic is just a computational device: the usual formula is always
appropriate.
○
○ For observations two periods apart there is not common shock: there is no correlation.
•
•
•
○
○
○
I don't know values, so we have to forecast it.
○
○ if
• and this can only happen if
•
○ Since
•
○ correlation coefficient
between any two adjacent
terms in the sequence.
• Although and are correlated
for any , this correlation gets
very small for large : because
• If we include the trend in the model, we can still use this set of assumptions.
○ We can skip stationarity including deterministic stationarity.
○ Stationary around the trend.
• not stationary, because it depends on
○ But independent of
○
○ Stationary around the trend: if we remove the trend, they are stationary.
• Weak dependence: data behaves well (in a similar way across time).
○ Stationary: dinamics are similar across time.
○ Observations become independent if we are really far apart.
○ Trend stationary (just need to include the trend in the model).
○ The law of large numbers and the central limit theorem can be applied to sample averages.
• TS.2': If we define this very carefully we can get rid of the exogenated X.
○ The following consistency result only requires to have zero unconditional mean and to
be uncorrelated with each :
• We are basing our model on a crazy set of assumptions.
• Contemporaneous exogeneity: we can have correlation between today's regressors and future
error terms.
• Strictly exogeneity:
○ If there is correlation with past error terms, we should include past error terms.
○ In general, we don't have the actual event in the future affecting present facts.
We just have expectations.
○ contemporaneous exogeneity.
Much weaker and can be more reasonable.
• With these assumptions we can prove that the OLS estimator is consistent.
○
○ Not necessarily unbiased.
• Example:
○
○
The error terms depend on t.
• Example 2:
○
○
• today is highly correlated with even in the distant future.
• Random walk with drift:
○
○ drift term.
○ To generate , the constant is added along with the random noise to the previous value
.
○ The expected value of follows a linear time trend by using repeated substitution:
○ Therefore, if : the expected value of is growing over time if and
shrinking over time if
○ , and so the best prediction of at time is plus the drift .
○ The variance of is the same as it was in the pure random walk case.
• Transform random walks with or without drift into a weakly dependent process.
• Weakly dependent processes are said to be integrated of order zero, or .
○ Nothing needs to be done to such series before using them in regression analysis: averages
of such sequences already satisfy the standard limit theorems.
• Unit root processes, such as a random walk (with or without drift), are said to be integrated of
order one, or .
○ The first difference of the process is weakly dependent (and often stationary).
○ A time series that is is often said to be a difference-stationary process.
○ Differencing time series before using them in regression analysis has another benefit: it
removes any linear time trend
• In the presence of
serial correlation:
estimator is
no longer BLUE.
•
○
○ Remaining dynamics will go to
• Violates
○ We are requiring to explain all the dynamics of , so it is very strict.
○ Everything else is on the error.
• in order to have stability.
• Check or :
○
○ did not show
rigorously.
This cannot be zero.
○ Past don't affect present ones.
• , and
• Provided the data are stationary and weakly dependent: R-squared and adjusted R-squared are
valid.
○
○ The variance of both the errors and the dependent variable do not change over time.
○ By the law of large numbers: and are still consistent.
•
○
○
○ i.i.d. satisfies the no serial correlation assumption.
○ If we rewrite the model with we kill the serial correlation.
• Assumptions that we can use when we use lag variables as independent variables.
○ We cannot rely on strict exogeneity.
• Null hypothesis:
there is no serial
correlation.
•
○
That is why we don't need the intercept.
It is just a slightly different average.
○ If were observed as i.i.d. we aplly immediately the asymptotic normality results.
○ If is clearly weakly dependent.
○ This does not work, because the errors are never observed.
We replace the error with the corresponding OLS residual.
Because of strict exogeneity assumption, the large-sample distribution of the statistic is
not affected by using the OLS residuals in place of the errors.
○
•
○ Quarterly data with seasonality.
• Testing for AR(1) Serial Correlation with Strictly Exogenous Regressors:
1. Run the OLS regression of on and obtain the OLS residuals, .
2. Run the regression of on obtaining the coefficient on and its t statistic.
This regression may or may not contain an intercept.
the statistic for will be slightly affected, but it is asymptotically valid either way.
3. Use to test against in the usual way.
Since is often expected a priori, the alternative can be
Typically, we conclude that serial correlation is a problem to be dealt with only if is
rejected at the 5% level.
It is best to report the p-value for the test.
• Under the null hypothesis of no serial correlation, the Durbin Watson statistic should be close to 2.
• Tests based on and test based on are conceptually the same.
○ Requires the full set of classical linear model assumptions, including normality of the error
terms.
○ This distribution depends on the values of the independent variables, on the sample size, the
number of regressors, and whether the regression contains an intercept.
○ Only allows to test Serial Correlation of order 1.
○ Assumes strict exogeneity.
•
○ no serial correlation (says nothing about serial correlation of different
orders).
○
○ high serial correlation.
○
• Compare with two sets of critical values in case of positive serial correlation:
○ upper.
If fail to reject
○ lower.
If reject
○ If test is inconclusive.
• Error term used is the one of the original model, not the one of the DW of the test we are taking.
•
○ makes test valid without strict exogeneity (only
contemporaneous), by including all the
• Problem with other tests: only control for .
○ It only said that is uncorrelated with , but it could be correlated with .
• Get by OLS and run on the others.
•
○
• There is no serial
•
correlation between
•
and
• As we don't know we cannot compute the regression for
• Homoskedasticity is
•
being violated.
○ The error terms are serial uncorrelated.
○ We cannot make
○
the
○ If we knew we could estimate and by regressing on . transformation
○ Provided we divide the estimated intercept by for , so we
• We must multiply by to get errors with the same variance. use the .
○ We use this error
BLUE estimators of and under assumptions TS.1-4 and the AR(1) model for
•
• Unless , the GLS estimator (OLS on the transformed data) will generally be different from the
original OLS estimator.
○ GLS is BLUE and and statistics from the transformed equation are valid.
• This transformed model satisfies all the assumptions and thus we can use OLS.
• For inference, we will need stationarity and weak dependence (which are assumed).
•
○ With time series data, we can only do inference if we correct both problems at the same time.
• This does not
violate the
homoskedasticity
assumption.
• The variance of
my error term
today is
correlated with
the variance of
the error term of
the past.
•
• variance of u_t conditional on X.
•
○ Dummy: 1 if and 0 otherwise.
•
•
• This model was estimated for a different time period from before.
•
○ It does not happen.
• If the squared residuals are the dependent variable, it means that we are looking for the variance of our
model and testing for heteroskedasticity.
•
•
○ White test.
○
○
○ The model is heteroskedastic:
It is really high, so we plain reject the null of homoskedasticity.
○ We reject the null hypothesis
○ We are just looking for the overall significance of the model.
• Testing for serial correlation without strict exogeneity.
○
○ How can we know if the error term is correlated with the error terms of past periods.
○
○
○
○ The F statistic of this null hypothesis is not equivalent to the F of all model.
• Test for ARCH problem.
○ Take the residuals from the original model and square the residuals.
• Much of the time serial correlation is viewed as the most important problem: it usually has a larger
impact on standard errors and the efficiency of estimators than does heteroskedasticity.
○ Obtaining tests for serial correlation that are robust to arbitrary heteroskedasticity is fairly
straightforward.
○ If we detect serial correlation using such a test, we can employ the Cochrane-Orcutt (or Prais-
Winsten) transformation and, in the transformed equation, use heteroskedasticity robust
standard errors and test statistics.
○ Or, we can test for heteroskedasticity using the Breusch-Pagan or White tests.
• We need to correct for both at the same time.
○ Through a combined weighted least squares AR(1) procedure.
• Correcting for both problems: serial correlation and heteroskedasticity.
• The error is heteroskedastic in addition of containing serial correlation.
○
• The transformed equation has ARCH errors.
○
○ If we know the particular kind of heteroskedasticity we can estimate the using standard
CO or PW methods.
• :
○ Almost any factor influences simultaneously inflation and unemployment rate.
• Test for AR(1) serial correlation: we take the residuals from the previous regression and run a regression.
○
○ Regress the previous residuals on one period lag residuals.
• Robust to serial correlation: include on this regression the original regressors.
• Most likely, the no serial Correlation assumption is not verified so we are going to try to correct
for it.