Professional Documents
Culture Documents
ECNOMETRICS
QMT533
ASSESSMENT 2
A multicollinearity test aids in determining whether a model contains multicollinearity. When two
or more independent variables are interconnected or interrelated, this is referred to as
multicollinearity.
1.2.1 Problem
1. The data collection method employed. For example, sampling over a limited range of values
taken by the regressors in the population.
3. Model specification. For example, adding polynomial term to a regression model, especially
when the range of the X is small.
4. Improper use of dummy variables Failure to exclude one category. You have a dummy
variable for each category or group and an intercept.
5. An overdetermined model. Happens when number of independent variables > sample size.
1. Multicollinearity does not violate OLS assumptions. OLS estimates are still unbiased and
BLUE (Best Linear Unbiased Estimators) – in general, it does not inhibit our ability to obtain a
good fit, nor does it tend to effect inferences about the mean response or prediction of new
observations.
2. Although BLUE, the OLS estimators have large variances and covariance, making precise
estimation difficult.
3. Confidence intervals for coefficients tend to be much wider, leading to the acceptance of the
“zero null hypothesis”.
1
4. The 𝑡 statistics of one or more coefficients tend to be very small thus result too statistically
insignificant.
5. Although the t ratio of one or more coefficients is statistically insignificant, R2 can be very
high.
6. The OLS estimators and their standard errors can be sensitive to small changes in the data.
7. The common interpretation of regression coefficients as measuring the change in the mean of
the response variable when the given independent variable is increased by one unit while all
other predictor variables are held constant is not fully applicable.
● Depending on whatever other predictors are included in the model, each component's
approximate regression coefficient changes.
● As more models are added, the accuracy of calculated coefficients of regression
declines.
● Any predictor variable's marginal impact on square error reduction is influenced by the
other predictors already included in this model.
● The testing of the null hypothesis that k=0 based on the models of the predictors.
1. 4 DETECTING MULTICOLLINEARITY
2. None of the t-ratios for the individual coefficients is statistically significant, yet the overall F
statistic is.
3. High pairwise correlation among regressors. A rule of thumb is that if the pairwise correlation
between two regressors is high, say, in excess of 0.8, then multicollinearity is a serious problem.
4. Scatterplot. It is a good practice to use scatter plots to see how the various variables in a
regression model are related.
5. In particular, as variables are added, look for changes in the signs of effects (e.g. switches
from positive to negative) that seem theoretically questionable.
2
6. Auxiliary regression. Klein’ rule of thumb suggests that multicollinearity may be a troublesome
problem only if the R2 obtained from auxiliary regression is greater than the overall R2.
7. Tolerance.
The closer the TOL is to zero, the greater the degree of collinearity of that variable with the other
independent variables.
8. Variance inflation factor (VIF). If VIF > 10, the variable is said to be highly correlated.
3
1.5.1 Correlation Matrix
Based on the Figure 1.0, show that there are numerous pairs of independent variables with
correlations greater than 0.8 or -0.8. Wish is, HP-MPGF, HPWT, HP-SP, MPF-MPGF, MPG-WT,
and MPGF-WT. Therefore, multicollinearity exists for all the aforementioned pairs.
As seen in Figure 1.1, the Tolerance (TOL) value for each independent variable is less
than 2. As a result, multicollinearity exists for each of the variables. However, only the variable
4
VOL has a value less than 10 for the Variance Inflation Factor (VIF). If the infinity values are
positive infinity, it can be assumed that all of the variables, with the exception of VOL, are
multicollinear.
From figure 1.3, show that the relatively high Condition Number (2.752458e+08) and the
large portions of variance between the variables HP, MPGF, WT, and SP multicollinearity is very
likely.
The positive value is required when deciding on the eigenvalues' maximum and minimum
values. As shown in Figure @@, the maximum value is 5.668374 and the minimum value is
7.481992e-17.
5
Since k and CI is more than 1000 and 30 it shows that there is severe multicollinearity in linear
models.
1. Increase the sample size. This will usually decrease standard errors and make it attenuate
the collinearity problem.
2. Dropping variables. When faced with severe multicollinearity, one of the simplest things is to
drop one of the collinear variables. But, if the variable really belongs in the model, this can lead
to specification error, which can be even worse than multicollinearity.
4. Use multivariate statistical techniques such as factor analysis and principal components.
5. Use Ridge Regression. Ridge regression is one of several methods that have been proposed
to remedy multicollinearity problems by modifying the method of least squares to allow biased
estimators of the regression coefficients.
6. Do nothing. Simply realize that multicollinearity is present and be aware of its consequences.
1.7 CONCLUSION
Therefore, based on the checking of the multicollinearity above, it can say that the data
might have a problem as all the indicators of checking multicollinearity have problems. It also
shows that all test has shown multicollinearity exist. Therefore, it can be concluded that
multicollinearity exists and further investigation and action need to be done to confirm this
problem, whether dropping the variables or observations and more.
6
2.0 AUTOCORRELATION
The degree of similarity between a given time series and a lagged version of itself over
successive time intervals is mathematically represented by autocorrelation. It's conceptually
similar to the correlation between two different time series, but autocorrelation uses the same
time series twice, once in its original form and once lagged one or more time periods.
For example, the data indicates that it is more likely to rain tomorrow if it is raining today
than if it is clear today. A stock may have a large positive autocorrelation of returns when it
comes to investing, meaning that if it is "up" today, it is more likely to be up tomorrow as well.
Naturally, autocorrelation can be a useful tool for traders to utilize; particularly for technical
analysts.
Autocorrelation method is the Durbin-Watson test. The Durbin-Watson statistic is used in
regression analysis to identify autocorrelation. A test number range between 0 and 4 is always
generated by the Durbin-Watson. Values closer to 0 and 4 imply higher levels of positive
correlation and negative autocorrelation, respectively, whereas values closer to the midway
point reflect lower levels of autocorrelation.
7
2.3 ASSUMPTION RELATED TO THE AUTOCORRELATION
- A linear regression model assumes independent error terms. This indicates that
one observation's error term is unaffected by another observation's error term. If
not, it is referred to as autocorrelation. It is generally observed in time series
data. Time series data consists of observations for which data is collected at
discrete points in time. Usually, observations at adjacent time intervals will have
correlated errors.
There are many statistical tests which help to identify the presence of autocorrelation.
We can also identify autocorrelation visually through ACF plots. We will discuss them one by
one.
1. Durbin – Watson Test: A very well known test that is used to identify presence of
autocorrelation is the Durbin Watson(DW) test. The DW test statistic is expressed
as below:
∑(et -et-1)2/∑et2
where et is the error term at period t. Now this formula returns a value which lies
between 0 and 4. A value of 2 is considered as no autocorrelation. A value greater
than 2 and closer to 4 indicates negative autocorrelation and a value lesser than 2 and
closer to 0 indicates positive autocorrelation. Now, the null hypothesis of this test is:
0
𝐻 : No first order autocorrelation exists among the residuals.
1
𝐻 : The residuals are autocorrelated.
8
2. Ljung-Box Q Test: Another very popular test is the Ljung-Box Q test. The null and
alternative hypothesis for this test is as follows:
0
𝐻 : The autocorrelation up to lag k is all 0
1
𝐻 : The autocorrelation up to lag k differs from 0.
For this test if the resulting p value is less than the critical value for the chosen level of
significance, we reject the null hypothesis and conclude that there is autocorrelation in
residuals.
3. ACF plots: A plot of the autocorrelation of a time series by lag is called the
Auto-Correlation Function, or ACF plot. We plot the values of correlation among
lags along with the confidence band in an ACF plot. In simple terms, it describes
how well the present value of the series is related with its past values.
9
2.5 FINDINGS ON AUTOCORRELATION
Figure 2.0 above, the residual plot, shows the residuals against fitted values. The
residual plot shows that the residual plot is between -5 to 5 which is there is a constant
variance. However, there are two outliers which are two fitted values greater than 10. This also
shows that there is autocorrelation in this residual plot.
10
2.5.2 Durbin-Watson test
From the figure 2.1 above, the Durbin-Watson is 0.99184. This shows that the durbin
watson is low because it is less than the acceptable range which is 1.50. Therefore, there is
positive autocorrelation.
If the diagnostic tests suggest that there have an autocorrelation problem then we have some
options which are follows as:
2. If it is pure autocorrelation then one can transform the original model so that in the new model
we do not have the problem of pure autocorrelation
3. For large sample cases, we can use the New-West method to obtain SE of OLS estimators
that are correlated for autocorrelation. It is just an extension of White’s heteroscedasticity
consistent standard error methods.
11
2.7 CONCLUSION
Autocorrelation is important because it can help us uncover patterns in our data, successfully
select the best prediction model, and correctly evaluate the effectiveness of our model. From the
findings, it is positive autocorrelation since the residual plots are constant. It's also proven by
Durbin-Watson since the values = 0.99184 less than 1.50. These two tests show that there is
positive autocorrelation.
12
3.0 HETEROSCEDASTICITY
When heteroskedasticity occurs, the error term variance is not constant or not equal to
variance. For instance, OLS are meet assumption if the datasets are not heteroscedasticity or
constant variance (homoscedasticity).
1. Following the error-learning model -As people learn, their errors of behaviour become
2
smaller over time. In this case, σ𝑖 is expected to decrease.
2. As income increases, people have more discretionary income and hence more scope for
2
choice about the disposition of their income. Hence, σ𝑖 is likely to increase with income.
13
3.2.2 Consequences of Heteroscedasticity
14
Figure 3.1 shows that heteroscedasticity exists. The reason is not all the points are
randomly distributed around zero, and it shows a decreasing and increasing pattern.
Glejser Test
Hypothesis:
0
𝐻 : There is no heteroscedasticity in the error variance
1
𝐻 : There is heteroscedasticity present in the error variance
0
Figure 3.2 shows that p-value are 0.000000114 and lower than α = 0. 05. So, reject 𝐻 .
This indicates that heteroscedasticity exists in these datasets.
15
Goldfeld-Quandt Test
Hypothesis:
0
𝐻 : There is no heteroscedasticity in the error variance
1
𝐻 : There is heteroscedasticity present in the error variance
0
Figure 3.3 shows that p-value are 1 and higher than α = 0. 05. So, failed to reject 𝐻 .
This indicates that no heteroscedasticity exists in these datasets.
Breusch-pagan test
16
Hypothesis:
0
𝐻 : There is no heteroscedasticity in the error variance
1
𝐻 : There is heteroscedasticity present in the error variance
0
Figure 3.4 shows that p-value are 0.00003023 and lower than α = 0. 05. So, reject 𝐻 .
This indicates that heteroscedasticity exists in these datasets.
Hypothesis:
0
𝐻 : There is no heteroscedasticity in the error variance
1
𝐻 : There is heteroscedasticity present in the error variance
0
Figure 3.5 shows that p-value are 0.0000156 and lower than α = 0. 05. So, reject 𝐻 .
This indicates that there are heteroscedasticity exist in these datasets.
17
3.5 REMEDIAL MEASURES
Weighted least squares is simply ordinary least squares, where each observation is
adjusted for the expected size of its error term.
WHITE’S HETEROSCEDASTICITY
18
3.6 CONCLUSION
19