Professional Documents
Culture Documents
happens if X2 and X3 are independent? What happens if X2 and X3 are perfectly linearly
related?
If X2 and X3 are independent variables in the model Yi = βˆ1 + βˆ2X2 + βˆ3X3 + ˆui, it means
that there is no correlation or relationship between X2 and X3. In this case, multicollinearity
is not an issue because the independent variables are not correlated with each other.
However, if X2 and X3 are perfectly linearly related, it means that there is a perfect
correlation between the two variables. In this scenario, multicollinearity becomes a
problem.
When there is perfect collinearity between X2 and X3, it becomes impossible to estimate the
individual coefficients βˆ2 and βˆ3 separately. This is because their effects on the dependent
variable (Yi) cannot be distinguished from each other. When X2 and X3 are perfectly linearly
related, the model becomes Yi = βˆ1 + βˆ2X2 + βˆ3X3 + ˆui is not identifiable or estimable.
Multicollinearity can lead to unstable and unreliable coefficient estimates, as well as inflated
standard errors. In such cases, even small changes in the data can result in large changes in
the estimated coefficients. It can also make it difficult to interpret the individual effects of
the independent variables on the dependent variable.
What happens with |X`X| when X2 and X3 are perfectly linearly related
When X2 and X3 are perfectly linearly related, it means that there is a perfect correlation
between the two variables. In this case, the matrix X'X becomes singular, which means its
determinant (|X'X|) becomes zero.
The matrix X'X represents the dot product of the design matrix X with its transpose X', and it
is used to calculate the coefficient estimates (βˆ) and their covariance matrix (cov(βˆ)) in the
ordinary least squares (OLS) estimation. Inverting the matrix X'X is necessary to obtain βˆ
and cov(βˆ).
However, when X2 and X3 are perfectly linearly related, the columns of X become linearly
dependent, resulting in a singular matrix X'X. A singular matrix does not have an inverse, and
its determinant is zero. Therefore, |X'X| becomes zero, and it is not possible to calculate βˆ
or cov(βˆ) using the formula mentioned.
In such cases of perfect multicollinearity, where two or more independent variables are
perfectly correlated, it is not possible to estimate the individual coefficients and their
covariance matrix using the ordinary least squares method. Alternative methods, such as
dropping one of the perfectly correlated variables or using techniques like ridge regression
or principal component analysis, may be used to address the issue of multicollinearity.
1. Data collection method: Multicollinearity can arise due to the way data is collected. If
there is a systematic relationship or dependency among the independent variables during
data collection, it can lead to multicollinearity. For example, if data is collected from a survey
that asks similar or related questions, it can result in high correlation between the variables.
2. Constraints on the model: Multicollinearity can occur when there are constraints imposed
on the model. These constraints can be theoretical or practical in nature. For instance, if a
researcher is required to include specific variables in the model due to theoretical
considerations or external requirements, it can lead to multicollinearity if those variables are
highly correlated with existing variables in the model.
3. Model specification: The way the model is specified can also contribute to
multicollinearity. This includes the choice of independent variables and the functional form
of the model. Including highly correlated variables in the model or using variables that are
derived from the same underlying concept can lead to multicollinearity. Additionally, using
polynomial terms or interaction terms that are highly correlated with the original variables
can also introduce multicollinearity.
Consequences of MC
1. Although the OLS estimators are BLUE their variances are inflated
Intuition ??? Impact of inflated variances?
Intuition:
Multicollinearity occurs when independent variables in a regression model are highly
correlated with each other. As a result, the information contained in the correlated variables
overlaps, making it difficult for the model to distinguish their individual effects on the
dependent variable. This leads to instability in the estimation process.
Overall, the inflated variances resulting from multicollinearity undermine the reliability,
stability, and interpretability of the regression model. It is essential to address
multicollinearity through techniques like variable transformation, feature selection, or
regularization methods to obtain more accurate and meaningful results.
Detection of MC
1. High R2 but few significant t ratios.
2. High pairwise correlation between explanatory variables 3. Examination of partial
correlation
4. Auxiliary regressions
5. Eigenvalues and condition index
6. Variance inflation factors (and tolerance)
7. Plots
1. High R2 but few significant t-ratios: Multicollinearity can manifest as a high coefficient of
determination (R2) in the overall model, indicating a good fit. However, when examining the
individual t-ratios for each coefficient, multicollinearity can be suspected if only a few
variables have statistically significant coefficients while others are insignificant or have
unexpected signs.
3. Examination of partial correlation: Partial correlation analysis helps assess the relationship
between two variables while controlling for the effects of other variables. High partial
correlations between two independent variables, after accounting for the effects of other
variables, can indicate multicollinearity.
5. Eigenvalues and condition index: Eigenvalues and condition index provide a numerical
measure of multicollinearity. Large eigenvalues or condition indices (above a certain
threshold, e.g., 30) indicate the presence of multicollinearity. These measures are derived
from the correlation matrix or the variance-covariance matrix of the independent variables.
6. Variance inflation factors (VIF) and tolerance: VIF quantifies the extent of multicollinearity
by measuring how much the variance of the estimated regression coefficient is inflated due
to the correlation with other independent variables. High VIF values (typically above 5 or 10)
suggest multicollinearity, while low tolerance values (1/VIF) indicate a high degree of
collinearity.
Remedial measures
1. Use a-priori information
2. Extending/expanding data
3. Removing variables
4. Transformation of variables
5. Ridge regression/ factor analysis regression etc
1. Use a-priori information: Prior knowledge or theory about the relationships between
variables can help guide the selection and inclusion of variables in the model. By considering
the theoretical relevance and importance of variables, you can prioritize including variables
that have the most meaningful and independent impact on the dependent variable.
2. Extending/expanding data: Increasing the sample size by collecting more data can
sometimes alleviate multicollinearity. A larger dataset can provide more variability and
reduce the chances of high correlation among variables. However, this may not always be
feasible or practical.
3. Removing variables: If variables are highly correlated, one possible approach is to remove
one or more of the correlated variables from the model. This can be done based on their
theoretical significance, statistical significance, or other criteria. By eliminating redundant
variables, you can reduce multicollinearity.
These remedial measures aim to mitigate the negative effects of multicollinearity and
improve the stability, reliability, and interpretability of the regression model. The choice of
remedial measure depends on the specific characteristics of the dataset, the research
objectives, and the underlying assumptions of the regression model.