14 views

Original Title: Chapt19 multicollinearity

Uploaded by Asif Sultan

Chapt19 multicollinearity

Attribution Non-Commercial (BY-NC)

- Goal Programming
- Multicollinearity
- Multicollinearity
- Predicting Quality of Work Life
- Asientos de Auto Para Niños
- Paper8-Solving the Resource Constrained Project Scheduling Problem to Minimize the Financial Failure Risk
- The influence of organizational structure on software quality: an empircal case study
- Capital Structure
- LeSage_mbook
- Factors and Values of Willingness to Pay of Malaysian contractors
- forecasting
- tarea regresion
- multicollinearit00farr.pdf
- resume-samples.pdf
- FINANCE MANAGEMENT FIN420 Mathematical Tabel
- Hasil Regresi Enter & Stepwise
- Regression workbook.xlsx
- HRSTA82_2019_Ass_2
- 410Ex07_15_1ans.pdf
- Quiz 23feb Q

You are on page 1of 10

Multicollinearity is a problem which occurs if one of the columns of the X matrix is exactly or nearly a linear combination of the other columns. Exact multicollinearity is rare, but could happen, for example, if we include a dummy (0-1) variable for "Male", another one for "Female", and a column of ones.

More typically, multicollinearity will be approximate, arising from the fact that our explanatory variables are correlated with each other (i.e. they essentially measure the same thing). For example, if we try to describe consumption in households (y ) in

-2terms of income (x 1) and net worth (x 2), then it will be hard to identify the separate effects of x 1 and x 2 on y . The estimated regression coefcients b 1 and b 2 will be hard to interpret. The variance of b 1 and b 2 will be very large, so the corresponding t -statistics will tend to be insignicant, even though the F for the model as a whole is signicant and R 2 is high. Further, the coefcient of x 1, and the corresponding t -statistic, may change dramatically if the seemingly insignicant variable x 2 is deleted from the model.

-3For a numerical example, consider a data set on the monthly sales of backyard satellite antennas (y ) in nine randomly selected districts, together with the number of households (x 1) in the district, and the number of owner-occupied households (x 2) in the district. (Both x 1 and x 2 are measured in units of 10,000 households). The multiple regression of y on x 1 and x 2 indicates that neither variable is linearly related to y . However, R 2 = .9279, and the overall F test is highly signicant, indicating that at least one of x 1 and x 2 is linearly related to y .

-4-

1 2 3 4 5 6 7 8 9

50 73 32 121 156 98 62 51 80

14 28 10 30 48 30 20 16 25

-5The reason why the results of the two t -tests are so different from the result of the F -test is that collinearity has destroyed the t -tests by strongly reducing their power. The Pearson correlation coefcient between x 1 and x 2 is r = .985, so the two variables are highly collinear. A simple regression of y on x 1 gives a t -statistic for b 1 of 9.35 (highly signicant), while a simple regression of y on x 2 gives a t statistic for b 2 of 8.62 (also highly signicant). Note also that the R 2 values for these two simple regressions are .9259 and .9139, respectively, both of which are almost as high as the multiple R 2 for the full model, .9279.

-6To get some mathematical insight into the general problem, we use the spectral decomposition (Jobson, p. 576) to write

1 = p (X X )

i =0

i1pi p i , of X X and

where i

P = [p 0 , . . . , pp ] is an orthogonal matrix of eigenvectors of X X . If there is exact multicollinearity, then for some (p +1)1 vector 0, we must have X = 0, so that is an eigenvector of X X , and the corresponding eigenvalue is zero. Therefore, one of the i must be zero. In this case, X X is not invertible, since (X X )1 would have to satisfy

Our computer will (hopefully) be unable to calculate the least squares estimator b , since b is no longer uniquely dened, and (X X )1 does not exist. Due to roundoff and other numerical errors, however, some packages will be able to carry out their calculations without any obvious catastrophe (e.g. dividing by zero), and therefore they will produce output, which will be completely inappropriate and useless.

-8If there is approximate multicollinearity, then one or more of the i will be very close to zero, so that the entries of (X X )1 = large. i1pi p i will be very

p

i =0

approximate multicollinearity tends to inate the estimated variance of b j for one or more (perhaps all) j . As a result, the t -statistics will tend to be insignicant. The overall F is not adversely affected by multicollinearity, so it may be signicant even if none of the individual b j is. It can also be shown that the prediction variance (incurred in "predicting" either the response surface or a future value of y at a particular value of the

-9explanatory variables) will not be disastrously affected by multicollinearity, as long as the entries of obey the same approximate multicollinearities as the columns of X .

Keep in mind, though, that multicollinearity often arises because we are trying to use too many explanatory variables. This tends to inate the prediction variance. (See the handout on model selection).

So, although the effect of multicollinearity on the predictions may not be disastrous, we will still typically be able to improve the quality of the predictions by using fewer variables.

- 10 In my opinion, the best remedy to multicollinearity is to use fewer variables. This can be achieved by a combination of thinking about the problem, transformation and combination of variables, and model selection. Two methods of diagnosing multicollinearity in a given data set are (1) Look at the Pearson correlation coefcient of all pairs of explanatory variables (2) Look at the ratio Max /Min of the largest to the smallest eigenvalues of (X X )1.

For those who insist on working with a multicollinear data set, there are biased estimation techniques (e.g. ridge regression) which may have a lower mean squared error than least squares.

- Goal ProgrammingUploaded bySusi Umifarah
- MulticollinearityUploaded byDipsubhra Chakraborty
- MulticollinearityUploaded byvamsi54
- Predicting Quality of Work LifeUploaded bygetachewgetu
- Asientos de Auto Para NiñosUploaded byRafael
- Paper8-Solving the Resource Constrained Project Scheduling Problem to Minimize the Financial Failure RiskUploaded byIjarai ManagingEditor
- The influence of organizational structure on software quality: an empircal case studyUploaded byR. Tyler Croy
- Capital StructureUploaded bysimmi33
- LeSage_mbookUploaded bytomili85
- Factors and Values of Willingness to Pay of Malaysian contractorsUploaded byZakaria Shamlan
- forecastingUploaded bynurul_azizah01
- tarea regresionUploaded byJosué Acosta
- multicollinearit00farr.pdfUploaded byशिव
- resume-samples.pdfUploaded byNarasimma Pallava Dudesim
- FINANCE MANAGEMENT FIN420 Mathematical TabelUploaded byYanty Ibrahim
- Hasil Regresi Enter & StepwiseUploaded byMega Cattleya PA Islami
- Regression workbook.xlsxUploaded bypavikutty
- HRSTA82_2019_Ass_2Uploaded bySuwisa Muchengetwa
- 410Ex07_15_1ans.pdfUploaded byd
- Quiz 23feb QUploaded byLakshan Fonseka
- Assignment 2 Group ExtraUploaded byabhishek sukhadia
- Downloadable_C25_SWOT_matrix_4.docxUploaded byJJ Paps
- Fixed EffectsUploaded bypedda60
- Determinants of FDI in the Bulkan RegionUploaded byOchieng Onuko
- hw2s-17Uploaded byJean-Christophe Boulon
- Pitch TemplateUploaded byAryan
- IMG_20140224_0015.pdfUploaded byPenunggu Gua
- Reli Abi LitasUploaded bytin rahma
- Planning Techniques and ToolsUploaded byKaren Manalo
- resume-samples.pdfUploaded byNarasimma Pallava Dudesim

- Scientific MethodUploaded byGladdin Chloe Tumanda
- 24048Uploaded byAsif Sultan
- Sms Packages DetailUploaded byAsif Sultan
- PteridophytesUploaded byAsif Sultan
- Statistical Versus Deter Minis Tic RelationshipsUploaded byAsif Sultan
- Multicollinearit1Uploaded byAsif Sultan
- Chapt19 multicollinearityUploaded byAsif Sultan

- Kalman DecompositionUploaded byOla Skeik
- Selection of Paramaters of Ecommerce Websites Using AHPUploaded byarcherselevators
- OJEE 2014 SyllabusUploaded byaglasem
- Boons -2018Uploaded bybestniaz
- Spectral TheoryUploaded byDipali Swain
- BS Physics - Scheme of StudiesUploaded byTanawush
- NonlinearUploaded byshimic32000
- social-network-analysis-con-python.pdfUploaded byPablo Loste Ramos
- Attendance Monitoring System using Face RecognitionUploaded byIRJET Journal
- A Multigrid Tutorial (With Corrections)Uploaded byAlejandro Galindo
- BiomedicalUploaded byMATHANKUMAR.S
- MELJUN Computer Algebra SystemUploaded byMELJUN CORTES, MBA,MPA
- Assignment 2 NewUploaded byNasrin Dorreh
- Eigen Values, Eigen Vectors_afzaal_1.pdfUploaded byHassan Raza
- SR_thesisfaeUploaded bybehaile2000
- The Palgrave Handbook of Quantum Models in Social Science Applications and Grand ChallengesUploaded bycjmaura
- fvsUploaded bySiva Raj
- Facial AnimationUploaded bySruthi Modala
- MethodologyUploaded byManoel Honorio Filho
- Introduction to programming with openCVUploaded byadarshajoisa
- Handbook of Thin PlateUploaded byBerPessutto
- State Space Analysis of Control system.pdfUploaded byJustin Watkins
- IFEM.ch20 1D Element MathematicaUploaded byXu Long
- Code It Course DetailsUploaded byumer456
- Bca New Syllabus - Tiruvalluvar UniversityUploaded byBabu
- Flex Pde 650Uploaded byAna Carolina Loyola
- Hybrid Video Watermarking Technique by Using Dwt & PcaUploaded byIAEME Publication
- An Evaluation of Software Requirement Prioritization TechniquesUploaded byijcsis
- tmpE3CUploaded byFrontiers
- Delay. Differential EquationsUploaded byRicardo