A New Approach of Principal Component Regression E

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/346348408
A New Approach of Principal Component Regression Estimator with

Applications to Collinear Data
Article · July 2020

DOI: 10.37624/IJERT/13.7.2020.1616-1622
CITATIONS READS
7 578
4 authors, including:
Kayode Ayinde Adewale Lukman

Federal University of Technology, Akure University of Medical Sciences, Ondo
76 PUBLICATIONS 758 CITATIONS 105 PUBLICATIONS 1,182 CITATIONS
SEE PROFILE SEE PROFILE
Alabi Olatayo Olusegun

Federal University of Technology, Akure
14 PUBLICATIONS 107 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
multicollinearity View project
DIAGNOSTIC CHECKS IN LINEAR REGRESSION MODEL View project
All content following this page was uploaded by Adewale Lukman on 27 November 2020.
The user has requested enhancement of the downloaded file.

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
A New Approach of Principal Component Regression Estimator with

Applications to Collinear Data
1 2
Kayode Ayinde, Adewale F. Lukman, 1Olusegun. O. Alabi and 1Hamidu A. Bello
1
Department of Statistics, Federal University of Technology, Akure, Nigeria.
2
Department of Physical Sciences, Landmark University, Omu-Aran, Nigeria.
Corresponding Author Email: adewale.folaranmi@lmu.edu.ng
ORCID: 0000-0002-2993-6203 (Hamidu A. Bello), ORCID: 0000-0003-2881-1297 (Lukman)
Abstract components (PCs) (Naes and Marten, 1988; Chatterjee and

Hadi, 2006). Other methods range from this method to more
In this paper, a new approach to estimating the model
specialized techniques for regularization (Naes, and Marten,
parameters in Principal Components (PCs) is developed.
1988). The PCR method has been proposed as alternatives to
Principal Component Analysis (PCA) is a method of variable
the OLS estimator when the independent assumption has not
reduction that has found relevance in combating
been satisfied in the analysis (Massy, 1965; Lee et al., 2015).
multicollinearity problem in Multiple Linear Regression
Models. In this paper, a new approach to estimating the model
parameters in Principal Components (PCs) is developed The
2. LITERATURE REVIEW
method requires using the PCs as regressors to predict the
dependent variable and further utilizing the predicted variable Consider the standard linear regression model defined as:
as a dependent variable on the original explanatory variables
in an Ordinary Least Square (OLS) regression. The resulting yi = β 0 + β1 xi1 + β 2 xi 2 + ... + β p xip + u i i = 1,2,..., n (2.1)
parameter estimates are the same as the estimates from the
and defined in matrix form as
usual Principal Components Regression Estimator. The
sampling properties of the new estimator are proved to be y = Xβ + u (2.2)
same as the existing ones. The new approach is simpler and
easier to use. Real-life data sets are used to illustrate both the where y is ( n × 1) vector for the observations of dependent
conventional and the developed method.
variable; X is ( n × p ) matrix of the observations of
Keywords: Multiple Linear Regression Models, explanatory variables, β is ( p × 1) vector of regression
Multicollinearity, Principal Components, Ordinary Least
Square coefficients u is ( n × 1) vector of the error terms assumed to
multivariate normal distribution with mean zero and variance,
σ 2 I n . Thus, the parameters of the model can be estimated
1. INTRODUCTION using the use of Ordinary Least Square (OLS) estimator
Violation of independence of explanatory variables in obtained as:
multiple linear regression models results in multicollinearity
βˆOLS = (X 1 X ) X 1 y
−1
problem (Gujarati, 2005). Literature has reported the use of (2.3)
the Ordinary Least Square (OLS) Estimator in the presence of
multicollinearity with consequences. It does produce unbiased The unbiased variance–covariance matrices of the OLS
but inefficient estimates of the model parameters, estimates of estimator can also be given as:
( )
regression coefficients having wrong signs and high standard −1
deviation, and the parameter estimates and their standard Var ( βˆOLS ) = σˆ 2 X 1 X (2.4)
errors become extremely sensitive to slight changes in the data
n
points (Chatterjee and Hadi, 2006; Lukman et al., 2018;
Ayinde et al., 2018). This consequence tends to inflate the ∑ (y i − yˆ i )
2
where σ̂ 2 = i =1 , (Maddalla, 1988; Kibra, 2003).

estimated variance of the predicted values (Fomby et al., n− p
1984; Maddalla, 1988; Ayinde et al., 2015).
Several other techniques are in use to combat this problem, Lukman et al. (2020) reported the OLS estimator as Best
and Principal Components Analysis (PCA) is one. It is a Linear Unbiased Estimator (BLUE) provided there is no
multivariate technique developed by Hotelling (1933) for multicollinearity among the explanatory variables is not
explaining a set of correlated variables by a reduced number perfect. However, the estimator is still not efficient when there
of uncorrelated ones with maximum variances called Principal is multicollinearity (Chatterjee and Price, 1977; Gujarati,
1616
2005; Lukman et al., 2019; Lukman et al., 2020). Among the _ n

⎞⎛⎛ _
⎞
estimators commonly used to address this problem are the ∑ ⎜⎝ x
− x j ⎟⎜ xik − x k ⎟
⎠⎝
ij
⎠
estimator based on Principal Component Regression where r = i =1 ,
jk
⎛ ⎛ 2
⎞⎛ 2
⎞
⎜ ∑ ⎜ xij − x j ⎞⎟ ⎟ ⎜ ∑ ⎛⎜ xik − x k ⎞⎟ ⎟
developed by Massy (1965), the Ridge Regression (RRE) n _ n _
Estimator (Hoerl and Kennard, 1970), estimator based on ⎜ i =1 ⎝ ⎠ ⎟⎠ ⎜⎝ i =1 ⎝ ⎠ ⎟⎠

Partial Least Square (PLS) Regression (Wold, 1985), and Liu ⎝
estimator (Liu, 1993) and recently, the KL estimator by Kibria j=1, 2,3, …, p; k= 1, 2,3, …, p
and Lukman (2020).
Estimator based on Principal Component Regression requires iii. Calculate the Eigenvalues and hence the Eigenvectors of
reducing the explanatory variables into a smaller number of the correlation matrices, say W. These become the
summary variables often called Principal Components (or coefficients (weights) that are attached to the standardized
factors) that explain most of the variations in the data. explanatory variables. Thus,
According to Chatterjee and Price (1977), they summarized
the algorithm of the procedures using the correlation matrices ⎡ w11 w12 w13 w14 . . . w1 p ⎤
as follows: ⎢w w22 w23 w24 . . . w2 p ⎥⎥
⎢ 21
i. Standardize both the response and the explanatory ⎢ w31 w32 w33 w34 . . . w3 p ⎥
variables: The standardized variables respectively become: ⎢ ⎥
w41 w42 w43 w44 . . . w4 p ⎥
− W =⎢ (2.8)
yi − y ⎢ . . . . . . . . ⎥
y i1 = (2.5) ⎢ ⎥
sy ⎢ . . . . . . . . ⎥
2
⎢ . . . . . . . . ⎥
n
⎛ _
⎞ n
⎢ ⎥
_ ∑ y i ∑ ⎜
i =1 ⎝
y i − y ⎟
⎠
⎢⎣ w1 p w2 p w3 p w4 p . . . w pp ⎥⎦
where y = i =1
and s y =
n n =1
−
iv. Obtain the Principal Components (PCs) using the
xij − x j linear equation given as:
x = 1
ij (2.6)
sx j p
PC ij = ∑ wkj xik1 , i = 1, 2,…,n; j=1,2,…,p (2.9)
2
n n
⎛ _
⎞ k =1
_ ∑x ij ∑ ⎜⎝ x ij−xj ⎟
⎠ These PCs are p orthogonal variables.
where x j = i =1
and sx j = i =1
n n =1
for j = 1,2, 3,…, p v. Determine the number of PCs to use: There is a need to
determine the number of PCs to use. Since there are p of them
ii. Obtain the correlation matrices of the explanatory and using all of them will produce the same results with OLS.
variables, say R. Thus, Hence a particular number of them has to be chosen. Kaiser
(1960) suggested the use of PCs whose Eigenvalue is more
⎡1 r12 r13 r14 . r1 p ⎤
. .
than one. Constantin (2014) suggested using any component
⎢r 1 r23 r24 . r2 p ⎥⎥
. .
⎢ 12 that accounts for at least 5% or 10% of the total variance.
⎢ r13 r23 1 r34 . r3 p ⎥
. . Some other authors emphasized that components should be
⎢ ⎥ selected enough so that the cumulative per cent of variance
r14 r24 r34 1 . .
r4 p ⎥ .
R=⎢ (2.7) accounted for is so high, between 80% and 90% (Naes and
⎢ . . . . . . ⎥
. . Marten, 1988; Ali and Robert, 1998; Abdul–Wahab et al.,
⎢ ⎥ 2005).
⎢ . . . . . .
. ⎥ .
⎢ . . . . . . . . ⎥ vi. Run the OLS regression (without intercept) of the
⎢ ⎥ standardized dependent variable on the selected m PCs. Thus,
⎢⎣ r1 p r2 p r3 p r4 p . . . 1 ⎥⎦
y i1 = f ( PC 1 , PC 2 ,..., PC m ) (2.10)
and obtain the estimates of the regression coefficients

as γ 1 , γ 2 ,…, γ m .
1617
vii. Obtain the true regression coefficients of the linear 3. A NEW APPROACH
model (2.1 or 2.2) based on Principal Component Regression
Theorem 1: The Principal Component Regression Estimator
method using:
of a Linear Regression Model y = Xβ + u , where y is
sy ⎡ m ⎤ ( n × 1) vector for the observations of dependent variable; X
β PCR = ⎢ ∑
s x j ⎣ t =1
w jt γ t ⎥ j= 1,2, …,p
⎦
(2.11)
is ( n × p ) matrix of the observations of explanatory
j
variables, β is ( p × 1) vector of regression coefficients u is

Also, the intercept can be obtained using:
( n × 1) vector of the error terms assumed to multivariate
_ p −
β PCR = y − ∑ β PCR x j
0 j
(2.12) normal distribution with mean zero and variance, σ 2 I n which
j =1
uses r Principal Components, r ≤ p , can be obtained as,
βˆ PCR = (X 1 X ) X 1 yˆ r , where ŷ r is the predicted variable
1 −1
Alternatively, re-consider model (2.2) again. Then, X X
becomes a ( p × p ) matrix whose Eigenvectors result into
when y is regressed on the r principal components.
( p × p ) orthogonal [
matrix T = (t1 , t 2 ,...t p ) = Tr ,T p − r . ]
Proof:
Utilizing this and following the study of Ozkale (2009),
Batah et al. (2009), and Chang and Yang (2012), (2.2) can be Recall that the estimator of the parameters of the Principal
written as: Component Regression that uses r Principal Components is
given in (2.14) as:
y = XTT 1 β + u = XTr Tr1 β + XT p − r T p1− = r β + u
αˆ PCR = (Z r1 Z r ) Z r1 y = (Tr1 X 1 XTr ) Tr1 X 1 y
−1 −1
(2.13) (3.1)
= Z r α r + Z p−r α p−r + u
Then, the predicted value of dependent variable becomes:

where Z r = XTr , α r = T r1 β , Z p −r = XTp−r ,
yˆ r = Z r αˆ PCR = XT r (Tr1 X 1 XT r ) Tr1 X 1 y
−1
(3.2
α p − r = T p1− r β .
The n× p matrix of the principal components 1

Multiplying both sides by X , (3.2) becomes:
[
Z = ( z1 , z 2 ,... z p ) = Z r , Z p − r ] is obtained using
z i = Xt i , i = 1,2,... p . However, Z p − r contains PCs
(
X 1 yˆ r = X 1 XTr Tr1 X 1 XTr )−1
Tr1 X 1 y (3.3)
corresponding to near zero Eeigenvalues, and are thus

βˆ PCR = Tr (Tr1 X 1 XTr ) Tr1 X 1 y .
−1
discarded for the Principal Component Regression. The But from (2.15),
estimator of the parameters used in the Principal Component
Therefore,
Regression and its true estimator in the light of model (2.2)
are respectively given as:
X 1 yˆ r* = X 1 Xβ̂ PCR (3.4)
αˆ PCR = (Z r1 Z r ) Z r1 y = (Tr1 X 1 XTr ) Tr1 X 1 y

−1 −1
(2.14)
Hence, it follows that
βˆ PCR = (X 1 X ) X 1 yˆ r
−1
βˆ PCR = Tr αˆ PCR = Tr (Tr1 X 1 XTr ) Tr1 X 1 y

−1 (3.5)
(2.15)
This completes the proof.

The mean and variance –covariance matrix of β̂ PCR are
respectively obtained as:
Theorem 2: Following from theorem (1), the mean, variance,
[ ] = E[T αˆ
E βˆ PCR r PCR ] = Tr E[αˆ PCR ] = Tr T r
1
β (2.16) bias and mean square error of true estimator of Principal
Component Regression Parameters are respectively as:
D ( βˆ PCR ) = D[Tr αˆ PCR ] = σ r2 Tr (Tr1 X 1 XTr ) Tr1 (2.17)

−1
(i) ( ) [(
E βˆ PCR = E X 1 X )
−1
]
X 1 yˆ r = Tr Tr1β
1618
(ii) ( ) [ ]
D βˆ PCR = D (X 1 X ) X 1 yˆ r = σ r2Tr (Tr1 X 1 XTr ) Tr1
−1 −1
(iii) ( ) (
Bias βˆ PCR = Tr Tr1 − I β )
(iv) ( )
MSEM βˆ PCR = σ r2 T r T r1 X 1 XT r ( )
−1
(
T r1 + T r T r1 − I ββ 1 T r T r1 − I ) ( )1
Proof:
(i) ( ) [(
E βˆ PCR = E X 1 X )
−1
]
X 1 yˆ r = X 1 X ( ) −1
X 1 E ( yˆ r )
( ) X E (Z αˆ )
= X 1X
−1 1
r r
= (X X ) X Z E (αˆ )
1 −1 1
r r (3.6)
= (X X ) X XT E (αˆ )
1 −1 1
r r
= Tr T β r
1
(ii) ( ) [(
D βˆ PCR = D X 1 X ) −1
X 1 yˆ r ]
(
= X 1X ) −1
X 1 D( yˆ r )X X 1 X ( ) −1
= (X X) X 1 D(Z r αˆ r )X X 1 X ( )
1 −1 −1
= (X X) X 1 Z r D(αˆ r )Z r1 X X 1 X ( )
1 −1 −1
(3.7)
= (X X) ( ) ( )
−1 −1 −1
1
X XTr σ T X XTr
1 2
r r
1 1 1
T X X X X
r
1 1
T (T ) −1
= σ r2 r r
1
X 1 XTr Tr1
Consequently, the bias and the mean square error matrix 4. APPLICATIONS TO REAL LIFE DATA
(MSEM) follow from (3.6) and (3.7) as:
4.1 Application to Longley Data
( ) (
Bias βˆ PCR = Tr Tr1 − I β ) (3.8)
Longley (1967) initially adopted this dataset in his study. It
comprises of six independent variables and one dependent
variable. Other authors that have applied the data include
(
MSEM βˆ PCR = ) (3.9)
Walker and Birch (1988), Ayinde et al. (2015), Yasin and
Murat (2016) and recently, Lukman and Ayinde (2018).
σ r2 Tr (Tr1 X 1 XTr ) Tr1 + (Tr Tr1 − I )ββ 1 (Tr Tr1 − I )
−1 1
Walker and Birch (1988) provided the scaled condition
number of the data set as 43,275, which shows the model
Thus, we observed that this new approach is simpler and suffers from severe multicollinearity. This finding agrees with
easier to apply. Here, the PCs are used as regressors to predict the result in Table 4.1, where the variance inflation factor is
the dependent variable. After which, we regressed the greater than 10. Table 4.1 further shows that the results of the
obtained fitted values on the original explanatory variables conventional principal component estimates (r=2) and the
using Ordinary Least Square (OLS) estimator. The results proposed are either the same or very close, and they return the
obtained is theoretical the same as the conventional one even same standard error. They are also more efficient than OLS
in term of the sampling properties of estimators. estimator.
1619
Table 4.1: The regression output with Longley Data
Estimators
OLS PCR EST. PCR EST.

Parameters (CONVENTIONAL)
(NEW APPROACH)
EST VIF
PC12
β1 -3482258.63 0.000014515 6.95234

(890420.38) (2.24540D-06) (2.24540D-06)
P-value = 0.004
β2 15.06 135.53 0.00079584 0.00021828

(84.91) (0.00011456) (0.00011456)
P-value = 0.863
β3 -0.0358 1788.51 -0.0026323 -0.0026321

(0.0335) (0.0027006) (0.0027006)
P-value = 0.313
β4 -0.0511 0.57053 0.57053

(0.2261) 399.15 (0.0089953) (0.0089953)
P-value = 0.826
β5 -2.0202 33.62 -0.65725 -0.65724

(0.4884) (0.18121) (0.18121)
P-value = 0.003
β6 -1.0332 3.5889 0.53145 0.53146

(0.2142) (0.14274) (0.14274)
P-value = 0.001
β7 1829.15 758.98 0.027786 0.024236

(455.48) (0.004261) (0.0042611)
P-value = 0.003
MSE 7.92846D+11 0.053316 0.053316
NOTE: The Eigenvalues of X1X: 2.76779D+12, 7.03914D+09, 1.16090D+07, 2504761.02102, 1738.35637, 13.30921, 1.17218D-07
1620
4.2 Application to Gruber Data national product by country from 1972 to 1986 comparing
four (4) independent variables. It has then been widely
To further illustrate the performance of the new approach and
analyzed in literature by many authors, including Akdeniz and
the conventional method, we use the data discussed in Gruber
Erol (2003), Li and Yang (2010), and Chang and Yang
(1998). This data set is on Total National Research and
(2012). Table 4.2 gives a summary of the results.
Development Expenditures—as a per cent of the gross
Table 4.2: The regression output with Gruber Data
Estimators
Parameters OLS PCR EST. PCR EST.

(CONVENTIONAL) (NEW APPROACH)
EST VIF
PC123
β1 0.692076 -0.018174 -0.018173

(0.864522) (0.028513) (0.028513)
P-value = 0.46
β2 0.625848 6.91 0.53588 0.53588

(0.176534) (0.14183) (0.14183)
P-value =0.016
β3 -0.115382 21.58 -0.034363 -0.034362

(0.312401) (0.1181) (0.1181)
P-value =0.727
β4 0.286602 29.76 0.2838 0.2838

(0.226894) (0.031305) (0.031305)
P-value =0.262
β5 0.02564 1.98 0.21042 0.21042

(0.170113) (0.033692) (0.033692)
P-value =0.886
MSE 0.95658 0.036989 0.036989
NOTE: The Eigenvalues of X’X: 312.9320, 0.7536, 0.0453, 0.0372, 0.0019.
Estimating all the parameters of the model by their estimators CONCLUSION

in this example with r chosen to be three as done by Chang
The paper has proved another technique of using Principal
and Yang (2012), the results obtained are as shown in Table 2.
Component Regression to address the multicollinearity
The results of the conventional estimator as reported by
problem in the linear regression model. The method provided
Chang and Yang (2012) are the same as the new approach.
gives the same results with the conventional method in terms
We also observed that the estimates obtained with Principal
of sampling properties of estimators and has the advantages of
Component Estimator has smaller standard errors and MSE
computation ease. The Principal Component Regression
than that of OLS estimator, showing that the former is more
Estimator is more efficient than the OLS estimator. Two
efficient. We can see that the new approach is applicable in
numerical examples support these findings.
practice.
1621
REFERENCES [17] Massy, W. F. (1965). Principal components regression

in exploratory statistical research. Journal of the
[1] Abdul–Wahab, S. A., Bakheit, C. S. and Al–Alawi, S.
American Statistical Analysis, 60:234–266.
M., (2005). Principal component and multiple
regression analysis in modelling of ground– level [18] Naes, T. and Marten, H. (1988). Principal Component
ozone and factors affecting its concentrations. Regression in NIR analysis: View points, Background
Environmental Modelling and Software 20, 1263– Details Selection of components. Journal of
1271. Chemometrics, 2, 155 – 167.
[2] Akdeniz, F. and Erol, H. (2003). Mean squared error [19] Constantin, C. (2014). Principal Component Analysis –
matrix comparisons of some biased estimators in linear A Powerful Tool in Computing Marketing Information.
regression. Communication in Statistics Theory and Bulletin of the Transilvania, University of Brasov,7
Methods, 32:2389–2413. (56), 2.
[3] Ali S. and Robert, F. L. (1998). Some Cautionary [20] Kaiser, H. F. (1960). The application of electronic
Notes on the Use of Principal Components Regression. computers to factor analysis. Educational and
The American Statistician, 52 (1), 15-19. Psychological Measurement, 20, 141-151.
[4] Ayinde, K., Lukman, A. F. and Arowolo, O.T. (2015). [21] Liu, K. (1993). A new class of biased estimate in linear
Robust regression diagnostics of influential regression. Communication in Statistics Theory and.
observations in linear regression model. Open Journal Methods, 22:393–402.
of Statistics, 5, 1-11. [22] Li, Y. and Yang, H. (2010) A new stochastic mixed
[5] Ayinde, K., Lukman A. F., Samuel, O. O. and Attah, ridge estimator in linear regression model. Stat. Pap.
O. M. (2018). Some New Adjusted Ridge Estimators of 51:315–323
Linear Regression Model. International Journal of Civil [23] Longley, J.W. (1967) An appraisal of least squares
Engineering and Technology, 9(11), 2838-2852. programs for electronic computer from the point of
[6] Batah, F. M., Ozkale, M. R. and Gore, S. D. (2009). view of the user. Journal of American Statistical
Combining unbiased ridge and principal component Association, 62: 819-841.
regression estimators. Communication in Statistics [24] Lukman, A. F., Ayinde, K., Aladeitan, B. B., Rasak, B.
Theory and Methods, 38:2201–2209. (2020). An Unbiased Estimator with Prior Information.
[7] Chang, X. and Yang, H. (2012). Combining two- Arab Journal of Basic and Applied Sciences, 27:1, 45-
parameter and principal component regression 55.
estimators. Stat. Papers, 53:549–562. [25] Lukman, A. F., Ayinde, K., Binuomote, S. and Onate,
[8] Chatterjee, S and Hadi, A. S. (2006). Regression A. C., (2019). Modified ridge‐type estimator to combat
Analysis by Example (Fourth Edition). multicollinearity: Application to chemical data. Journal
of Chemometrics, e3125.
[9] Fomby, T. B., Hill, R. C., and Johnson, S.R.(1984).
https://doi.org/10.1002/cem.3125
Advanced Econometric Methods. NY, Springer –
Verlag. [26] Lukman, A. F. and Ayinde, K. (2018). Detecting
Influential Observations in Two-Parameter Liu-Ridge
[10] Gruber, M.H.J. (1998) Improving efficiency by
Estimator. Journal of Data Science, 16(2), 207-218.
shrinkage: the James Stein and ridge regression
estimators. Marcel Dekker, New York. [27] Lukman, A. F., Ayinde, K., Okunola, A. O., Akanbi, O.
B. and Onate. C. A. (2018). Classification-Based Ridge
[11] Gujarati, D. N. (2005). Basic Econometrics, McGraw-
Estimation Techniques of Alkhamisi Methods. Journal
Hill, New York.
of Probability and Statistical Sciences. 16(2), 2018,
[12] Lee, H. Park, Y.M. and Lee, S. (2015). Principal 165-181.
Component Regression by Principal Component
[28] Ozkale, M. R. (2009). Principal components regression
Selection. Communications for Statistical Applications
estimator and a test for the restrictions. Statistics,
and Methods, 22, (2), 173–180. DOI:
43:541–551.
http://dx.doi.org/10.5351/CSAM.2015.22.2.173.
[29] Walker, E. and Birch, J. B. (1988). Influence Measures
[13] Hoerl, A. E. and Kennard, R. W. (1970). Ridge
in Ridge Regression. Technometrics, 30(2), 221- 227.
regression: biased estimation for nonorthogonal
problems. Technometrics, 12(1):55-67. [30] Wold, Herman (1985). "Partial least squares". In Kotz,
Samuel; Johnson, Norman L. (eds.). Encyclopedia of
[14] Hotelling, H. (1933). Analysis of a complex of
statistical sciences. 6. New York: Wiley. 581–591.
statistical variables into principal components. Journal
of Educational Psychology, 24, 417–441, and 498–520. [31] Yasin, A. and Murat, E. (2016). Influence Diagnostics
in Two-Parameter Ridge Regression. Journal of Data
[15] Kibria, B. M., (2003). Performance of some new ridge
Science, 14, 33-52.
regression estimators. Communications in Statistics-
Simulation and Computation, 32 (2) (2003), 419–435 [32] Kibria, G. B. M. and Lukman, A. F. (2020). A New
[16] Maddala, G.S. (1988) Introduction to Econometrics, Ridge-Type Estimator for the Linear Regression
Macmillan, New York. Model: Simulations and Applications. Scientifica.
https://doi.org/10.1155/2020/9758378.
1622
View publication stats

A New Approach of Principal Component Regression E

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A New Approach of Principal Component Regression E

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A New Approach of Principal Component Regression Estimator with

Article · July 2020

Kayode Ayinde Adewale Lukman

SEE PROFILE SEE PROFILE

Alabi Olatayo Olusegun

multicollinearity View project

DIAGNOSTIC CHECKS IN LINEAR REGRESSION MODEL View project

The user has requested enhancement of the downloaded file.

A New Approach of Principal Component Regression Estimator with

Abstract components (PCs) (Naes and Marten, 1988; Chatterjee and

where σ̂ 2 = i =1 , (Maddalla, 1988; Kibra, 2003).

2005; Lukman et al., 2019; Lukman et al., 2020). Among the _ n

Estimator (Hoerl and Kennard, 1970), estimator based on ⎜ i =1 ⎝ ⎠ ⎟⎠ ⎜⎝ i =1 ⎝ ⎠ ⎟⎠

and obtain the estimates of the regression coefficients

variables, β is ( p × 1) vector of regression coefficients u is

Then, the predicted value of dependent variable becomes:

The n× p matrix of the principal components 1

corresponding to near zero Eeigenvalues, and are thus

αˆ PCR = (Z r1 Z r ) Z r1 y = (Tr1 X 1 XTr ) Tr1 X 1 y

βˆ PCR = Tr αˆ PCR = Tr (Tr1 X 1 XTr ) Tr1 X 1 y

This completes the proof.

D ( βˆ PCR ) = D[Tr αˆ PCR ] = σ r2 Tr (Tr1 X 1 XTr ) Tr1 (2.17)

Table 4.1: The regression output with Longley Data

OLS PCR EST. PCR EST.

β1 -3482258.63 0.000014515 6.95234

β2 15.06 135.53 0.00079584 0.00021828

β3 -0.0358 1788.51 -0.0026323 -0.0026321

β4 -0.0511 0.57053 0.57053

β5 -2.0202 33.62 -0.65725 -0.65724

β6 -1.0332 3.5889 0.53145 0.53146

β7 1829.15 758.98 0.027786 0.024236

MSE 7.92846D+11 0.053316 0.053316

Table 4.2: The regression output with Gruber Data

Parameters OLS PCR EST. PCR EST.

β1 0.692076 -0.018174 -0.018173

β2 0.625848 6.91 0.53588 0.53588

β3 -0.115382 21.58 -0.034363 -0.034362

β4 0.286602 29.76 0.2838 0.2838

β5 0.02564 1.98 0.21042 0.21042

MSE 0.95658 0.036989 0.036989

NOTE: The Eigenvalues of X’X: 312.9320, 0.7536, 0.0453, 0.0372, 0.0019.

Estimating all the parameters of the model by their estimators CONCLUSION

REFERENCES [17] Massy, W. F. (1965). Principal components regression

View publication stats

You might also like