Professional Documents
Culture Documents
net/publication/346348408
CITATIONS READS
7 578
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Adewale Lukman on 27 November 2020.
1 2
Kayode Ayinde, Adewale F. Lukman, 1Olusegun. O. Alabi and 1Hamidu A. Bello
1
Department of Statistics, Federal University of Technology, Akure, Nigeria.
2
Department of Physical Sciences, Landmark University, Omu-Aran, Nigeria.
Corresponding Author Email: adewale.folaranmi@lmu.edu.ng
ORCID: 0000-0002-2993-6203 (Hamidu A. Bello), ORCID: 0000-0003-2881-1297 (Lukman)
( )
regression coefficients having wrong signs and high standard −1
deviation, and the parameter estimates and their standard Var ( βˆOLS ) = σˆ 2 X 1 X (2.4)
errors become extremely sensitive to slight changes in the data
n
points (Chatterjee and Hadi, 2006; Lukman et al., 2018;
Ayinde et al., 2018). This consequence tends to inflate the ∑ (y i − yˆ i )
2
1616
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
_ ∑x ij ∑ ⎜⎝ x ij−xj ⎟
⎠ These PCs are p orthogonal variables.
where x j = i =1
and sx j = i =1
n n =1
for j = 1,2, 3,…, p v. Determine the number of PCs to use: There is a need to
determine the number of PCs to use. Since there are p of them
ii. Obtain the correlation matrices of the explanatory and using all of them will produce the same results with OLS.
variables, say R. Thus, Hence a particular number of them has to be chosen. Kaiser
(1960) suggested the use of PCs whose Eigenvalue is more
⎡1 r12 r13 r14 . r1 p ⎤
. .
than one. Constantin (2014) suggested using any component
⎢r 1 r23 r24 . r2 p ⎥⎥
. .
⎢ 12 that accounts for at least 5% or 10% of the total variance.
⎢ r13 r23 1 r34 . r3 p ⎥
. . Some other authors emphasized that components should be
⎢ ⎥ selected enough so that the cumulative per cent of variance
r14 r24 r34 1 . .
r4 p ⎥ .
R=⎢ (2.7) accounted for is so high, between 80% and 90% (Naes and
⎢ . . . . . . ⎥
. . Marten, 1988; Ali and Robert, 1998; Abdul–Wahab et al.,
⎢ ⎥ 2005).
⎢ . . . . . .
. ⎥ .
⎢ . . . . . . . . ⎥ vi. Run the OLS regression (without intercept) of the
⎢ ⎥ standardized dependent variable on the selected m PCs. Thus,
⎢⎣ r1 p r2 p r3 p r4 p . . . 1 ⎥⎦
y i1 = f ( PC 1 , PC 2 ,..., PC m ) (2.10)
1617
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
vii. Obtain the true regression coefficients of the linear 3. A NEW APPROACH
model (2.1 or 2.2) based on Principal Component Regression
Theorem 1: The Principal Component Regression Estimator
method using:
of a Linear Regression Model y = Xβ + u , where y is
sy ⎡ m ⎤ ( n × 1) vector for the observations of dependent variable; X
β PCR = ⎢ ∑
s x j ⎣ t =1
w jt γ t ⎥ j= 1,2, …,p
⎦
(2.11)
is ( n × p ) matrix of the observations of explanatory
j
βˆ PCR = (X 1 X ) X 1 yˆ r
−1
1618
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
(ii) ( ) [ ]
D βˆ PCR = D (X 1 X ) X 1 yˆ r = σ r2Tr (Tr1 X 1 XTr ) Tr1
−1 −1
(iii) ( ) (
Bias βˆ PCR = Tr Tr1 − I β )
(iv) ( )
MSEM βˆ PCR = σ r2 T r T r1 X 1 XT r ( )
−1
(
T r1 + T r T r1 − I ββ 1 T r T r1 − I ) ( )1
Proof:
(i) ( ) [(
E βˆ PCR = E X 1 X )
−1
]
X 1 yˆ r = X 1 X ( ) −1
X 1 E ( yˆ r )
( ) X E (Z αˆ )
= X 1X
−1 1
r r
= (X X ) X Z E (αˆ )
1 −1 1
r r (3.6)
= (X X ) X XT E (αˆ )
1 −1 1
r r
= Tr T β r
1
(ii) ( ) [(
D βˆ PCR = D X 1 X ) −1
X 1 yˆ r ]
(
= X 1X ) −1
X 1 D( yˆ r )X X 1 X ( ) −1
= (X X) X 1 D(Z r αˆ r )X X 1 X ( )
1 −1 −1
= (X X) X 1 Z r D(αˆ r )Z r1 X X 1 X ( )
1 −1 −1
(3.7)
= (X X) ( ) ( )
−1 −1 −1
1
X XTr σ T X XTr
1 2
r r
1 1 1
T X X X X
r
1 1
T (T ) −1
= σ r2 r r
1
X 1 XTr Tr1
Consequently, the bias and the mean square error matrix 4. APPLICATIONS TO REAL LIFE DATA
(MSEM) follow from (3.6) and (3.7) as:
4.1 Application to Longley Data
( ) (
Bias βˆ PCR = Tr Tr1 − I β ) (3.8)
Longley (1967) initially adopted this dataset in his study. It
comprises of six independent variables and one dependent
variable. Other authors that have applied the data include
(
MSEM βˆ PCR = ) (3.9)
Walker and Birch (1988), Ayinde et al. (2015), Yasin and
Murat (2016) and recently, Lukman and Ayinde (2018).
σ r2 Tr (Tr1 X 1 XTr ) Tr1 + (Tr Tr1 − I )ββ 1 (Tr Tr1 − I )
−1 1
Walker and Birch (1988) provided the scaled condition
number of the data set as 43,275, which shows the model
Thus, we observed that this new approach is simpler and suffers from severe multicollinearity. This finding agrees with
easier to apply. Here, the PCs are used as regressors to predict the result in Table 4.1, where the variance inflation factor is
the dependent variable. After which, we regressed the greater than 10. Table 4.1 further shows that the results of the
obtained fitted values on the original explanatory variables conventional principal component estimates (r=2) and the
using Ordinary Least Square (OLS) estimator. The results proposed are either the same or very close, and they return the
obtained is theoretical the same as the conventional one even same standard error. They are also more efficient than OLS
in term of the sampling properties of estimators. estimator.
1619
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
Estimators
NOTE: The Eigenvalues of X1X: 2.76779D+12, 7.03914D+09, 1.16090D+07, 2504761.02102, 1738.35637, 13.30921, 1.17218D-07
1620
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
4.2 Application to Gruber Data national product by country from 1972 to 1986 comparing
four (4) independent variables. It has then been widely
To further illustrate the performance of the new approach and
analyzed in literature by many authors, including Akdeniz and
the conventional method, we use the data discussed in Gruber
Erol (2003), Li and Yang (2010), and Chang and Yang
(1998). This data set is on Total National Research and
(2012). Table 4.2 gives a summary of the results.
Development Expenditures—as a per cent of the gross
Estimators
1621
International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 7 (2020), pp. 1616-1622
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.7.2020.1616-1622
1622