You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/260436784

A Monte Carlo Evaluation of Some Ridge Estimators

Article · August 2008

CITATIONS READS

14 433

1 author:

Yazid Al-Hassan
University of Regina
7 PUBLICATIONS 113 CITATIONS

SEE PROFILE

All content following this page was uploaded by Yazid Al-Hassan on 17 February 2015.

The user has requested enhancement of the downloaded file.


J. J. Appl. Sci., Vol.10, No. 2 (2008)

A Monte Carlo Evaluation of


Some Ridge Estimators
Yazid Mohammad Al-Hassan
Ministry of Education, Jordan
Received: 1/6/2008 Accepted: 28/8/2008
Al-Hassan, Yazid M. (2008) A Monte Carlo (2008)
Evaluation of Some Ridge Estimators. J.J. Appl. .
Sci: Natural Sciences Series 10 (2): 101-110. .110-101 :(2) 10 :

Abstract: Several studies concerning ridge :


regression have dealt with the choice of ridge
parameter k. Many algorithms for the ridge
parameter have been proposed in the statistical k
literature. In this article, seven methods for
estimating ridge parameter have been .k
considered. A simulation study has been made
to evaluate the performance of these estimators
based on the minimum mean squared error
(MSE) criterion. The simulation study indicates
that under certain conditions two estimators .
perform well almost overall. These two
estimators are GM and HKB.
Keywords: Ridge parameter, MSE, simulation. .HKB GM

Introduction
Consider the standard model for multiple linear regression
y= + e, …….………………………………….…………………..……….. (1)
where y is an n × 1 column vector of observations on the dependent variable, X is
an n × p fixed matrix of observations on the explanatory variables and is of full
rank p (p ≤ n) , is a p×1 unknown column vector of regression coefficients,
and e is an n × 1 vector of random errors; E(e) = 0, E(ee' ) = ∂ 2 I n , where I n
denotes the n×n identity matrix. The variables are assumed to be standardized so
that X´X is in the form of correlation matrix, and the vector X´y is the vector of
correlation coefficients of the dependent variable with each explanatory variable.
The least squares (LS) estimator, of the parameters are given by
= (X′X) − 1 X′y ………………………………………………………………... (2)
In multiple linear regression models, we usually assume that the explanatory
variables are independent. However, in practice, there may be strong or near to
strong linear relationships among the explanatory variables. In that case the
independent assumptions are no longer valid, which causes the problem of
multicollinearity.
In the presence of multicollinearity, it is impossible to estimate the unique effects
of individual variables in the regression equation. Moreover, the LS estimates are

* Author's e-mail address: yazid_12111980@yahoo.com

101
Yazid Al-Hassan

likely to be too large in absolute value and, possibly, of the wrong sign. Therefore,
multicollinearity becomes one of the serious problems in the linear regression
analysis.
Several methods have been suggested to solve this problem. "Ridge regression" is
the most popular one as it has many benefits in real life. The ridge regression
method was proposed by Hoerl and Kennard,[1,2] and since then, numerous papers
have been written, either suggesting different ways of estimating the ridge
parameter, comparing ridge with LS, and evaluating the performance of different
ridge parameter estimates.
Hoerl and Kennard[1] suggested the use of X′X + kI p , (k ≥ 0) rather than X′X , in
the estimation of (Eq. (2)). The resulting estimators of are known in the
literature as the ridge regression estimators, given by

(k) = (X′X + kI p ) −1 X′y ………………………………………………………... (3)

The constant k is known as a biasing or ridge parameter. As k increases from zero


and continues up to infinity, the regression estimates tend toward zero. Though
these estimators result in biased estimates, for certain values of k, they yield
minimum mean squared error (MMSE) compared to the LS estimator.[1]
However, the MSE( (k) ) will depend on unknown parameters k, and 2 ,
which cannot be calculated in practice. But k has to be estimated from the real
data instead.
Several methods for estimating k have been proposed by several researchers (e.g.,
Hoerl and Kennard,[1] Hoerl et al.,[3] McDonald and Galarneau,[4] Lawless and
Wang,[5] Hocking et al.,[6] Wichern and Churchill,[7] Nordberg,[8] Saleh and
Kibria,[9] Singh and Tracy,[10] Wencheko,[11] Kibria[12] and Khalaf and
Shukur[13]).
The objective of the article is to investigate some of the existing methods that are
available in literature and to make a comparison among them based on mean
squares properties.

Estimating Methods and Performance Criteria


In order to describe these methods, it is convenient to write the linear regression
model (1) in canonical form. Suppose there exists an orthogonal matrix D such
that D′CD = , where C = X′X and = diag( 1 , 2 ,..., p ) contains the
eigenvalues of the matrix C. The canonical form of the model (1) is
y = X ∗ + e , ……………………………………………………………………. (4)

where X ∗ = XD and = D′ . Then the ridge regression estimator is given as


follows
′ ′
(k) = (X ∗ X ∗ + K) −1 X ∗ y , ……………………………………………………... (5)

where K = diag(k 1 ,k 2 ,..., k p ) , k i > 0 and = −1X ∗ y is the ordinary least
squares (OLS) estimates of . Eq. (5) is called the general form of ridge

102
J. J. Appl. Sci., Vol.10, No. 2 (2008)

regression.[1] It follows from Hoerl and Kennard[1] that the value of k i which
minimizes the MSE ( (k) ), where
p p
k i2 i2
MSE( (K)) = 2
∑ i
+ k i )2
+∑ , ………………………………… (6)
i + ki )
2
i =1 i i =1

is
σ2
ki = , …………………………………………………………..…………… (7)
α i2
2
where represents the error variance of model (1), i is the ith element of .
Equation (7) gives a value of k i that fully depends on the unknowns 2 and i ,
and must be estimated from the observed data. Hoerl and Kennard[1] suggested
the replacement of 2 and i by their corresponding unbiased estimators, that is,

σ2
ki = , ……………………………………………………………………….. (8)
α i2
where 2
= ∑ e i2 n − p is the residual mean square estimate, which is an unbiased
estimator of 2 , and i is the ith element of , which is an unbiased estimator of
.
In the following we present some methods for estimating ridge parameter k.

1- Hoerl and Kennard Method ( k HK or HK)


Hoerl and Kennard[1] found that the best method for achieving a better estimate
(k) is to use k i = k for all i and they suggested k to be,
2
k HK = 2
……………………………………………….……………….. (9)
max( i )

2- Hoerl, Kennard and Baldwin Method ( k HKB or HKB)


Hoerl, Kennard and Baldwin[3] proposed a different estimator of k by taking the
harmonic mean of k i in Eq. (8). That is
2
p
k HKB = p
…………………………………………………………………. (10)
∑i =1
2
i

3- Lawless and Wang Method ( k LW or LW)


Lawless and Wang[5] proposed the following estimator
2
p
k LW = p
……………….……………………………………………….. (11)

i =1
i
2
i

4- Hocking, Speed and Lynn Method ( k HSL or HSL)


Hocking, Speed and Lynn[6] suggested the following estimator for k

103
Yazid Al-Hassan

∑ i i )2
k HSL = 2 i =1
p
…………………………………………………………. (12)
(∑ i
2 2
i )
i =1

5- Arithmetic Mean Method ( k AM or AM)


Kibria[12] proposed an estimator of k by using the arithmetic mean of k i in Eq.
(8), which produces the following estimator,
2
1 p
k AM = ∑
p i =1 2
…………………………..……………………………………. (13)
i

6- Geometric Mean Method ( k GM or GM)


Kibria[12] proposed estimating k by using the geometric mean of k i in Eq. (8),
which produces the following estimator,
2
k GM = p 1
……………………………………………………………… (14)
(∏ 2 p
i )
i =1

7- Khalaf and Shukur Method ( k KS or KS )


Khalaf and Shukur[13] suggested a new approach for choosing the ridge parameter
k by adding the amount 2 max( i ) to the denominator of the right hand side of
Eq. (9) which is a function of the correlation between the independent variables.
The proposed estimator is

max( i ) 2
k KS = …………………………………….. (15)
(n − p − 1) 2 + max( i ) ⋅ max( i ) 2
To compare the proposed estimators, a criterion for measuring “goodness” of an
estimator is needed. Following Lawless and Wang,[5] Gibbons[14] and Kibria,[12]
mean squared error (MSE) criterion is used throughout our study to measure the
goodness of an estimator.
From Eq. (6), the MSE heavily depends on i , i and 2 . Since, theoretically, the
estimators in Eqs. (9)–(15) are very hard to compare, we will compare them
through a simulation study which is discussed in the following section.

The Monte Carlo Simulations


In this section, we will discuss the simulation study that compares the
performance of different estimators under several degrees of multicollinearity.
Following McDonald and Galarneau,[4] Wichern and Churchill,[7] Gibbons[14]
and Kibria,[12] the explanatory variables were generated using the device
1
x ij = (1 − ) z ij + z ip , i = 1,2,..., n, j = 1,2,..., p ………………………...……. (16)
2 2

104
J. J. Appl. Sci., Vol.10, No. 2 (2008)

where z ij are independent standard normal pseudo-random numbers, and is


specified so that the correlation between any two explanatory variables is given by
2
. The variables are then standardized so that X′X and X′y are in correlation
forms. Different sets of correlation are considered corresponding to = 0.7, 0.8,
0.9 and 0.99. These values of will include a wide range of low, moderate and
high correlation between variables. The n observations for the dependent variable
y are determined by

yi = 0 + 1x i1 + 2 x i2 + ... + p x ip + e i , i = 1, ..., n ……………………………. (17)


2
where e i are independent normal (0, ) pseudo-numbers and 0 is taken to be
identically zero. We varied the sample sizes between 15, 25 and 30 and
explanatory variables, between 5, 10 and 20.
For each set of explanatory variables one choice for the coefficient vectors is
considered. Newhouse and Oman[15] stated that if the mean squared error is a
function of , 2 and k, and if the explanatory variables are fixed, then the MSE
is minimized when is the normalized eigenvectors corresponding to the largest
eigenvalue of X′X matrix subject to constraint that ′ = 1. Hence, we selected the
coefficients 1 , 2 ,..., p as normalized eigenvectors corresponding to the largest
eigenvalues of X′X matrix so that ′ = 1. We didn't use normalized eigenvectors
corresponding to the smallest eigenvalue because the conclusion about the
performance of estimators in both cases will not change greatly.[12]
For given values of p, n and , the experiment was repeated 5000 times by
generating 5000 samples. For each replicate r (r = 1, 2, …, 5000), the values of k
of different proposed estimators and the corresponding ridge estimators were
calculated using

(k) = + kI p ) −1 X ∗ y , k = k HK , k HKB , ..., k GM ………………………………. (18)

Then the MSEs for estimators are calculated as follows

1 5000
MSE( k) = ∑(
5000 r =1
(r) − )′( (r) − ) ……………………………………….. (19)

The simulated MSEs and ridge parameters ( ks ) are summarized in Tables 1-3.

105
Yazid Al-Hassan

Table 1: Estimated MSEs and ks with p = 5 and n =15

HK HKB LW HSL KS AM GM 1 p

MSE 0.032149 0.028267 0.031002 0.032890 0.032305 0.030369 0.026100


0.7 20.36

k 0.011974 0.032452 0.017708 0.008705 0.011301 135.85 0.081711

MSE 0.024247 0.021202 0.023732 0.024425 0.024280 0.025838 0.019211


0.8 34.73

k 0.005324 0.017813 0.007295 0.004642 0.005212 57.85 0.056898

MSE 0.018660 0.016479 0.018519 0.018682 0.018665 0.025773 0.014892


0.9 78.86

k 0.001718 0.006895 0.002055 0.001664 0.001709 23.20 0.036353

MSE 0.015531 0.013896 0.015523 0.015532 0.015532 0.037429 0.012766


0.99 888.80

k 0.000117 0.000521 0.000119 0.000117 0.000117 4.85 0.005032

Table 2: Estimated MSEs and ks with p = 10 and n = 25

HK HKB LW HSL KS AM GM 1 p

MSE 0.017104 0.014731 0.016700 0.017172 0.017110 0.020987 0.012676


0.7 57.30
k 0.0025940 0.0172282 0.0046657 0.0022568 0.0025690 715.35 0.0740455

MSE 0.011996 0.010688 0.011894 0.012003 0.011997 0.019178 0.008934


0.8 101.91

k 0.0009706 0.0079269 0.0014605 0.0009362 0.0009681 483.60 0.0573716

MSE 0.008819 0.008133 0.008800 0.008819 0.008819 0.021929 0.007106


0.9 238.32

k 0.0002851 0.0026139 0.0003489 0.0002835 0.0002849 403.35 0.0401009

MSE 0.006916 0.006535 0.006915 0.006916 0.006916 0.034109 0.007335


0.99 2680.55
k 0.0000179 0.0001726 0.0000182 0.0000179 0.0000179 377.87 0.0069720

106
J. J. Appl. Sci., Vol.10, No. 2 (2008)

Table 3: Estimated MSEs and ks with p = 20 and n = 30

HK HKB LW HSL KS AM GM 1 p

MSE 0.01337 0.01029 0.01307 0.01339 0.01337 0.01981 0.00732


0.7 480.08

k 0.000415 0.006648 0.000827 0.000396 0.000415 6391.06 0.080017

MSE 0.008697 0.007336 0.008640 0.008698 0.008697 0.018969 0.005831


0.8 860.51

k 0.000141 0.002570 0.000221 0.000140 0.000141 4505.67 0.065381

MSE 0.006142 0.005515 0.006133 0.006143 0.006142 0.020860 0.006396


0.9 2039.75

k 0.000038 0.000736 0.000047 0.000038 0.000038 920.683 0.048189

MSE 0.004810 0.004484 0.004809 0.004810 0.004810 0.028701 0.010029


0.99 23199.10

k 0.0000023 0.0000455 0.0000024 0.0000023 0.0000023 466.87 0.0092948

Results of the Simulation Study


In this section we present the results of our Monte Carlo experiments. Our
primary interest here lies in comparing the MSEs of the considered estimators.
The main results of simulation have been presented in Tables 1–3. To compare the
performance of the considered estimators we will consider the following criterion.
1- Performance as a Function of and 1 p

For given n and p, GM performs better than the other estimators when the
correlations between the explanatory variables are low or moderate, but for high
correlations, HKB becomes better than GM, and for extremely high correlation
(i.e. = 0.99 ) all estimators (except AM) perform better than or as good as GM.
All estimators perform extremely better than AM, especially for high correlations.
For given n, p and , as the MSEs decrease for most of the estimators, the
ratio 1 p increases.

2- Performance as a Function of k
For a given n and p, k decreases for most of the estimators as the ratio 1 p

increases, which means that for most of the estimators there is a direct relation
between the MSE and k. But for a given n, p and , the best estimator, bar AM
and GM, is the estimator that have the largest k.

107
Yazid Al-Hassan

3- Performance as a Function of n and p


For a given , as the sample size and the number of the explanatory variables
increase, the MSEs of all estimators, bar GM at = 0.99 , decrease. It is observed
that for large sample size and high correlation GM performs poorer than the most
of other estimators.

Discussion and Conclusions


Several procedures for constructing ridge estimators have been proposed in the
literature. These procedures necessarily hinge on a method for selecting the
constant k. In this article we have evaluated the performance of some of ridge
estimators by comparing them based on the mean squares criteria. The evaluation
has been done using Monte Carlo simulations where levels of correlation, the
number of variables and the number of observations have been varied. For each
combination, we have used 5000 replications. Given the results of our simulation
study, certain conclusions emerge. In general, these conclusions are restricted to
the set of experimental conditions investigated.
It is evident from the simulation results that for low or moderate correlations, GM
performs better than the other estimators. However, for high correlations, HKB
performs better than GM and the other estimators. Besides, for high correlations
and large sample size, most of estimators perform better than GM.
Most of the times, the performance of the estimators HK, HSL and KS is almost
equivalent, and these estimators are better than AM, but they perform poorer than
LW. It is observed that the values of k AM are extremely large in comparison with
other ks , which makes AM a highly biased estimator. Because of that, Kibria[12]
advised against using AM in practice.

Simulation Comparisons
Several extensive studies have been conducted to evaluate the performance of
ridge estimators. Some of these studies are Hoerl and Kennard,[1] Hoerl et al.,[3]
McDonald and Galarneau,[4] Lawless and Wang,[5] Hocking et al.,[6] Wichern
and Churchill,[7] Gibbons,[14] Saleh and Kibria,[9] Singh and Tracy,[10] Kibria[12]
and Khalaf and Shukur.[13] Each of these studies has evaluated the most recent
proposed estimators at that time. This research study evaluates the recent
estimators available at the time of doing it.
This section compares the results of some simulation studies and points out areas
of agreement (or disagreement) regarding the relative performance of the
estimators evaluated here.
As in the present study, the estimator GM did well in the simulation comparison
of Kibria.[12] The estimator HKB performed well in this study and in those of
Hoerl, Kennard and Baldwin,[3] Gibbons,[14] Kibria[12] and Al-Hassan.[16] Its
performance was criticized, however, by Lawless[17] and Wichern and
Churchill,[7] who could not recommend its use without further study.
The LW estimator was not singled out as one of the best estimators in this study,
which agrees with the results of Gibbons[14] and Al-Hassan.[16] However, most of

108
J. J. Appl. Sci., Vol.10, No. 2 (2008)

the times, LW estimator performed better than the estimators HK, HSL and AM.
This result is similar to that of Kibria.[12]
The estimator HK is the first proposed ridge estimator, so it was evaluated in
various studies. Most of the estimators that were proposed after HK are
performing better (e.g. HKB, LW and GM) or almost equivalent to HK (e.g. HSL
and KS for low error variance). As for HK, our results were almost identical to the
results of Hoerl, Kennard and Baldwin,[3] Wichern and Churchill,[7] Kibria[12]
and Al-Hassan.[16]
The HSL estimator was included in the simulation studies of Hocking, Speed and
Lynn,[6] Kibria[12] and Al-Hassan.[16] The present study followed the same
format of these studies in this regard.
In addition to this study, the estimator AM was included in that of Kibria[12] and
Al-Hassan.[16] The performance of this estimator in our study is identical to that
of Al-Hassan.[16] Considering the performance of AM in the sense of having
smaller MSE, our results disagreed with that of Kibria.[12] However, AM was a
highly biased estimator in both studies.
The estimator KS was included in the studies of Khalaf and Shukur[13] and Al-
Hassan.[16] With regard to this estimator, our results were identical to that of Al-
Hassan.[16] In the case of low error variance, our results were similar to those of
Khalaf and Shukur.[13]

References
[1] Hoerl, A. E. & Kennard, R. W. (1970a) Ridge regression: biased estimation
for nonorthogonal problems. Technometrics 12 (1), 55-67.
[2] Hoerl, A. E. & Kennard, R. W. (1970b) Ridge regression: applications to
nonorthogonal problems. Technometrics 12 (1), 69-82.
[3] Hoerl, A. E., Kennard, R. W. & Baldwin, K. F. (1975) Ridge regression:
some simulation. Communications in Statistics 4 (2), 105–123.
[4] McDonald, G. C. & Galarneau, D. I. (1975) A Monte Carlo evaluation of
some ridge-type estimators. Journal of the American Statistical Association
70 (350), 407-412.
[5] Lawless, J. F. & Wang, P. (1976) A simulation study of ridge and other
regression estimators. Communications in Statistics-Theory and Methods 5
(4), 307–323.
[6] Hocking, R. R. , Speed, F. M. & Lynn, M. J. (1976) A class of biased
estimators in linear regression. Technometrics 18 (4), 425-437.
[7] Wichern, D. & Churchill, G. (1978) A comparison of ridge estimators.
Technometrics 20 (2), 301–311.
[8] Nordberg, L. (1982) A procedure for determination of a good ridge
parameter in linear regression. Communications in Statistics-Simulation and
Computation 11 (3), 285–309.
[9] Saleh, A. K. & Kibria, B. M. (1993) Performances of some new preliminary
test ridge regression estimators and their properties. Communications in
Statistics-Theory and Methods 22 (10), 2747–2764.

109
Yazid Al-Hassan

[10] Singh, S. & Tracy, D. S. (1999) Ridge-regression using scrambled


responses. Metrika 41 (2), 147–157.
[11] Wencheko, E. (2000) Estimation of the signal-to-noise in the linear
regression model. Statistical Papers 41 (3), 327–343.
[12] Kibria, B. M. (2003) Performance of some new ridge regression estimators.
Communications in Statistics-Simulation and Computation 32 (2), 419-435.
[13] Khalaf, G. & Shukur, G. (2005) Choosing ridge parameter for regression
problems. Communications in Statistics-Theory and Methods 34 (5),
1177-1182.
[14] Gibbons, D. G. (1981) A simulation study of some ridge estimators. Journal
of the American Statistical Association 76 (373), 131–139.
[15] Newhouse, J. P. and Oman, S. D. (1971) An evaluation of ridge estimators.
Rand Corporation R-716-PR, 1-28.
[16] Al-Hassan, Y. M. (2007) A comparison between ridge and principal
components regression methods using simulation technique. Unpublished
M.S. thesis, Al al-Bayt University, Jordan.
[17] Lawless, J. F. (1978) Ridge and related estimation procedure.
Communications in Statistics-Theory and Methods 7 (2), 139–164.

110

View publication stats

You might also like