You are on page 1of 7

ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: https://www.tandfonline.

com/loi/lsta20

Choosing Ridge Parameter for Regression


Problems

Ghadban Khalaf & Ghazi Shukur

To cite this article: Ghadban Khalaf & Ghazi Shukur (2005) Choosing Ridge Parameter for
Regression Problems, , 34:5, 1177-1182, DOI: 10.1081/STA-200056836

To link to this article: https://doi.org/10.1081/STA-200056836

Published online: 02 Sep 2006.

Submit your article to this journal

Article views: 774

Citing articles: 70 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=lsta20
Communications in Statistics—Theory and Methods, 34: 1177–1182, 2005
Copyright © Taylor & Francis, Inc.
ISSN: 0361-0926 print/1532-415X online
DOI: 10.1081/STA-200056836

Regression Analysis and Time Series

Choosing Ridge Parameter for Regression Problems

GHADBAN KHALAF1 AND GHAZI SHUKUR2


1
Department of Mathematics, King Khalid University, Saudi Arabia
2
Departments of Economics & Statistics, Jönköping University and
Växjö University, Sweden

Hoerl and Kennard (1970a) introduced the ridge regression estimator as


an alternative to the ordinary least squares estimator in the presence of
multicollinearity. In this article, a new approach for choosing the ridge parameter
(K), when multicollinearity among the columns of the design matrix exists, is
suggested and evaluated by simulation techniques, in terms of mean squared errors
(MSE). A number of factors that may affect the properties of these methods have
been varied. The MSE from this approach has shown to be smaller than using Hoerl
and Kennard (1970a) in almost all situations.

Keywords Multicollinearity; Ridge Regression; Simulations.

Mathematics Subject Classification 67J07.

1. Introduction
In situations where the explanatory variables are highly intercorrelated in multiple
regression analysis, one may not assess decisive answers to the questions one poses.
This is because the standard errors are very high or the t-ratios are very low. This
situation is referred to as the problem of multocollinearity. One of the solutions
to this problem is to use the so-called ridge regression proposed by Hoerl and
Kennard (1970a). The authors proposed ridge regression estimators to deal with the
problem of multicollinearity, and since then, numerous papers have been written,
either suggesting different ways of estimating the ridge parameter, comparing ridge
with least squares (LS), or discussing the merits of ridge regression techniques.
Among many ridge estimators that have been advocated, the estimator Hoerl
et al. (1975) performed fairly well in many simulation studies comparing ridge
estimators among themselves and with the LS estimators (see the above sited
references).

Received December 18, 2002; Accepted November 5, 2004


Address correspondence to Ghazi Shukur, Departments of Statistics, Jönköping
University, Sweden; E-mail: ghazi.shukur@ehv.vxu.se

1177
1178 Khalaf and Shukur

In this article we introduce an alternative ridge estimator and study its


performance by means of simulation techniques. Comparisons are made with other
ridge-type estimators evaluated elsewhere, and the estimators to be included in this
study are described in Sec. 2. In Sec. 3 we illustrate the simulation technique that
we have adopted in the study. The results of the simulations appear in the tables in
the Appendix. However, we discuss these results in Sec. 4. In Sec. 5, we give a brief
summary and conclusions.

2. The Estimators
This section consists of two parts. In Subsec. 1, we introduce the notations, the
linear model used here, and the necessary background. The estimator we suggest
together with some formulas for determining the ridge parameters are given in
Subsec. 2.

2.1. Some Background


Consider the multiple regression model:

Y = XB + e (1)

where Y is an n × 1 vector of observations, B is a p × 1 vector of regression


coefficients, X is an n × p design matrix, and finally, e in an n × 1 vector of
random variables which are distributed as N0 2 In , In is the identity matrix of
order n.
We are primarily interested in the estimation B. The usual LS estimator of
B depends heavily on characteristics of the matrix X X. If the X X matrix is
ill-conditioned, the LS estimators are sensitive to a number of errors, namely,
there is an “explosion” of the sampling variance of the estimators and meaningful
statistical inference becomes impossible for practitioners. To overcome this problem,
Hoerl and Kennard (1970a) suggested the use of X X + KIp  K ≥ 0 rather than
X X, in the estimation B. The resulting estimators of B are known in the literature
as the ridge regression estimators, given by:

ˆ = X X + KIp −1 X Y

Although these estimators are biased, under certain conditions the reduction in
variance overpowers the increase in squared bias and, therefore, these estimators
dominate, in the sense of mean squared error (MSE), the LS estimator of the vector
of unknown regression coefficients.
Attempts have been made to reduce the bias of ridge regression estimators.
Vinod and Ullah (1981, p. 306) introduced a general class of improved biased
estimators of regression coefficients. Singh et al. (1986) used the jack-knife
procedure to reduce the bias of the ridge estimator. They proposed the almost
unbiased generalized ridge regression estimator. Kadiyala (1984) proposed a class
of almost unbiased shrinkage estimators of regression coefficients. However,
these estimators are not operational since the bias correction term includes
unknown parameters.
Choosing Ridge Parameter for Regression Problems 1179

2.2. The New Suggested Method


Let us assume that the model (1) is taken to be in canonical form, so that X X = Tp ,
where Tp is a p × p diagonal matrix with the ith diagonal element ti > 0. The LS
estimator of the ith element ob B (say Bi ) is:

X Y
ˆ i = i  i = 1 2     p (2)
ti

where Xi is the ith column vector of X. The version of the generalized ridge
regression estimator suggested by Hoerl and Kennard (1970a,b) is written as:
 
ti
˜ i = ˆ i  i = 1 2     p (3)

ti + K

where ˆ = Tp−1 X Y.
The authors showed that a sufficient condition ˜ i has smaller MSE than that
ˆ
of :

 < K ∗  where K ∗ = S 2 /ˆ max


0<K 2
 (4)

and ˆ max
2
is the largest element of B. Here, S 2 is the usual estimate of 2 , defined by:

ˆ Y − X/n
S 2 = Y − X ˆ − p − 1

We will henceforth refer to the estimator (4) as HK.


The second is our new estimate of the optimal shrinkage parameter K ∗ , given in
(4), as a single quantity rather than to estimate 2 and B2 separately. The suggested
estimator is:

K ∗∗ = tmax S 2 /n − p − 1S 2 + tmax ˆ max


2
 (5)

where tmax is the largest eigenvalue of the matrix X X, and 0 < K ∗∗ <  (McDonald
and Galarneau, 1975). For our suggested estimator, defined by (5), we use the
acronym KS (where KS denotes the method suggested by Kalaf and Shukur).
The modification we suggest is accomplished by adding the amount S 2 /tmax  to
the denominator of Eq. (4) which is a function of the correlation between the
independent variables. This amount, however, varies with the size of the used
sample (i.e., the number of observations), and, finally, to keep this kind of variation
fixed, we multiply the S 2 /tmax  by the number of degrees of freedom n − p − 1.
Processing in this manner, we can separate out the variation caused by the size
of the sample and keep the variation that only depends on the strength of the
multicollinearity. This leads to the denominator of the alternative estimator given
by (5) being greater than that of Hoerl and Kennard by n − p − 1S 2 /tmax . Our
approach is examined by mean of simulation techniques which we present in the
next section.
1180 Khalaf and Shukur

3. The Simulation Study


In this section we present our Monte Carlo study regarding the properties of the LS,
the ridge with K ∗ parameter, and the ridge with our K ∗∗ parameter. These properties
will then be compared in the sense of MSEs. A number of factors can affect
these properties. The sample size T, degree of correlation between the explanatory
variables, and the error variances are three such factors. In this article we will study
the consequences of varying T , degree of correlation, and the error variances, while
the values of the parameters are chosen to be ones. In our model, we include only
two explanatory variables. We also mainly concentrate on the case where the errors
and the explanatory variables are normally distributed.
The design of a good Monte Carlo study is dependent on (i) what factors are
expected to affect the properties of the tests/estimators under investigation and (ii)
what criteria are being used to judge the results. The first question was treated
above, and we will return to the values used in our experiment shortly. The second
question also needs to be looked at, however.
The properties of these estimates will be compared in terms of MSE, Firstly, we
relate (by a ratio) the MSE of the MSE from the first ridge method using the K ∗ to
the MSE from the LS estimates. Secondly, we relate the MSE from the second ridge
estimate using K ∗∗ to that of the LS. Finally, we also relate those MSEs obtained
by using the K ∗ and K ∗∗ , respectively. To compare between these three methods, we
prefer those/that who give the smallest ratio.
Now, our primary interest lies in the investigating the properties of our
proposed approach to minimize the MSE, and thus the different degree of
correlation between the variables included in the model has been used. We put these
values equals to 0.7, 0.8, 0.9, 0.95, and 0.99. These values will cover a wide range of
moderate and strong correlation between the variables. The number of equations to
be estimated is of central importance.
The variance of the error term can, of course, take an infinite number of forms.
We, however, enforced the errors to have low and fairly high variances equal to 1
and 10. To investigate the effect of sample sizes, we used samples of the size 50, 75,
and 100, which may cover situations of small, moderate, and large samples.

4. The Results
In this section we present the results of our Monte Carlo experiment concerning
the properties of our proposed approach for choosing the ridge parameter K,
when multicollinearity among the columns of the design matrix exists. Our primary
interest lies in comparing the MSEs of two suggested methods for choosing the
ridge parameter K that are used in this study, i.e., the HK and KS. The results of
our study are presented in the following table. This comparison is mainly done by
calculating ratios between; first, the MSE by HK to the MSE by LS; second, the
MSE by KS to the MSE by LS; third, and finally, the MSE by KS to the MSE by
HK. Processing in this manner, we consider the method that lead to the minimum
ratio to be the best from the MSE point of view.
Looking at the first part of this table (i.e., when the error variances 2 = 1
is chosen to be low, we can see that both of the HK and KS are better than the
LS, and that the KS produce somewhat lower MSEs than the HK. The results also
reveal that in small samples, i.e., when the number of observations is equal to 50,
Choosing Ridge Parameter for Regression Problems 1181

Table 1
Ratios of MSEs between the two ridge methods and the LS, and between the two
ridge methods together
Correlation 2 = 1 2 = 10
between Number of
x1 and x2 observations HK/LS KS/LS KS/HK HK/LS KS/LS KS/HK
0.70 50 0050 0050 0.997 1.021 0.048 0.047
75 0191 0171 0.894 3.111 0.162 0.052
100 0555 0519 0.935 4.060 0.503 0.124
0.80 50 0050 0050 0.995 1.028 0.048 0.047
75 0195 0171 0.878 3.041 0.161 0.053
100 0562 0519 0.925 4.072 0.499 0.123
0.90 50 00505 0050 0.993 1.027 0.048 0.047
75 0198 0171 0.863 2.961 0.159 0.054
100 0569 0521 0.915 4.052 0.496 0.122
0.95 50 00505 00501 0.992 1.025 0.048 0.047
75 0199 0171 0.856 2.919 0.159 0.054
100 0573 0521 0.909 4.034 0.495 0.123
0.99 50 0051 0050 0.991 1.023 0.048 0.047
75 0201 0171 0.850 2.885 0.159 0.055
100 0575 0521 0.906 4.017 0.494 0.123

the HK and KS perform extremely better than the LS and that the difference
between then together is almost minimal. On the other hand, when the sample sizes
become larger, the ratios also become larger which means that the performance of
the LS become better but never superior to the HK and KS.
When looking at the second part of the table when the error variances is quite
large, i.e., when 2 = 10, we see that the KH method behaves poorer than the LS,
especially in large samples. Our suggested method, on the other hand, performs
much better than both of the others. This result may, however, show the good
performance of our method and its robustness against some situations where the
other methods behave badly.

5. Brief Summary and Conclusions


In this article we have studied the properties of a new proposed approach for
choosing the ridge parameter K, when multicollinearity among the columns of the
design matrix exists. The investigation has been done using Monte Carlo methods
where in addition to the different multicollinearity levels, the number of observation
and the error variances have been varied. For each combination, we have used
10,000 replications. The evaluation of our method has been done by comparing the
MSEs between our proposed method and that of Hoerl and Kennard (1970a). We
find that our method outperform the other in almost all cases. The properties of the
other method become poorer in situations where the error variance is high, where
as our method remains robust.
1182 Khalaf and Shukur

References
Hoerl, A. E., Kennard, R. W. (1970a). Ridge regression: biased estimation for non-
orthogonal problems. Tech. 12:55–67.
Hoerl, A. E., Kennard, R. W. (1970b). Ridge regression: application to non-orthogonal
problems. Tech. 12:69–82.
Hoerl, A. E., Kennard, R. W., Baldwin, K. F. (1975). Ridge regression: some simulations.
Commun. Statist. A(4):105–123.
Kadiyala, K. (1984). A class of almost unbiased and efficient estimators of regression
coefficients. Econ. Lett. 16:293–296.
McDonald, G. C., Galarneau, D. I. (1975). A Monte-Carlo evaluation of some ridge-type
estimators. JASA 70:407–416.
Singh, B., Chaubey, Y. P., Dwivede, T. D. (1986). An almost unbiased estimators. Sankhya
48:342–346.
Vinod, H. D., Ullah, A. (1981). Recent Advances in Regression Methods. New York:
Marcel Dekker Inc.

You might also like