Professional Documents
Culture Documents
2005
2005
com/loi/lsta20
To cite this article: Ghadban Khalaf & Ghazi Shukur (2005) Choosing Ridge Parameter for
Regression Problems, , 34:5, 1177-1182, DOI: 10.1081/STA-200056836
1. Introduction
In situations where the explanatory variables are highly intercorrelated in multiple
regression analysis, one may not assess decisive answers to the questions one poses.
This is because the standard errors are very high or the t-ratios are very low. This
situation is referred to as the problem of multocollinearity. One of the solutions
to this problem is to use the so-called ridge regression proposed by Hoerl and
Kennard (1970a). The authors proposed ridge regression estimators to deal with the
problem of multicollinearity, and since then, numerous papers have been written,
either suggesting different ways of estimating the ridge parameter, comparing ridge
with least squares (LS), or discussing the merits of ridge regression techniques.
Among many ridge estimators that have been advocated, the estimator Hoerl
et al. (1975) performed fairly well in many simulation studies comparing ridge
estimators among themselves and with the LS estimators (see the above sited
references).
1177
1178 Khalaf and Shukur
2. The Estimators
This section consists of two parts. In Subsec. 1, we introduce the notations, the
linear model used here, and the necessary background. The estimator we suggest
together with some formulas for determining the ridge parameters are given in
Subsec. 2.
Y = XB + e (1)
Although these estimators are biased, under certain conditions the reduction in
variance overpowers the increase in squared bias and, therefore, these estimators
dominate, in the sense of mean squared error (MSE), the LS estimator of the vector
of unknown regression coefficients.
Attempts have been made to reduce the bias of ridge regression estimators.
Vinod and Ullah (1981, p. 306) introduced a general class of improved biased
estimators of regression coefficients. Singh et al. (1986) used the jack-knife
procedure to reduce the bias of the ridge estimator. They proposed the almost
unbiased generalized ridge regression estimator. Kadiyala (1984) proposed a class
of almost unbiased shrinkage estimators of regression coefficients. However,
these estimators are not operational since the bias correction term includes
unknown parameters.
Choosing Ridge Parameter for Regression Problems 1179
X Y
ˆ i = i i = 1 2 p (2)
ti
where Xi is the ith column vector of X. The version of the generalized ridge
regression estimator suggested by Hoerl and Kennard (1970a,b) is written as:
ti
˜ i = ˆ i i = 1 2 p (3)
ti + K
where ˆ = Tp−1 X Y.
The authors showed that a sufficient condition ˜ i has smaller MSE than that
ˆ
of :
and ˆ max
2
is the largest element of B. Here, S 2 is the usual estimate of 2 , defined by:
ˆ Y − X/n
S 2 = Y − X ˆ − p − 1
where tmax is the largest eigenvalue of the matrix X X, and 0 < K ∗∗ < (McDonald
and Galarneau, 1975). For our suggested estimator, defined by (5), we use the
acronym KS (where KS denotes the method suggested by Kalaf and Shukur).
The modification we suggest is accomplished by adding the amount S 2 /tmax to
the denominator of Eq. (4) which is a function of the correlation between the
independent variables. This amount, however, varies with the size of the used
sample (i.e., the number of observations), and, finally, to keep this kind of variation
fixed, we multiply the S 2 /tmax by the number of degrees of freedom n − p − 1.
Processing in this manner, we can separate out the variation caused by the size
of the sample and keep the variation that only depends on the strength of the
multicollinearity. This leads to the denominator of the alternative estimator given
by (5) being greater than that of Hoerl and Kennard by n − p − 1S 2 /tmax . Our
approach is examined by mean of simulation techniques which we present in the
next section.
1180 Khalaf and Shukur
4. The Results
In this section we present the results of our Monte Carlo experiment concerning
the properties of our proposed approach for choosing the ridge parameter K,
when multicollinearity among the columns of the design matrix exists. Our primary
interest lies in comparing the MSEs of two suggested methods for choosing the
ridge parameter K that are used in this study, i.e., the HK and KS. The results of
our study are presented in the following table. This comparison is mainly done by
calculating ratios between; first, the MSE by HK to the MSE by LS; second, the
MSE by KS to the MSE by LS; third, and finally, the MSE by KS to the MSE by
HK. Processing in this manner, we consider the method that lead to the minimum
ratio to be the best from the MSE point of view.
Looking at the first part of this table (i.e., when the error variances 2 = 1
is chosen to be low, we can see that both of the HK and KS are better than the
LS, and that the KS produce somewhat lower MSEs than the HK. The results also
reveal that in small samples, i.e., when the number of observations is equal to 50,
Choosing Ridge Parameter for Regression Problems 1181
Table 1
Ratios of MSEs between the two ridge methods and the LS, and between the two
ridge methods together
Correlation 2 = 1 2 = 10
between Number of
x1 and x2 observations HK/LS KS/LS KS/HK HK/LS KS/LS KS/HK
0.70 50 0050 0050 0.997 1.021 0.048 0.047
75 0191 0171 0.894 3.111 0.162 0.052
100 0555 0519 0.935 4.060 0.503 0.124
0.80 50 0050 0050 0.995 1.028 0.048 0.047
75 0195 0171 0.878 3.041 0.161 0.053
100 0562 0519 0.925 4.072 0.499 0.123
0.90 50 00505 0050 0.993 1.027 0.048 0.047
75 0198 0171 0.863 2.961 0.159 0.054
100 0569 0521 0.915 4.052 0.496 0.122
0.95 50 00505 00501 0.992 1.025 0.048 0.047
75 0199 0171 0.856 2.919 0.159 0.054
100 0573 0521 0.909 4.034 0.495 0.123
0.99 50 0051 0050 0.991 1.023 0.048 0.047
75 0201 0171 0.850 2.885 0.159 0.055
100 0575 0521 0.906 4.017 0.494 0.123
the HK and KS perform extremely better than the LS and that the difference
between then together is almost minimal. On the other hand, when the sample sizes
become larger, the ratios also become larger which means that the performance of
the LS become better but never superior to the HK and KS.
When looking at the second part of the table when the error variances is quite
large, i.e., when 2 = 10, we see that the KH method behaves poorer than the LS,
especially in large samples. Our suggested method, on the other hand, performs
much better than both of the others. This result may, however, show the good
performance of our method and its robustness against some situations where the
other methods behave badly.
References
Hoerl, A. E., Kennard, R. W. (1970a). Ridge regression: biased estimation for non-
orthogonal problems. Tech. 12:55–67.
Hoerl, A. E., Kennard, R. W. (1970b). Ridge regression: application to non-orthogonal
problems. Tech. 12:69–82.
Hoerl, A. E., Kennard, R. W., Baldwin, K. F. (1975). Ridge regression: some simulations.
Commun. Statist. A(4):105–123.
Kadiyala, K. (1984). A class of almost unbiased and efficient estimators of regression
coefficients. Econ. Lett. 16:293–296.
McDonald, G. C., Galarneau, D. I. (1975). A Monte-Carlo evaluation of some ridge-type
estimators. JASA 70:407–416.
Singh, B., Chaubey, Y. P., Dwivede, T. D. (1986). An almost unbiased estimators. Sankhya
48:342–346.
Vinod, H. D., Ullah, A. (1981). Recent Advances in Regression Methods. New York:
Marcel Dekker Inc.