Professional Documents
Culture Documents
Computation
Qinggang Wang, John J. Koval, Catherine A. Mills & Kang-In David Lee
To cite this article: Qinggang Wang, John J. Koval, Catherine A. Mills & Kang-In David Lee (2007)
Determination of the Selection Statistics and Best Significance Level in Backward Stepwise
Logistic Regression, Communications in Statistics - Simulation and Computation, 37:1, 62-72,
DOI: 10.1080/03610910701723625
Regression Analysis
1. Introduction
A common problem in statistical analysis is the selection of those independent
(predictor) variables in a regression model that might influence the outcome
variable. The process of selecting a subset variables from a large number of variables
is called model-building. One of the purposes of model-building in logistic regression
is prediction. In epidemiologic studies, stepwise logistic regression has been widely
62
Backward Stepwise Logistic Regression 63
used for model-building. This procedure involves selection and stopping criteria,
and the stepwise approach is a method in which variables are selected either for
inclusion or exclusion from the model in a sequential fashion (Draper and Smith,
1998; Hosmer and Lemeshow, 2000). There are many variations on this approach
but the three main versions of the stepwise procedure are: forward selection (FS),
backward elimination (BE), and Efroymson’s procedure (a combination of FS and
BE; Efroymson, 1960). For the forward selection method, Lee and Koval (1997)
showed that the overall best varied from 0.05–0.40. In this article, we will consider
backward elimination. Three selection criteria, namely the Log-likelihood ratio
statistic, Score statistic (Rao, 1973), and Wald’s statistic (Wald, 1943), were used
with a standard stopping criterion 2 test based on a fixed level. Levels of can
vary from = 1 (all predictors in the logistic model) to = 0 (no predictors in the
logistic model). The usual conventional value for has been 0.05. Monte Carlo
simulations with multivariate normal distribution for predictor variables were used
to determine the best level for the stopping criterion and the difference of three
selection criteria for backward stepwise logistic regression in terms of estimated true
error rate of prediction (ERR).
2. Performance Criterion
In this study, our aim is to predict the outcome accurately; interpretation of
coefficients is secondary. The estimated true error rate of prediction (ERR) is the
performance criterion and defined as:
ˆ
ERR = ARR +
where ARR is the apparent error rate of prediction and ˆ denotes an estimate
for the bias of ARR. The apparent error rate (Hills, 1966) is estimated by the
resubstitution method; this tends to underestimate the true error rate because the
data is used twice (Glick, 1972, 1973).
There are several nonparametric methods of estimating bias including cross-
validation, jackknifing, and bootstrapping (Efron, 1983; Gong, 1986). However, for
all three nonparametric methods, estimates of bias must be computed at each step
in the backward stepwise procedure, and may require substantial computing time.
These methods of estimating are not used in our study.
Efron (1986) derived the parametric estimator for the bias of ARR in the
general exponential family linear model including the logistic regression. Since
estimates of using Efron’s formula may easily be calculated at each step in
backward stepwise logistic regression, Efron’s estimator was used in this study.
These statistics are the Log-likelihood ratio statistic (LR), Rao’s Score statistic (SC),
and Wald’s statistic (WD).
Suppose that there are k variables in the model and the kth variable is to
be considered for exclusion. Then the components of may be partitioned as
= k−1 k . The hypotheses of interest are H0 k = 0 and H1 k = 0. Let ˆ 0
denote the maximum likelihood estimation under the null hypothesis, and ˆ 1 denote
the maximum likelihood estimation under the alternative hypothesis, that is, ˆ 0 =
ˆ k−1 0 and ˆ 1 = ˆ k−1 ˆ k .
The LR is
LR = 2 Lˆ 1 − Lˆ 0
= 2 Lˆ k−1 ˆ k − Lˆ k−1 0
WD = ˆ k Ck×k
−1 ˆ 1 ˆ
k
i = aV i−1 + for i = 1 2 P
where
09P1 − V/1 − V P if 0 < V < 1
a=
1 − if V = 1
∗
i = bDi−1 1/2 for i = 1 2 P and 0 < D ≤ 1
where
M1 − D/1 − DP if 0 < D < 1
b=
M/P if D = 1
Table 1
Values of the factors in the response surface design for the
multivariate normal case
Level (code)
Low star Low factorial Center High factorial High star
Factor −2 −1 (0) (1) (2)
P 5 10 15 20 25
V 0.2 0.4 0.6 0.8 1.0
M 1.0 1.5 2.0 2.5 3.0
D 0.375 0.5 0.625 0.75 0.875
N 100 200 300 400 500
Table 2
An illustration example for the backward stepwise logistic regression for one sampling stiuation: P = 5, V = 06,
M = 20, D = 0625, and N = 300
Full model Removal Step 1 Removal Step 2 Removal Step 3 Removal Step 4 Removal Step 5
Predictors: coeff. coeff. p-value coeff. p-value coeff. p-value coeff. p-value coeff. p-value
X1 0.8923 0.8923 <0.001 0.8886 <0.001 0.8947 <0.001 0.9055 <0.001 1.1274 <0.001
X2 0.6576 06576 <0.001 0.6519 <0.001 0.6587 <0.001 0.7960 <0.001
X3 0.2141 02141 0.125 0.2228 0.046 0.2997 0.016
X4 0.1129 0.1129 0.410
X5 0.0944 0.0944 0.398 0.1587 0.202
Variable
Removed X4 X5 X3 X2 X1
Backward Stepwise Logistic Regression
based on this factor combination was investigated to determine the best level
2
for the stopping criterion. The Log-likelihood ratio criterion was used as
selection criterion. As we can see in this example, the full model does not have the
better prediction performance than the reduced model. The minimal ERR, 0.2278,
occurred when X4 was removed from the full model. In this case, the best would
be 0.40, which is the next lower possible -value when the p-value is 0.410.
Table 3
The best levels for 48 sampling situations in the multivariate
normal case for LR
Sampling Sampling
situation ERR situation ERR
1 0.2633 0.316 25 0.2652 0.318
2 0.2582 0.263 26 0.2648 0.276
3 0.2657 0.318 27 0.2688 0.340
4 0.2571 0.240 28 0.2658 0.272
5 0.2085 0.303 29 0.2123 0.311
6 0.2056 0.251 30 0.2096 0.269
7 0.2095 0.298 31 0.2148 0.343
8 0.2077 0.259 32 0.2119 0.269
9 0.2646 0.307 33 0.2350 0.263
10 0.2593 0.239 34 0.2351 0.280
11 0.2699 0.303 35 0.2361 0.252
12 0.2599 0.270 36 0.2359 0.256
13 0.2112 0.300 37 0.2358 0.281
14 0.2059 0.259 38 0.2353 0.288
15 0.2134 0.317 39 0.2381 0.403
16 0.2071 0.259 40 0.2355 0.272
17 0.2674 0.317 41 0.3000 0.285
18 0.2640 0.248 42 0.2329 0.269
19 0.2692 0.322 43 0.2246 0.256
20 0.2649 0.266 44 0.2315 0.240
21 0.2112 0.303 45 0.2378 0.292
22 0.2094 0.262 46 0.1881 0.274
23 0.2139 0.299 47 0.2357 0.262
24 0.2100 0.235 48 0.2361 0.262
Backward Stepwise Logistic Regression 69
the effect of EPV, we conducted two sub-studies, one for P = 5 and one for P =
25, combining with other four factors have the same values as in the full study, as
indicated in the last four rows of Table 1. For P = 5, the sample size N is chosen to
40, 50, 100, 200, and 400, and for P = 25, the sample size N is 200, 250, 500, 1,000,
and 2,000, so that EPV will be 1:4, 1:5, 1:10, 1:20, and 1:40 for both of P = 5 and
P = 25. We still used the second-order central response surface design strategy for
these two sub-studies.
Table 4
Response surface analysis of the best levels for P V M D, and N
in multivariate normal case for LR
Parameter d.f. ˆ s.e t-value p-value
P 1 −00222 0.0045 −500 <.0001
V 1 −01437 0.1112 −129 0.2074
M 1 −00689 0.0494 −140 0.1740
D 1 −02008 0.2199 −091 0.3692
N 1 5.76E-5 0.0002 0.26 0.7975
P2 1 0.0005 9.24E-5 5.73 <.0001
V2 1 0.0822 0.0577 1.42 0.1659
M2 1 0.0109 0.0092 1.19 0.2463
D2 1 −00528 0.1478 −036 0.7235
N2 1 −2.38E-7 2.31E-7 −103 0.3124
PV 1 −00019 0.0023 −085 0.4001
PM 1 0.0007 0.0009 0.77 0.4484
PD 1 0.0012 0.0036 0.34 0.7390
PN 1 −2.21E-6 4.50E-6 −049 0.6269
VM 1 −00077 0.0225 −034 0.7338
VD 1 0.1518 0.0900 1.69 0.1034
VN 1 2.62E-5 0.0001 0.23 0.8179
MD 1 0.0407 0.0360 1.13 0.2683
MN 1 −3.68E-5 4.50E-5 −082 0.4206
DN 1 0.0003 0.0002 1.93 0.0644
Factor d.f. SSR MSE F -value p-value
P 6 0.0419 0.0070 43.02 <.0001
V 6 0.0012 0.0002 1.25 0.3107
M 6 0.0009 0.0001 0.93 0.4891
D 6 0.0018 0.0003 1.84 0.1266
N 6 0.0016 0.0003 1.64 0.1733
Lack-of-fit test F225 = 067 (p-value = 0.7683).
70 Wang et al.
five factors P V M D, and N on the best level; and (3) to compare three selection
criteria: LR, SC, and WD. SAS/IML 9.1 was used for all the computer programs
(SAS/IML Software, 2004; SAS/STAT, 2004).
Table 3 gives the best levels of the Log-likelihood ratio criterion for 48 sampling
situations where the best level for each sampling situation is defined to be the mean
of levels for which the ERR is a minimum over 300 replications. The best level
from LR is between 0.24 and 0.40 which is identical to the Score test. However, the
best level from the Wald’s test is different from LR and SC with 024 < < 031.
The results of the analysis of the response surface design for the Log-likelihood
ratio criterion are given in Table 4. It was assumed that the effects of third-
order and fourth-order interactions could be ignored (Montgomery, 2005). Since
the lack-of-fit test is not significant p-value = 0.7683), it may be concluded that
the quadratic surface fits the data well. The main effect of factor P is statistically
significant (p < 0001). Figure 1 (the best level against P for three selection
criteria) shows that the best for the Score test is almost identical to the Log-
likelihood ratio, but not for the Wald’s test.
The results obtained from two sub-studies of P = 5 and P = 25 showed that
EPV does not have a strong effect on the choice of the best . In the response surface
analysis, although the effect for factor N from is significant in the P = 25 case, the
range of values for P = 5 and P = 25 are very narrow, that is, 035 ≤ ≤ 038 and
023 ≤ ≤ 027, respectively. This confirms the finding by Derksen and Keselman
(1992) that the sample size (or the number of events) is of little practical importance
for selection algorithms. However, as suggested by Hosmer and Lemeshow (2000)
and Peduzzi et al. (1996), we still recommend the use of EPV values of 10 or greater
in logistic regression.
Figure 1. The Best levels over the number of independent (predictor) variables P for the
LR, SC and WD selection criteria in the multivariate normal case.
Backward Stepwise Logistic Regression 71
Acknowledgments
This study was supported by a postgraduate scholarship from the Natural Sciences
and Engineering Research Council (Canada), and by grants from the National
Cancer Institute of Canada (015046), and the Natural Sciences and Engineering
Research Council (Canada) (9280-03). We thank the editor and referees for their
comments which significantly improved the quality of this article.
72 Wang et al.
References
Ambler, G., Brady, A. R., Royston, P. (2002). Simplifying a prognostic model: a simulation
study based on clinical data. Statistics in Medicine 21:3803–3822.
Bendel, R. B., Afifi, A. A. (1977). Comparison of stopping rules in forward stepwise
regression. Journal of the American Statistical Association 72(357):46–53.
Cochran, W. G., Cox, G. M. (1957). Experimental Designs. 2nd ed. New York: Wiley.
Costanza, M. C., Afifi, A. A. (1979). Comparison of stopping rules in forward stepwise
discriminant analysis. Journal of the American Statistical Association 74(368):777–785.
Derksen, S., Keselman, H. J. (1992). Backward, forward and stepwise automated subset
selection algorithms: frequency of obtaining authentic and noise variables. British Journal
of Mathematical and Statistical Psychology 45:265–282.
Draper, N. R., Smith, H. (1998). Applied Regression Analysis. 3rd ed. New York: Wiley.
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-
validation. Journal of the American Statistics Association 78:316–331.
Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the
American Statistical Association 81(394):461–470.
Efroymson, M. A. (1960). Mathematical methods for digital computers. In: Ralston, A.,
Wilf, H. S., eds. Multiple Regression Analysis. New York: Wiley.
Glick, N. (1972). Sampled-based classification procedures derived from density estimators.
Journal of the American Statistics Association 67:116–122.
Glick, N. (1973). Sampled-based multinomial classification. Biometrics 29(1):241–256.
Gong, G. (1986). Cross-validation, the jackknife, and the bootstrap: excess error
estimation in forward logistic regression. Journal of the American Statistical Association
81(393):108–113.
Harrell, F. E., Lee, K. L., Califf, R. M., Pryor, D. B., Rosati, R. A. (1984). Regression
modelling strategies for improved prognostic prediction. Statistics in Medicine 3:143–152.
Hauck, W. W., Donner, A. (1977). Wald’s test as applied to hypotheses in logit analysis.
Journal of the American Statistical Association 72(360):851–853.
Hills, M. (1966). Allocation rules and their error rates. Journal of the Royal Statistical Society
Series B 28:1–20.
Hosmer, D. W., Lemeshow, S. (2000). Applied Logistic Regression. 2nd ed. New York: Wiley.
Lee, K. I., Koval, J. J. (1997). Determination of the best significance level in forward step-wise
logistic regression. Communications in Statistics—Simulation and Computation 26(2):559–575.
Montgomery, D. C. (2005). Design and Analysis of Experiments. 6th ed. New York: Wiley.
Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., Feinstein, A. R. (1996). A simulation
study of the number of events per variable in logistic regression analysis. Journal of
Clinical Epidemiology 49:1503–1510.
Qaqish, B. F. (2003). A family of multivariate binary distributions for simulating correlated
binary variables with specified marginal means and correlations. Biometrika 90:455–463.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications. 2nd ed. New York: Wiley.
Roecker, E. B. (1991). Prediction error and its estimation for subset-selected models.
Technometrics 33:459–468.
SAS/IML Software. (2004). User’s Guide, SAS Institute Inc. Version 9.1, Cary, NC, USA.
SAS/STAT. (2004). User’s Guide, SAS Institute Inc. Version 9.1 Cary, NC, USA.
Steyerberg, E. W., Eijkemans, M. J. C., Harrell, J., F. E., Habbema, J. D. F. (2000).
Prognostic modeling with logistic regression analysis: a comparison of selection and
estimation methods in small data sets. Statistics in Medicine 19:1059–1079.
Steyerberg, E. W., Eijkemans, M. J. C., Harrell, J., F. E., Habbema, J. D. F. (2001).
Prognostic modeling with logistic regression analysis: in search of a sensible strategy in
small data sets. Medical Decision Making 21:45–56.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the
number of observations is large. Transactions of the American Mathematical Society
54:426–482.