You are on page 1of 12

Communications in Statistics - Simulation and

Computation

ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: https://www.tandfonline.com/loi/lssp20

Determination of the Selection Statistics and Best


Significance Level in Backward Stepwise Logistic
Regression

Qinggang Wang, John J. Koval, Catherine A. Mills & Kang-In David Lee

To cite this article: Qinggang Wang, John J. Koval, Catherine A. Mills & Kang-In David Lee (2007)
Determination of the Selection Statistics and Best Significance Level in Backward Stepwise
Logistic Regression, Communications in Statistics - Simulation and Computation, 37:1, 62-72,
DOI: 10.1080/03610910701723625

To link to this article: https://doi.org/10.1080/03610910701723625

Published online: 03 Jan 2008.

Submit your article to this journal

Article views: 1324

View related articles

Citing articles: 24 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=lssp20
Communications in Statistics—Simulation and Computation® , 37: 62–72, 2008
Copyright © Taylor & Francis Group, LLC
ISSN: 0361-0918 print/1532-4141 online
DOI: 10.1080/03610910701723625

Regression Analysis

Determination of the Selection Statistics


and Best Significance Level in Backward
Stepwise Logistic Regression

QINGGANG WANG1 , JOHN J. KOVAL1 ,


CATHERINE A. MILLS1 , AND KANG-IN DAVID LEE2
1
Department of Epidemiology and Biostatistics, University of Western
Ontario, London, Ontario, Canada
2
ApoPharma Inc., Toronto, Ontario, Canada

The process of building a subset model in backward stepwise logistic regression


for the purpose of prediction relies on two separate criteria: selection criteria
and stopping criteria. SAS/IML programs were written to provide Monte Carlo
2
simulations to determine the best  level for the  stopping criterion and three
selection criteria: Log-likelihood ratio statistic (LR), Score statistic (SC), and Wald’s
statistic (WD). Performance was evaluated using Efron’s (1986) estimated true error
rate of prediction. In our study, we found that the best  varied around between 0.24
and 0.40. For the selection criteria, LR and SC,  significantly decreased with the
number of predictor variables, but for WD it did not. An overall recommendation is
that the LR or SC should be used as a selection criterion, and a stopping criterion
of 020 ≤  ≤ 040 should be used, with a further refinement that, with the fewer
variables, one should use a larger  level.

Keywords Backward stepwise; Error rate; Logistic regression; Selection criteria;


Stopping criteria.

Mathematics Subject Classification 62H30; 62J12.

1. Introduction
A common problem in statistical analysis is the selection of those independent
(predictor) variables in a regression model that might influence the outcome
variable. The process of selecting a subset variables from a large number of variables
is called model-building. One of the purposes of model-building in logistic regression
is prediction. In epidemiologic studies, stepwise logistic regression has been widely

Received November 6, 2006; Accepted April 27, 2007


Address correspondence to Qinggang Wang, Department of Epidemiology and
Biostatistics, University of Western Ontario, London, Ontario, Canada; E-mail: qingwang@
cancerboard.ab.ca

62
Backward Stepwise Logistic Regression 63

used for model-building. This procedure involves selection and stopping criteria,
and the stepwise approach is a method in which variables are selected either for
inclusion or exclusion from the model in a sequential fashion (Draper and Smith,
1998; Hosmer and Lemeshow, 2000). There are many variations on this approach
but the three main versions of the stepwise procedure are: forward selection (FS),
backward elimination (BE), and Efroymson’s procedure (a combination of FS and
BE; Efroymson, 1960). For the forward selection method, Lee and Koval (1997)
showed that the overall best  varied from 0.05–0.40. In this article, we will consider
backward elimination. Three selection criteria, namely the Log-likelihood ratio
statistic, Score statistic (Rao, 1973), and Wald’s statistic (Wald, 1943), were used
with a standard stopping criterion 2 test based on a fixed  level. Levels of  can
vary from  = 1 (all predictors in the logistic model) to  = 0 (no predictors in the
logistic model). The usual conventional value for  has been 0.05. Monte Carlo
simulations with multivariate normal distribution for predictor variables were used
to determine the best  level for the stopping criterion and the difference of three
selection criteria for backward stepwise logistic regression in terms of estimated true
error rate of prediction (ERR).

2. Performance Criterion
In this study, our aim is to predict the outcome accurately; interpretation of
coefficients is secondary. The estimated true error rate of prediction (ERR) is the
performance criterion and defined as:

ˆ
ERR = ARR + 

where ARR is the apparent error rate of prediction and  ˆ denotes an estimate
for the bias of ARR. The apparent error rate (Hills, 1966) is estimated by the
resubstitution method; this tends to underestimate the true error rate because the
data is used twice (Glick, 1972, 1973).
There are several nonparametric methods of estimating bias including cross-
validation, jackknifing, and bootstrapping (Efron, 1983; Gong, 1986). However, for
all three nonparametric methods, estimates of bias must be computed at each step
in the backward stepwise procedure, and may require substantial computing time.
These methods of estimating  are not used in our study.
Efron (1986) derived the parametric estimator for the bias of ARR in the
general exponential family linear model including the logistic regression. Since
estimates of  using Efron’s formula may easily be calculated at each step in
backward stepwise logistic regression, Efron’s estimator was used in this study.

3. Selection and Stopping Criteria


Once the maximum likelihood estimates have been obtained for a set of data, a
removal statistic based on three selection criteria is computed for each predictor
variable eligible for removal from the model. The predictor with the smallest value
of the removal statistic is removed from the model and the process is repeated until
the procedure stops. In this section, three test statistics, used to eliminate predictor
variables from the backward stepwise logistic regression model, will be discussed.
64 Wang et al.

These statistics are the Log-likelihood ratio statistic (LR), Rao’s Score statistic (SC),
and Wald’s statistic (WD).
Suppose that there are k variables in the model and the kth variable is to
be considered for exclusion. Then the components of  may be partitioned as
 = k−1  k  . The hypotheses of interest are H0 k = 0 and H1 k = 0. Let ˆ 0
denote the maximum likelihood estimation under the null hypothesis, and ˆ 1 denote
the maximum likelihood estimation under the alternative hypothesis, that is, ˆ 0 =
ˆ k−1  0 and ˆ 1 = ˆ k−1  ˆ k  .
The LR is
 
LR = 2 Lˆ 1  − Lˆ 0 
 
= 2 Lˆ k−1  ˆ k  − Lˆ k−1  0 

where L is the Log-likelihood function. The Score statistic is

SC = U  ˆ 0 I −1 ˆ 0 Uˆ 0 


= U  ˆ k−1  0I −1 ˆ k−1  0Uˆ k−1  0
 
where U =  L is the efficient score vector, and I = −E 
U is Fisher’s
information matrix. Wald’s statistic (WD) is defined as

WD = ˆ k Ck×k
−1 ˆ 1 ˆ
 k 

where C is the variance–covariance matrix of . ˆ


2
These three selection criteria were used with the standard stopping criterion,  ,
in backward stepwise logistic regression; 19 possible values of  were considered, from
0.95–0.05 in steps of 0.05. At each , ERR was calculated and the minimum value of
ERR determined over all ’s. The  for the minimum value of ERR was denoted as
m . The best  level is defined as the mean of the m ’s over all the 300 replications.

4. Monte Carlo Experimental Design


Real data sets are limited in their usefulness in that they restrict consideration of
the number of predictor variables in the model, the distribution of dependent and
predictor variables, and the effect of different sample sizes. A unique set of data may
also influence conclusions; hence the generalizability of the results may be limited.
Simulation studies can be used to obtain results over a wide range of sampling
situations.
The general design of the simulation experiments used in this article is described
by the following four steps.
Step 1. Generation of predictor variables, Xi  i = 1 2     N .
Xi  Y = 0, 
i = 1 2    n0 , and Xi  Y = 1, i = 1 2     n1 are generated from
populations 0 and 1 , respectively. Note that n0 = n1 = N/2.
Step 2. Computation of , ˆ the maximum likelihood estimates of .
ˆi , i = 0 1 2     k are obtained via iteratively reweighted least squares.

Step 3. Computation of estimated logistic probabilities, PrY = 1  Xi  = ˆ i X.


ˆ i X, i = 1 2     N , are computed using Xi from Step 1 and ˆ from Step 2.
Backward Stepwise Logistic Regression 65

Step 4. Generation of predicted dependent variable,  Yi .



Yi is equal to 0 if ˆ i X ≤ 1/2 and is equal to 1 if ˆ i X > 1/2.
In this study, Steps 1–4 were repeated 300 times.

4.1. Generation of Multivariate Normal Predictor Variables


Multivariate normal variables were generated using the reparameterization method
employed by Bendel and Afifi (1977), Costanza and Afifi (1979), and Lee and

Koval (1997). We assume that X ∼ NP 0  in population 0 and X ∼ NP    in

population 1 . Thus, and are reparametrized in terms of four factors, P V M,
and D, while a fifth factor, N , determines sample sizes.
The first factor, P, is the number of predictor variables in the model. Given a
value of P, we then specify a value for the second factor, V ∈ 0 1, and determine
the eigenvalues i of by means of the expression:

i = aV i−1 +  for i = 1 2     P

where

09P1 − V/1 − V P  if 0 < V < 1
a=
1 −  if V = 1

A value of  = 01 was chosen as a lower bound on the smallest eigenvalue


P in order to avoid the numerical difficulties encountered with nearly singular
matrices. The eigenvalues reflect the degree of interdependence among the predictor
variables; the predictor variables are highly interdependent near V = 0 and mutually
independent near V = 1. Since = EE , where E is the matrix of eigenvectors of
and  is the diagonal matrix of eigenvalues i , then once the values of i have been
specified, a random orthogonal matrix E can be generated and used to create .
 
The third factor, M, is the Mahalanobis distance between 0 and 1 . It may
be expressed as: M = 2 =  −1 , and describes the separation between the two
populations. The fourth factor, D, determines the elements i of the vector . As
D varies from 0 to 1, the rate of increase in M decreases as the number of included
variables increases from 1 to P. Let


i = bDi−1 1/2 for i = 1 2     P and 0 < D ≤ 1

where

M1 − D/1 − DP  if 0 < D < 1
b=
M/P if D = 1

Elements i of were then obtained from = R ∗ where = RR is the


Cholesky decomposition of .
To generate a P-variate observation X from NP ∼    P independent N0 1
values Z’s are first generated. The vector Z is then transformed to the required
vector X by X = + RZ.
66 Wang et al.

4.2. Second-Order Central Response Surface Design


The levels of five factors P V M D, and N must be specified. In this study, a second-
order central response surface design was used (Cochran and Cox, 1957) in a
manner similar to that used by Bendel and Afifi (1977), Costanza and Afifi (1979),
Lee and Koval (1997), and Roecker (1991). This design allows us for the fit of a
model with linear and quadratic effects in all factors and first-order interactions
between all factors. In addition, this design permitted the examination of five levels
of all five factors using only 48 sampling situations (combinations of levels). The
alternative of a 35 factorial design would evaluate only 3 levels of each factor yet
demand 243 sampling situations.
In the experimental design, each factor has five levels, which for convenience,
are taken to be spaced equally on suitable scales and coded as: −2 −1 0 +1 +2.
These five levels are termed low star, low factorial, center, high factorial, and high
star, respectively (Lee and Koval, 1997). The experimental design consists of 48
sampling situations of three types: (1) There are 25 = 32 factorial points, which are
all possible combinations of the levels ±1 for each factor; (2) There are six center
points of the form (0, 0, 0, 0, 0); and (3) There are 10 star points of the form
−2 0 0 0 0     0 0 0 0 +2. The levels of the five factors in the 48 sampling
situations used in this study are presented in Table 1.
The values of P from 5–25 represent a range from a small number of variables to
a large number of variables. This range was used by Lee and Koval (1997). The factor
V ranges from 0.2 (highly dependent) to 1.0 (perfectly independent) which provides
reasonable coverage of the possible dependence structures of the predictor variables
(Costanza and Afifi, 1979; Lee and Koval, 1997). The factor M ranges from 1.0–3.0
and represents close to well-separated distances between populations (Lee and Koval,
1997). The range of levels for the factor D from 0.375 (fast rate) to 0.875 (slow rate)
provides fair coverage of the rates of increase in M (Costanza and Afifi, 1979). N ,
which ranges from 100–500, represents a small sample to a large one. Also note that
two equal-sized samples were drawn from each population, that is, n0 = n1 = N/2.

4.3. Illustration of the Algorithm by One Sampling Situation


Table 2 illustrates the backward elimination method by one of 48 sampling
situations: P = 5 V = 06 M = 20 D = 0625, and N = 300. A generated data set

Table 1
Values of the factors in the response surface design for the
multivariate normal case
Level (code)
Low star Low factorial Center High factorial High star
Factor −2 −1 (0) (1) (2)
P 5 10 15 20 25
V 0.2 0.4 0.6 0.8 1.0
M 1.0 1.5 2.0 2.5 3.0
D 0.375 0.5 0.625 0.75 0.875
N 100 200 300 400 500
Table 2
An illustration example for the backward stepwise logistic regression for one sampling stiuation: P = 5, V = 06,
M = 20, D = 0625, and N = 300
Full model Removal Step 1 Removal Step 2 Removal Step 3 Removal Step 4 Removal Step 5
Predictors: coeff. coeff. p-value coeff. p-value coeff. p-value coeff. p-value coeff. p-value
X1 0.8923 0.8923 <0.001 0.8886 <0.001 0.8947 <0.001 0.9055 <0.001 1.1274 <0.001
X2 0.6576 06576 <0.001 0.6519 <0.001 0.6587 <0.001 0.7960 <0.001
X3 0.2141 02141 0.125 0.2228 0.046 0.2997 0.016
X4 0.1129 0.1129 0.410
X5 0.0944 0.0944 0.398 0.1587 0.202
Variable
Removed X4 X5 X3 X2 X1
Backward Stepwise Logistic Regression

ERR 0.2355 0.2278 0.2316 0.2357 0.2548 0.5008


67
68 Wang et al.

based on this factor combination was investigated to determine the best  level
2
for the  stopping criterion. The Log-likelihood ratio criterion was used as
selection criterion. As we can see in this example, the full model does not have the
better prediction performance than the reduced model. The minimal ERR, 0.2278,
occurred when X4 was removed from the full model. In this case, the best  would
be 0.40, which is the next lower possible -value when the p-value is 0.410.

4.4. Evaluation of the Effect of the Events Per Variable (EPV)


In logistic regression situations, the number of events is defined as either the number
of failures or the number of successes, whichever is the lower. This ratio of events
to predictors is called the events per variable (EPV). A common opinion is that
EPV should not be less than 10:1 in order to get reasonably stable estimates of
the regression coefficients (Hosmer and Lemeshow, 2000). Several authors were
concerned about the effect of EPV on the best  level for the stepwise algorithm
(Ambler et al., 2002; Harrell et al., 1984; Steyerberg et al., 2001). To investigate

Table 3
The best  levels for 48 sampling situations in the multivariate
normal case for LR
Sampling Sampling
situation ERR  situation ERR 
1 0.2633 0.316 25 0.2652 0.318
2 0.2582 0.263 26 0.2648 0.276
3 0.2657 0.318 27 0.2688 0.340
4 0.2571 0.240 28 0.2658 0.272
5 0.2085 0.303 29 0.2123 0.311
6 0.2056 0.251 30 0.2096 0.269
7 0.2095 0.298 31 0.2148 0.343
8 0.2077 0.259 32 0.2119 0.269
9 0.2646 0.307 33 0.2350 0.263
10 0.2593 0.239 34 0.2351 0.280
11 0.2699 0.303 35 0.2361 0.252
12 0.2599 0.270 36 0.2359 0.256
13 0.2112 0.300 37 0.2358 0.281
14 0.2059 0.259 38 0.2353 0.288
15 0.2134 0.317 39 0.2381 0.403
16 0.2071 0.259 40 0.2355 0.272
17 0.2674 0.317 41 0.3000 0.285
18 0.2640 0.248 42 0.2329 0.269
19 0.2692 0.322 43 0.2246 0.256
20 0.2649 0.266 44 0.2315 0.240
21 0.2112 0.303 45 0.2378 0.292
22 0.2094 0.262 46 0.1881 0.274
23 0.2139 0.299 47 0.2357 0.262
24 0.2100 0.235 48 0.2361 0.262
Backward Stepwise Logistic Regression 69

the effect of EPV, we conducted two sub-studies, one for P = 5 and one for P =
25, combining with other four factors have the same values as in the full study, as
indicated in the last four rows of Table 1. For P = 5, the sample size N is chosen to
40, 50, 100, 200, and 400, and for P = 25, the sample size N is 200, 250, 500, 1,000,
and 2,000, so that EPV will be 1:4, 1:5, 1:10, 1:20, and 1:40 for both of P = 5 and
P = 25. We still used the second-order central response surface design strategy for
these two sub-studies.

5. Results of the Simulation Experiments


In this section, we analyze the results of the sampling experiments for the
multivariate normal case. It has three main purposes: (1) to recommend the best 
2
level of significance for the  stopping criterion; (2) to investigate the effects of the

Table 4
Response surface analysis of the best  levels for P V M D, and N
in multivariate normal case for LR
Parameter d.f. ˆ s.e t-value p-value
P 1 −00222 0.0045 −500 <.0001
V 1 −01437 0.1112 −129 0.2074
M 1 −00689 0.0494 −140 0.1740
D 1 −02008 0.2199 −091 0.3692
N 1 5.76E-5 0.0002 0.26 0.7975
P2 1 0.0005 9.24E-5 5.73 <.0001
V2 1 0.0822 0.0577 1.42 0.1659
M2 1 0.0109 0.0092 1.19 0.2463
D2 1 −00528 0.1478 −036 0.7235
N2 1 −2.38E-7 2.31E-7 −103 0.3124
PV 1 −00019 0.0023 −085 0.4001
PM 1 0.0007 0.0009 0.77 0.4484
PD 1 0.0012 0.0036 0.34 0.7390
PN 1 −2.21E-6 4.50E-6 −049 0.6269
VM 1 −00077 0.0225 −034 0.7338
VD 1 0.1518 0.0900 1.69 0.1034
VN 1 2.62E-5 0.0001 0.23 0.8179
MD 1 0.0407 0.0360 1.13 0.2683
MN 1 −3.68E-5 4.50E-5 −082 0.4206
DN 1 0.0003 0.0002 1.93 0.0644
Factor d.f. SSR MSE F -value p-value
P 6 0.0419 0.0070 43.02 <.0001
V 6 0.0012 0.0002 1.25 0.3107
M 6 0.0009 0.0001 0.93 0.4891
D 6 0.0018 0.0003 1.84 0.1266
N 6 0.0016 0.0003 1.64 0.1733
Lack-of-fit test F225 = 067 (p-value = 0.7683).
70 Wang et al.

five factors P V M D, and N on the best  level; and (3) to compare three selection
criteria: LR, SC, and WD. SAS/IML 9.1 was used for all the computer programs
(SAS/IML Software, 2004; SAS/STAT, 2004).
Table 3 gives the best  levels of the Log-likelihood ratio criterion for 48 sampling
situations where the best  level for each sampling situation is defined to be the mean
of  levels for which the ERR is a minimum over 300 replications. The best  level
from LR is between 0.24 and 0.40 which is identical to the Score test. However, the
best  level from the Wald’s test is different from LR and SC with 024 <  < 031.
The results of the analysis of the response surface design for the Log-likelihood
ratio criterion are given in Table 4. It was assumed that the effects of third-
order and fourth-order interactions could be ignored (Montgomery, 2005). Since
the lack-of-fit test is not significant p-value = 0.7683), it may be concluded that
the quadratic surface fits the data well. The main effect of factor P is statistically
significant (p < 0001). Figure 1 (the best  level against P for three selection
criteria) shows that the best  for the Score test is almost identical to the Log-
likelihood ratio, but not for the Wald’s test.
The results obtained from two sub-studies of P = 5 and P = 25 showed that
EPV does not have a strong effect on the choice of the best . In the response surface
analysis, although the effect for factor N from is significant in the P = 25 case, the
range of  values for P = 5 and P = 25 are very narrow, that is, 035 ≤  ≤ 038 and
023 ≤  ≤ 027, respectively. This confirms the finding by Derksen and Keselman
(1992) that the sample size (or the number of events) is of little practical importance
for selection algorithms. However, as suggested by Hosmer and Lemeshow (2000)
and Peduzzi et al. (1996), we still recommend the use of EPV values of 10 or greater
in logistic regression.

Figure 1. The Best  levels over the number of independent (predictor) variables P for the
LR, SC and WD selection criteria in the multivariate normal case.
Backward Stepwise Logistic Regression 71

6. Discussion and Conclusion


2
In this article, we have studied the determinants of the best  level for the 
stopping criterion for the purpose of prediction. The best choice of  varies between
0.24 and 0.40 for multivariate normal case. We actually recommend 020 ≤  ≤ 040,
because once the number of predictors goes beyond 25, the best  will decrease
under 0.24.
Monte Carlo simulations show that the choice of  depends upon the factor
P. The results for the factor P suggests that  should decrease with the number of
predictor variables, at least for P in the range 5–25. This decrease of  with P is in
agreement with the idea that when there are fewer predictor variables, a particular
variable has great chance of being removed. In this situation, a larger  level should
be used to lower the chance of removal, thus giving each variable a chance of being
eliminated compared to that when there are a larger number of variables. The results
agree somewhat with that given by Lee and Koval (1997) for the forward stepwise
logistic regression where they show only one factor, P, has a significant effect on the
choice of best  levels.
Secondly, the results suggests that application of backward elimination at 0.05
level is inappropriate, and that had relatively poor model performance (Ambler
et al., 2002; Harrell et al., 1984). Steyerberg recommended a higher  level (e.g., 0.50)
may be considered to limit the loss of information (Steyerberg et al., 2000, 2001).
Their results also showed that the greater  led to better prediction performance
in fewer predictors case (8 predictors) than more predictors case (17 predictors). In
our study, the best  level for P = 5 is 0.40, and it decreases when P increases from
5–25.
However, the effect of EPV mentioned by Ambler et al. (2002), Harrell et al.
(1984), and Steyerberg et al. (2000, 2001) was not found in this article. We conducted
two sub-studies for P = 5 and P = 25, but the ranges of best  for these two sub-
studies were very narrow, that is, 035 ≤  ≤ 038 and 023 ≤  ≤ 027, respectively.
Therefore, we concluded that the events per variable has a less important effect on
the choice of the best  level.
The last point raised from this simulation study is that the Wald’s test may not
be a good selection criteria for backward elimination logistic regression. See Hauck
and Donner’s (1977) comment: “If an effect is very large, the normal approximation
of the Wald test-statistic becomes very inaccurate and its results are unreliable for
binary data.” On the contrary, the Score statistics could be a good approximation
of Log-likelihood ratio statistic for the purpose of speeding up the selecting of
variables. The poor behavior of the Wald’s statistics does not bode well for SAS in
that PROC LOGISTIC uses only the Wald’s statistic for backward elimination.
Further simulations should be carried out to discover whether the rules given
above are suitable for the multivariate binary case, such as Qaqish (2003); then, a
2
more general recommendation for the  stopping criterion can be made.

Acknowledgments
This study was supported by a postgraduate scholarship from the Natural Sciences
and Engineering Research Council (Canada), and by grants from the National
Cancer Institute of Canada (015046), and the Natural Sciences and Engineering
Research Council (Canada) (9280-03). We thank the editor and referees for their
comments which significantly improved the quality of this article.
72 Wang et al.

References
Ambler, G., Brady, A. R., Royston, P. (2002). Simplifying a prognostic model: a simulation
study based on clinical data. Statistics in Medicine 21:3803–3822.
Bendel, R. B., Afifi, A. A. (1977). Comparison of stopping rules in forward stepwise
regression. Journal of the American Statistical Association 72(357):46–53.
Cochran, W. G., Cox, G. M. (1957). Experimental Designs. 2nd ed. New York: Wiley.
Costanza, M. C., Afifi, A. A. (1979). Comparison of stopping rules in forward stepwise
discriminant analysis. Journal of the American Statistical Association 74(368):777–785.
Derksen, S., Keselman, H. J. (1992). Backward, forward and stepwise automated subset
selection algorithms: frequency of obtaining authentic and noise variables. British Journal
of Mathematical and Statistical Psychology 45:265–282.
Draper, N. R., Smith, H. (1998). Applied Regression Analysis. 3rd ed. New York: Wiley.
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-
validation. Journal of the American Statistics Association 78:316–331.
Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the
American Statistical Association 81(394):461–470.
Efroymson, M. A. (1960). Mathematical methods for digital computers. In: Ralston, A.,
Wilf, H. S., eds. Multiple Regression Analysis. New York: Wiley.
Glick, N. (1972). Sampled-based classification procedures derived from density estimators.
Journal of the American Statistics Association 67:116–122.
Glick, N. (1973). Sampled-based multinomial classification. Biometrics 29(1):241–256.
Gong, G. (1986). Cross-validation, the jackknife, and the bootstrap: excess error
estimation in forward logistic regression. Journal of the American Statistical Association
81(393):108–113.
Harrell, F. E., Lee, K. L., Califf, R. M., Pryor, D. B., Rosati, R. A. (1984). Regression
modelling strategies for improved prognostic prediction. Statistics in Medicine 3:143–152.
Hauck, W. W., Donner, A. (1977). Wald’s test as applied to hypotheses in logit analysis.
Journal of the American Statistical Association 72(360):851–853.
Hills, M. (1966). Allocation rules and their error rates. Journal of the Royal Statistical Society
Series B 28:1–20.
Hosmer, D. W., Lemeshow, S. (2000). Applied Logistic Regression. 2nd ed. New York: Wiley.
Lee, K. I., Koval, J. J. (1997). Determination of the best significance level in forward step-wise
logistic regression. Communications in Statistics—Simulation and Computation 26(2):559–575.
Montgomery, D. C. (2005). Design and Analysis of Experiments. 6th ed. New York: Wiley.
Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., Feinstein, A. R. (1996). A simulation
study of the number of events per variable in logistic regression analysis. Journal of
Clinical Epidemiology 49:1503–1510.
Qaqish, B. F. (2003). A family of multivariate binary distributions for simulating correlated
binary variables with specified marginal means and correlations. Biometrika 90:455–463.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications. 2nd ed. New York: Wiley.
Roecker, E. B. (1991). Prediction error and its estimation for subset-selected models.
Technometrics 33:459–468.
SAS/IML Software. (2004). User’s Guide, SAS Institute Inc. Version 9.1, Cary, NC, USA.
SAS/STAT. (2004). User’s Guide, SAS Institute Inc. Version 9.1 Cary, NC, USA.
Steyerberg, E. W., Eijkemans, M. J. C., Harrell, J., F. E., Habbema, J. D. F. (2000).
Prognostic modeling with logistic regression analysis: a comparison of selection and
estimation methods in small data sets. Statistics in Medicine 19:1059–1079.
Steyerberg, E. W., Eijkemans, M. J. C., Harrell, J., F. E., Habbema, J. D. F. (2001).
Prognostic modeling with logistic regression analysis: in search of a sensible strategy in
small data sets. Medical Decision Making 21:45–56.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the
number of observations is large. Transactions of the American Mathematical Society
54:426–482.

You might also like