Professional Documents
Culture Documents
Some Alternatives To The Box-Cox Regression Model
Some Alternatives To The Box-Cox Regression Model
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/2527151?seq=1&cid=pdf-reference#references_tab_contents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Institute of Social and Economic Research, Osaka University, Wiley and Economics Department of the University
of Pennsylvania are collaborating with JSTOR to digitize, preserve and extend access to International Economic Review.
http://www.jstor.org
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
INTERNATIONAL ECONOMIC REVIEW
BY JEFFREY M. WOOLDRIDGE1
type approaches, the proposed estimators of the conditional mean function are
versions of the model are derived. Scale invariant t-statistics are proposed,
scale invariant.
1. INTRODUCTION
XK). Formalizing this notion requires one to decide which aspect of the conditional
x, M(ylx). Especially for nonnegative variables the conditional mean and condi-
tional median can be very different. Because of its simple algebra (linearity, the law
of iterated expectations), the conditional mean has been used more extensively in
recent example is the learning model for wage determination studied by Farber and
Gibbons (1990). While there are situations where the conditional median is of
interest, the conditional expectation continues to receive the most attention among
theorists and empirical economists. This paper is about estimating models for
E(ylx), and the subsequent discussion assumes that E(ylx) is the function of
interest.
The first model studied in econometrics courses, which postulates that E(ylx) is
Finally, although of less importance for asymptotic inference, the assumption that
In econometrics the most common alternative to the linear model for E(y Ix) is a
*Manuscript received May 1989; revised February 1990 and July 1991.
1 This is a condensed and revised version of MIT Department of Economics Working Paper No. 534.
I am grateful to Ernie Berndt, Marcel Dagenais, Jean-Marie Dufour, Zvi Griliches, Dale Poirier, Mark
Showalter, Jeff Zabel, three anonymous referees, and the participants of the Boston College and
University of Montreal econometrics workshop for providing helpful comments. Special thanks to Ernie
Berndt for directing me to the issue of scale invariance, and to Diego Rodriguez for supplying and
935
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
936 JEFFREY M. WOOLDRIDGE
linear model for E(log ylx), provided that P(y > 0) = 1. Normality of log y cannot
models with log y as the dependent variable. However, the important issue of
whether the linear model for log y implicitly provides the best description of E( ylx)
depends on the particular application. One cannot even investigate this issue unless
Noting that the identity and logarithmic transformations in linear models are too
restrictive, Box and Cox (1964) suggested a transformation of y that contains the
In the Box-Cox regression model there is a value A E R such that for some K x 1
documented (see, e.g., Amemiya and Powell 1981). Also, the practice of estimating
various econometricians and statisticians (see, e.g., Amemiya and Powell 1981 and
From a social scientist's point of view there is the more important problem of
interpreting the parameters P and A. The vector P measures the marginal effects of
economic agents or policy makers then interest typically lies in the expected value
primarily because they also index E(ylx). Poirier and Melino (1978) show that Pij
and aE(ylx)laxj have the same sign when y(A) is assumed to have a plausible
truncated normal distribution, but their expression for E(ylx) relies on the assumed
Distribution-free methods are available for estimating P and A, e.g. Amemiya and
Powell's (1981) nonlinear two stage least squares estimator. All that is required is
This is less restrictive than the standard Box-Cox model but requires, at a
minimum, bounding the supports of x and E in ways that depend on the true
parameters (to ensure the inequality E 2 -(1 /A + xP) for A + 0). Requiring that a
a lot of economic data. But, if one is willing to make such an assumption, Duan's
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 937
The motivation underlying this paper combines my belief that E(ylx) is the object
typically used as a device for generalizing functional form. Rather than searching
normality (or independence of the implicit errors and explanatory variables), this
paper attempts the more modest task of specifying a functional form for E( ylx) that
contains the linear, exponential, constant elasticity, and a variety of other regres-
sion models as special cases. The parameters of the conditional mean specification
are easy to interpret and weighted nonlinear least squares (WNLS) estimators or
are likely to be sufficiently precise for many applications. In some cases the WNLS
estimator (or QMLE) is fully efficient. This notwithstanding, it seems that obtaining
The functional form studied here extends those used by others in the literature on
nonlinear estimation (e.g., Mukerji 1963, Mizon 1977, Berndt and Khaled 1979),
and provides a unified framework for analyzing and testing several of the regression
for modelling E(ylx) but, in contrast to Box-Cox type approaches, tests about
Section 2 of the paper presents a case for defining all economic quantities in
basic model for E( ylx) and derives the asymptotic covariance matrix of the WNLS
(QML) estimator. Standard and robust Lagrange multiplier (LM) tests for restricted
versions of the model are derived in Section 4. Section 5 discusses the issue of scale
(1978) housing price data is provided in Section 6, and some remarks about
liberally in the social sciences, often without, regard for the implications for
This raises the issue of which of these is the appropriate definition of an elasticity.
differentiable f(-) then the marginal effect of xj on y is simply af(x)Iaxj, and the
(f2x) xi
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
938 JEFFREY M. WOOLDRIDGE
When y and x are random the natural definition of the marginal effect of xj on y is
(Here it is useful to remind the reader that all quantities could instead be defined in
terms of the conditional median.) The elasticity of y with respect to xj, keeping
3aE( Xi E(ylx) O,
which is the same as the right-hand side of (2.1) when y and xj are positive. It is
easily seen that using E(ylx) as the basis for all definitions preserves the relation-
ships that hold among economic quantities in the deterministic case. Further,
on E( y Ix) generally changes as the list of conditioning variables changes, but this is
and h; just define e y - /i(x) and 71 y//(x) (assuming that P[/i(x) > 0] = 1).
var (ylx), so they cannot both be correct. But without such additional assumptions
(2.4) and (2.5) are simply different ways of stating that E(ylx) = ,(x).
transformation 'p(y) are useful only if enough structure has been imposed to
Poirier (1978), Poirier and Melino (1978), and Huang and Kelingos (1979) make
essentially the same point in the context of specific models. However, the current
paper is motivated by the fact that the only way to estimate E(ylx) without
directly.
Let y be a nonnegative random variable and let x (X1, X2, ..., XK) be a 1 x
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 939
something such as (130, AO) to distinguish the "true" parameters from the generic
parameter vector (13, A). As this results in a notational nightmare, (13, A) is used to
denote the true values as well as generic values. As usual, the vector x can contain
and (3.2) from the Box-Cox conditional mean model E[y(A)lx] = x,8 under the
constant variance. However, in what follows only (3.1) and (3.2) is assumed to
hold. In particular, y need not be continuously distributed, so that (3.1) and (3.2)
For (3.1) to be well-defined the inequality 1 + Axf3 2 0, with strict inequality for
A < 0, must hold for all relevant values of x. This is analogous to requiring
case of (3.1) as A -> 0. The exponential model is attractive in that it ensures that
predicted values are well defined and positive for all x and P3.
function are needed. Define the (K + 1) x 1 parameter vector 0 (13', A)' and
for future reference, note that VAU(x; 0) is a linear combination of V,38p(x; 0) and
,u(x; 0) log [,(x; 0)]. The derivative of ,(x; 0) with respect to A at A = 0 can be
Under (3.1) and (3.2) and standard regularity conditions (e.g. White and
Domowitz 1984), 0 can be consistently estimated by NLS. In some cases the NLS
estimator would have a large asymptotic variance matrix due to the substantial
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
940 JEFFREY M. WOOLDRIDGE
this end, let co(x; y) > 0 be a weighting function that can depend on an M x 1 vector
Equation (3.9) is appropriate if var ( yIx) is proportional to [E( ylx)] 2, while (3 .10) is
appropriate if var (ylx) is proportional to E(y Ix). The weighting functions (3.11) and
(3.12) allow for additional flexibility, and encompass (3.9) and (3.10). It is important
to remember that, except where stated, the weighting function is not assumed to be
correctly specified for var (ylx). In other words it is not assumed that
(3.13) var (ylx) = -2cw(x; y) for some y E RM and some o-2 > 0,
but only that such considerations motivate the choice of cl. The idea is that
nonconstant weighting functions can result in efficiency gains even if (3.13) does not
hold. Because the goal is to test hypotheses about the conditional mean parameters
can be obtained using the approach of White (1980). Let {(xt, yt): t = 1, 2, ...} be
a sequence of random vectors following the regression model (3.1) and (3.2), and
suppose there are N observations available. Assume either that these observations
are independent or that they constitute an ergodic time series such that
(xt can contain lagged dependent variables as well as other explanatory variables,
but ergodicity rules out integrated processes). Equation (3.14) ensures that the
Time series applications for which (3.14) does not hold can be accomodated, but the
formulas for the asymptotic covariance matrix derived below-in particular, the
formula for BN-would have to be modified along the lines of White and Domowitz
(1984).
needed. Let 5y denote an estimator that would be </N-consistent for By if(3. 13) held.
Even if (3.13) does not hold, y' generally has a probability limit, denoted here by
=y* plim j' and sometimes called the pseudo-true value of gamma. If cl is chosen
as in (3.9) or (3.10) then jy could be set to initial estimators of (if, A), but it is easier
in this case to compute the QMLE using the appropriate linear exponential family.
obtained by nonlinear least squares estimation using the squared NLS residual ?2
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 941
Let c&t w co(xt; jy) and let a denote the WNLS estimator of 0, which solves
min0 >/it- (Yt - cxu; 0))2/t. Let ut(0) -uA(xt; 0), cot(y) -c(xt; y), Et
(K + 1) X (K + 1) matrices
NN
t=1 t= 1
and assume for simplicity that these have limits A and B, respectively. Then, under
(3.13) holds then y* = y and the asymptotic variance of \/'N(Oa - 0) takes the more
AN BNAN, where
NN
t=1 t=1
asymptotic standard error of 0j is the square root of the jth diagonal element of
formulas are valid in the context of QMLE in the LEF when W't is the estimated
variance from the LEF distribution. For exponential regression, C& = yt7 and for
The WNLS estimator using weights lIc0t is the efficient WNLS estimator if
var (ytlxt) = o-2 w(xt; y) and jy is </N-consistent for y. In addition, there are
some important cases where the WNLS estimator of 0 achieves the asymptotic
with conditional mean tk(x; 0) then the weighting function in (3.9) produces a
directly; the formulas in (3.16) are still valid. For more on QMLE in the LEF, see
discussed in Sections 2 and 3, the Box-Cox model (1.3) and the conditional mean
they produce the same model for E(ylx). When A is known to equal unity both
procedures lead to OLS estimation of P3. When A = 0 in (1.3) the inefficiency of the
WNLS estimators of the slope parameters relative to the Box-Cox MLE can be
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
942 JEFFREY M. WOOLDRIDGE
where o-2 = exp (7)2) - 1. Assume that f is estimated with the restriction A = 0
imposed. The MLE of f3, denoted f3, is simply the OLS estimator from the
regression log Yt on 1, Xt2, ... , XtK. Let p denote the exponential QMLE or the
WNLS estimator with weighting function given by (3.9). As can be seen from
(3.17), the plim of ,13 is ,1 + 7q2/2, so the comparison is based on the slope
2N-1 EE(x'xt)
This ratio is, of course, always bigger than unity, and increases without bound as
Tq oo. However, (3.19) is surprisingly close to unity for reasonable values of Ti. For
Ti = .50 (empirically a very large value for the conditional standard deviation of log
y), the ratio of the asymptotic standard deviations is 1.066; that is, the WNLS
standard deviations are 6.6 percent larger than the MLE's. For more moderate
values of qr, say rq = .30, this ratio is 1.023, so that the WNLS estimator standard
deviations are 2.3 percent larger than the MLE's. Estimates of Ti on the order of .10
are not uncommon in applied work, in which case the inefficiency of WNLS is well
Monte Carlo study could shed light on the efficiency loss of WNLS when A = 0 and
is estimated along with f3, but this is beyond the scope of the current paper.
1, ... , N have to be imposed. Before estimating the full model it makes sense to
test whether some easily estimated restricted version is sufficient. The two cases of
primary interest are A = 1, which leads to a linear model, and A = 0, which results
Consider first testing the null hypothesis Ho: A = A0 against H1: A # A0 for
A0 # 0. Expressions (3.5) and (3.7) provide the gradient of the regression function
evaluated at the null hypothesis; denote these by Vp j t and VAk 't, where 13 is the
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 943
restricted WNLS estimator (or QMLE) obtained with A = Ao. Let 9t and 't be the
(unweighted) fitted values and residuals from the WNLS regression. Assuming that
t. > 0, t = 1, ..., N, the usual LM statistic can be derived from Engle (1984) or
r-squared. Under Ho and var (y t Ixt) = o-2w(xt; y), NR1 x . The test of a linear
model occurs when AO 1, in which case Vgt can be taken to be jt- xtl/Vcut.
Further, if C&t 1 this reduces to the LM form of the t-test for the null of a linear
suggests using a standard t-test in the Box-Cox setup because, under Ho: A = 1,
The failure of normality has no effect on the asymptotic size of the test, but
misspecification of the conditional variance function can bias the inference toward
or away from the null model. Because the primary goal is to test hypotheses about
var (ylx). Wooldridge (1991a, Procedure 3.1) covers regression-based LM tests that
computed as follows: First, obtain the residuals, say Ft, from the regression
Then obtain the sum of squared residuals, SSR, from the OLS regression
The LM test for A = 0 requires only a slightly different argument. Let /8 be the
WNLS estimator using conditional mean function exp (xt, ) and weights l/&6t (or let
f3 be the exponential, Poisson, or some other QMLE in the LEF). Let 9't exp
(xtI8) be the fitted values and let t-Yt - 9t be the residuals from the WNLS
The robust form of the test can be computed by first saving the residuals Ft from the
regression
and then using N - SSR from the regression (4.3) as asymptotically Xi.
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
944 JEFFREY M. WOOLDRIDGE
Let , and k be the estimates computed under 8 = 0, so that the fitted values and
V 3,(itand Vk , t be the gradients defined by (3.5) and (3.7). The gradient with
If WNLS or QMLE is used, each quantity is weighted by 1I\/i and the standard
(4.8) E t on V A t, V A Pit, V a A t
It is important to stress that tests about f3 and A in the current setup are purely
tests about E(ylx): provided the robust forms of the tests are used, the null
distribution of y given x. In contrast, tests about p and A in the Box-Cox model, e.g.
those developed by Davidson and MacKinnon (1985), are generally tests about the
entire distribution of y given x. One could reject the null hypothesis because the
One uncomfortable feature of the model (3.1) and (3.2) is that the t-statistics for
the slope coefficients 12, ... ,OK are not invariant to the scaling of yt whenever A
is estimated along with p. Spitzer (1984) noted the analogous feature for the
Suppose that (k3, A) are the WNLS estimates using yt as the regressand, and let
(f+k A+) be the corresponding estimates when the regressand is cOyt for some
(5.1) kA+ = Ak; IB1 = (co - 1)Ak + ' cOBl Pi+ co iBj, j = 2, ..,K,
and therefore
The estimate of A is invariant to the scaling of yt, and the estimates of the other
coefficients change so that the fitted values and residuals in the scaled regression
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 945
are simply scaled up versions of the fitted values and residuals in the unscaled
invariant, but tests of whether the population values are zero based on the
t-statistics of I62, * P 1K are not. It turns out that both the usual and robust
exlusion of any 1 x Q vector zt (see model (4.6)) is invariant to the scaling of Yt.
Therefore, both the usual and robust LM tests for the null Ho: /3j = 0 (j = 2, ... .
K) are scale invariant, whereas the Wald test (based on the t-statistic) is not.
Dagenais and Dufour (1991) provide an interesting general discussion of the scale
One solution to the lack of scale invariance for the t-statistics is to introduce an
where v is an additional scale parameter. Note that the elasticity and semi-elasticity
contains unity the parameters 3, A, and v are not separately identifiable from the
WNLS objective function. Instead, for the case that P( y > 0) = 1, take v to be the
Equations (5.3) and (5.4) represent the modified model. With this choice of v (5.3)
exp (N1 StY1 log (yt)), and /3 and A can be estimated by solving
3,A t=1
{yt: t = 1, , N}, and then model (3.1) and (3.2) is estimated. The estimates of
,/ are trivially scale invariant because Yt'v is invariant to scaling. Spitzer (1984)
recommends the same strategy for the Box-Cox model. However, in implementing
this procedure one should recognize that the solutions f3 and A to (5.5) depend on
the estimator v, and therefore the asymptotic variance of 0 (f3', A)' should reflect
this additional source of uncertainty (as different samples of {Yt} are obtained, the
estimator v usually changes). In the general WNLS case, the simplest approach to
0 t=1
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
946 JEFFREY M. WOOLDRIDGE
where A(x; 0, v) v[ I + Axf] 1/A. Under the additional assumption that {(xt, yt)}
solution 0 of (5.6), which accounts for the variability of 'v. Define the (K + 1) x 1
vector
t= 1
where V ef't is the same as derived in Section 3 except that it is now multiplied by
v; also, note that V 4t = [1 + Ax 8] 1/A is simply the fitted value of the scaled
=1 = 1). Note that, in the construction of 9t, v log (yt/Iv) is not weighted by 1I\/&.
Generally speaking, (5.7) differs from the usual robust covariance matrix
estimator
However, as shown in the Appendix, (5.7) and (5.8) produce numerically identical
unaffected by the estimation of v. The scale corrected standard errors of P1, ...
13K from (5.7) are generally different from those obtained from (5.8), reflecting the
estimators, the scale invariance results (including those for LM tests) and formula
(5.7) remain valid. Therefore, scale invariant inference is available for exponential
Harrison and Rubinfeld (1978) (hereafter, HR) estimate a hedonic price equation
for median housing prices in the Boston metropolitan area. Their primary interest
is in the effects of air pollution on housing prices, controlling for a variety of other
town attributes. HR use log of median housing price as the dependent variable in a
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 947
discussed below. The data are published and used extensively in the book on
The hedonic price equation is reestimated here using model (3.1) and, for
(see (3.9)). Box-Cox estimates are obtained by maximizing the likelihood function
The results of the estimation are given in Table 1. CRIME is the per capita crime
rate, DIS is a weighted distance to five employment centers in the Boston area,
RAD is an index of highway accessibility, PROPTAX is the property tax rate (per
population which is black, RM is the average number of rooms, and NOX is the
nitrogen oxide concentration (parts per hundred million). With the exception of the
inclusion of CRIME2 and RM in Table 1, the variable transformations are the same
housing price and proportion of blacks in the population, but it is not compelling.
Unfortunately, the raw data on B are not readily available; only (B - .63)2 is
reported in BKW.
The QMLE and Box-Cox procedures give qualitatively similar results, although
overlooked by HR and BKW. (CRIME2 and RM are omitted in the HR model.) The
coefficients on the pollution variable, NOX2, are very similar in the two proce-
dures, even though the estimates of the transformation parameter LAMBDA (.75
for QMLE, .40 for Box-Cox) are somewhat different. The model with NOX
replacing NOX2 fits almost as well; including both in the model introduces severe
To give some idea of how elasticity estimates differ between the QMLE and
the average, minimum, and maximum values of NOX in the sample (all other
needed for the Box-Cox model. Rather than performing numerical integration, I use
the simple estimate [1 + Axt,] 1/A. The estimated elasticities for the QMLE,
evaluated at the average, minimum, and maximum values of NOX, are -.391,
-.175, and -1.224, respectively. For Box-Cox, the corresponding quantities are
-.407, -.188, and -1.140. These elasticity estimates are closer than one might
expect, suggesting that similar functional forms are obtained for different slope/
lambda combinations.
Each estimate comes with several standard errors. For QMLE, the usual (or
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
948 JEFFREY M. WOOLDRIDGE
TABLE 1
(0.456) (0.329)
[0.606]
{0.610} {0.329}
(0.0028) (0.0030)
[0.0034]
{0.0035} {0.0030}
(0.00003) (0.00004)
[0.00004]
{0.00004} {0.00005}
(0.027) (0.030)
[0.039]
{0.039} {0.030}
(0.019) (0.019)
[0.019]
{0.019} {0.019}
(0.0001) (0.0001)
[0.0001]
{0.0001} {0.0001}
(0.005) (0.006)
[0.004]
{0.004} {0.006}
(0.023) (0.018)
[0.038]
{0.038} {0.018}
(0.074) (0.095)
[0.097]
{0.097} {0.095}
RM -0.982 -0.667
(0.150) (0.094)
[0.194]
{0.195} {0.094}
(0.013) (0.007)
[0.016]
{0.016} {0.007}
(0.0010) (0.0012)
[0.0012]
{0.0012} {0.0012}
nonrobust) standard errors are in parentheses. These are obtained as the square
roots of the diagonal elements from the inverse of the Hessian of the exponential
log-likelihood, times 'a (which is reported in the table). Here, 'a2 is computed with
a degrees-of-freedom adjustment as
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 949
TABLE 1
(CONTINUED)
(0.132) (0.056)
[0.160]
{0.160} {0.057}
Notes:
i. For QMLE, standard errors in parentheses are computed from the inverse of the Hessian times
&.2 The standard errors in brackets are the robust standard errors. In braces are the standard errors
corrected for the estimated geometric mean; these are also robust to variance misspecification.
ii. For Box-Cox, standard errors in parentheses are computed from the outer product of the
score. The quantities in braces are the standard errors corrected for the estimated geometric mean.
iii. The value of the log-likelihood for QMLE is the value of the gamma log-likelihood evaluated
at ,, A, and &2. The gamma density is parameterized such that var (yIx) = o-2[E(y Ix)]2. This
provides some evidence on how well a gamma distribution fits the data, but recall that the
parameter o-2 is not jointly estimated with f3 and A. Therefore, this is not the maximized value of
iv. The r-squareds are computed as R I- 1_tXl (jt - _t)2/Y (Yt - 7) 2, where the 5t
are the fitted values for yt: t =- [1 + Axt~I] 1/A. This is the formula used for Box-Cox and QMLE,
even though [1 + Axt,8] 1/A is not the conditional mean of yt given xt for the Box-Cox model.
NN
t=1 t=I
where P = 13 for the current model. The quantities in brackets are the robust
standard errors, obtained from (5.8). The standard errors corrected for the
estimated geometric mean, obtained from (5.7), are in braces; these are also robust
to variance misspecification.
For Box-Cox, the standard error in parentheses is obtained from the outer
product of the score. Because of recent evidence showing that these estimates can
should be interpreted with caution. There is no robust standard error since the
model (1.3) is assumed to be true. Using an argument similar to that for the QMLE,
the Box-Cox standard errors can be corrected for the estimation of the geometric
Most coefficients are highly statistically significant. Both the QMLE and Box-
rejected along this dimension as well. With the exception of (B - .63)2, all
coefficients have (robust) t-statistics greater than three. Interestingly, for the
QMLE, the significance and magnitude of (B - .63)2 has declined substantially; the
estimate is .15 with a robust t-statistic of 1.5. (HR obtain an estimate of .36 with a
t-statistic of 3.5.) Thus, rather than adopt HR's tenuous explanation for a positive
effect for large values of B, one might conclude that B is insignificant. Part of the
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
950 JEFFREY M. WOOLDRIDGE
part is due to the additional nonlinearities allowed for by adding CRIME2 and RM
to HR's specification. The estimate and t-statistic are notably higher for the
Box-Cox procedure.
A useful regularity in Table 1 is that the correction to the standard errors for the
presence of the estimated geometric mean has a very small effect. For QMLE, the
robust and scale-corrected standard errors differ in insignificant digits. For Box-
Cox, the usual and scale-corrected standard errors are almost indistinguishable.
Berndt, Showalter, and Wooldridge (1990) also found relatively minor differences
in the two standard errors for a variety of data sets. This suggests that ignoring the
preliminary estimation of scale does not lead one too far astray when performing
data set best. In terms of r-squareds (the amount of variation in y t explained by xt),
the two approaches provide almost identical fits. (Incidentally, the largest possible
r-squared for the mean function (3.1) and (3.2) is obtained by performing NLS. For
this data set, the NLS r-squared is .827.) Direct comparison of the exponential and
much too restrictive, requiring the variance to be equal to the mean squared (for
this data set, var (ylx)/[E(ylx)]2 is estimated to be .033, well below unity, the value
gamma log-likelihood at the estimated /3, A, and aQ2, and to compare this to the
Box-Cox log-likelihood. Still, this is biased against the QMLE because the gamma
log-likelihood is not evaluated at its maximized value; recall that, for robustness
reasons, the exponential log-likelihood is maximized to obtain /3 and A, and then ar2
is obtained by (6.1). For this data set, the Box-Cox log-likelihood is 177.47 and the
Berndt, Showalter, and Wooldridge (1992) find notable differences between the
some data sets they examine. Consequently, the finding of similar results for this
7. CONCLUDING REMARKS
functional form in statistics and econometrics, but it is not well suited for all
approaches, this paper has proposed a flexible nonlinear model for E(ylx). No
all of the robust LM tests of the special cases covered in Section 4 are pure
the model for E(ylx), and not as a rejection of some other distributional assump-
tion.
The model studied here can easily incorporate Box-Cox transformations of the
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 951
denote the Box-Cox transformation of x;, as in (1.1) and (1.2). Then (3.1) and (3.2)
are extended by replacing x with x(p) (x1, ., xj, xj+i(PJ+I) ---, XK(PK))-
Estimation and inference in this more general model are covered in Wooldridge
(1989).
APPENDIX
All of the results in this appendix assume that Xtl - 1, t = 1, 2, .... For
notational simplicity, the results are proven only for the unweighted case.
PROOF. Let jut(,8, A)-- [1 + AxtBI]lIA, and let 'Vpt(p, A) and VALt((P, A)
denote the derivatives. Then the first order conditions for (,B+, kA+) are
t=1
t=1
Because the solutions are assumed to be unique, it suffices to show that 3 + and A +
given by (5.1) satisfy (A. 1) and (A.2). Then (A. 1) reduces to showing
t= 1
t= 1
But (A.3) follows from the first order condition for (,B, A). Next, straightforward
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
952 JEFFREY M. WOOLDRIDGE
t= 1
NN
t=1 t=1
The first term on the right-hand side of (A.4) is zero by the first order condition for
(W, A). The first order condition for (13, A) also implies that the second right-hand
t=1
E [1 + Xxt]{(l1A)-1}(1 + - (i X))
t= 1
+ A > [1 + X- t A)) = 0
t=1
N -1 N
N -I N
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 953
that rt is also scaled up by co. Then the expression (A.7) is scale invariant, and then
Consequently, the residuals from the regression VVApt(00) on Vp,/t(00), say ro0,
rt) -1N-1/2 itfl rt~t and the asymptotic variance of A is invariant. That the
computed standard errors are invariant follows because rt and 't are also scaled up
by co. El
consider testing Ho: 5 = 0. If the regressand is cOyt for co > 0 then the gradients
VA ,U ( p A O) = 0VA ?,8 A, 0) + A co
Vsju t+ A +, 0) = c o- aA t 68 I AI 0)
Because the gradients of the scaled data are linear combinations of the gradients for
the unscaled data, and because Et+ = cO Et, the r-squareds from the regressions
are numerically identical. This shows that the nonrobust LM statistics are numer-
ically identical. For the robust test, note that the residuals rj+ from the regression
(rt+ = c irt). Consequently, the sum of squared residuals from the regressions 1
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
954 JEFFREY M. WOOLDRIDGE
standard error of A (yt has been scaled by the geometric mean v).
t= 1
where u (V) = Vi1 + AXtfa] / and VPut(0, v) and VAjut(0, v) are also scaled
N -1 N
t ~~~N12t1Vpjc
t= 1
where all elements are evaluated at (0, v). The second term in (A. 11) is the
contribution due to the estimation of v; the first term is as before. Thus, it suffices
(the element corresponding to A) is identically zero. But (A. 12) is the vector of
The coefficient on VAut is also obtained by first obtaining the residuals rt from the
regression VAut on V0,ut and then computing the coefficient from the regression
V,,ut on rt. Thus, it suffices to show that V,ut and rt are orthogonal. But the
residuals rt are orthogonal to any linear combination of Vpput, i.e. StNl Vpu'rt =
REFERENCES
AMEMIYA, T. AND J. L. POWELL, "A Comparison of the Box-Cox Maximum Likelihood Estimator and the
Non-Linear Two Stage Least Squares Estimator," Journal of Econometrics 17 (1981), 351-381.
ANDREWS, D. F., "A Note on the Selection of Data Transformations," Biometrika 58 (1971), 249-254.
BELSLEY, D. A., E. KUH, AND R. E. WELSCH, Regression Diagnostics: Identifying Influential Data and
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 955
BERNDT, E. R. AND M. S. KHALED, "Parametric Productivity Measurement and Choice among Flexible
BICKEL, P. J. AND K. A. DOKSUM, "An Analysis of Transformations Revisited," Journal of the American
Box, G. E. P. AND D. R. Cox, "An Analysis of Transformations," Journal of the Royal Statistical Society
DAVIDSON, R. AND J. G. MACKINNON, "Testing Linear and Loglinear Regressions Against Box-Cox
DAGENAIS, M. G. AND J.-M. DUFOUR, "Invariance, Nonlinear Models, and Asymptotic Tests," Econo-
DUAN, N., "Smearing Estimate: A Nonparametric Retransformation Method," Journal of the American
ENGLE, R. F., "Wald, Likelihood Ratio, and Lagrange Multiplier Statistics in Econometrics," in Z.
FARBER, H. AND R. GIBBONS, "Learning and Wage Determination," mimeo, MIT Department of
Economics, 1990.
HARRISON, D. AND D. L. RUBINFELD, "Hedonic Housing Prices and the Demand for Clean Air," Journal
HUANG, C. J. AND J. A. KELINGOS, "Conditional Mean Function and a General Specification of the
MCCULLAGH, P. AND J. A. NELDER, Generalized Linear Models, 2nd ed. (New York: Chapman and Hall,
1989)
Section Study of Factor Substitution and Returns to Scale," Econometrica 45 (1977), 1221-1242.
MUKERJI, V., "A Generalized SMAC Function with Constant Ratios of Elasticity of Substitution,"
POIRIER, D. J., "The Use of the Box-Cox Transformation in Limited Dependent Variable Models,"
AND A. MELINO, "A Note on the Interpretation of Regression Coefficients within a Class of
SPITZER, J. J., "Variance Estimates in Models with Box-Cox Transformations: Implications for Estima-
tion and Hypothesis Testing," Review of Economics and Statistics 66 (1984), 645-652.
WHITE, H. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
(1984), 143-162.
WOOLDRIDGE, J. M., "Some Alternatives to the Box-Cox Regression Model," Working Paper No. 534,
(1991b), 29-55.
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions