Some Alternatives To The Box-Cox Regression Model

Institute of Social and Economic Research, Osaka University
Economics Department of the University of Pennsylvania
Some Alternatives to the Box-Cox Regression Model

Author(s): Jeffrey M. Wooldridge
Source: International Economic Review, Vol. 33, No. 4 (Nov., 1992), pp. 935-955
Published by: Wiley for the Economics Department of the University of Pennsylvania and
Institute of Social and Economic Research, Osaka University
Stable URL: http://www.jstor.org/stable/2527151
Accessed: 01-03-2016 18:02 UTC
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/2527151?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Institute of Social and Economic Research, Osaka University, Wiley and Economics Department of the University
of Pennsylvania are collaborating with JSTOR to digitize, preserve and extend access to International Economic Review.
http://www.jstor.org
This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
INTERNATIONAL ECONOMIC REVIEW
Vol. 33, No. 4, November 1992
SOME ALTERNATIVES TO THE BOX-COX REGRESSION MODEL*
BY JEFFREY M. WOOLDRIDGE1
A nonlinear regression model is proposed as an alternative to the Box-Cox
regression model for nonnegative variables. The functional form contains
linear, exponential, and reciprocal models as special cases. Unlike Box-Cox
type approaches, the proposed estimators of the conditional mean function are
robust to conditional variance and other distributional misspecifications.
Computationally simple, robust Lagrange multiplier statistics for restricted
versions of the model are derived. Scale invariant t-statistics are proposed,
and the Lagrange multiplier statistic for exclusion restrictions is shown to be
scale invariant.
1. INTRODUCTION
Economists and other social scientists are often interested in explaining a
nonnegative variable y in terms of some explanatory variables x (x1, x2,
XK). Formalizing this notion requires one to decide which aspect of the conditional
distribution of y given x is of interest. The two leading candidates are the
conditional expectation of y given x, E( ylx), and the conditional median of y given
x, M(ylx). Especially for nonnegative variables the conditional mean and condi-
tional median can be very different. Because of its simple algebra (linearity, the law
of iterated expectations), the conditional mean has been used more extensively in
developing economic theory, particularly in the context of rational expectations; a
recent example is the learning model for wage determination studied by Farber and
Gibbons (1990). While there are situations where the conditional median is of
interest, the conditional expectation continues to receive the most attention among
theorists and empirical economists. This paper is about estimating models for
E(ylx), and the subsequent discussion assumes that E(ylx) is the function of
interest.
The first model studied in econometrics courses, which postulates that E(ylx) is
a linear function of x, or a linear function of some vector of transformations ?(x),
often provides an inadequate description of E(ylx). In addition, y is frequently
heteroskedastic, resulting in the usual inference procedures being inappropriate.
Finally, although of less importance for asymptotic inference, the assumption that
y conditional on x is normally distributed is untenable because y is nonnegative.
In econometrics the most common alternative to the linear model for E(y Ix) is a
*Manuscript received May 1989; revised February 1990 and July 1991.
1 This is a condensed and revised version of MIT Department of Economics Working Paper No. 534.
I am grateful to Ernie Berndt, Marcel Dagenais, Jean-Marie Dufour, Zvi Griliches, Dale Poirier, Mark
Showalter, Jeff Zabel, three anonymous referees, and the participants of the Boston College and
University of Montreal econometrics workshop for providing helpful comments. Special thanks to Ernie
Berndt for directing me to the issue of scale invariance, and to Diego Rodriguez for supplying and
documenting the data set.
935
936 JEFFREY M. WOOLDRIDGE
linear model for E(log ylx), provided that P(y > 0) = 1. Normality of log y cannot
be ruled out a priori and heteroskedasticity is often less of a problem in linear
models with log y as the dependent variable. However, the important issue of
whether the linear model for log y implicitly provides the best description of E( ylx)
depends on the particular application. One cannot even investigate this issue unless
the estimates of E(log ylx) can be transformed into estimates of E(ylx).
Noting that the identity and logarithmic transformations in linear models are too
restrictive, Box and Cox (1964) suggested a transformation of y that contains the
identity and logarithmic transformations as special cases. For nonnegative y the
Box-Cox transformation is defined as
(1.1) y(A) (yA-1)/A, A O
(1.2) -log y, A =O.
The case A ? 0 is allowed only if P(y > 0) = 1.
In the Box-Cox regression model there is a value A E R such that for some K x 1
vector f3 and some 2> 0,
(1.3) y(A)Ixx - N(xp, q 2).
It is well-known that (1.3) cannot strictly be true unless A = 0. The inconsistency
of the Box-Cox estimators of A, /, and q2 due to the inherent nonnormality is well
documented (see, e.g., Amemiya and Powell 1981). Also, the practice of estimating
A and then performing inference on f3 as if A were known has been criticized by
various econometricians and statisticians (see, e.g., Amemiya and Powell 1981 and
Bickel and Doksum 1981).
From a social scientist's point of view there is the more important problem of
interpreting the parameters P and A. The vector P measures the marginal effects of
the explanatory variables on E[y(A)lx]. But if y is the variable important to
economic agents or policy makers then interest typically lies in the expected value
of y given x. The parameters f3, A, and V2 in a Box-Cox model are of interest
primarily because they also index E(ylx). Poirier and Melino (1978) show that Pij
and aE(ylx)laxj have the same sign when y(A) is assumed to have a plausible
truncated normal distribution, but their expression for E(ylx) relies on the assumed
distribution for y(A).
Distribution-free methods are available for estimating P and A, e.g. Amemiya and
Powell's (1981) nonlinear two stage least squares estimator. All that is required is
that the error ? y(A) - xf has zero expectation conditional on x. However,
estimation of E(ylx) requires something stronger, such as ? being independent of x.
This is less restrictive than the standard Box-Cox model but requires, at a
minimum, bounding the supports of x and E in ways that depend on the true
parameters (to ensure the inequality E 2 -(1 /A + xP) for A + 0). Requiring that a
power transformation of y results in a linear conditional expectation with errors
independent of x (which rules out heteroskedasticity conditional on x) is still asking
a lot of economic data. But, if one is willing to make such an assumption, Duan's
(1983) smearing estimator can be used to estimate E(ylx).
ALTERNATIVES TO THE BOX-COX MODEL 937
The motivation underlying this paper combines my belief that E(ylx) is the object
of primary interest with the observation that the Box-Cox transformation is
typically used as a device for generalizing functional form. Rather than searching
for a (possibly nonexistent) transformation of the explained variable that simulta-
neously induces linearity of the conditional expectation, homoskedasticity, and
normality (or independence of the implicit errors and explanatory variables), this
paper attempts the more modest task of specifying a functional form for E( ylx) that
contains the linear, exponential, constant elasticity, and a variety of other regres-
sion models as special cases. The parameters of the conditional mean specification
are easy to interpret and weighted nonlinear least squares (WNLS) estimators or
quasi-maximum likelihood (QML) estimators in the linear exponential family (LEF)
are likely to be sufficiently precise for many applications. In some cases the WNLS
estimator (or QMLE) is fully efficient. This notwithstanding, it seems that obtaining
robust, possibly inefficient estimates of economically interesting parameters is
often preferred to obtaining efficient (under correct specification of the distribution)
but nonrobust estimates of parameters that might not be of interest.
The functional form studied here extends those used by others in the literature on
nonlinear estimation (e.g., Mukerji 1963, Mizon 1977, Berndt and Khaled 1979),
and provides a unified framework for analyzing and testing several of the regression
functions used in applied economics. It is as flexible as the Box-Cox transformation
for modelling E(ylx) but, in contrast to Box-Cox type approaches, tests about
E(ylx) can be carried out without imposing auxiliary distributional assumptions.
Section 2 of the paper presents a case for defining all economic quantities in
terms of E(ylx), where y is the variable to be explained. Section 3 proposes the
basic model for E( ylx) and derives the asymptotic covariance matrix of the WNLS
(QML) estimator. Standard and robust Lagrange multiplier (LM) tests for restricted
versions of the model are derived in Section 4. Section 5 discusses the issue of scale
invariance of test statistics. An empirical application to Harrison and Rubinfeld's
(1978) housing price data is provided in Section 6, and some remarks about
extending the basic model are contained in Section 7.
2. SOME CONSIDERATIONS WHEN CHOOSING FUNCTIONAL FORM
Transformations of the explained and explanatory variables are used quite
liberally in the social sciences, often without, regard for the implications for
interpreting parameter estimates. For example, for y and xi positive random
scalars, it is easy to construct examples such that
(2.1) aE(log ylx)/d log xj =$ a log E(ylx)Id log xj.
This raises the issue of which of these is the appropriate definition of an elasticity.
When y and x are deterministically related there is no problem. If y = f(x) for a
differentiable f(-) then the marginal effect of xj on y is simply af(x)Iaxj, and the
elasticity of y with respect to x; is
(f2x) xi
(2.2) dXj fix)' f(x)$O .
When y and x are random the natural definition of the marginal effect of xj on y is
the marginal effect of xj on the expected value of y given x, namely aE(yIx)/axj.
(Here it is useful to remind the reader that all quantities could instead be defined in
terms of the conditional median.) The elasticity of y with respect to xj, keeping
XI, ..., XfI, Xi+l,..., XK fixed, is
3aE( Xi E(ylx) O,
(2.3) axj E(ylx)'
which is the same as the right-hand side of (2.1) when y and xj are positive. It is
easily seen that using E(ylx) as the basis for all definitions preserves the relation-
ships that hold among economic quantities in the deterministic case. Further,
agreeing that measures should be based on E(ylx) results in straightforward
comparisons across different parametric models. Of course the marginal effect of xj
on E( y Ix) generally changes as the list of conditioning variables changes, but this is
always the case in regression-type analyses-even in fully nonparametric settings.
Another argument for defining economic quantities in terms of E(ylx) is that it
circumvents the issue of whether the "disturbances" in a model are additive or
multiplicative. When y 2 0 and interest centers on g(x) E( ylx), the disturbances
can be multiplicative or additive: The models
(2.4) y = / (x) + E, E(Elx) = O
(2.5) y = (x)n), 2)O, E()Ix) = 1
are both correct and observationally equivalent without further restrictions on E
and h; just define e y - /i(x) and 71 y//(x) (assuming that P[/i(x) > 0] = 1).
If e and Tj are assumed to be independent of x then the models imply different
var (ylx), so they cannot both be correct. But without such additional assumptions
(2.4) and (2.5) are simply different ways of stating that E(ylx) = ,(x).
In summary, in most empirical studies estimates of E[>p(y)Ix] for some nonlinear
transformation 'p(y) are useful only if enough structure has been imposed to
recover E(ylx) (unless the conditional median is of interest). Goldberger (1968),
Poirier (1978), Poirier and Melino (1978), and Huang and Kelingos (1979) make
essentially the same point in the context of specific models. However, the current
paper is motivated by the fact that the only way to estimate E(ylx) without
imposing additional distributional assumptions is to specify a model for E(ylx)
directly.
3. SPECIFICATION AND ESTIMATION OF THE MODEL
Let y be a nonnegative random variable and let x (X1, X2, ..., XK) be a 1 x
K vector of explanatory variables. In most cases the first element xI is unity.
Without any assumptions on the conditional distribution of y given x consider the
following model for E(yIx):
(3.1) E(ylx) = [1 + Ax]h/A, A #0
(3.2) = exp (x13), A = 0.
Technically, it would be more precise to replace (13, A) in (3.1) and (3.2) by
something such as (130, AO) to distinguish the "true" parameters from the generic
parameter vector (13, A). As this results in a notational nightmare, (13, A) is used to
denote the true values as well as generic values. As usual, the vector x can contain
nonlinear transformations of an underlying set of explanatory variables, whereas y
should be the quantity of interest.
It is straightforward (e.g. Wooldridge 1989) to derive the functional form (3.1)
and (3.2) from the Box-Cox conditional mean model E[y(A)lx] = x,8 under the
plausible assumption that log y conditional on x is normally distributed with
constant variance. However, in what follows only (3.1) and (3.2) is assumed to
hold. In particular, y need not be continuously distributed, so that (3.1) and (3.2)
can be applied to count data.
For (3.1) to be well-defined the inequality 1 + Axf3 2 0, with strict inequality for
A < 0, must hold for all relevant values of x. This is analogous to requiring
nonnegativity of the regression function in a Box-Cox model when A + 0. Equation
(3.2) is the natural definition of the regression function at A = 0 as it is the limiting
case of (3.1) as A -> 0. The exponential model is attractive in that it ensures that
predicted values are well defined and positive for all x and P3.
To estimate f3 and A by WNLS or QMLE, the derivatives of the regression
function are needed. Define the (K + 1) x 1 parameter vector 0 (13', A)' and
express the parameterized regression function for E(ylx) as
(3.3) /u(x; 0) [1 + Axp]l/A, A #0
(3.4) 2exp(x13), A=0.
The gradient of p(x; 0) with respect to 13 is the 1 x K vector
(3.5) Vpp,(x; 0) = [1 + Ax13]{(/A) }X, A #0
(3.6) V 1 , (x; 13, 0) = exp (x,8)x, A = 0.
For A =A 0, straightforward but tedious algebra yields
(3.7) VA P (X; 0) = [1+ Ax,8]I/A
X [Ax13 - (1 + Ax13) log (1 + Ax13)]/[1A2(1 + Ax,8)];
for future reference, note that VAU(x; 0) is a linear combination of V,38p(x; 0) and
,u(x; 0) log [,(x; 0)]. The derivative of ,(x; 0) with respect to A at A = 0 can be
obtained by L'Hospital's rule; the details are omitted for brevity:
(3.8) V , (x; 13, 0) = -exp (x,8) (x,8) 2/2.
Under (3.1) and (3.2) and standard regularity conditions (e.g. White and
Domowitz 1984), 0 can be consistently estimated by NLS. In some cases the NLS
estimator would have a large asymptotic variance matrix due to the substantial
heteroskedasticity in many nonnegative variables. Gains in efficiency might be
realized by using a weighted NLS procedure or a quasi-likelihood procedure. To
this end, let co(x; y) > 0 be a weighting function that can depend on an M x 1 vector
of parameters y. Setting co(x; y) 1 yields NLS. Other possibilities for co are
(3.9) tx;)[x;) (-)
(3.10) wj (x; y) = [u(x; 0) ( = 0)
(3.11) wj (x; y) = exp (xy) (,y -7 8 necessarily)
(3.12) c(x; y) = [1 + px5] P, p #& 0 (y =(8', p)').
Equation (3.9) is appropriate if var ( yIx) is proportional to [E( ylx)] 2, while (3 .10) is
appropriate if var (ylx) is proportional to E(y Ix). The weighting functions (3.11) and
(3.12) allow for additional flexibility, and encompass (3.9) and (3.10). It is important
to remember that, except where stated, the weighting function is not assumed to be
correctly specified for var (ylx). In other words it is not assumed that
(3.13) var (ylx) = -2cw(x; y) for some y E RM and some o-2 > 0,
but only that such considerations motivate the choice of cl. The idea is that
nonconstant weighting functions can result in efficiency gains even if (3.13) does not
hold. Because the goal is to test hypotheses about the conditional mean parameters
0, inference should be robust to violations of (3.13).
The robust asymptotic variance-covariance matrix of the WNLS estimator of 0
can be obtained using the approach of White (1980). Let {(xt, yt): t = 1, 2, ...} be
a sequence of random vectors following the regression model (3.1) and (3.2), and
suppose there are N observations available. Assume either that these observations
are independent or that they constitute an ergodic time series such that
(3.14) E(ytIxt) = E(ytIxt, yt- 1, xt - 1, Yt-2, Xt -2, ...), t = 1, 2,
(xt can contain lagged dependent variables as well as other explanatory variables,
but ergodicity rules out integrated processes). Equation (3.14) ensures that the
errors {et et - E(ytlxt): t = 1, 2, ...} are appropriately serially uncorrelated.
Time series applications for which (3.14) does not hold can be accomodated, but the
formulas for the asymptotic covariance matrix derived below-in particular, the
formula for BN-would have to be modified along the lines of White and Domowitz
(1984).
To compute the WNLS estimator of 0, estimates of the weighting functions are
needed. Let 5y denote an estimator that would be </N-consistent for By if(3. 13) held.
Even if (3.13) does not hold, y' generally has a probability limit, denoted here by
=y* plim j' and sometimes called the pseudo-true value of gamma. If cl is chosen
as in (3.9) or (3.10) then jy could be set to initial estimators of (if, A), but it is easier
in this case to compute the QMLE using the appropriate linear exponential family.
The weighting function (3.9) corresponds to exponential regression, and (3.10)
corresponds to Poisson regression. If cl is given by (3.11) or (3.12), then j' can be
obtained by nonlinear least squares estimation using the squared NLS residual ?2
as the regressand. A better procedure is to use the exponential QMLE with ,Q as
the regressand, as in McCullagh and Nelder (1989, section 10.4).
Let c&t w co(xt; jy) and let a denote the WNLS estimator of 0, which solves
min0 >/it- (Yt - cxu; 0))2/t. Let ut(0) -uA(xt; 0), cot(y) -c(xt; y), Et
Yt - Lt(O), cutct(=*), E t EtIN , and V l4 d-Vltt(0)/\/<. Define the
(K + 1) X (K + 1) matrices
NN
(3.15) AN-N1 N E[V94* 'V p], BN N-1
t=1 t= 1
and assume for simplicity that these have limits A and B, respectively. Then, under
general heteroskedasticity of unknown form, /'N( ) a N(O, A-1BA-1). If
(3.13) holds then y* = y and the asymptotic variance of \/'N(Oa - 0) takes the more
familiar form o-2 A - . In the general case a consistent estimator of V A 1 BA
is given by the White (1980) variance-covariance matrix estimator VN
AN BNAN, where
NN
(3.16) AN-N-1 V0/tLV9/Lt, BN=N- E t
t=1 t=1
t-- Yt - tt(o), Vo t Vottt(o), Et-tlv&= t, and V 0tt V t/\/&. The
asymptotic standard error of 0j is the square root of the jth diagonal element of
VNIN, and one performs inference on 0 as if 0- N(0, VNIN). These same
formulas are valid in the context of QMLE in the LEF when W't is the estimated
variance from the LEF distribution. For exponential regression, C& = yt7 and for
Poisson regression, Ct =Pt
The WNLS estimator using weights lIc0t is the efficient WNLS estimator if
var (ytlxt) = o-2 w(xt; y) and jy is </N-consistent for y. In addition, there are
some important cases where the WNLS estimator of 0 achieves the asymptotic
Cramer-Rao lower bound. If the conditional distribution of y given x is exponential
with conditional mean tk(x; 0) then the weighting function in (3.9) produces a
WNLS estimator that is asymptotically equivalent to the maximum likelihood
estimator. If y conditional on x has a Poisson distribution with mean p(x; 0) then
c(x; y) given by (3.10) produces the WNLS estimator that is asymptotically
efficient. As mentioned above, in these cases it is easier to compute the QMLE
directly; the formulas in (3.16) are still valid. For more on QMLE in the LEF, see
Gourieroux, Monfort, and Trognon (1984) and Wooldridge (199ib).
Before turning to Lagrange multiplier tests a brief discussion of the efficiency of
WNLS or QMLE relative to the Box-Cox approach is warranted. First, as
discussed in Sections 2 and 3, the Box-Cox model (1.3) and the conditional mean
specification (3.1) and (3.2) are usually nonnested. Only when A = 1 or A = 0 do
they produce the same model for E(ylx). When A is known to equal unity both
procedures lead to OLS estimation of P3. When A = 0 in (1.3) the inefficiency of the
WNLS estimators of the slope parameters relative to the Box-Cox MLE can be
tabulated as a function of q2 = var (log ylx). Let H and q2 be defined as in (1.3).
Under (1.3) with A = 0 it is well known that
(3.17) E(ylx) = exp [(131 + q)2/2) + 32X2 + + 13KXK]
(3.18) var (ylx) = o-2{exp [(131 + 7)2/2) + f32x2 + *' + I3KXK]}2,
where o-2 = exp (7)2) - 1. Assume that f is estimated with the restriction A = 0
imposed. The MLE of f3, denoted f3, is simply the OLS estimator from the
regression log Yt on 1, Xt2, ... , XtK. Let p denote the exponential QMLE or the
WNLS estimator with weighting function given by (3.9). As can be seen from
(3.17), the plim of ,13 is ,1 + 7q2/2, so the comparison is based on the slope
estimators only. The asymptotic variance of \/N(f3 -3) is
2N-1 EE(x'xt)
A standard calculation shows that the asymptotic variance of \/N(Q - 13) is
0r2 N-1 E(x'xt)
Thus, the asymptotic standard deviation of fj relative to Pj j = 2, ..., K, is
(3.19) ASD(f3j)/ASD(,Bj) = [exp (T i2) -1]1/2/71
This ratio is, of course, always bigger than unity, and increases without bound as
Tq oo. However, (3.19) is surprisingly close to unity for reasonable values of Ti. For
Ti = .50 (empirically a very large value for the conditional standard deviation of log
y), the ratio of the asymptotic standard deviations is 1.066; that is, the WNLS
standard deviations are 6.6 percent larger than the MLE's. For more moderate
values of qr, say rq = .30, this ratio is 1.023, so that the WNLS estimator standard
deviations are 2.3 percent larger than the MLE's. Estimates of Ti on the order of .10
are not uncommon in applied work, in which case the inefficiency of WNLS is well
under one percent.
When A is estimated along with 13 there is no analytical formula like (3.19). A
Monte Carlo study could shed light on the efficiency loss of WNLS when A = 0 and
is estimated along with f3, but this is beyond the scope of the current paper.
4. LAGRANGE MULTIPLIER TESTS OF RESTRICTED MODELS
Joint estimation of 13 and A can be difficult if the restrictions 1 + Axtg 23 0, t =
1, ... , N have to be imposed. Before estimating the full model it makes sense to
test whether some easily estimated restricted version is sufficient. The two cases of
primary interest are A = 1, which leads to a linear model, and A = 0, which results
in an exponential model. However, it is no more difficult to consider the general
case Ho: A = A0. Throughout this section it is assumed that xt - 1, t = 1, 2,
Consider first testing the null hypothesis Ho: A = A0 against H1: A # A0 for
A0 # 0. Expressions (3.5) and (3.7) provide the gradient of the regression function
evaluated at the null hypothesis; denote these by Vp j t and VAk 't, where 13 is the
restricted WNLS estimator (or QMLE) obtained with A = Ao. Let 9t and 't be the
(unweighted) fitted values and residuals from the WNLS regression. Assuming that
t. > 0, t = 1, ..., N, the usual LM statistic can be derived from Engle (1984) or
Wooldridge (1991a). Because of the relationship between VA ,t and V3pt, the LM
statistic is easily computed as NR 2, from the regression
(4.1) Et on Vp At, log (Qt) t= 1, 2, ... , N,
where EtEiV&, V co tY V Y 9' , and R 2 is the uncentered
r-squared. Under Ho and var (y t Ixt) = o-2w(xt; y), NR1 x . The test of a linear
model occurs when AO 1, in which case Vgt can be taken to be jt- xtl/Vcut.
Further, if C&t 1 this reduces to the LM form of the t-test for the null of a linear
model against the Box-Cox alternative derived by Andrews (1971). Andrews
suggests using a standard t-test in the Box-Cox setup because, under Ho: A = 1,
ylx is normally distributed and homoskedastic.
The failure of normality has no effect on the asymptotic size of the test, but
misspecification of the conditional variance function can bias the inference toward
or away from the null model. Because the primary goal is to test hypotheses about
E(ylx), it is prudent to use the LM test that is robust to misspecification of
var (ylx). Wooldridge (1991a, Procedure 3.1) covers regression-based LM tests that
are robust to variance misspecification. Here, the robust LM statistic can be
computed as follows: First, obtain the residuals, say Ft, from the regression
(4.2) Yt log (9't) on V, iAt, t = 1, 2,..., N.
Then obtain the sum of squared residuals, SSR, from the OLS regression
(4.3) 1 on 9trt, t= 1, 2, ..., N;
under Ho, N - SSR is asymptotically x2 whether or not (3.13) holds.
The LM test for A = 0 requires only a slightly different argument. Let /8 be the
WNLS estimator using conditional mean function exp (xt, ) and weights l/&6t (or let
f3 be the exponential, Poisson, or some other QMLE in the LEF). Let 9't exp
(xtI8) be the fitted values and let t-Yt - 9t be the residuals from the WNLS
estimation. Then, referring to (3.8), the nonrobust LM statistic for testing
Ho: A = 0 is obtained as NR 2 from the regression
(4.4) 9t on ytXt, yt(log 5t)2, t= 1, ..., N.
The robust form of the test can be computed by first saving the residuals Ft from the
regression
(4.5) yt(log 9t)2 on ytxt, t= 1,..., N
and then using N - SSR from the regression (4.3) as asymptotically Xi.
The LM test for exclusion restrictions is also easy to derive. Let zt be a 1 x Q
vector of additional variables, and consider testing Ho: 8 = 0 in
(4.6) E(ytIxt, zt) = [1 + AxtI + Azt5]l/A.
Let , and k be the estimates computed under 8 = 0, so that the fitted values and
residuals are computed under Ho as 9' t [1 + Akxtl3]lA and t = - 9?. Let
V 3,(itand Vk , t be the gradients defined by (3.5) and (3.7). The gradient with
respect to 8 evaluated under the null is
(4.7) V [1 + 1/) - 1Zt.
If WNLS or QMLE is used, each quantity is weighted by 1I\/i and the standard
LM test is obtained as NR 2 from the regression
(4.8) E t on V A t, V A Pit, V a A t
The robust LM test requires the 1 x Q residuals Ft from the regression
(4.9) VA it on V3 Ft, VkA t;
the using N - SSR from the regression
(4.10) 1 on 9t~t, t=1,...,N
is asymptotically XQ under Ho (note that tft- (Etrtl, *.t ttQ) is a 1 X Q
vector). Exclusion restriction tests when A is fixed at some Ao are obtained by
omitting VAkjt in (4.8) or (4.9).
It is important to stress that tests about f3 and A in the current setup are purely
tests about E(ylx): provided the robust forms of the tests are used, the null
hypothesis imposes no assumptions on var (ylx) or any other aspect of the
distribution of y given x. In contrast, tests about p and A in the Box-Cox model, e.g.
those developed by Davidson and MacKinnon (1985), are generally tests about the
entire distribution of y given x. One could reject the null hypothesis because the
distribution is misspecified even if E(ylx) is correctly specified.
5. ON THE ISSUE OF SCALE INVARIANCE
One uncomfortable feature of the model (3.1) and (3.2) is that the t-statistics for
the slope coefficients 12, ... ,OK are not invariant to the scaling of yt whenever A
is estimated along with p. Spitzer (1984) noted the analogous feature for the
Box-Cox model (1.3).
Suppose that (k3, A) are the WNLS estimates using yt as the regressand, and let
(f+k A+) be the corresponding estimates when the regressand is cOyt for some
c0 > 0. Throughout this section it is assumed that xtl 1. As shown in the
Appendix, the estimates satisfy
(5.1) kA+ = Ak; IB1 = (co - 1)Ak + ' cOBl Pi+ co iBj, j = 2, ..,K,
and therefore
(5.2) [1 + Ak x, +] 1A = co[ 1 + kx,]l1A.
The estimate of A is invariant to the scaling of yt, and the estimates of the other
coefficients change so that the fitted values and residuals in the scaled regression
are simply scaled up versions of the fitted values and residuals in the unscaled
regression. Consequently, estimates of elasticities and semi-elasticities are scale
invariant, but tests of whether the population values are zero based on the
t-statistics of I62, * P 1K are not. It turns out that both the usual and robust
standard errors of A are invariant to the scale of y (see the Appendix).
Interestingly, as shown in the Appendix, the Lagrange multiplier statistic for
exlusion of any 1 x Q vector zt (see model (4.6)) is invariant to the scaling of Yt.
Therefore, both the usual and robust LM tests for the null Ho: /3j = 0 (j = 2, ... .
K) are scale invariant, whereas the Wald test (based on the t-statistic) is not.
Dagenais and Dufour (1991) provide an interesting general discussion of the scale
invariance issue in the context of maximum likelihood estimation.
One solution to the lack of scale invariance for the t-statistics is to introduce an
additional scaling parameter to the conditional mean model. In place of (3.1)
consider the model
(5.3) E(ylx) = v[1 + AxI3]lIA,
where v is an additional scale parameter. Note that the elasticity and semi-elasticity
of y with respect to xj are independent of the value of v. Scaling y up or down is
absorbed into the scaling parameter v, leaving /3 and A unchanged. However, if x
contains unity the parameters 3, A, and v are not separately identifiable from the
WNLS objective function. Instead, for the case that P( y > 0) = 1, take v to be the
population geometric mean of y:
(5.4) v exp [E(log y)].
Equations (5.3) and (5.4) represent the modified model. With this choice of v (5.3)
is easily operationalized by replacing v by its sample counterpart vi-
exp (N1 StY1 log (yt)), and /3 and A can be estimated by solving
(5.5) min (yIv - [1 + Axf3]l/A)2.
3,A t=1
Each observation on yt is simply divided by the sample geometric mean of
{yt: t = 1, , N}, and then model (3.1) and (3.2) is estimated. The estimates of
,/ are trivially scale invariant because Yt'v is invariant to scaling. Spitzer (1984)
recommends the same strategy for the Box-Cox model. However, in implementing
this procedure one should recognize that the solutions f3 and A to (5.5) depend on
the estimator v, and therefore the asymptotic variance of 0 (f3', A)' should reflect
this additional source of uncertainty (as different samples of {Yt} are obtained, the
estimator v usually changes). In the general WNLS case, the simplest approach to
this problem is to view 0 as a "two-step" estimator that solves
(5.6) min (Yt - pk(xt; 0, c))2/i,
0 t=1
where A(x; 0, v) v[ I + Axf] 1/A. Under the additional assumption that {(xt, yt)}
is an independent sequence, the Appendix derives the asymptotic variance of the
solution 0 of (5.6), which accounts for the variability of 'v. Define the (K + 1) x 1
vector
CN N (V> t7t)'(VV, t/;)
t= 1
where V ef't is the same as derived in Section 3 except that it is now multiplied by
v; also, note that V 4t = [1 + Ax 8] 1/A is simply the fitted value of the scaled
regressand yt/v. A consistent estimate of the asymptotic variance of 0 is
(5.7) | VFV 1 t [IPI - CN](E tgt )IPI - CN]
where P K + 1, 9t is the 1 x (P + 1) vector 9t (iat V 0Ft, iv log (ytI'V)), 't
Yt -Lt(O, v) Yt A- v + XxjB] i/A, =a i Vo?IV&t, and t =tIV-ot. The
estimator (5.7) is also robust to variance misspecification (heteroskedasticity when
=1 = 1). Note that, in the construction of 9t, v log (yt/Iv) is not weighted by 1I\/&.
Generally speaking, (5.7) differs from the usual robust covariance matrix
estimator
(5.8) | EV,} tV o ,1, E 92V o ,tvitV,,i V,},ut'Vo ,u
However, as shown in the Appendix, (5.7) and (5.8) produce numerically identical
estimates for se(X); this is as it should be since the asymptotic variance of A is
unaffected by the estimation of v. The scale corrected standard errors of P1, ...
13K from (5.7) are generally different from those obtained from (5.8), reflecting the
variation in the estimator 'v.
Because QMLEs in the LEF have first-order conditions identical to WNLS
estimators, the scale invariance results (including those for LM tests) and formula
(5.7) remain valid. Therefore, scale invariant inference is available for exponential
(gamma), Poisson, and other regression models.
6. AN APPLICATION TO A HEDONIC PRICE EQUATION
Harrison and Rubinfeld (1978) (hereafter, HR) estimate a hedonic price equation
for median housing prices in the Boston metropolitan area. Their primary interest
is in the effects of air pollution on housing prices, controlling for a variety of other
town attributes. HR use log of median housing price as the dependent variable in a
linear regression; the explanatory variables are transformed in a variety of ways
discussed below. The data are published and used extensively in the book on
regression diagnostics by Belsley, Kuh, and Welsch (1980) (hereafter, BKW).
The hedonic price equation is reestimated here using model (3.1) and, for
comparison, by standard Box-Cox procedures. Following the discussion in Section
5, y is housing price divided by the sample geometric mean of housing prices.
Because a constant variance for housing prices (conditional on the explanatory
variables) is unlikely, exponential QMLE is used to estimate the parameters of (3.1)
(see (3.9)). Box-Cox estimates are obtained by maximizing the likelihood function
constructed under (1.3).
The results of the estimation are given in Table 1. CRIME is the per capita crime
rate, DIS is a weighted distance to five employment centers in the Boston area,
RAD is an index of highway accessibility, PROPTAX is the property tax rate (per
$10,000), STRATIO is the student-teacher ratio by town, LOWSTAT is the
proportion of the population that is lower status, B is the proportion of the
population which is black, RM is the average number of rooms, and NOX is the
nitrogen oxide concentration (parts per hundred million). With the exception of the
inclusion of CRIME2 and RM in Table 1, the variable transformations are the same
as those used by HR. (I have excluded some of the variables found to be
unimportant by both HR and BKW.) The least natural of the transformations is
(B - .63)2. HR provide a story for this kind of quadratic relationship between
housing price and proportion of blacks in the population, but it is not compelling.
Unfortunately, the raw data on B are not readily available; only (B - .63)2 is
reported in BKW.
The QMLE and Box-Cox procedures give qualitatively similar results, although
the magnitudes of some estimates differ. Interestingly, both approaches suggest
that quadratics in CRIME and RM are warranted, nonlinearities apparently
overlooked by HR and BKW. (CRIME2 and RM are omitted in the HR model.) The
coefficients on the pollution variable, NOX2, are very similar in the two proce-
dures, even though the estimates of the transformation parameter LAMBDA (.75
for QMLE, .40 for Box-Cox) are somewhat different. The model with NOX
replacing NOX2 fits almost as well; including both in the model introduces severe
collinearity, rendering each term insignificant. Therefore, I stick to the transfor-
mation used by HR.
To give some idea of how elasticity estimates differ between the QMLE and
Box-Cox approaches, I computed the elasticity of PRICE with respect to NOX at
the average, minimum, and maximum values of NOX in the sample (all other
variables are evaluated at their sample averages). An estimate of E(PRICEjx) is
needed for the Box-Cox model. Rather than performing numerical integration, I use
the simple estimate [1 + Axt,] 1/A. The estimated elasticities for the QMLE,
evaluated at the average, minimum, and maximum values of NOX, are -.391,
-.175, and -1.224, respectively. For Box-Cox, the corresponding quantities are
-.407, -.188, and -1.140. These elasticity estimates are closer than one might
expect, suggesting that similar functional forms are obtained for different slope/
lambda combinations.
Each estimate comes with several standard errors. For QMLE, the usual (or
TABLE 1
HOUSING PRICE EQUATIONS
Variable QMLE BOX-COX
ONE 2.971 1.967
(0.456) (0.329)
[0.606]
{0.610} {0.329}
CRIME -0.0179 -0.0197
(0.0028) (0.0030)
[0.0034]
{0.0035} {0.0030}
CRIME2 0.00014 0.00014
(0.00003) (0.00004)
[0.00004]
{0.00004} {0.00005}
log (DIS) -0.199 -0.209
(0.027) (0.030)
[0.039]
{0.039} {0.030}
log (RAD) 0.118 0.105
(0.019) (0.019)
[0.019]
{0.019} {0.019}
PROPTAX -0.0005 -0.0003
(0.0001) (0.0001)
[0.0001]
{0.0001} {0.0001}
STRATIO -0.032 -0.032
(0.005) (0.006)
[0.004]
{0.004} {0.006}
log (LOWSTAT) -0.366 -0.355
(0.023) (0.018)
[0.038]
{0.038} {0.018}
(B - .63)2 0.150 0.281
(0.074) (0.095)
[0.097]
{0.097} {0.095}
RM -0.982 -0.667
(0.150) (0.094)
[0.194]
{0.195} {0.094}
RM2 0.086 0.060
(0.013) (0.007)
[0.016]
{0.016} {0.007}
NOX2 -0.0062 -0.0065
(0.0010) (0.0012)
[0.0012]
{0.0012} {0.0012}
nonrobust) standard errors are in parentheses. These are obtained as the square
roots of the diagonal elements from the inverse of the Hessian of the exponential
log-likelihood, times 'a (which is reported in the table). Here, 'a2 is computed with
a degrees-of-freedom adjustment as
TABLE 1
(CONTINUED)
Variable QMLE BOX-COX
LAMBDA 0.748 0.402
(0.132) (0.056)
[0.160]
{0.160} {0.057}
number of observations: 506 506
log likelihood: 164.04 177.47
sigma: 0.182 0.173
r-squared: 0.821 0.823
Notes:
i. For QMLE, standard errors in parentheses are computed from the inverse of the Hessian times
&.2 The standard errors in brackets are the robust standard errors. In braces are the standard errors
corrected for the estimated geometric mean; these are also robust to variance misspecification.
ii. For Box-Cox, standard errors in parentheses are computed from the outer product of the
score. The quantities in braces are the standard errors corrected for the estimated geometric mean.
iii. The value of the log-likelihood for QMLE is the value of the gamma log-likelihood evaluated
at ,, A, and &2. The gamma density is parameterized such that var (yIx) = o-2[E(y Ix)]2. This
provides some evidence on how well a gamma distribution fits the data, but recall that the
parameter o-2 is not jointly estimated with f3 and A. Therefore, this is not the maximized value of
the gamma log-likelihood.
iv. The r-squareds are computed as R I- 1_tXl (jt - _t)2/Y (Yt - 7) 2, where the 5t
are the fitted values for yt: t =- [1 + Axt~I] 1/A. This is the formula used for Box-Cox and QMLE,
even though [1 + Axt,8] 1/A is not the conditional mean of yt given xt for the Box-Cox model.
NN
(6.1) a2 e(N- P)-' ut=-y)/y3(N_ P)-l ?
t=1 t=I
where P = 13 for the current model. The quantities in brackets are the robust
standard errors, obtained from (5.8). The standard errors corrected for the
estimated geometric mean, obtained from (5.7), are in braces; these are also robust
to variance misspecification.
For Box-Cox, the standard error in parentheses is obtained from the outer
product of the score. Because of recent evidence showing that these estimates can
substantially understate the true variability in the coefficient estimators, they
should be interpreted with caution. There is no robust standard error since the
model (1.3) is assumed to be true. Using an argument similar to that for the QMLE,
the Box-Cox standard errors can be corrected for the estimation of the geometric
mean of PRICE; these are given in braces.
Most coefficients are highly statistically significant. Both the QMLE and Box-
Cox estimates reject Ho: A = 0 quite strongly, so that the HR specification is
rejected along this dimension as well. With the exception of (B - .63)2, all
coefficients have (robust) t-statistics greater than three. Interestingly, for the
QMLE, the significance and magnitude of (B - .63)2 has declined substantially; the
estimate is .15 with a robust t-statistic of 1.5. (HR obtain an estimate of .36 with a
t-statistic of 3.5.) Thus, rather than adopt HR's tenuous explanation for a positive
effect for large values of B, one might conclude that B is insignificant. Part of the
reduction in the coefficient on (B - .63)2 is due to the estimation of lambda, and
part is due to the additional nonlinearities allowed for by adding CRIME2 and RM
to HR's specification. The estimate and t-statistic are notably higher for the
Box-Cox procedure.
A useful regularity in Table 1 is that the correction to the standard errors for the
presence of the estimated geometric mean has a very small effect. For QMLE, the
robust and scale-corrected standard errors differ in insignificant digits. For Box-
Cox, the usual and scale-corrected standard errors are almost indistinguishable.
Berndt, Showalter, and Wooldridge (1990) also found relatively minor differences
in the two standard errors for a variety of data sets. This suggests that ignoring the
preliminary estimation of scale does not lead one too far astray when performing
inference on the slope coefficients.
Finally, it is important to examine which approach, QMLE or Box-Cox, fits this
data set best. In terms of r-squareds (the amount of variation in y t explained by xt),
the two approaches provide almost identical fits. (Incidentally, the largest possible
r-squared for the mean function (3.1) and (3.2) is obtained by performing NLS. For
this data set, the NLS r-squared is .827.) Direct comparison of the exponential and
Box-Cox log-likelihoods is uninformative because the exponential distribution is
much too restrictive, requiring the variance to be equal to the mean squared (for
this data set, var (ylx)/[E(ylx)]2 is estimated to be .033, well below unity, the value
implied by the exponential distribution). It is more informative to evaluate the
gamma log-likelihood at the estimated /3, A, and aQ2, and to compare this to the
Box-Cox log-likelihood. Still, this is biased against the QMLE because the gamma
log-likelihood is not evaluated at its maximized value; recall that, for robustness
reasons, the exponential log-likelihood is maximized to obtain /3 and A, and then ar2
is obtained by (6.1). For this data set, the Box-Cox log-likelihood is 177.47 and the
gamma log-likelihood is 164.04 (the exponential log-likelihood is -513.67). How-
ever, it is difficult to know what to make of this difference.
Berndt, Showalter, and Wooldridge (1992) find notable differences between the
two approaches in terms of goodness-of-fit and parameter estimates, at least for
some data sets they examine. Consequently, the finding of similar results for this
particular data set should not be extrapolated to all empirical applications.
7. CONCLUDING REMARKS
The Box-Cox transformation has proven to be a useful tool for generalizing
functional form in statistics and econometrics, but it is not well suited for all
applications where interest centers on E(ylx). As an alternative to Box-Cox type
approaches, this paper has proposed a flexible nonlinear model for E(ylx). No
second moment or other distributional assumptions are relied upon to obtain
consistent estimates or to perform asymptotically valid inference. Consequently,
all of the robust LM tests of the special cases covered in Section 4 are pure
conditional mean tests; a rejection can be confidently interpreted as a rejection of
the model for E(ylx), and not as a rejection of some other distributional assump-
tion.
The model studied here can easily incorporate Box-Cox transformations of the
explanatory variables. For positive variables xj, j = J + 1, ..., K, let xj(pj)
denote the Box-Cox transformation of x;, as in (1.1) and (1.2). Then (3.1) and (3.2)
are extended by replacing x with x(p) (x1, ., xj, xj+i(PJ+I) ---, XK(PK))-
Estimation and inference in this more general model are covered in Wooldridge
(1989).
Michigan State University, U.S.A.
APPENDIX
All of the results in this appendix assume that Xtl - 1, t = 1, 2, .... For
notational simplicity, the results are proven only for the unweighted case.
CLAIM 1. A = A; ,8j = (c& - 1)X-1 + c6 i; pj = co 6j, j = 2,..., K.
PROOF. Let jut(,8, A)-- [1 + AxtBI]lIA, and let 'Vpt(p, A) and VALt((P, A)
denote the derivatives. Then the first order conditions for (,B+, kA+) are
(A. 1) E V,,u t (,8 +,k +)'(co y t-, +, A +))-O.
t=1
(A.2) E Vk A t, k, +)'(c Oy t- ju( t ,kA+))--O.
t=1
Because the solutions are assumed to be unique, it suffices to show that 3 + and A +
given by (5.1) satisfy (A. 1) and (A.2). Then (A. 1) reduces to showing
(A.3) E cj VAVe,,(,B, 8k)'(coyt - co,(/3, Ak))
t= 1
=C2-A v k)'(y, - , k ))=0.
t= 1
But (A.3) follows from the first order condition for (,B, A). Next, straightforward
algebra establishes that
'V~A ,Ut 68 , k +) = C O'VA ytt A)
+ A2C0k [1 + X'kxt](lIA) '[co + k log (co)c4 (1 + Ax,) - 1],
from which it follows
(A.4) > VA t(P, A +)'(coyt - (p +, A +))
t= 1
NN
= CO 2 VAK?(, A)(Y, -,(p A)) + > X-2co2-A[1 + Axt](l/A)1
t=1 t=1
x [co + A log (co)co(1 + 8-l](yt-t(1, A)).
The first term on the right-hand side of (A.4) is zero by the first order condition for
(W, A). The first order condition for (13, A) also implies that the second right-hand
side term of (A.4) is zero. This is because
(A.5) A [1 + Ax ]{(iA) l}(yt - (p, k)) 0
t=1
is the first element of Xt U Vpyt(p, A)'(yt- t(3,B A)) if xt- 1. Also,
E [1 + Xxt]{(l1A)-1}(1 + - (i X))
= E [ 1 + A xp] {(1IA) - -( y - A))
t= 1
+ A > [1 + X- t A)) = 0
t=1
as this is a linear combination of _t-l V , A)'(yt - j(, X)). This
establishes (A.1) and (A.2) for (,f+, A+) given by (5.1). C:
CLAIM 2. se(A) is scale invariant.
PROOF. It is shown that the asymptotic variance of A is scale invariant. A
standard mean-value expansion yields
N -1 N
(A-6) ( - 0) = N-1 E Vout(0)`VoPut(0) N-112E V out(0)'c, + op(l).
Focusing on the last element of (A.6), that corresponding to A, we have
N -I N
(A.7) ( - A) = N1 E r,)N12 E rt + opl,
where rt is the residual from the regression
(A.8) VAjut(0) on Vput(O), t= 1, ,N.
The true error et is scaled up by c0 when yt is scaled up by c0, so it suffices to show
that rt is also scaled up by co. Then the expression (A.7) is scale invariant, and then
so must be the asymptotic variance of A. Let values superscripted by "o" denote
the scaled values. Then
(A.9) V 611t(0 ) = c VoAt(0
(A.10) VAF At(60) = COVA A t(O) + A -cO [1 + AxtP](
x [cA + A log (co)c4'(1 + Axtf) - 1].
From (A.10), VAu(60?) can be expressed as VAAJ0?) = coVAxJ(O) + Vptkt(O)a
for a K x 1 vector a. From (A.9),
VA, At(O0) = CoVAAt(0) + C AV t(O O)a.
Consequently, the residuals from the regression VVApt(00) on Vp,/t(00), say ro0,
satisfy ro' = cort; thus, (N-1 ' 1 (rN?) 2K 1/2 1 = (NN1 E 1
rt) -1N-1/2 itfl rt~t and the asymptotic variance of A is invariant. That the
computed standard errors are invariant follows because rt and 't are also scaled up
by co. El
CLAIM 3. The LM statistic for exclusion restrictions is scale invariant.
PROOF. In the unrestricted model
pU(x, z; 8 , A, 8) = [1 + Ax,3 + Az8]l/A
consider testing Ho: 5 = 0. If the regressand is cOyt for co > 0 then the gradients
used for the test on scaled data are related by
IVO3,u t 68l +, A +, ) = co- AVj u, tf A, )
VA ,U ( p A O) = 0VA ?,8 A, 0) + A co
X [1 + XtB](lIA) 1[c1 0 + A log (co)c (1 + AXE) - 1]
Vsju t+ A +, 0) = c o- aA t 68 I AI 0)
Because the gradients of the scaled data are linear combinations of the gradients for
the unscaled data, and because Et+ = cO Et, the r-squareds from the regressions
?,+ on Vg3,ut(,l3+, A+, 0), VAAt(P+, A+, 0), +, A+, 0)
? on t XVut(,lk, 0), VAkt(I, A , 0), V 8 A, , 0)
are numerically identical. This shows that the nonrobust LM statistics are numer-
ically identical. For the robust test, note that the residuals rj+ from the regression
V 8 u(P6 +, A+, 0) on V put(,8 +, A+, 0), VA U(P+, A+, 0)
are scalar multiples of the residuals rt from the regression
V8L~(f,8 A, 0) on Vojut(,8, A, 0), VAU(I, S, 0)
(rt+ = c irt). Consequently, the sum of squared residuals from the regressions 1
on + r + and 1 on s'rt are identical. El
CLAIM 4. The scale-corrected standard error of A is identical to the robust
standard error of A (yt has been scaled by the geometric mean v).
PROOF. 0 (,13, A) now satisfies
EVo yt(o ), (Yt - A t(o' , ) 0
t= 1
where u (V) = Vi1 + AXtfa] / and VPut(0, v) and VAjut(0, v) are also scaled
up by 'v. A mean value expansion and the delta method yield
N -1 N
(A. 1 1) jN(O - 0) = N-1 Vo}u V( t N - 1/2 NV(', t
t ~~~N12t1Vpjc
-(N-i E V0UV 0 )U (N-i E 0Ui VU
X -1/2 v log (yIv) + op(l)
t= 1
where all elements are evaluated at (0, v). The second term in (A. 11) is the
contribution due to the estimation of v; the first term is as before. Thus, it suffices
to show that last element of
(A. 12) (N-' E Vo,,V@,t (N-' VOA,,v,,,t
(the element corresponding to A) is identically zero. But (A. 12) is the vector of
coefficients from the regression
Vf,u on VlpA, VAA, t= 1,...,N.
The coefficient on VAut is also obtained by first obtaining the residuals rt from the
regression VAut on V0,ut and then computing the coefficient from the regression
V,,ut on rt. Thus, it suffices to show that V,ut and rt are orthogonal. But the
residuals rt are orthogonal to any linear combination of Vpput, i.e. StNl Vpu'rt =
0, and V,/t = [1 + Axtf] i/A - [1 + Axt]i/A-I + [1 + Axt,] /A-xtfl, which
is a linear combination of Voutt whenever xt contains a constant. Thus, for A,
(A. 11) is the same as (A.6). This completes the proof. D
REFERENCES
AMEMIYA, T. AND J. L. POWELL, "A Comparison of the Box-Cox Maximum Likelihood Estimator and the
Non-Linear Two Stage Least Squares Estimator," Journal of Econometrics 17 (1981), 351-381.
ANDREWS, D. F., "A Note on the Selection of Data Transformations," Biometrika 58 (1971), 249-254.
BELSLEY, D. A., E. KUH, AND R. E. WELSCH, Regression Diagnostics: Identifying Influential Data and
Sources of Collinearity (New York: Wiley, 1980).
BERNDT, E. R. AND M. S. KHALED, "Parametric Productivity Measurement and Choice among Flexible
Functional Forms," Journal of Political Economy 87 (1979), 1220-1245.
, M. H. SHOWALTER, AND J. M. WOOLDRIDGE, "An Empirical Investigation of the Box-Cox Model
and a Nonlinear Least Squares Alternative," Econometric Reviews (forthcoming 1992).
BICKEL, P. J. AND K. A. DOKSUM, "An Analysis of Transformations Revisited," Journal of the American
Statistical Association 76 (1981), 296-311.
Box, G. E. P. AND D. R. Cox, "An Analysis of Transformations," Journal of the Royal Statistical Society
Series B 26 (1964), 211-252.
DAVIDSON, R. AND J. G. MACKINNON, "Testing Linear and Loglinear Regressions Against Box-Cox
Alternatives," Canadian Journal of Economics 25 (1985), 499-517.
DAGENAIS, M. G. AND J.-M. DUFOUR, "Invariance, Nonlinear Models, and Asymptotic Tests," Econo-
metrica 59 (1991), 1601-1615.
DUAN, N., "Smearing Estimate: A Nonparametric Retransformation Method," Journal of the American
Statistical Association 78 (1983), 605-610.
ENGLE, R. F., "Wald, Likelihood Ratio, and Lagrange Multiplier Statistics in Econometrics," in Z.
Griliches and M. D. Intriligator, eds., Handbook of Econometrics, Volume 2 (Amsterdam: North
Holland, 1984), 775-826.
FARBER, H. AND R. GIBBONS, "Learning and Wage Determination," mimeo, MIT Department of
Economics, 1990.
GOLDBERGER, A. S., "The Interpretation and Estimation of Cobb-Douglas Production Functions,"
Econometica 35 (1968), 464-472.
GOURIEROUX, C., A. MONFORT, AND A. TROGNON, "Pseudo-Maximum Likelihood Methods: Theory,"
Econometrica 52 (1984), 681-700.
HARRISON, D. AND D. L. RUBINFELD, "Hedonic Housing Prices and the Demand for Clean Air," Journal
of Environmental Economics and Management 5 (1978), 81-102.
HUANG, C. J. AND J. A. KELINGOS, "Conditional Mean Function and a General Specification of the
Disturbance in Regression," Southern Economic Journal 45 (1979), 710-717.
MCCULLAGH, P. AND J. A. NELDER, Generalized Linear Models, 2nd ed. (New York: Chapman and Hall,
1989)
MIZON, G. E., "Inferential Procedures in Nonlinear Models: An Application in a UK Industrial Cross
Section Study of Factor Substitution and Returns to Scale," Econometrica 45 (1977), 1221-1242.
MUKERJI, V., "A Generalized SMAC Function with Constant Ratios of Elasticity of Substitution,"
Review of Economic Studies 30 (1963), 233-236.
POIRIER, D. J., "The Use of the Box-Cox Transformation in Limited Dependent Variable Models,"
Journal of the American Statistical Association 73 (1978), 284-287.
AND A. MELINO, "A Note on the Interpretation of Regression Coefficients within a Class of
Truncated Distributions," Econometrica 46 (1978), 1207-1209.
SPITZER, J. J., "Variance Estimates in Models with Box-Cox Transformations: Implications for Estima-
tion and Hypothesis Testing," Review of Economics and Statistics 66 (1984), 645-652.
WHITE, H. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
Heteroskedasticity," Econometrica 48 (1980), 817-838.
AND I. DOMOWITZ, "Nonlinear Regression with Dependent Observations," Econometrica 52
(1984), 143-162.
WOOLDRIDGE, J. M., "Some Alternatives to the Box-Cox Regression Model," Working Paper No. 534,
MIT Department of Economics, 1989.
, "On the Application of Robust, Regression-Based Diagnostics to Models of Conditional Means
and Conditional Variances," Journal of Econometrics 47 (1991a), 5-46.
, "Specification Testing and Quasi-Maximum Likelihood Estimation," Journal of Econometrics 48
(1991b), 29-55.

Some Alternatives To The Box-Cox Regression Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Some Alternatives To The Box-Cox Regression Model

Uploaded by

Copyright:

Available Formats

Institute of Social and Economic Research, Osaka University

Economics Department of the University of Pennsylvania

Some Alternatives to the Box-Cox Regression Model

You may need to log in to JSTOR to access the linked references.

Vol. 33, No. 4, November 1992

SOME ALTERNATIVES TO THE BOX-COX REGRESSION MODEL*

A nonlinear regression model is proposed as an alternative to the Box-Cox

regression model for nonnegative variables. The functional form contains

linear, exponential, and reciprocal models as special cases. Unlike Box-Cox

robust to conditional variance and other distributional misspecifications.

Computationally simple, robust Lagrange multiplier statistics for restricted

and the Lagrange multiplier statistic for exclusion restrictions is shown to be

Economists and other social scientists are often interested in explaining a

nonnegative variable y in terms of some explanatory variables x (x1, x2,

distribution of y given x is of interest. The two leading candidates are the

conditional expectation of y given x, E( ylx), and the conditional median of y given

developing economic theory, particularly in the context of rational expectations; a

a linear function of x, or a linear function of some vector of transformations ?(x),

often provides an inadequate description of E(ylx). In addition, y is frequently

heteroskedastic, resulting in the usual inference procedures being inappropriate.

y conditional on x is normally distributed is untenable because y is nonnegative.

documenting the data set.

be ruled out a priori and heteroskedasticity is often less of a problem in linear

the estimates of E(log ylx) can be transformed into estimates of E(ylx).

identity and logarithmic transformations as special cases. For nonnegative y the

Box-Cox transformation is defined as

(1.1) y(A) (yA-1)/A, A O

(1.2) -log y, A =O.

The case A ? 0 is allowed only if P(y > 0) = 1.

vector f3 and some 2> 0,

(1.3) y(A)Ixx - N(xp, q 2).

It is well-known that (1.3) cannot strictly be true unless A = 0. The inconsistency

of the Box-Cox estimators of A, /, and q2 due to the inherent nonnormality is well

A and then performing inference on f3 as if A were known has been criticized by

Bickel and Doksum 1981).

the explanatory variables on E[y(A)lx]. But if y is the variable important to

of y given x. The parameters f3, A, and V2 in a Box-Cox model are of interest

distribution for y(A).

that the error ? y(A) - xf has zero expectation conditional on x. However,

estimation of E(ylx) requires something stronger, such as ? being independent of x.

power transformation of y results in a linear conditional expectation with errors

independent of x (which rules out heteroskedasticity conditional on x) is still asking

(1983) smearing estimator can be used to estimate E(ylx).

of primary interest with the observation that the Box-Cox transformation is

for a (possibly nonexistent) transformation of the explained variable that simulta-

neously induces linearity of the conditional expectation, homoskedasticity, and

quasi-maximum likelihood (QML) estimators in the linear exponential family (LEF)

robust, possibly inefficient estimates of economically interesting parameters is

often preferred to obtaining efficient (under correct specification of the distribution)

but nonrobust estimates of parameters that might not be of interest.

functions used in applied economics. It is as flexible as the Box-Cox transformation

E(ylx) can be carried out without imposing auxiliary distributional assumptions.

terms of E(ylx), where y is the variable to be explained. Section 3 proposes the

invariance of test statistics. An empirical application to Harrison and Rubinfeld's

extending the basic model are contained in Section 7.

2. SOME CONSIDERATIONS WHEN CHOOSING FUNCTIONAL FORM

Transformations of the explained and explanatory variables are used quite

interpreting parameter estimates. For example, for y and xi positive random

scalars, it is easy to construct examples such that

(2.1) aE(log ylx)/d log xj =$ a log E(ylx)Id log xj.

When y and x are deterministically related there is no problem. If y = f(x) for a

elasticity of y with respect to x; is

(2.2) dXj fix)' f(x)$O .