You are on page 1of 22

Institute of Social and Economic Research, Osaka University

Economics Department of the University of Pennsylvania

Some Alternatives to the Box-Cox Regression Model


Author(s): Jeffrey M. Wooldridge
Source: International Economic Review, Vol. 33, No. 4 (Nov., 1992), pp. 935-955
Published by: Wiley for the Economics Department of the University of Pennsylvania and
Institute of Social and Economic Research, Osaka University
Stable URL: http://www.jstor.org/stable/2527151
Accessed: 01-03-2016 18:02 UTC

REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/2527151?seq=1&cid=pdf-reference#references_tab_contents

You may need to log in to JSTOR to access the linked references.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Institute of Social and Economic Research, Osaka University, Wiley and Economics Department of the University
of Pennsylvania are collaborating with JSTOR to digitize, preserve and extend access to International Economic Review.

http://www.jstor.org

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
INTERNATIONAL ECONOMIC REVIEW

Vol. 33, No. 4, November 1992

SOME ALTERNATIVES TO THE BOX-COX REGRESSION MODEL*

BY JEFFREY M. WOOLDRIDGE1

A nonlinear regression model is proposed as an alternative to the Box-Cox

regression model for nonnegative variables. The functional form contains

linear, exponential, and reciprocal models as special cases. Unlike Box-Cox

type approaches, the proposed estimators of the conditional mean function are

robust to conditional variance and other distributional misspecifications.

Computationally simple, robust Lagrange multiplier statistics for restricted

versions of the model are derived. Scale invariant t-statistics are proposed,

and the Lagrange multiplier statistic for exclusion restrictions is shown to be

scale invariant.

1. INTRODUCTION

Economists and other social scientists are often interested in explaining a

nonnegative variable y in terms of some explanatory variables x (x1, x2,

XK). Formalizing this notion requires one to decide which aspect of the conditional

distribution of y given x is of interest. The two leading candidates are the

conditional expectation of y given x, E( ylx), and the conditional median of y given

x, M(ylx). Especially for nonnegative variables the conditional mean and condi-

tional median can be very different. Because of its simple algebra (linearity, the law

of iterated expectations), the conditional mean has been used more extensively in

developing economic theory, particularly in the context of rational expectations; a

recent example is the learning model for wage determination studied by Farber and

Gibbons (1990). While there are situations where the conditional median is of

interest, the conditional expectation continues to receive the most attention among

theorists and empirical economists. This paper is about estimating models for

E(ylx), and the subsequent discussion assumes that E(ylx) is the function of

interest.

The first model studied in econometrics courses, which postulates that E(ylx) is

a linear function of x, or a linear function of some vector of transformations ?(x),

often provides an inadequate description of E(ylx). In addition, y is frequently

heteroskedastic, resulting in the usual inference procedures being inappropriate.

Finally, although of less importance for asymptotic inference, the assumption that

y conditional on x is normally distributed is untenable because y is nonnegative.

In econometrics the most common alternative to the linear model for E(y Ix) is a

*Manuscript received May 1989; revised February 1990 and July 1991.

1 This is a condensed and revised version of MIT Department of Economics Working Paper No. 534.

I am grateful to Ernie Berndt, Marcel Dagenais, Jean-Marie Dufour, Zvi Griliches, Dale Poirier, Mark

Showalter, Jeff Zabel, three anonymous referees, and the participants of the Boston College and

University of Montreal econometrics workshop for providing helpful comments. Special thanks to Ernie

Berndt for directing me to the issue of scale invariance, and to Diego Rodriguez for supplying and

documenting the data set.

935

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
936 JEFFREY M. WOOLDRIDGE

linear model for E(log ylx), provided that P(y > 0) = 1. Normality of log y cannot

be ruled out a priori and heteroskedasticity is often less of a problem in linear

models with log y as the dependent variable. However, the important issue of

whether the linear model for log y implicitly provides the best description of E( ylx)

depends on the particular application. One cannot even investigate this issue unless

the estimates of E(log ylx) can be transformed into estimates of E(ylx).

Noting that the identity and logarithmic transformations in linear models are too

restrictive, Box and Cox (1964) suggested a transformation of y that contains the

identity and logarithmic transformations as special cases. For nonnegative y the

Box-Cox transformation is defined as

(1.1) y(A) (yA-1)/A, A O

(1.2) -log y, A =O.

The case A ? 0 is allowed only if P(y > 0) = 1.

In the Box-Cox regression model there is a value A E R such that for some K x 1

vector f3 and some 2> 0,

(1.3) y(A)Ixx - N(xp, q 2).

It is well-known that (1.3) cannot strictly be true unless A = 0. The inconsistency

of the Box-Cox estimators of A, /, and q2 due to the inherent nonnormality is well

documented (see, e.g., Amemiya and Powell 1981). Also, the practice of estimating

A and then performing inference on f3 as if A were known has been criticized by

various econometricians and statisticians (see, e.g., Amemiya and Powell 1981 and

Bickel and Doksum 1981).

From a social scientist's point of view there is the more important problem of

interpreting the parameters P and A. The vector P measures the marginal effects of

the explanatory variables on E[y(A)lx]. But if y is the variable important to

economic agents or policy makers then interest typically lies in the expected value

of y given x. The parameters f3, A, and V2 in a Box-Cox model are of interest

primarily because they also index E(ylx). Poirier and Melino (1978) show that Pij

and aE(ylx)laxj have the same sign when y(A) is assumed to have a plausible

truncated normal distribution, but their expression for E(ylx) relies on the assumed

distribution for y(A).

Distribution-free methods are available for estimating P and A, e.g. Amemiya and

Powell's (1981) nonlinear two stage least squares estimator. All that is required is

that the error ? y(A) - xf has zero expectation conditional on x. However,

estimation of E(ylx) requires something stronger, such as ? being independent of x.

This is less restrictive than the standard Box-Cox model but requires, at a

minimum, bounding the supports of x and E in ways that depend on the true

parameters (to ensure the inequality E 2 -(1 /A + xP) for A + 0). Requiring that a

power transformation of y results in a linear conditional expectation with errors

independent of x (which rules out heteroskedasticity conditional on x) is still asking

a lot of economic data. But, if one is willing to make such an assumption, Duan's

(1983) smearing estimator can be used to estimate E(ylx).

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 937

The motivation underlying this paper combines my belief that E(ylx) is the object

of primary interest with the observation that the Box-Cox transformation is

typically used as a device for generalizing functional form. Rather than searching

for a (possibly nonexistent) transformation of the explained variable that simulta-

neously induces linearity of the conditional expectation, homoskedasticity, and

normality (or independence of the implicit errors and explanatory variables), this

paper attempts the more modest task of specifying a functional form for E( ylx) that

contains the linear, exponential, constant elasticity, and a variety of other regres-

sion models as special cases. The parameters of the conditional mean specification

are easy to interpret and weighted nonlinear least squares (WNLS) estimators or

quasi-maximum likelihood (QML) estimators in the linear exponential family (LEF)

are likely to be sufficiently precise for many applications. In some cases the WNLS

estimator (or QMLE) is fully efficient. This notwithstanding, it seems that obtaining

robust, possibly inefficient estimates of economically interesting parameters is

often preferred to obtaining efficient (under correct specification of the distribution)

but nonrobust estimates of parameters that might not be of interest.

The functional form studied here extends those used by others in the literature on

nonlinear estimation (e.g., Mukerji 1963, Mizon 1977, Berndt and Khaled 1979),

and provides a unified framework for analyzing and testing several of the regression

functions used in applied economics. It is as flexible as the Box-Cox transformation

for modelling E(ylx) but, in contrast to Box-Cox type approaches, tests about

E(ylx) can be carried out without imposing auxiliary distributional assumptions.

Section 2 of the paper presents a case for defining all economic quantities in

terms of E(ylx), where y is the variable to be explained. Section 3 proposes the

basic model for E( ylx) and derives the asymptotic covariance matrix of the WNLS

(QML) estimator. Standard and robust Lagrange multiplier (LM) tests for restricted

versions of the model are derived in Section 4. Section 5 discusses the issue of scale

invariance of test statistics. An empirical application to Harrison and Rubinfeld's

(1978) housing price data is provided in Section 6, and some remarks about

extending the basic model are contained in Section 7.

2. SOME CONSIDERATIONS WHEN CHOOSING FUNCTIONAL FORM

Transformations of the explained and explanatory variables are used quite

liberally in the social sciences, often without, regard for the implications for

interpreting parameter estimates. For example, for y and xi positive random

scalars, it is easy to construct examples such that

(2.1) aE(log ylx)/d log xj =$ a log E(ylx)Id log xj.

This raises the issue of which of these is the appropriate definition of an elasticity.

When y and x are deterministically related there is no problem. If y = f(x) for a

differentiable f(-) then the marginal effect of xj on y is simply af(x)Iaxj, and the

elasticity of y with respect to x; is

(f2x) xi

(2.2) dXj fix)' f(x)$O .

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
938 JEFFREY M. WOOLDRIDGE

When y and x are random the natural definition of the marginal effect of xj on y is

the marginal effect of xj on the expected value of y given x, namely aE(yIx)/axj.

(Here it is useful to remind the reader that all quantities could instead be defined in

terms of the conditional median.) The elasticity of y with respect to xj, keeping

XI, ..., XfI, Xi+l,..., XK fixed, is

3aE( Xi E(ylx) O,

(2.3) axj E(ylx)'

which is the same as the right-hand side of (2.1) when y and xj are positive. It is

easily seen that using E(ylx) as the basis for all definitions preserves the relation-

ships that hold among economic quantities in the deterministic case. Further,

agreeing that measures should be based on E(ylx) results in straightforward

comparisons across different parametric models. Of course the marginal effect of xj

on E( y Ix) generally changes as the list of conditioning variables changes, but this is

always the case in regression-type analyses-even in fully nonparametric settings.

Another argument for defining economic quantities in terms of E(ylx) is that it

circumvents the issue of whether the "disturbances" in a model are additive or

multiplicative. When y 2 0 and interest centers on g(x) E( ylx), the disturbances

can be multiplicative or additive: The models

(2.4) y = / (x) + E, E(Elx) = O

(2.5) y = (x)n), 2)O, E()Ix) = 1

are both correct and observationally equivalent without further restrictions on E

and h; just define e y - /i(x) and 71 y//(x) (assuming that P[/i(x) > 0] = 1).

If e and Tj are assumed to be independent of x then the models imply different

var (ylx), so they cannot both be correct. But without such additional assumptions

(2.4) and (2.5) are simply different ways of stating that E(ylx) = ,(x).

In summary, in most empirical studies estimates of E[>p(y)Ix] for some nonlinear

transformation 'p(y) are useful only if enough structure has been imposed to

recover E(ylx) (unless the conditional median is of interest). Goldberger (1968),

Poirier (1978), Poirier and Melino (1978), and Huang and Kelingos (1979) make

essentially the same point in the context of specific models. However, the current

paper is motivated by the fact that the only way to estimate E(ylx) without

imposing additional distributional assumptions is to specify a model for E(ylx)

directly.

3. SPECIFICATION AND ESTIMATION OF THE MODEL

Let y be a nonnegative random variable and let x (X1, X2, ..., XK) be a 1 x

K vector of explanatory variables. In most cases the first element xI is unity.

Without any assumptions on the conditional distribution of y given x consider the

following model for E(yIx):

(3.1) E(ylx) = [1 + Ax]h/A, A #0

(3.2) = exp (x13), A = 0.

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 939

Technically, it would be more precise to replace (13, A) in (3.1) and (3.2) by

something such as (130, AO) to distinguish the "true" parameters from the generic

parameter vector (13, A). As this results in a notational nightmare, (13, A) is used to

denote the true values as well as generic values. As usual, the vector x can contain

nonlinear transformations of an underlying set of explanatory variables, whereas y

should be the quantity of interest.

It is straightforward (e.g. Wooldridge 1989) to derive the functional form (3.1)

and (3.2) from the Box-Cox conditional mean model E[y(A)lx] = x,8 under the

plausible assumption that log y conditional on x is normally distributed with

constant variance. However, in what follows only (3.1) and (3.2) is assumed to

hold. In particular, y need not be continuously distributed, so that (3.1) and (3.2)

can be applied to count data.

For (3.1) to be well-defined the inequality 1 + Axf3 2 0, with strict inequality for

A < 0, must hold for all relevant values of x. This is analogous to requiring

nonnegativity of the regression function in a Box-Cox model when A + 0. Equation

(3.2) is the natural definition of the regression function at A = 0 as it is the limiting

case of (3.1) as A -> 0. The exponential model is attractive in that it ensures that

predicted values are well defined and positive for all x and P3.

To estimate f3 and A by WNLS or QMLE, the derivatives of the regression

function are needed. Define the (K + 1) x 1 parameter vector 0 (13', A)' and

express the parameterized regression function for E(ylx) as

(3.3) /u(x; 0) [1 + Axp]l/A, A #0

(3.4) 2exp(x13), A=0.

The gradient of p(x; 0) with respect to 13 is the 1 x K vector

(3.5) Vpp,(x; 0) = [1 + Ax13]{(/A) }X, A #0

(3.6) V 1 , (x; 13, 0) = exp (x,8)x, A = 0.

For A =A 0, straightforward but tedious algebra yields

(3.7) VA P (X; 0) = [1+ Ax,8]I/A

X [Ax13 - (1 + Ax13) log (1 + Ax13)]/[1A2(1 + Ax,8)];

for future reference, note that VAU(x; 0) is a linear combination of V,38p(x; 0) and

,u(x; 0) log [,(x; 0)]. The derivative of ,(x; 0) with respect to A at A = 0 can be

obtained by L'Hospital's rule; the details are omitted for brevity:

(3.8) V , (x; 13, 0) = -exp (x,8) (x,8) 2/2.

Under (3.1) and (3.2) and standard regularity conditions (e.g. White and

Domowitz 1984), 0 can be consistently estimated by NLS. In some cases the NLS

estimator would have a large asymptotic variance matrix due to the substantial

heteroskedasticity in many nonnegative variables. Gains in efficiency might be

realized by using a weighted NLS procedure or a quasi-likelihood procedure. To

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
940 JEFFREY M. WOOLDRIDGE

this end, let co(x; y) > 0 be a weighting function that can depend on an M x 1 vector

of parameters y. Setting co(x; y) 1 yields NLS. Other possibilities for co are

(3.9) tx;)[x;) (-)

(3.10) wj (x; y) = [u(x; 0) ( = 0)

(3.11) wj (x; y) = exp (xy) (,y -7 8 necessarily)

(3.12) c(x; y) = [1 + px5] P, p #& 0 (y =(8', p)').

Equation (3.9) is appropriate if var ( yIx) is proportional to [E( ylx)] 2, while (3 .10) is

appropriate if var (ylx) is proportional to E(y Ix). The weighting functions (3.11) and

(3.12) allow for additional flexibility, and encompass (3.9) and (3.10). It is important

to remember that, except where stated, the weighting function is not assumed to be

correctly specified for var (ylx). In other words it is not assumed that

(3.13) var (ylx) = -2cw(x; y) for some y E RM and some o-2 > 0,

but only that such considerations motivate the choice of cl. The idea is that

nonconstant weighting functions can result in efficiency gains even if (3.13) does not

hold. Because the goal is to test hypotheses about the conditional mean parameters

0, inference should be robust to violations of (3.13).

The robust asymptotic variance-covariance matrix of the WNLS estimator of 0

can be obtained using the approach of White (1980). Let {(xt, yt): t = 1, 2, ...} be

a sequence of random vectors following the regression model (3.1) and (3.2), and

suppose there are N observations available. Assume either that these observations

are independent or that they constitute an ergodic time series such that

(3.14) E(ytIxt) = E(ytIxt, yt- 1, xt - 1, Yt-2, Xt -2, ...), t = 1, 2,

(xt can contain lagged dependent variables as well as other explanatory variables,

but ergodicity rules out integrated processes). Equation (3.14) ensures that the

errors {et et - E(ytlxt): t = 1, 2, ...} are appropriately serially uncorrelated.

Time series applications for which (3.14) does not hold can be accomodated, but the

formulas for the asymptotic covariance matrix derived below-in particular, the

formula for BN-would have to be modified along the lines of White and Domowitz

(1984).

To compute the WNLS estimator of 0, estimates of the weighting functions are

needed. Let 5y denote an estimator that would be </N-consistent for By if(3. 13) held.

Even if (3.13) does not hold, y' generally has a probability limit, denoted here by

=y* plim j' and sometimes called the pseudo-true value of gamma. If cl is chosen

as in (3.9) or (3.10) then jy could be set to initial estimators of (if, A), but it is easier

in this case to compute the QMLE using the appropriate linear exponential family.

The weighting function (3.9) corresponds to exponential regression, and (3.10)

corresponds to Poisson regression. If cl is given by (3.11) or (3.12), then j' can be

obtained by nonlinear least squares estimation using the squared NLS residual ?2

as the regressand. A better procedure is to use the exponential QMLE with ,Q as

the regressand, as in McCullagh and Nelder (1989, section 10.4).

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 941

Let c&t w co(xt; jy) and let a denote the WNLS estimator of 0, which solves

min0 >/it- (Yt - cxu; 0))2/t. Let ut(0) -uA(xt; 0), cot(y) -c(xt; y), Et

Yt - Lt(O), cutct(=*), E t EtIN , and V l4 d-Vltt(0)/\/<. Define the

(K + 1) X (K + 1) matrices

NN

(3.15) AN-N1 N E[V94* 'V p], BN N-1

t=1 t= 1

and assume for simplicity that these have limits A and B, respectively. Then, under

general heteroskedasticity of unknown form, /'N( ) a N(O, A-1BA-1). If

(3.13) holds then y* = y and the asymptotic variance of \/'N(Oa - 0) takes the more

familiar form o-2 A - . In the general case a consistent estimator of V A 1 BA

is given by the White (1980) variance-covariance matrix estimator VN

AN BNAN, where

NN

(3.16) AN-N-1 V0/tLV9/Lt, BN=N- E t

t=1 t=1

t-- Yt - tt(o), Vo t Vottt(o), Et-tlv&= t, and V 0tt V t/\/&. The

asymptotic standard error of 0j is the square root of the jth diagonal element of

VNIN, and one performs inference on 0 as if 0- N(0, VNIN). These same

formulas are valid in the context of QMLE in the LEF when W't is the estimated

variance from the LEF distribution. For exponential regression, C& = yt7 and for

Poisson regression, Ct =Pt

The WNLS estimator using weights lIc0t is the efficient WNLS estimator if

var (ytlxt) = o-2 w(xt; y) and jy is </N-consistent for y. In addition, there are

some important cases where the WNLS estimator of 0 achieves the asymptotic

Cramer-Rao lower bound. If the conditional distribution of y given x is exponential

with conditional mean tk(x; 0) then the weighting function in (3.9) produces a

WNLS estimator that is asymptotically equivalent to the maximum likelihood

estimator. If y conditional on x has a Poisson distribution with mean p(x; 0) then

c(x; y) given by (3.10) produces the WNLS estimator that is asymptotically

efficient. As mentioned above, in these cases it is easier to compute the QMLE

directly; the formulas in (3.16) are still valid. For more on QMLE in the LEF, see

Gourieroux, Monfort, and Trognon (1984) and Wooldridge (199ib).

Before turning to Lagrange multiplier tests a brief discussion of the efficiency of

WNLS or QMLE relative to the Box-Cox approach is warranted. First, as

discussed in Sections 2 and 3, the Box-Cox model (1.3) and the conditional mean

specification (3.1) and (3.2) are usually nonnested. Only when A = 1 or A = 0 do

they produce the same model for E(ylx). When A is known to equal unity both

procedures lead to OLS estimation of P3. When A = 0 in (1.3) the inefficiency of the

WNLS estimators of the slope parameters relative to the Box-Cox MLE can be

tabulated as a function of q2 = var (log ylx). Let H and q2 be defined as in (1.3).

Under (1.3) with A = 0 it is well known that

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
942 JEFFREY M. WOOLDRIDGE

(3.17) E(ylx) = exp [(131 + q)2/2) + 32X2 + + 13KXK]

(3.18) var (ylx) = o-2{exp [(131 + 7)2/2) + f32x2 + *' + I3KXK]}2,

where o-2 = exp (7)2) - 1. Assume that f is estimated with the restriction A = 0

imposed. The MLE of f3, denoted f3, is simply the OLS estimator from the

regression log Yt on 1, Xt2, ... , XtK. Let p denote the exponential QMLE or the

WNLS estimator with weighting function given by (3.9). As can be seen from

(3.17), the plim of ,13 is ,1 + 7q2/2, so the comparison is based on the slope

estimators only. The asymptotic variance of \/N(f3 -3) is

2N-1 EE(x'xt)

A standard calculation shows that the asymptotic variance of \/N(Q - 13) is

0r2 N-1 E(x'xt)

Thus, the asymptotic standard deviation of fj relative to Pj j = 2, ..., K, is

(3.19) ASD(f3j)/ASD(,Bj) = [exp (T i2) -1]1/2/71

This ratio is, of course, always bigger than unity, and increases without bound as

Tq oo. However, (3.19) is surprisingly close to unity for reasonable values of Ti. For

Ti = .50 (empirically a very large value for the conditional standard deviation of log

y), the ratio of the asymptotic standard deviations is 1.066; that is, the WNLS

standard deviations are 6.6 percent larger than the MLE's. For more moderate

values of qr, say rq = .30, this ratio is 1.023, so that the WNLS estimator standard

deviations are 2.3 percent larger than the MLE's. Estimates of Ti on the order of .10

are not uncommon in applied work, in which case the inefficiency of WNLS is well

under one percent.

When A is estimated along with 13 there is no analytical formula like (3.19). A

Monte Carlo study could shed light on the efficiency loss of WNLS when A = 0 and

is estimated along with f3, but this is beyond the scope of the current paper.

4. LAGRANGE MULTIPLIER TESTS OF RESTRICTED MODELS

Joint estimation of 13 and A can be difficult if the restrictions 1 + Axtg 23 0, t =

1, ... , N have to be imposed. Before estimating the full model it makes sense to

test whether some easily estimated restricted version is sufficient. The two cases of

primary interest are A = 1, which leads to a linear model, and A = 0, which results

in an exponential model. However, it is no more difficult to consider the general

case Ho: A = A0. Throughout this section it is assumed that xt - 1, t = 1, 2,

Consider first testing the null hypothesis Ho: A = A0 against H1: A # A0 for

A0 # 0. Expressions (3.5) and (3.7) provide the gradient of the regression function

evaluated at the null hypothesis; denote these by Vp j t and VAk 't, where 13 is the

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 943

restricted WNLS estimator (or QMLE) obtained with A = Ao. Let 9t and 't be the

(unweighted) fitted values and residuals from the WNLS regression. Assuming that

t. > 0, t = 1, ..., N, the usual LM statistic can be derived from Engle (1984) or

Wooldridge (1991a). Because of the relationship between VA ,t and V3pt, the LM

statistic is easily computed as NR 2, from the regression

(4.1) Et on Vp At, log (Qt) t= 1, 2, ... , N,

where EtEiV&, V co tY V Y 9' , and R 2 is the uncentered

r-squared. Under Ho and var (y t Ixt) = o-2w(xt; y), NR1 x . The test of a linear

model occurs when AO 1, in which case Vgt can be taken to be jt- xtl/Vcut.

Further, if C&t 1 this reduces to the LM form of the t-test for the null of a linear

model against the Box-Cox alternative derived by Andrews (1971). Andrews

suggests using a standard t-test in the Box-Cox setup because, under Ho: A = 1,

ylx is normally distributed and homoskedastic.

The failure of normality has no effect on the asymptotic size of the test, but

misspecification of the conditional variance function can bias the inference toward

or away from the null model. Because the primary goal is to test hypotheses about

E(ylx), it is prudent to use the LM test that is robust to misspecification of

var (ylx). Wooldridge (1991a, Procedure 3.1) covers regression-based LM tests that

are robust to variance misspecification. Here, the robust LM statistic can be

computed as follows: First, obtain the residuals, say Ft, from the regression

(4.2) Yt log (9't) on V, iAt, t = 1, 2,..., N.

Then obtain the sum of squared residuals, SSR, from the OLS regression

(4.3) 1 on 9trt, t= 1, 2, ..., N;

under Ho, N - SSR is asymptotically x2 whether or not (3.13) holds.

The LM test for A = 0 requires only a slightly different argument. Let /8 be the

WNLS estimator using conditional mean function exp (xt, ) and weights l/&6t (or let

f3 be the exponential, Poisson, or some other QMLE in the LEF). Let 9't exp

(xtI8) be the fitted values and let t-Yt - 9t be the residuals from the WNLS

estimation. Then, referring to (3.8), the nonrobust LM statistic for testing

Ho: A = 0 is obtained as NR 2 from the regression

(4.4) 9t on ytXt, yt(log 5t)2, t= 1, ..., N.

The robust form of the test can be computed by first saving the residuals Ft from the

regression

(4.5) yt(log 9t)2 on ytxt, t= 1,..., N

and then using N - SSR from the regression (4.3) as asymptotically Xi.

The LM test for exclusion restrictions is also easy to derive. Let zt be a 1 x Q

vector of additional variables, and consider testing Ho: 8 = 0 in

(4.6) E(ytIxt, zt) = [1 + AxtI + Azt5]l/A.

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
944 JEFFREY M. WOOLDRIDGE

Let , and k be the estimates computed under 8 = 0, so that the fitted values and

residuals are computed under Ho as 9' t [1 + Akxtl3]lA and t = - 9?. Let

V 3,(itand Vk , t be the gradients defined by (3.5) and (3.7). The gradient with

respect to 8 evaluated under the null is

(4.7) V [1 + 1/) - 1Zt.

If WNLS or QMLE is used, each quantity is weighted by 1I\/i and the standard

LM test is obtained as NR 2 from the regression

(4.8) E t on V A t, V A Pit, V a A t

The robust LM test requires the 1 x Q residuals Ft from the regression

(4.9) VA it on V3 Ft, VkA t;

the using N - SSR from the regression

(4.10) 1 on 9t~t, t=1,...,N

is asymptotically XQ under Ho (note that tft- (Etrtl, *.t ttQ) is a 1 X Q

vector). Exclusion restriction tests when A is fixed at some Ao are obtained by

omitting VAkjt in (4.8) or (4.9).

It is important to stress that tests about f3 and A in the current setup are purely

tests about E(ylx): provided the robust forms of the tests are used, the null

hypothesis imposes no assumptions on var (ylx) or any other aspect of the

distribution of y given x. In contrast, tests about p and A in the Box-Cox model, e.g.

those developed by Davidson and MacKinnon (1985), are generally tests about the

entire distribution of y given x. One could reject the null hypothesis because the

distribution is misspecified even if E(ylx) is correctly specified.

5. ON THE ISSUE OF SCALE INVARIANCE

One uncomfortable feature of the model (3.1) and (3.2) is that the t-statistics for

the slope coefficients 12, ... ,OK are not invariant to the scaling of yt whenever A

is estimated along with p. Spitzer (1984) noted the analogous feature for the

Box-Cox model (1.3).

Suppose that (k3, A) are the WNLS estimates using yt as the regressand, and let

(f+k A+) be the corresponding estimates when the regressand is cOyt for some

c0 > 0. Throughout this section it is assumed that xtl 1. As shown in the

Appendix, the estimates satisfy

(5.1) kA+ = Ak; IB1 = (co - 1)Ak + ' cOBl Pi+ co iBj, j = 2, ..,K,

and therefore

(5.2) [1 + Ak x, +] 1A = co[ 1 + kx,]l1A.

The estimate of A is invariant to the scaling of yt, and the estimates of the other

coefficients change so that the fitted values and residuals in the scaled regression

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 945

are simply scaled up versions of the fitted values and residuals in the unscaled

regression. Consequently, estimates of elasticities and semi-elasticities are scale

invariant, but tests of whether the population values are zero based on the

t-statistics of I62, * P 1K are not. It turns out that both the usual and robust

standard errors of A are invariant to the scale of y (see the Appendix).

Interestingly, as shown in the Appendix, the Lagrange multiplier statistic for

exlusion of any 1 x Q vector zt (see model (4.6)) is invariant to the scaling of Yt.

Therefore, both the usual and robust LM tests for the null Ho: /3j = 0 (j = 2, ... .

K) are scale invariant, whereas the Wald test (based on the t-statistic) is not.

Dagenais and Dufour (1991) provide an interesting general discussion of the scale

invariance issue in the context of maximum likelihood estimation.

One solution to the lack of scale invariance for the t-statistics is to introduce an

additional scaling parameter to the conditional mean model. In place of (3.1)

consider the model

(5.3) E(ylx) = v[1 + AxI3]lIA,

where v is an additional scale parameter. Note that the elasticity and semi-elasticity

of y with respect to xj are independent of the value of v. Scaling y up or down is

absorbed into the scaling parameter v, leaving /3 and A unchanged. However, if x

contains unity the parameters 3, A, and v are not separately identifiable from the

WNLS objective function. Instead, for the case that P( y > 0) = 1, take v to be the

population geometric mean of y:

(5.4) v exp [E(log y)].

Equations (5.3) and (5.4) represent the modified model. With this choice of v (5.3)

is easily operationalized by replacing v by its sample counterpart vi-

exp (N1 StY1 log (yt)), and /3 and A can be estimated by solving

(5.5) min (yIv - [1 + Axf3]l/A)2.

3,A t=1

Each observation on yt is simply divided by the sample geometric mean of

{yt: t = 1, , N}, and then model (3.1) and (3.2) is estimated. The estimates of

,/ are trivially scale invariant because Yt'v is invariant to scaling. Spitzer (1984)

recommends the same strategy for the Box-Cox model. However, in implementing

this procedure one should recognize that the solutions f3 and A to (5.5) depend on

the estimator v, and therefore the asymptotic variance of 0 (f3', A)' should reflect

this additional source of uncertainty (as different samples of {Yt} are obtained, the

estimator v usually changes). In the general WNLS case, the simplest approach to

this problem is to view 0 as a "two-step" estimator that solves

(5.6) min (Yt - pk(xt; 0, c))2/i,

0 t=1

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
946 JEFFREY M. WOOLDRIDGE

where A(x; 0, v) v[ I + Axf] 1/A. Under the additional assumption that {(xt, yt)}

is an independent sequence, the Appendix derives the asymptotic variance of the

solution 0 of (5.6), which accounts for the variability of 'v. Define the (K + 1) x 1

vector

CN N (V> t7t)'(VV, t/;)

t= 1

where V ef't is the same as derived in Section 3 except that it is now multiplied by

v; also, note that V 4t = [1 + Ax 8] 1/A is simply the fitted value of the scaled

regressand yt/v. A consistent estimate of the asymptotic variance of 0 is

(5.7) | VFV 1 t [IPI - CN](E tgt )IPI - CN]

where P K + 1, 9t is the 1 x (P + 1) vector 9t (iat V 0Ft, iv log (ytI'V)), 't

Yt -Lt(O, v) Yt A- v + XxjB] i/A, =a i Vo?IV&t, and t =tIV-ot. The

estimator (5.7) is also robust to variance misspecification (heteroskedasticity when

=1 = 1). Note that, in the construction of 9t, v log (yt/Iv) is not weighted by 1I\/&.

Generally speaking, (5.7) differs from the usual robust covariance matrix

estimator

(5.8) | EV,} tV o ,1, E 92V o ,tvitV,,i V,},ut'Vo ,u

However, as shown in the Appendix, (5.7) and (5.8) produce numerically identical

estimates for se(X); this is as it should be since the asymptotic variance of A is

unaffected by the estimation of v. The scale corrected standard errors of P1, ...

13K from (5.7) are generally different from those obtained from (5.8), reflecting the

variation in the estimator 'v.

Because QMLEs in the LEF have first-order conditions identical to WNLS

estimators, the scale invariance results (including those for LM tests) and formula

(5.7) remain valid. Therefore, scale invariant inference is available for exponential

(gamma), Poisson, and other regression models.

6. AN APPLICATION TO A HEDONIC PRICE EQUATION

Harrison and Rubinfeld (1978) (hereafter, HR) estimate a hedonic price equation

for median housing prices in the Boston metropolitan area. Their primary interest

is in the effects of air pollution on housing prices, controlling for a variety of other

town attributes. HR use log of median housing price as the dependent variable in a

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 947

linear regression; the explanatory variables are transformed in a variety of ways

discussed below. The data are published and used extensively in the book on

regression diagnostics by Belsley, Kuh, and Welsch (1980) (hereafter, BKW).

The hedonic price equation is reestimated here using model (3.1) and, for

comparison, by standard Box-Cox procedures. Following the discussion in Section

5, y is housing price divided by the sample geometric mean of housing prices.

Because a constant variance for housing prices (conditional on the explanatory

variables) is unlikely, exponential QMLE is used to estimate the parameters of (3.1)

(see (3.9)). Box-Cox estimates are obtained by maximizing the likelihood function

constructed under (1.3).

The results of the estimation are given in Table 1. CRIME is the per capita crime

rate, DIS is a weighted distance to five employment centers in the Boston area,

RAD is an index of highway accessibility, PROPTAX is the property tax rate (per

$10,000), STRATIO is the student-teacher ratio by town, LOWSTAT is the

proportion of the population that is lower status, B is the proportion of the

population which is black, RM is the average number of rooms, and NOX is the

nitrogen oxide concentration (parts per hundred million). With the exception of the

inclusion of CRIME2 and RM in Table 1, the variable transformations are the same

as those used by HR. (I have excluded some of the variables found to be

unimportant by both HR and BKW.) The least natural of the transformations is

(B - .63)2. HR provide a story for this kind of quadratic relationship between

housing price and proportion of blacks in the population, but it is not compelling.

Unfortunately, the raw data on B are not readily available; only (B - .63)2 is

reported in BKW.

The QMLE and Box-Cox procedures give qualitatively similar results, although

the magnitudes of some estimates differ. Interestingly, both approaches suggest

that quadratics in CRIME and RM are warranted, nonlinearities apparently

overlooked by HR and BKW. (CRIME2 and RM are omitted in the HR model.) The

coefficients on the pollution variable, NOX2, are very similar in the two proce-

dures, even though the estimates of the transformation parameter LAMBDA (.75

for QMLE, .40 for Box-Cox) are somewhat different. The model with NOX

replacing NOX2 fits almost as well; including both in the model introduces severe

collinearity, rendering each term insignificant. Therefore, I stick to the transfor-

mation used by HR.

To give some idea of how elasticity estimates differ between the QMLE and

Box-Cox approaches, I computed the elasticity of PRICE with respect to NOX at

the average, minimum, and maximum values of NOX in the sample (all other

variables are evaluated at their sample averages). An estimate of E(PRICEjx) is

needed for the Box-Cox model. Rather than performing numerical integration, I use

the simple estimate [1 + Axt,] 1/A. The estimated elasticities for the QMLE,

evaluated at the average, minimum, and maximum values of NOX, are -.391,

-.175, and -1.224, respectively. For Box-Cox, the corresponding quantities are

-.407, -.188, and -1.140. These elasticity estimates are closer than one might

expect, suggesting that similar functional forms are obtained for different slope/

lambda combinations.

Each estimate comes with several standard errors. For QMLE, the usual (or

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
948 JEFFREY M. WOOLDRIDGE

TABLE 1

HOUSING PRICE EQUATIONS

Variable QMLE BOX-COX

ONE 2.971 1.967

(0.456) (0.329)

[0.606]

{0.610} {0.329}

CRIME -0.0179 -0.0197

(0.0028) (0.0030)

[0.0034]

{0.0035} {0.0030}

CRIME2 0.00014 0.00014

(0.00003) (0.00004)

[0.00004]

{0.00004} {0.00005}

log (DIS) -0.199 -0.209

(0.027) (0.030)

[0.039]

{0.039} {0.030}

log (RAD) 0.118 0.105

(0.019) (0.019)

[0.019]

{0.019} {0.019}

PROPTAX -0.0005 -0.0003

(0.0001) (0.0001)

[0.0001]

{0.0001} {0.0001}

STRATIO -0.032 -0.032

(0.005) (0.006)

[0.004]

{0.004} {0.006}

log (LOWSTAT) -0.366 -0.355

(0.023) (0.018)

[0.038]

{0.038} {0.018}

(B - .63)2 0.150 0.281

(0.074) (0.095)

[0.097]

{0.097} {0.095}

RM -0.982 -0.667

(0.150) (0.094)

[0.194]

{0.195} {0.094}

RM2 0.086 0.060

(0.013) (0.007)

[0.016]

{0.016} {0.007}

NOX2 -0.0062 -0.0065

(0.0010) (0.0012)

[0.0012]

{0.0012} {0.0012}

nonrobust) standard errors are in parentheses. These are obtained as the square

roots of the diagonal elements from the inverse of the Hessian of the exponential

log-likelihood, times 'a (which is reported in the table). Here, 'a2 is computed with

a degrees-of-freedom adjustment as

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 949

TABLE 1

(CONTINUED)

Variable QMLE BOX-COX

LAMBDA 0.748 0.402

(0.132) (0.056)

[0.160]

{0.160} {0.057}

number of observations: 506 506

log likelihood: 164.04 177.47

sigma: 0.182 0.173

r-squared: 0.821 0.823

Notes:

i. For QMLE, standard errors in parentheses are computed from the inverse of the Hessian times

&.2 The standard errors in brackets are the robust standard errors. In braces are the standard errors

corrected for the estimated geometric mean; these are also robust to variance misspecification.

ii. For Box-Cox, standard errors in parentheses are computed from the outer product of the

score. The quantities in braces are the standard errors corrected for the estimated geometric mean.

iii. The value of the log-likelihood for QMLE is the value of the gamma log-likelihood evaluated

at ,, A, and &2. The gamma density is parameterized such that var (yIx) = o-2[E(y Ix)]2. This

provides some evidence on how well a gamma distribution fits the data, but recall that the

parameter o-2 is not jointly estimated with f3 and A. Therefore, this is not the maximized value of

the gamma log-likelihood.

iv. The r-squareds are computed as R I- 1_tXl (jt - _t)2/Y (Yt - 7) 2, where the 5t

are the fitted values for yt: t =- [1 + Axt~I] 1/A. This is the formula used for Box-Cox and QMLE,

even though [1 + Axt,8] 1/A is not the conditional mean of yt given xt for the Box-Cox model.

NN

(6.1) a2 e(N- P)-' ut=-y)/y3(N_ P)-l ?

t=1 t=I

where P = 13 for the current model. The quantities in brackets are the robust

standard errors, obtained from (5.8). The standard errors corrected for the

estimated geometric mean, obtained from (5.7), are in braces; these are also robust

to variance misspecification.

For Box-Cox, the standard error in parentheses is obtained from the outer

product of the score. Because of recent evidence showing that these estimates can

substantially understate the true variability in the coefficient estimators, they

should be interpreted with caution. There is no robust standard error since the

model (1.3) is assumed to be true. Using an argument similar to that for the QMLE,

the Box-Cox standard errors can be corrected for the estimation of the geometric

mean of PRICE; these are given in braces.

Most coefficients are highly statistically significant. Both the QMLE and Box-

Cox estimates reject Ho: A = 0 quite strongly, so that the HR specification is

rejected along this dimension as well. With the exception of (B - .63)2, all

coefficients have (robust) t-statistics greater than three. Interestingly, for the

QMLE, the significance and magnitude of (B - .63)2 has declined substantially; the

estimate is .15 with a robust t-statistic of 1.5. (HR obtain an estimate of .36 with a

t-statistic of 3.5.) Thus, rather than adopt HR's tenuous explanation for a positive

effect for large values of B, one might conclude that B is insignificant. Part of the

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
950 JEFFREY M. WOOLDRIDGE

reduction in the coefficient on (B - .63)2 is due to the estimation of lambda, and

part is due to the additional nonlinearities allowed for by adding CRIME2 and RM

to HR's specification. The estimate and t-statistic are notably higher for the

Box-Cox procedure.

A useful regularity in Table 1 is that the correction to the standard errors for the

presence of the estimated geometric mean has a very small effect. For QMLE, the

robust and scale-corrected standard errors differ in insignificant digits. For Box-

Cox, the usual and scale-corrected standard errors are almost indistinguishable.

Berndt, Showalter, and Wooldridge (1990) also found relatively minor differences

in the two standard errors for a variety of data sets. This suggests that ignoring the

preliminary estimation of scale does not lead one too far astray when performing

inference on the slope coefficients.

Finally, it is important to examine which approach, QMLE or Box-Cox, fits this

data set best. In terms of r-squareds (the amount of variation in y t explained by xt),

the two approaches provide almost identical fits. (Incidentally, the largest possible

r-squared for the mean function (3.1) and (3.2) is obtained by performing NLS. For

this data set, the NLS r-squared is .827.) Direct comparison of the exponential and

Box-Cox log-likelihoods is uninformative because the exponential distribution is

much too restrictive, requiring the variance to be equal to the mean squared (for

this data set, var (ylx)/[E(ylx)]2 is estimated to be .033, well below unity, the value

implied by the exponential distribution). It is more informative to evaluate the

gamma log-likelihood at the estimated /3, A, and aQ2, and to compare this to the

Box-Cox log-likelihood. Still, this is biased against the QMLE because the gamma

log-likelihood is not evaluated at its maximized value; recall that, for robustness

reasons, the exponential log-likelihood is maximized to obtain /3 and A, and then ar2

is obtained by (6.1). For this data set, the Box-Cox log-likelihood is 177.47 and the

gamma log-likelihood is 164.04 (the exponential log-likelihood is -513.67). How-

ever, it is difficult to know what to make of this difference.

Berndt, Showalter, and Wooldridge (1992) find notable differences between the

two approaches in terms of goodness-of-fit and parameter estimates, at least for

some data sets they examine. Consequently, the finding of similar results for this

particular data set should not be extrapolated to all empirical applications.

7. CONCLUDING REMARKS

The Box-Cox transformation has proven to be a useful tool for generalizing

functional form in statistics and econometrics, but it is not well suited for all

applications where interest centers on E(ylx). As an alternative to Box-Cox type

approaches, this paper has proposed a flexible nonlinear model for E(ylx). No

second moment or other distributional assumptions are relied upon to obtain

consistent estimates or to perform asymptotically valid inference. Consequently,

all of the robust LM tests of the special cases covered in Section 4 are pure

conditional mean tests; a rejection can be confidently interpreted as a rejection of

the model for E(ylx), and not as a rejection of some other distributional assump-

tion.

The model studied here can easily incorporate Box-Cox transformations of the

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 951

explanatory variables. For positive variables xj, j = J + 1, ..., K, let xj(pj)

denote the Box-Cox transformation of x;, as in (1.1) and (1.2). Then (3.1) and (3.2)

are extended by replacing x with x(p) (x1, ., xj, xj+i(PJ+I) ---, XK(PK))-

Estimation and inference in this more general model are covered in Wooldridge

(1989).

Michigan State University, U.S.A.

APPENDIX

All of the results in this appendix assume that Xtl - 1, t = 1, 2, .... For

notational simplicity, the results are proven only for the unweighted case.

CLAIM 1. A = A; ,8j = (c& - 1)X-1 + c6 i; pj = co 6j, j = 2,..., K.

PROOF. Let jut(,8, A)-- [1 + AxtBI]lIA, and let 'Vpt(p, A) and VALt((P, A)

denote the derivatives. Then the first order conditions for (,B+, kA+) are

(A. 1) E V,,u t (,8 +,k +)'(co y t-, +, A +))-O.

t=1

(A.2) E Vk A t, k, +)'(c Oy t- ju( t ,kA+))--O.

t=1

Because the solutions are assumed to be unique, it suffices to show that 3 + and A +

given by (5.1) satisfy (A. 1) and (A.2). Then (A. 1) reduces to showing

(A.3) E cj VAVe,,(,B, 8k)'(coyt - co,(/3, Ak))

t= 1

=C2-A v k)'(y, - , k ))=0.

t= 1

But (A.3) follows from the first order condition for (,B, A). Next, straightforward

algebra establishes that

'V~A ,Ut 68 , k +) = C O'VA ytt A)

+ A2C0k [1 + X'kxt](lIA) '[co + k log (co)c4 (1 + Ax,) - 1],

from which it follows

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
952 JEFFREY M. WOOLDRIDGE

(A.4) > VA t(P, A +)'(coyt - (p +, A +))

t= 1

NN

= CO 2 VAK?(, A)(Y, -,(p A)) + > X-2co2-A[1 + Axt](l/A)1

t=1 t=1

x [co + A log (co)co(1 + 8-l](yt-t(1, A)).

The first term on the right-hand side of (A.4) is zero by the first order condition for

(W, A). The first order condition for (13, A) also implies that the second right-hand

side term of (A.4) is zero. This is because

(A.5) A [1 + Ax ]{(iA) l}(yt - (p, k)) 0

t=1

is the first element of Xt U Vpyt(p, A)'(yt- t(3,B A)) if xt- 1. Also,

E [1 + Xxt]{(l1A)-1}(1 + - (i X))

= E [ 1 + A xp] {(1IA) - -( y - A))

t= 1

+ A > [1 + X- t A)) = 0

t=1

as this is a linear combination of _t-l V , A)'(yt - j(, X)). This

establishes (A.1) and (A.2) for (,f+, A+) given by (5.1). C:

CLAIM 2. se(A) is scale invariant.

PROOF. It is shown that the asymptotic variance of A is scale invariant. A

standard mean-value expansion yields

N -1 N

(A-6) ( - 0) = N-1 E Vout(0)`VoPut(0) N-112E V out(0)'c, + op(l).

Focusing on the last element of (A.6), that corresponding to A, we have

N -I N

(A.7) ( - A) = N1 E r,)N12 E rt + opl,

where rt is the residual from the regression

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 953

(A.8) VAjut(0) on Vput(O), t= 1, ,N.

The true error et is scaled up by c0 when yt is scaled up by c0, so it suffices to show

that rt is also scaled up by co. Then the expression (A.7) is scale invariant, and then

so must be the asymptotic variance of A. Let values superscripted by "o" denote

the scaled values. Then

(A.9) V 611t(0 ) = c VoAt(0

(A.10) VAF At(60) = COVA A t(O) + A -cO [1 + AxtP](

x [cA + A log (co)c4'(1 + Axtf) - 1].

From (A.10), VAu(60?) can be expressed as VAAJ0?) = coVAxJ(O) + Vptkt(O)a

for a K x 1 vector a. From (A.9),

VA, At(O0) = CoVAAt(0) + C AV t(O O)a.

Consequently, the residuals from the regression VVApt(00) on Vp,/t(00), say ro0,

satisfy ro' = cort; thus, (N-1 ' 1 (rN?) 2K 1/2 1 = (NN1 E 1

rt) -1N-1/2 itfl rt~t and the asymptotic variance of A is invariant. That the

computed standard errors are invariant follows because rt and 't are also scaled up

by co. El

CLAIM 3. The LM statistic for exclusion restrictions is scale invariant.

PROOF. In the unrestricted model

pU(x, z; 8 , A, 8) = [1 + Ax,3 + Az8]l/A

consider testing Ho: 5 = 0. If the regressand is cOyt for co > 0 then the gradients

used for the test on scaled data are related by

IVO3,u t 68l +, A +, ) = co- AVj u, tf A, )

VA ,U ( p A O) = 0VA ?,8 A, 0) + A co

X [1 + XtB](lIA) 1[c1 0 + A log (co)c (1 + AXE) - 1]

Vsju t+ A +, 0) = c o- aA t 68 I AI 0)

Because the gradients of the scaled data are linear combinations of the gradients for

the unscaled data, and because Et+ = cO Et, the r-squareds from the regressions

?,+ on Vg3,ut(,l3+, A+, 0), VAAt(P+, A+, 0), +, A+, 0)

? on t XVut(,lk, 0), VAkt(I, A , 0), V 8 A, , 0)

are numerically identical. This shows that the nonrobust LM statistics are numer-

ically identical. For the robust test, note that the residuals rj+ from the regression

V 8 u(P6 +, A+, 0) on V put(,8 +, A+, 0), VA U(P+, A+, 0)

are scalar multiples of the residuals rt from the regression

V8L~(f,8 A, 0) on Vojut(,8, A, 0), VAU(I, S, 0)

(rt+ = c irt). Consequently, the sum of squared residuals from the regressions 1

on + r + and 1 on s'rt are identical. El

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
954 JEFFREY M. WOOLDRIDGE

CLAIM 4. The scale-corrected standard error of A is identical to the robust

standard error of A (yt has been scaled by the geometric mean v).

PROOF. 0 (,13, A) now satisfies

EVo yt(o ), (Yt - A t(o' , ) 0

t= 1

where u (V) = Vi1 + AXtfa] / and VPut(0, v) and VAjut(0, v) are also scaled

up by 'v. A mean value expansion and the delta method yield

N -1 N

(A. 1 1) jN(O - 0) = N-1 Vo}u V( t N - 1/2 NV(', t

t ~~~N12t1Vpjc

-(N-i E V0UV 0 )U (N-i E 0Ui VU

X -1/2 v log (yIv) + op(l)

t= 1

where all elements are evaluated at (0, v). The second term in (A. 11) is the

contribution due to the estimation of v; the first term is as before. Thus, it suffices

to show that last element of

(A. 12) (N-' E Vo,,V@,t (N-' VOA,,v,,,t

(the element corresponding to A) is identically zero. But (A. 12) is the vector of

coefficients from the regression

Vf,u on VlpA, VAA, t= 1,...,N.

The coefficient on VAut is also obtained by first obtaining the residuals rt from the

regression VAut on V0,ut and then computing the coefficient from the regression

V,,ut on rt. Thus, it suffices to show that V,ut and rt are orthogonal. But the

residuals rt are orthogonal to any linear combination of Vpput, i.e. StNl Vpu'rt =

0, and V,/t = [1 + Axtf] i/A - [1 + Axt]i/A-I + [1 + Axt,] /A-xtfl, which

is a linear combination of Voutt whenever xt contains a constant. Thus, for A,

(A. 11) is the same as (A.6). This completes the proof. D

REFERENCES

AMEMIYA, T. AND J. L. POWELL, "A Comparison of the Box-Cox Maximum Likelihood Estimator and the

Non-Linear Two Stage Least Squares Estimator," Journal of Econometrics 17 (1981), 351-381.

ANDREWS, D. F., "A Note on the Selection of Data Transformations," Biometrika 58 (1971), 249-254.

BELSLEY, D. A., E. KUH, AND R. E. WELSCH, Regression Diagnostics: Identifying Influential Data and

Sources of Collinearity (New York: Wiley, 1980).

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions
ALTERNATIVES TO THE BOX-COX MODEL 955

BERNDT, E. R. AND M. S. KHALED, "Parametric Productivity Measurement and Choice among Flexible

Functional Forms," Journal of Political Economy 87 (1979), 1220-1245.

, M. H. SHOWALTER, AND J. M. WOOLDRIDGE, "An Empirical Investigation of the Box-Cox Model

and a Nonlinear Least Squares Alternative," Econometric Reviews (forthcoming 1992).

BICKEL, P. J. AND K. A. DOKSUM, "An Analysis of Transformations Revisited," Journal of the American

Statistical Association 76 (1981), 296-311.

Box, G. E. P. AND D. R. Cox, "An Analysis of Transformations," Journal of the Royal Statistical Society

Series B 26 (1964), 211-252.

DAVIDSON, R. AND J. G. MACKINNON, "Testing Linear and Loglinear Regressions Against Box-Cox

Alternatives," Canadian Journal of Economics 25 (1985), 499-517.

DAGENAIS, M. G. AND J.-M. DUFOUR, "Invariance, Nonlinear Models, and Asymptotic Tests," Econo-

metrica 59 (1991), 1601-1615.

DUAN, N., "Smearing Estimate: A Nonparametric Retransformation Method," Journal of the American

Statistical Association 78 (1983), 605-610.

ENGLE, R. F., "Wald, Likelihood Ratio, and Lagrange Multiplier Statistics in Econometrics," in Z.

Griliches and M. D. Intriligator, eds., Handbook of Econometrics, Volume 2 (Amsterdam: North

Holland, 1984), 775-826.

FARBER, H. AND R. GIBBONS, "Learning and Wage Determination," mimeo, MIT Department of

Economics, 1990.

GOLDBERGER, A. S., "The Interpretation and Estimation of Cobb-Douglas Production Functions,"

Econometica 35 (1968), 464-472.

GOURIEROUX, C., A. MONFORT, AND A. TROGNON, "Pseudo-Maximum Likelihood Methods: Theory,"

Econometrica 52 (1984), 681-700.

HARRISON, D. AND D. L. RUBINFELD, "Hedonic Housing Prices and the Demand for Clean Air," Journal

of Environmental Economics and Management 5 (1978), 81-102.

HUANG, C. J. AND J. A. KELINGOS, "Conditional Mean Function and a General Specification of the

Disturbance in Regression," Southern Economic Journal 45 (1979), 710-717.

MCCULLAGH, P. AND J. A. NELDER, Generalized Linear Models, 2nd ed. (New York: Chapman and Hall,

1989)

MIZON, G. E., "Inferential Procedures in Nonlinear Models: An Application in a UK Industrial Cross

Section Study of Factor Substitution and Returns to Scale," Econometrica 45 (1977), 1221-1242.

MUKERJI, V., "A Generalized SMAC Function with Constant Ratios of Elasticity of Substitution,"

Review of Economic Studies 30 (1963), 233-236.

POIRIER, D. J., "The Use of the Box-Cox Transformation in Limited Dependent Variable Models,"

Journal of the American Statistical Association 73 (1978), 284-287.

AND A. MELINO, "A Note on the Interpretation of Regression Coefficients within a Class of

Truncated Distributions," Econometrica 46 (1978), 1207-1209.

SPITZER, J. J., "Variance Estimates in Models with Box-Cox Transformations: Implications for Estima-

tion and Hypothesis Testing," Review of Economics and Statistics 66 (1984), 645-652.

WHITE, H. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for

Heteroskedasticity," Econometrica 48 (1980), 817-838.

AND I. DOMOWITZ, "Nonlinear Regression with Dependent Observations," Econometrica 52

(1984), 143-162.

WOOLDRIDGE, J. M., "Some Alternatives to the Box-Cox Regression Model," Working Paper No. 534,

MIT Department of Economics, 1989.

, "On the Application of Robust, Regression-Based Diagnostics to Models of Conditional Means

and Conditional Variances," Journal of Econometrics 47 (1991a), 5-46.

, "Specification Testing and Quasi-Maximum Likelihood Estimation," Journal of Econometrics 48

(1991b), 29-55.

This content downloaded from 141.218.1.105 on Tue, 01 Mar 2016 18:02:39 UTC
All use subject to JSTOR Terms and Conditions

You might also like