You are on page 1of 11

Evaluation of linear asset pricing models by implied portfolio performance

Ronald J. Balvers
a,
*
, Dayong Huang
b
a
Division of Economics and Finance, West Virginia University, Morgantown, WV 26506-6025, USA
b
Department of Accounting and Finance, University of North Carolina-Greensboro, Greensboro, NC 27402-6170, USA
a r t i c l e i n f o
Article history:
Received 6 October 2008
Accepted 9 March 2009
Available online 20 March 2009
JEL classication:
G12
C52
G11
Keywords:
Linear asset pricing models
Model evaluation
Portfolio performance
a b s t r a c t
We present a theoretical perspective that motivates the use of the Generalized Least Squares R-Square,
prominently advocated by Lewellen et al. [Lewellen, J., Nagel, S., Shanken, J., forthcoming. A skeptical
appraisal of asset-pricing tests. Journal of Financial Economics], as an evaluation measure for multivariate
linear asset pricing models. Adapting results from Shanken [Shanken, J., 1985. Multivariate tests of the
zero-beta CAPM. Journal of Financial Economics 14, 327348] and Kandel and Stambaugh [Kandel, S.,
Stambaugh, R.F., 1995. Portfolio inefciency and the cross-section of expected returns. Journal of Finance
50, 157184], we provide various interpretations and a graphical account in mean-variance space of this
measure, facilitating a better understanding of its properties. We furthermore relate it to another leading
evaluation metric, the HJ-distance of Hansen and Jagannathan [Hansen, L.P., Jagannathan, R., 1997. Assess-
ing specication errors in stochastic discount factor models. Journal of Finance 52, 557590]. Addition-
ally, we present a comparison between these evaluation measures using mean-variance mathematics
in risk-return space, and we provide a simple formula for calculating both model evaluation measures
that involves only the parameters of the mean-variance asset and factor frontiers.
2009 Elsevier B.V. All rights reserved.
1. Introduction
An expanding selection of (linear) asset pricing models, gener-
ally with solid theoretical motivations, is available for application.
Apart from the Sharpe (1964), Lintner (1965) and Black (1972)
CAPM and the Fama and French (1992) and Fama and French
(1996) three factor model, examples include the Cochrane (1996)
investment-based model, the Jaganathan and Wang (1996) condi-
tional CAPM, the Campbell and Cochrane (1999) habit persistence
model, the Lettau and Ludvigson (2001a,b) cay model, and the
Balvers and Huang (2007) productivity-based model. From a
practical perspective, these models provide cost-of-capital mea-
sures and risk-adjusted returns, that may be used to decide such
issues as whether a particular investment project is protable gi-
ven the appropriate opportunity cost, or whether a hedge fund is
a good investment once returns are adjusted for risk. Because the
advise often varies dramatically by asset pricing model, the choice
of a proper measure for model selection is vital.
Lewellen et al. (forthcoming) point out prominently, however,
that the goodness-of-t measure most typically used for model
evaluation the Ordinary Least Squares R-square (OLS RSQ) is
in fact quite uninformative of how well a model explains the
cross-sectional variation in mean returns: when test assets have
an approximate factor structure, even models with randomly se-
lected factors often generate high R-squares. They instead advocate
the cross-sectional regression (CSR) Generalized Least Squares
R-square (GLS RSQ) as a more reliable evaluation measure.
The objective of the paper is to present a theoretical perspective
on the GLS RSQ that clearly motivates its use as a model evaluation
measure. Adapting results from Kandel and Stambaugh (1995) we
provide various interpretations and a graphical account in mean-
variance space of the CSR GLS RSQ, facilitating a better understand-
ing of this metric and its properties. We furthermore relate the CSR
GLS RSQ or KS-ratio to another leading evaluation metric, devel-
oped and motivated by Hansen and Jagannathan (1997) the HJ-
distance. In effect we put the KS-ratio on the same solid theoretical
footing as the HJ-distance. Additionally, we present a comparison
between these evaluation measures using mean-variance mathe-
matics in the familiar risk-return space, and we provide a simple
formula for calculating both measures that involves only the
parameters of the mean-variance asset and factor frontiers.
Section 2 presents a short review of the literature and the prop-
erties of the three popular aforementioned model evaluation mea-
sures. Section 3 develops propositions establishing the equivalence
of various evaluation criteria to a KS-ratio, as well as providing a
simple measure for this ratio and a comparison to the HJ-distance
criterion. In Section 4 we provide further intuition and geometrical
illustration for these measures and a discussion of issues related to
inefcient factor frontiers, econometric testing, and the presence
of nontradable macro factors. Simulations illustrating the
0378-4266/$ - see front matter 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.jbankn.2009.03.007
* Corresponding author. Tel.: +1 304 293 7880; fax: +1 304 293 2233.
E-mail addresses: rbalvers@wvu.edu (R.J. Balvers), d_huang@uncg.edu
(D. Huang).
Journal of Banking & Finance 33 (2009) 15861596
Contents lists available at ScienceDirect
Journal of Banking & Finance
j our nal homepage: www. el sevi er . com/ l ocat e/ j bf
performance of the evaluation measures are in Sections 5 and 6
concludes.
2. Review of standard evaluation measures for linear asset
pricing models
Three model selection measures are popularly employed: the
Ordinary Least Squares R-square (OLS RSQ) from a cross-sectional
regression of the mean returns of a set of test assets on the model
factors; the Generalized Least Squares R-square (GLS RSQ) from
such a regression; and the normalized maximum pricing error in
pricing a set of test assets based on a models stochastic discount
factor. We now discuss some of the origins, properties, and rela-
tionships involving these measures.
Model performance measurement is loosely tied to the test sta-
tistics evaluating a models adherence to a null hypothesis in isola-
tion. For the traditional two-pass FamaMacBeth cross-sectional
regression (CSR) approach, Shanken (1985) develops a test statistic
that is a transformation of the mean-squared return errors
weighted by their (inverse) covariance matrix, in which risk pre-
mia are estimated by GLS. Kandel and Stambaugh (1995) argue
the benets of the GLS estimation procedure by showing that the
approach is optimal for estimating risk premia in a one-factor con-
text, in the sense that it nds the risk premium estimates that min-
imize a distance measure between the factor portfolio and the
asset frontier. They show that the resulting distance measure,
which we call the KS-ratio, is equivalent to the (square root of
the) CSR GLS RSQ, and to the mean return for a mean-variance opti-
mizer using the (false) model as a fraction of the return expected
under the true model.
Kandel and Stambaugh (1995) also show that the R-square from
a cross-sectional OLS regression changes with the portfolio of the
test assets considered. Jagannathan and Wang (1996, p. 42), pro-
vide a simple example of this fact illustrating that the OLS RSQ
can vary from zero to close to one by altering the composition of
the portfolio of test assets. They argue, however, as does Cochrane
(2001), that the OLS RSQ remains an appropriate measure for the
relative ability of different models to explain a given portfolio of
test assets. Lewellen et al. (forthcoming) add further doubts to
the usefulness of OLS RSQ as an evaluation measure by pointing
out that essentially any set of factors will generate a high OLS
RSQ if the test assets have an approximate factor structure.
1
Hansen and Jagannathan (1997) provide a third evaluation
measure, named the HJ-distance by Jaganathan and Wang (1996):
the maximum possible pricing error based on the model out of
all feasible portfolios with a normalized payoff second moment.
It is obtained by generalized method of moments (GMM) estima-
tion to nd each models stochastic discount factor parameters
minimizing this measure. Hansen and Jagannathan show that this
distance measure also equals the (square root of the) total squared
pricing errors weighted by the second moment of the gross returns,
and further can be interpreted as the distance of the estimated sto-
chastic discount factor to a true stochastic discount factor. Hansen
and Jagannathan (1997) and Jaganathan and Wang (1996) empha-
size the advantage of the HJ-distance measure that the second-mo-
ment matrix of gross returns, used to weight pricing errors, is
model-independent (since mean return estimates are not needed),
allowing a fair comparison across models.
A basic comparison of the properties of the three measures in-
volves: estimation efciency, invariance of weighting to the choice
of model, and invariance to the portfolio composition for given test
assets. The OLS RSQ measure is based on OLS estimation that is not
optimally efcient (but generally robust to mis-specication) and
changes with the choice of portfolio for a given set of test assets,
but the (equal) weighting of the errors does not vary by model.
The KS-ratio (or GLS RSQ) is based on efcient (but not always ro-
bust) GLS estimation and, for a particular group of test assets, is
invariant to the choice of portfolio. Its weighting matrix, however,
depends on the mean return estimates that vary by model. It is
possible that a poor model for mean returns in fact improves the
evaluation measure because mean return errors are weighted less
heavily. The HJ-distance neither varies with the choice of portfolio
nor uses model-dependent weights as Hansen and Jagannathan
(1997) and Jaganathan and Wang (1996) point out; however, to
permit its distance interpretation and model-independent weights,
it needs estimation with second moments of gross returns as
weights, which are not efciency-optimal weights in this context.
Several recent papers have examined the properties of the three
model evaluation measures.
2
Kan and Zhou (2004) discuss the cor-
respondence between Shankens (1985) CSR test statistic which
can be shown to be similar to the KS-ratio and the HJ-distance. They
recognize that the HJ-distance measure focuses on a models ability
to explain asset prices, whereas the KS-ratio (Shankens CSR test sta-
tistic) focuses on a models ability to explain asset returns. Hence, a
factor which, for instance, implies constant betas for all test assets
implies a perfect zero HJ-distance because all gross returns are
equally priced at one (so the constant can be chosen to t perfectly),
but has no explanatory power for return variations, implying a poor
KS-ratio. Kan and Zhou (2004) also provide a geometric interpreta-
tion of the HJ-distance and Shankens CSR test statistic in mean-stan-
dard deviation space.
Skoulakis (2005) combines advantages of the KS-ratio and the
HJ-distance measures by modifying Shankens CSR test statistic to
use the model-free second moment matrix as the weighting ma-
trix. He shows that the estimation based on the second moment
weights is asymptotically efcient and equivalent to estimation
with the sample covariance matrix under the assumption of i.i.d.
normality of factor and asset returns, but generally different when
this restrictive assumption is dropped. Thus, the Skoulakis distance
measure is efciently estimated, model-independent, and portfolio
independent. It can be interpreted as the maximum absolute value
of the alpha (risk-adjusted return) of any portfolio chosen from the
test assets, normalized by a measure of payoff size.
Building on the work of Shanken (1985), Kandel and Stambaugh
(1995), Hansen and Jagannathan (1997), and given the benets of
GLS RSQ over OLS RSQ stressed by Lewellen et al. (forthcoming),
our intent is to provide a further theoretical perspective to the
GLS RSQ by introducing the KS-measure as an evaluation criterion,
and bringing it on the same theoretical footing as the HJ-measure.
We relate these measures to each other and to several alternative
measures and provide a relatively simple geometric interpretation
in mean-variance space.
We begin by dening the implied portfolio mean measuring a
models usefulness based on the mean portfolio return that a
mean-variance decision maker can maximally attain for any
chosen variance by employing the model for portfolio decisions.
Kandel and Stambaugh (1995) employ this criterion to motivate
a GLS estimation procedure for linear one-factor models as the best
method for estimating the parameters in the second-pass of the
beta approach. We adapt the Kandel and Stambaugh analysis to
1
Grauer and Janmaat (2009) also point out the limitations of the OLS RSQ measure,
as well as slope and intercept parameters, as indicators of whether the CAPM holds
when it is in fact false.
2
These evaluation measures are natural in an environment in which risk is
captured by second moments. We do not discuss the less commonly applied
measures that arise in an asymmetric, non-normal environment as examined in the
recent papers by Eling and Schuhmacher (2007), Farinelli et al. (2008) and
Zakamouline and Koekebakker (in press).
R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596 1587
apply to the choice of model instead of the choice of estimation
procedure and extend the analysis to allow multiple factors.
We conrm the KandelStambaugh results for our context in
the sense that ranking models by the implied-portfolio-mean crite-
rion is equivalent to ranking models by the cross-sectional GLS RSQ
and equivalent to ranking models by the (square of the) KS-ratio a
standardized measure of how close the factor portfolio is to the as-
set frontier as well as related criteria such as Shankens (1985)
CSR test statistic and a generalized Sharpe ratio. A sufcient rank-
ing statistic representing each of these criteria is obtained straight-
forwardly from the parameters of the factor and asset frontiers for
any particular set of test assets. No further estimation is necessary:
betas and risk premia need not be estimated to calculate this sta-
tistic and rank linear asset pricing models. We later conduct simu-
lations to assess the performance of the KS-ratio, the HJ-distance,
and the OLS RSQ evaluation criteria in discriminating between
models with varying degrees of correlation with the underlying
set of factors generation returns.
3. Linear asset pricing models and evaluation measures
3.1. The linear model specication
There are n risky assets with a stochastic returns vector r,
means l, and covariance matrix V. Any model k consists of n
k
tradable factors characterized by an n n
k
matrix S
k
of portfolio
weights with:
1
k
S
0
k
1; l
k
S
0
k
l; and V
k
S
0
k
VS
k
; 1
where 1
k
is a unit vector of length n
k
and 1 is a unit vector of length
n, l
k
represents the n
k
1 vector of factor means for model k, and
V
k
is the n
k
n
k
factor covariance matrix of model k.
For a linear multi-factor beta specication, model k provides the
model-implied mean returns
^
lk 1a
k
b
k
b
k
; b
k
VS
k
S
0
k
VS
k

1
; 2
with b
k
representing the n n
k
matrix of betas (factor loadings). The
model specic constants, the scalar a
k
and the n
k
1 vector of risk
premia b
k
, are determined to make the model-implied means ^ lk
as close as possible to the true means l.
3.2. The maximal KS-ratio
Kandel and Stambaugh dene for any portfolio mean a models
efciency as the ratio r
2
p
r
2
g
=r
2
p
k
r
2
g
, displayed in Fig. 1,
where l; r
2
represent portfolio return mean and variance, respec-
tively, and g represents the global minimum variance portfolio, p
k
represents a single-factor portfolio, and p the frontier portfolio
with the same mean. They show that this measure is also equal
to l
p
k
l
g
=l
q
l
g

2
, where q (not shown in Fig. 1) is the ef-
cient portfolio with the same variance as p
k
. Both are equivalent
measures of a factors distance to the asset frontier.
Generalizing this concept to our multi-factor case and allowing
for model comparison, we dene the maximal KS-ratio for model k
as
3
:
KSk Max
l
r
2
l r
g
2
r
2
k
l r
2
g
!
; 3
subject to : r
2
l 1=C C=Dl B=C
2
; 4
and : r
2
k
l 1=C
k
C
k
=D
k
l B
k
=C
k

2
; 5
and with r
2
g
1=C and l
g
B=C. Eq. (4) represents the overall
mean-variance asset frontier and Eq. (5) the frontier constructed
from model ks factors. The constants for the asset frontier and
the factor frontier have the standard denitions:
A l
0
V
1
l; B l
0
V
1
1; C 1
0
V
1
1;
A
k
l
0
k
V
1
k
l
k
; B
k
l
0
k
V
1
k
1
k
; C
k
1
0
k
V
1
k
1
k
;
D AC B
2
; D
k
A
k
C
k
B
2
k
; V
k
S
0
k
VS
k
:
6
The maximal KS-ratio provides a measure of how close to the
efcient frontier a factor portfolio of model k can be chosen given
that the mean returns for the factor and efcient portfolios are
equal. In Fig. 1 the maximal KS-ratio is found as the distance p
g
p di-
vided by distance p
g
p
k
.
We show rst that the maximal KS-ratio (hereafter for brevity
called the KS-ratio) can be calculated conveniently only from the
parameters of the asset frontier and the factor frontier of a model
to be evaluated.
Proposition 1. Evaluation measure KSk depends exclusively on the
parameters of the asset frontier and the factor frontier of model k:
KSk C=D
B
k
=C
k
B=C
2
1=C
k
1=C
D
k
=C
k

!
: 7
Proof. See Appendix A for a simple algebraic proof. Below we pro-
vide an intuitive geometric argument. h
3.3. Graphical exposition of the KS-ratio as an evaluation measure
The main complication resulting from the multi-factor exten-
sion to Kandel and Stambaugh (1995) is that the factor portfolio
p
k
that maximizes the KS-ratio must be identied. Fig. 1 indicates
that portfolio p
k
which produces the maximal KS-ratio is found
graphically using mean-variance frontiers by connecting the global
minimum portfolio point on the asset frontier to the global mini-
mum portfolio point on the factor frontier and extending the con-
necting line until it intersects the factor frontier again the latter
intersection produces point p
k
. The factor frontier must lie com-
pletely within the asset frontier because of the assumption that
the factors (or the mimicking factors) are tradable.
To see why this works graphically, consider rst the result
known from Roll (1977) and Roll (1985) that one may nd the
zero-beta rate for some portfolio p on the frontier as the vertical
intercept of a line extending from p through the global minimum
portfolio point on the frontier. This holds for any frontier asset
Fig. 1. Efcient frontiers for assets and factors, and the KS-ratio.
3
The case of one-factor in Kandel and Stambaugh (1995) is a special case of this
optimal KS ratio where the choice set is one point: l
p
k
B
k
=C
k
.
1588 R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596
or factor. Additionally, Figure 3 in Kandel and Stambaugh (1995, p.
166) as well as the dashed curve in our Fig. 1 illustrates parabolas
with constant KS-ratios (except for the global minimum portfolio
for which the KS-ratio is not dened, and given that we consider
the square of the ratio used in Kandel and Stambaugh, 1995). Find-
ing the optimal KS-ratio amounts to selecting the widest parabola
that still makes contact with the factor opportunity set. This deter-
mines portfolio p
k
at the point where the constant-KS-ratio parab-
ola (the dashed curve in Fig. 1) is tangent to the factor frontier.
Since at that point the slopes of the optimal KS parabola and the
factor frontier must be equal, the zero-beta rate for portfolio p
k
on the optimal KS parabola is identical to that for the factor
frontier.
4
The zero-beta rate is, correspondingly, also found by extending
a line from p
k
through the global minimum portfolio point on the
optimal KS parabola to the vertical axis, and the global minimum
on the optimal KS portfolio is identical to that on the asset frontier.
The line connecting the global minima on the asset frontier and the
factor frontier, therefore, identies the relevant zero-beta rate as
well as the factor portfolio p
k
generating the KSk ratio.
Given the asset and factor frontiers of Eqs. (4) and (5), or consid-
ering Fig. 1, it is clear that the slope of the connecting line equals
B
k
=C
k
B=C=1=C
k
1=C so that the zero-beta rate l
0
is
obtained graphically as
l
0

B
C

1
C
B
k
=C
k
B=C
1=C
k
1=C

B B
k
C C
k
; 8
The portfolio p
k
can be found from
l
p
k
l
0
r
2
B
k
=C
k
B=C
1=C
k
1=C

; 9
and using Eq. (5) for the factor frontier to eliminate r
2
. This gives:
l
p
k
B
k
=C
k
D
k
=C
k

1=C
k
1=C
B
k
=C
k
B=C
: 10
Lastly, substituting Eq. (10) into Eq. (3) using Eqs. (4) and (5)
implies after some manipulation Eq. (7) in Proposition 1.
5
Thus a ranking of different asset pricing models can be obtained
by comparing the asset frontier parameters for the n assets with
the factor frontier parameters for each model k. No rst or sec-
ond-pass regressions are necessary for the model evaluation. Only
Eq. (7) need be applied to provide a ranking and this requires
obtaining the asset frontier once for each set of n (test) assets
and the factor frontier once for each model k.
3.4. Alternative evaluation criteria
It is implicit in Kandel and Stambaugh (1995), for one-factor
models, that a series of appealing model evaluation criteria lead
in fact to equivalent model rankings: (a) the KS-ratio the distance
between asset frontier and factor frontier in mean-variance space,
(b) the square of a generalized Sharpe ratio, (c) the cross-sectional
GLS R-squared, (d) weighted mean-squared return errors, and (e)
the model-implied mean portfolio return. The generalization of
the rst criterion into the maximal KS(k) ratio of Proposition 1 al-
lows us to show that the equivalence applies for multi-factor mod-
els as well.
At l l
p
k
the KS-ratio equals r
2
p
r
2
g
=r
2
p
k
r
2
g
and can be
related to the square of a generalized Sharpe ratio:
r
2
p
r
2
g
r
2
p
k
r
2
g
C=D
l
p
k
l
g

2
r
2
p
k
r
2
g
!
C=DSRk; 11
where the rst equality follows from Eq. (4). Since C=D depends on
the (test) assets only and not the choice of model, it does not affect
model evaluation. Thus ranking models by the KS-ratio is equivalent
to ranking by the generalized Sharpe ratio SRk l
p
k

l
g

2
=r
2
p
k
r
2
g
which is actually the square of a Sharpe ratio ana-
log: if a riskless asset exists with return r
f
then the minimum port-
folio variance is zero and the expression reduces to the square of a
standard Sharpe ratio, l
p
k
r
f

2
=r
2
p
k
.
The weighted mean-squared return errors of model k are de-
ned as
MSEk Min
a
k
;b
k
l
^
lk
0
V
1
l
^
lk; 12
where the mean returns are from Eq. (2):
^
lk 1a
k

VS
k
S
0
k
VS
k

1
b
k
. This criterion was rst suggested by Shanken
(1985) for evaluating a linear zero-beta model and used more re-
cently by Kan and Zhou (2004), who compare it to the HJ-distance
criterion.
The cross-sectional GLS R-squared for model k; RSQk, is related
to the mean-squared errors. Using the same denition as Kandel
and Stambaugh (1995) we have
RSQk 1 fMSEk=l 1

l
0
V
1
l 1

lg;
with l obtained as the GLS estimate of the constant and equal to
B=C. So, from (6)
RSQk 1 C=DMSEk; 13
by the inverse of their mean-squared errors is equivalent to ranking
by their GLS RSQ. (Again, C=D depends on the assets only and not the
choice of model).
A nal criterion relates to the usefulness of a model to guide
mean-variance portfolio choice. It rates a model based on the
maximum mean return, MIM(k) (model-implied mean), that a
mean-variance investor can attain by using the model to select
the portfolio at any desired level of variance. Dene for arbitrary
portfolio variance r
2
the vector s( ) as the efcient portfolio shares
predicated on the set of implied means for model k given by Eq.
(2):
^
lk 1a
k
VS
k
S
0
k
VS
k

1
b
k
. Then
MIMk Max
a
k
;b
k
l
0
s
^
lk; 14
where l represents the true vector of means generating the actual
mean portfolio return. In summary:
Proposition 2. Consider a set of n risky assets and a linear asset
pricing model k dened by Eqs. (1) and (2), and evaluation measures
KSk; SRk; RSQk; MSEk, and MIMk provided in Eq. (3) and
(11)(14). Then:
KSk C=DSRk RSQk 1 C=DMSEk

MIMk B=C
2
r
2
1=C
: 15
Each of the evaluation measures leads to equivalent ranking of any
model k.
4
This follows from another well-known graphical result that in mean-standard
deviation space a frontier portfolios zero-beta rate is found as the horizontal intercept
of the frontier portfolios tangency line plus the fact that for the tangency point
equality of slope in mean-standard deviation space implies equality in mean-variance
space.
5
There are two more ways of identifying the KS-ratio graphically. Most elegantly,
extend the line l
0
p
k
until its second intersection with the asset frontier and call this
intersection p
l
(see Fig. 1). Then the maximal KS-ratio equals l
f

l
g
=l
l
l
g
gp
k
=gp
l
(the equality follows from basic geometry). To prove this,
consider that at p
l
we have l
l
l
g
D=C1=s, where s is the slope in parentheses
in Eqs. (8) or (9). In addition, we know that l
f
l
g
B
k
=C
k
D
k
=C
k
1=s B=C
from Eq. (10). Then we can check that the ratio equals the KS-ratio in Eq. (7).
Alternatively, draw a line from p through g and call the intersection with the vertical
axis l

0
. Then the maximal KS-ratio equals l
0
l
g
=l

0
l
g
. This follows from the
similarity of the triangles gl
0
l
g
; gl

0
l
g
with gpp
g
; gp
k
p
g
.
R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596 1589
Proof. See Appendix A. Note that the scalars B; C; and D dened in
Eq. (6) do not vary by model, only by the set of assets considered,
and that the MIM(k) criterion applies for any chosen level of port-
folio variance, r
2
.
6
h
3.5. Comparison to the HJ criterion
Hansen and Jagannathan (1997) develop the distance measure
HJ(k) that allows assessment of different asset pricing models.
The (square of the) HJ-distance is
HJk Min
c
k
;d
k
gk
0
V ll
0

1
gk; 16
with gk lc
k
VS
k
d
k
1: 17
Kan and Zhou (2004) consider this measure as a means of eval-
uating linear stochastic discount factor models. They observe that
the HJ-distance based on the GMM/SDF approach measures pricing
errors (deviations of observed price averages for each asset from
those implied by theory), whereas evaluation based on the beta ap-
proach measures return errors (deviations of mean returns for each
asset from those implied by theory). In addition the errors are
weighted by non-central second moments in the GMM/SDF ap-
proach but by covariances in the beta approach. Accordingly, the
HJ-distance measure should differ from the KS distance measure.
We nd that
Proposition 3. The evaluation measures KS(k) dened in Eqs. (3)(6)
and HJ(k) dened in Eqs. (16) and (17) are related as:
KSk 1 C=D
A A
k
C C
k

HJk; 18
Thus HJ(k) and KS(k) lead to generically different evaluations for any
model k.
Proof. See Appendix A. Unless pricing errors are zero, giving
HJk 0 and KSk 1, Ranking models k by the KSk measure
typically yields a different result than ranking based on the (nega-
tive of the) HJk measure because A A
k
=C C
k
varies with k.
Kan and Zhou (2004) obtain a similar result when they compare
the HJ-distance to Shankens CSR test statistic. h
4. Discussion and graphical exposition
4.1. Discussion of the cross-sectional t and model-implied mean
return criteria
For the RSQk, or from Eq. (14) the MSEk criterion, the GLS
parameter conditions ((A2) and (A3) in the Appendix A) require
that ^ l
g
l
g
and that ^ l
k
l
k
. Thus, the model errors for the global
minimum variance portfolio and for the factor portfolios are re-
quired to be zero. From Roll (1977), Grinblatt and Titman (1987)
and Huberman et al. (1987) we know that a parabola in mean-
variance space exactly describes all mean returns in a linear beta
model if and only if a unique portfolio of the factors is on the
parabola. Since the estimates
^
lk in Eq. (2) constitute an exact lin-
ear beta model, they can be described by a model-implied fron-
tier
^
lk
0
s
^
lk. This frontier must include the global minimum
variance portfolio as well as exactly one-factor portfolio. This fac-
tor portfolio must be p
k
and the model-implied frontier must be
the dashed parabola in Fig. 1.
According to the MIMk criterion, the usefulness of a model
may be judged by how well it serves a mean-variance investor.
By using a model to summarize the mean returns for all assets
rather than the actual means one necessarily obtains lower mean
returns for any given variance. However, the better the model
the higher the mean return. For any model k the global minimum
variance portfolio is an efcient portfolio choice given as
s
g

^
lk V
1
1=1V
1
1 which is independent of the model. It
yields a mean return of l
0
s
g

^
lk l
0
V
1
1=1
0
V
1
1 B=C. Given
that
^
l
g
l
g
and
^
l
k
l
k
we nd that l
0
s
^
lk coincides with
^
lk
0
s
^
lk at the global minimum variance portfolio as well as
at the p
k
portfolio and so the two must be identical.
4.2. Graphical comparison of KS-ratio and HJ-distance
Fig. 2 illustrates the HJ-distance as well as the KS-ratio. The zero-
beta estimate in the HJ case is 1=c
k
A A
k
=B B
k
from Eqs.
(A28) and (A29). From this point extend two lines: a line through
point g with horizontal axis intersection r
2
0
; and a line through
point g
k
with horizontal axis intersection r
2
0k
. It is straightforward
to show that HJk 1=r
2
0
1=r
2
0k
. The second intersection
with the factor frontier of the line from the HJ zero-beta estimate
through the minimum of the factor frontier is point p
h
which is
the mutual fund (factor portfolio) that minimizes the HJ-distance.
This graphical illustration follows Kan and Zhou (2004), Fig. 2
except that in our mean-variance graph the lines intersect the
6
The Appendix A also considers a version of the Fama and MacBeth (1973)
approach in which a cross-sectional regression is run for each period, here in GLS
form, and the average R-squared is used for evaluation. The (inverse-covariance-
weighted) mean-squared errors represent the average taken over all periods and are
given as:
MSE
FM
k Min
a
k
;b
k
Efr ^ lk
0
V
1
r ^ lkg;
and the associated average R-squared as derived below Eq. (A12) in the Appendix A
is
RSQ
FM
k 1 fMSE
FM
k=Er
0
V
1
r B
2
=Cg:
Since Er
0
V
1
r B
2
=C is not model specic, MSEFMk and RSQ
FM
k rank models
equivalently. The Appendix A also shows that the difference between MSE(k) and
MSEFMk is not model specic. Although MSEk < MSEFMk and
RSQ
FM
k < RSQk, the four criteria: (a) MSE(k), (b) RSQk, (c) MSEFMk and
(d) RSQ
FM
k rank all models equivalently for any given set of test assets. Shanken
(1985) observes that the evaluation is also unchanged if we replace the inverse of
the covariance matrix V of the returns by the inverse of the covariance matrix R of
the return errors. The mean-squared error criterion in equation (*) is proportional
to Shankens (1985) test statistic for testing the zero-beta CAPM. For the one-factor
CAPM case Roll (1985) shows that Shankens test statistic is related to the KS-ratio. Fig. 2. The KS-ratio and the HJ-distance.
1590 R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596
minima of the asset and factor frontiers rather than being tangent
to these frontiers, and the horizontal axis intercepts do not need to
be squared. The geometry does not directly pin down the zero-beta
estimate for the HJ case. Finding the HJ zero-beta estimate graph-
ically is possible but a bit convoluted.
7
4.3. Inefcient factor frontiers
The reason for working with the square of the original ratio de-
ned by Kandel and Stambaugh (1995) is to deal with factor fron-
tiers that are inefcient in the sense of their horizontal center
lying below the horizontal center of the asset frontier; i.e.,
B
k
=C
k
< B=C. In this case the line connecting the zero-beta rate
with portfolio p
k
has a negative slope so that portfolio p
k
is inef-
cient within the factor frontier (the dashed frontier in Fig. 1 illus-
trates the normal case with an efcient factor portfolio p
k
). In
this case, portfolio p
k
in fact minimizes the original KS-ratio. How-
ever, while the factor portfolio p
k
is now inefcient by itself, to-
gether with the global minimum variance portfolio it traces out a
frontier that dominates any frontier generated by combining a
portfolio on the efcient half of the factor frontier with the global
minimum variance portfolio.
The approach of maximizing the original (i.e., not squared) KS-ra-
tio picks an efcient factor portfolio when B
k
=C
k
< B=C, but this
implies a model implementation that leads to a worse outcome
according to the RSQ(k), MSE(k), and MIM(k) criteria lower
cross-sectional GLS RSQ (a higher mean-squared error) and lower
implied portfolio mean return compared to that resulting from
using the inefcient frontier portfolio p
k
. Hence, the original KS
denition, even in the one-factor case, provides an inappropriate
evaluation measure when B
k
=C
k
< B=C, while in this case the
(squared) denition of the KS-ratio provides the appropriate
measure.
4.4. Econometric testing issues
Shanken (1985) developed a CSR test statistic for the zero-beta
CAPM based on the weighted squared errors that is akin to the KS-
ratio and applies to the single-factor case. The MSE(k) criterion in
Eq. (7) analogously generates the same test statistic in the multi-
factor case. Roll (1985) provides a graphical depiction of Shankens
test statistic, relating it to the market proxys excess variance
above the minimum variance for the same mean in ratio to the
market proxys excess variance above the global minimum vari-
ance. Kandel and Stambaugh (1995) in turn relate this to the KS-
ratio.
8
While we are not interested here in classical testing, it should
be clear that all results apply equally well whether we use sample
moments or population moments: given the moments in a partic-
ular sample the evaluation measure tells us how well a model ex-
plains these sample moments. Clearly, for a given set of sample
moments the evaluation results are less reliable as a guide for what
model to use in the future if the sample moments are a poor proxy
for the population moments.
4.5. Non-tradable factors
Many asset pricing models include one or more risk factors that
are macro factors and cannot directly be related to asset returns.
The analysis so far does not apply to these non-tradable factors
on account of Eq. (1) which allows only factors that are portfolios
of the test assets. To enable evaluation of models containing non-
tradable factors, consider a factor model k

consisting of a vector
of any stochastic variables with nite rst and second moments
and n
k
< n elements. Without loss of generality we can write the
n
k
vector of factor realizations for this model, f
k
, as
f
k
v
k
W
0
k
r x
k
; 19
with v
k
set such that Ex
k
0 and a further imposed condition
Ex
k
r 0 determining W
k
V
1
V
f r
V
f r
is the n n
k
covariance
matrix of the non-tradable factors with the asset returns).
Based on Huberman et al. (1987) we can choose the mimicking
factor portfolios as
r
k
L
k
W
0
k
r; 20
where L
k
can be any invertible kxk matrix (that we may pick such
that the weights on the test assets sum to one for each factor:
S
0
k
L
k
W
0
k
, with S
0
k
1 1
k
. Model k

directly provides the model-


implied mean returns
^
lk

1a
k

b
k
b
k

; b
k
VW
k

W
0
k
VW
k

1
: 21
It is easy to conrm that a model k with mimicking factor portfolios
r
k
as given in Eq. (21) provides the same model-implied means:
^
lk
^
lk

.
9
As the mimicking model is equivalent in terms of
the implied means, it must provide the same evaluation measure
as the original model. A formal statement and proof of the equiva-
lence of the evaluation measures for the mimicking model and the
original model is available from the authors.
5. Illustrative simulations
If we dene a better model as one whose factors are more
highly correlated with the true factors generating the data, then
its application is likely to produce better capital-budgeting or port-
folio management results. It stands to reason also that each of the
three evaluation criteria typically improves as a model gets bet-
ter, but it is not clear which of the criteria would improve most
discriminate best. One issue is that the foundation for each crite-
rion is not directly linked to how closely correlated model factors
are to the true factors. A second issue is that each evaluation mea-
sure has different empirical requirements (requiring either covari-
ance estimation, non-central second moments estimation, or no
second moment estimation at all). To address these issues, we sim-
ulate realistic returns and factors, construct models whose factors
have different correlations with the true factors, and check how
well each evaluation measure performs in discriminating between
different models. The returns are taken to be the returns on the 25
asset portfolios sorted by Fama and French based on size and book-
to-market value (FF25) and, as suggested by Lewellen et al. (forth-
coming), these same 25 portfolios plus 30 industry portfolios
(FF55). Both sets of test assets are considered for the 1964Q1-
2004Q4 period.
7
Draw two lines emanating from the origin: line 1 through the minimum of the
asset frontier; line 2 through the minimum of the factor frontier. The second
intersections with the respective frontiers give the r
2
; l points A=B
2
; A=B and
A
k
=B
2
k
; A
k
=B
k
. Connect these two points by line 3. In addition draw line 4 that is
parallel to line 1, but shifted so that it goes through the point A=B
2
; A
k
=B
k
. Where
line 4 intersects line 2, draw a line straight up to line 3. At that intersection nd the HJ
zero-beta rate A A
k
=B B
k
.
8
Gibbons et al. (1989), Roll and Ross (1994) discuss issues that are similar to those
in Shanken (1985) and Kandel and Stambaugh (1995) but apply only to the case
where a risk-free asset exists. We avoid this case here in large measure because we
want to apply our evaluation approach, and in practice the unconditional variance of
typical risk-free proxies is clearly positive so that no convenient empirical proxy can
be used for the standard unconditional asset pricing tests.
9
Because Eq. (1) does not hold for model k

, using Eqs. (6) and (7) to evaluate the


model generally gives KSkKSk

since, from Eq. (19), the original factors


determine a very different factor frontier, having typically both a different mean
and additional idiosyncratic risk compared to the mimicking factors.
R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596 1591
We simulate return data that: (a) are generated from a known
model with random noise added; and (b) closely mimic the char-
acteristics of the standard test assets (FF25 and FF55). To this end
we obtain the rst three principal components of a group of test
assets and the associated weights (normalized to one) on the test
assets. We then regress the test asset returns on the three princi-
pal components to generate each assets sensitivities. The result-
ing sensitivities and the principal components themselves are,
respectively, treated as the factor loadings and the factors deter-
mining the time series of the systematic component of the simu-
lated test asset returns. Random noise is added to match the
covariances of the simulated returns with those of the original
test assets. Then we consider for two misspecied models which
one explains the simulated returns better. The two models are dif-
ferent combinations of the true factors that determined the sys-
tematic part of the simulated returns, and false factors
generated as random weights. The better model has more
weight on the true factors than the worse model and, accord-
ingly, the better model has less weight than the worse model
on the false factors (the latter are separately drawn for each mod-
el). We then calculate the various evaluation measures for each
model. We repeat this 5000 times and for each evaluation mea-
sure record the fraction of times that the worse model beats the
better model.
Specically, we perform the following steps: (1) Take the rst n
k
principal components from the times series of n test assets as the
factors f which have weights w (an n n
k
matrix with the columns
normalized to add to one) on the test assets that generate the prin-
cipal components. (2) Obtain the factor loadings b, a n
k
n matrix;
and the error covariance matrix R, a n n matrix, as the return var-
iance not explained by the rst three principal components. (3)
Perform the Cholesky decomposition (for a singular matrix) to gen-
erate P, an n n matrix, from R P
0
P. (4) Simulate the returns r, a
Txn matrix, as r f b eP, in which e is a Txn matrix of indepen-
dently drawn N(0, 1) random errors (T is the length of the time ser-
ies). (5) Use rw as the true model factors and rv as the false
model factors, where v is an n n
k
matrix of random weights; each
weight is a draw from a N(0, 1) distribution, with a normalization
so that the weights sum to one. (6) Construct a better model as
k
b
k
b
rw 1 k
b
rv and a worse model as k
w
k
w
rw
1 k
w
rv, with k
b
> k
w
. (7) Given returns bold {r} calculate the
KS, HJ, and OLS RSQ measures for both models. (8) Repeat 5000
times and record the instances in which the worse model outper-
forms the better model for each evaluation measure.
Panel A of Table 1 presents the size for each evaluation mea-
sure the fraction of trials in which the worse model outdoes the
better model. We show this for all possible cases with weights
k
b
; k
w
on the true factors taken from {0, 1/3, 2/3, 1} such that
k
b
> k
w
. Note that even for the true model, k
b
1, the factors
contain measurement error. This is reasonable because idiosyn-
cratic noise to the test assets must add noise to the factors which
are portfolios of the assets. Panel A displays the size for the KS,
HJ, and OLS RSQ measures individually (in bold) as well as for all
permutations in which combinations of these measures are taken
as the criterion. For combinations of the measures, the worse mod-
el is taken to outperform the better model only if it does so for each
evaluation measure considered. The results are provided for n
k
3,
for both sets of test assets (FF25 and FF55), and given a time series
length T 163. The latter coincides with a quarterly frequency
sample size used typically with asset pricing models that include
macro factors. For the same cases, Panel B of Table 1 provides
the power the fraction of trials for which the better model out-
performs the worse model. Here the better model is taken to out-
perform the worse model only if it does so for each evaluation
measure considered. Note that for the single-measure criteria
power is simply one minus size.
The key results in Table 1 can be summarized as follows:
(1) Generally KS discriminates substantially better than OLS RSQ
which in turn discriminates substantially better than HJ. In
the case k
b
1; k
w
0 for the FF25, for instance, we have a
size for KS, OLS RSQ, and HJ, respectively, of 9.2%, 13.1%,
and 21.9%. While Kandel and Stambaugh (1995) argue that
the KS measure is superior to the OLS RSQ, this argument
relies on the use of the correct covariance matrix. In our sim-
ulations, as in applications with real-world data, the covari-
ance matrix must be estimated. Moreover, we do not use as
our criterion for a better model one that motivates KS, that
is yielding a higher mean return for mean-variance inves-
tors (or any of the equivalent interpretations), but simply
the criterion that a better model has higher correlation
with the actual data-generating mechanism. This criterion
does not (obviously) favor any of the three evaluation mea-
sures. The lesser performance of the HJ-distance measure as
an evaluation metric may be related to the results of Ahn
and Gadarowski (2004), Kan and Zhou (2004) and Wang
and Zhang (2003) who argue that HJ has size problems in
nite samples.
10
(2) In one case, OLS RSQ outdoes KS (k
b
1; k
w
2=3 for the
FF25); in another case HJ outdoes OLS RSQ (k
b
1=3; k
w
0
for the FF25). These are cases where the size is relatively
large for each evaluation measure. They suggest, though,
that the estimation of the covariance matrix and the discrep-
ancy in the foundation for the evaluation measures relative
to what constitutes a better model affects the evaluation
measures in complex ways.
(3) The higher the discrepancy in the degree of truth between
the models, the more reliable each measure. For instance, for
the FF25 in the case k
b
1; k
w
0, size for KS, OLS RSQ, and
HJ, respectively, is 9.2%, 13.1%, and 21.9% while for the closer
case k
b
2=3; k
w
1=3 we have a much larger size for KS,
OLS RSQ, and HJ, respectively, of 29.4%, 31.9%, and 35.1%. This
result is not surprising and merely conrms that each of the
evaluation measures is reasonable.
(4) The evaluation measures tend to be somewhat more reliable
for the FF55 test assets than for the FF25 test assets, espe-
cially for the KS measure. In the case k
b
1; k
w
0, size
for KS, OLS RSQ, and HJ, respectively, is 9.2%, 13.1%, and
21.9% for the FF25 and changes to 2.2%, 18.7%, and 35.1%
for the FF55. Here the size improves for KS but worsens for
the two other measures. From Lewellen et al. (forthcoming)
we know that the FF55 test asset returns, having substan-
tially less factor structure, are far more difcult to explain
with random factors than the FF25 test asset returns. What
we nd here is that less factor structure does not mean that
any reasonable evaluation measure will discriminate better
between models based on the FF55. The tradeoff is that
FF55 has more assets than FF25 which means there is more
to explain, but less factor structure and more idiosyncratic
noise, which means there is less to explain. Apparently the
KS measure deals better with this tradeoff than the two
other measures.
(5) For the FF55 test assets, a given discrepancy between the
two models is picked up better for each measure as the mod-
els are closer to the actual data-generating process. For the
10
The performance of the HJ measure, however, is denitely not related to the
estimation method. We can obtain the HJ measure in two ways: conventionally, apply
the SDF approach with GMM estimation, or from Eq. (18), Proposition 4, infer HJ from
the KS ratio obtained via the OLS regression beta method. Both give numerically
identical results. The differences between the OLS regression beta method and the
SDFGMM approach, discussed in Jagannathan and Wang (2002), are irrelevant here.
1592 R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596
FF25 test assets no clear pattern is found on this issue. Set-
ting the model discrepancy equal with k
b
k
w
1=3, the
size increases in each case for each measure as k
b
goes
from 1 to 2/3 to 1/3.
(6) For combinations of the measures (a model beats another
only if does so based on two or all three of the evaluation
measures), the result depends on whether the better
model is true k
b
1 or not. If k
b
1 considering KS and
HJ jointly does not improve the size substantially but consid-
ering KS and OLS RSQ jointly does improve the size substan-
tially, compared to KS as the single evaluation measure. In
the empirically more likely case of k
b
< 1, however, consid-
ering KS and HJ jointly does improve the size more substan-
tially than considering KS and OLS RSQ jointly.
These simulations suggest tentatively that the KS measure out-
performs the HJ and OLS RSQ measures as a model selection crite-
rion. For various reasons this result must be interpreted cautiously.
First, our simulations involve simplifying assumptions such as nor-
mally distributed errors. Second, the results likely will vary
depending on the test assets and the sample period. Third, our cri-
terion is mean return errors rather than pricing errors which biases
against the HJ measure as Kan and Zhou (2004) point out. Fourth,
theoretical attributes of the measures are ignored. For instance,
the OLS RSQ measure depends on which portfolio of the test assets
is considered. But our simulations essentially consider only one
portfolio for each of the two groups of test assets and thus cannot
capture this drawback to the OLS RSQ. Similarly, the simulations
Table 1
Comparing models with different degrees of mis-specication.
Panel A: Size
FF25 kw 2/3 1/3 0
k
b
KS OLSR2 HJ ALL KS OLSR2 HJ ALL KS OLSR2 HJ ALL
1 KS 0.339 0.204 0.322 0.198 0.174 0.109 0.164 0.104 0.092 0.054 0.089 0.052
OLSR2 0.304 0.226 0.177 0.134 0.131 0.083
HJ 0.411 0.293 0.219
2/3 KS 0.294 0.208 0.253 0.183 0.185 0.132 0.164 0.122
OLSR2 0.319 0.211 0.285 0.17
HJ 0.351 0.293
1/3 KS 0.381 0.309 0.307 0.255
OLSR2 0.485 0.309
HJ 0.429
FF55 k
b
1 KS 0.129 0.073 0.112 0.065 0.055 0.037 0.048 0.033 0.022 0.017 0.019 0.016
OLSR2 0.248 0.135 0.197 0.107 0.187 0.094
HJ 0.449 0.399 0.353
2/3 KS 0.312 0.228 0.217 0.163 0.212 0.155 0.131 0.098
OLSR2 0.437 0.247 0.422 0.188
HJ 0.468 0.415
KS 0.396 0.278 0.242 0.17
1/3 OLSR2 0.471 0.254
HJ 0.478
Panel B: Power
FF25 k
b
1 KS 0.661 0.561 0.572 0.500 0.826 0.758 0.697 0.659 0.908 0.831 0.778 0.732
OLSR2 0.696 0.511 0.823 0.664 0.869 0.733
HJ 0.589 0.707 0.781
2/3 KS 0.706 0.595 0.608 0.525 0.815 0.662 0.686 0.581
OLSR2 0.681 0.541 0.715 0.592
HJ 0.649 0.707
1/3 KS 0.619 0.443 0.497 0.375
OLSR2 0.515 0.395
HJ 0.571
FF55 k
b
1 KS 0.871 0.696 0.534 0.429 0.945 0.785 0.594 0.508 0.978 0.808 0.644 0.552
OLSR2 0.752 0.438 0.803 0.511 0.813 0.554
HJ 0.551 0.601 0.647
2/3 KS 0.688 0.479 0.437 0.312 0.788 0.521 0.504 0.327
OLSR2 0.563 0.342 0.578 0.351
HJ 0.532 0.585
1/3 KS 0.604 0.411 0.368 0.259
OLSR2 0.529 0.305
HJ 0.522
The test assets are 25 assets sorted by size and book-to-market value (FF25) and these 25 assets plus 30 industry portfolios (FF55). Data are from 1964Q1 to 2004Q4. For both
sets of test assets, we rst obtain their rst three principal components and the associated weights on the assets. We regress the test assets on the three principal components
to generate each assets sensitivities. The resulting sensitivities and three components are treated as the sensitivities and factors determining the 25 and 55 time series of
asset returns. Random noise is added to match the covariances of the simulated returns with those of the test assets. We take the true model to be the three factors obtained
by multiplying the portfolio weights on the original returns that yielded the three principal components by the simulated returns. We then compare models with varying
weights on the true factors and the false factors (the latter being three factors generated from random draws of weight on the simulated returns, as in Table 1). We
conduct 5000 trials, re-sampling for each trial the random factors and asset returns. The weights on the true factors k
b
; kw are 1, 2/3, 1/3, 0 and the one-on-one model
evaluation results for all combinations of these, such that k
b
> kw are presented. The numbers in the table are the fraction of times that the better model (higher weight on
the true factors) provides a worse statistic than the worse model (lower weight on the true factors). The fractions are provided for the KS, adjusted OLSR2, and HJ measures.
Numbers on the diagonal (in boldface) for each criterion indicate the size the fraction of times the worse model performs better than the better model. Off-diagonal
numbers indicate an alternative size measure the percentage of times the worse model outperforms the better model unanimously based on two criteria, and based on all
three criteria (the latter listed under the column titled all). Panel A reports Size and Panel B reports Power (the fraction of times the better model outperforms the worse
model based on each of the considered criteria).
R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596 1593
cannot capture the advantage of the HJ measure stemming from
the use of model-independent weights.
11,12
6. Conclusion
For nancial decisions, such as those regarding the appropriate
cost-of-capital for capital-budgeting purposes or the choice of as-
set weights for portfolio management purposes, it is important to
rely on the best available asset pricing model. A variety of plausible
models exists, but what model is better is not at all clear and may
depend on the purpose at hand. Several reasonable evaluation cri-
teria can be employed to rank models. The most common are the
Hansen and Jagannathan (1997) distance measure (HJ-distance)
and the standard cross-sectional goodness-of-t measure (OLS
RSQ). Lewellen et al. (forthcoming) advocate the use of the GLS
RSQ which we show to be similar to a criterion introduced by Kan-
del and Stambaugh (1995) (and which we refer to as the KS-ratio).
The three criteria are equivalent when a model provides an exact
description of the data. However, for realistic nancial decision
problems, it is reasonable to think of different models each having
imperfect correlation with the true data-generating mechanism.
From this decision-making perspective, we ask: What evaluation
measure would be the most desirable?
We show in Propositions 13 and Figs. 1 and 2 that the KS cri-
terion: can be amended to apply to any multi-factor asset pricing
model; is easy to compute from the parameters of the asset and
factor frontiers; can easily be understood visually; and, extending
Kandel and Stambaugh (1995), has various equivalent interpreta-
tions related to: (1) Shankens (1985) weighted squared mean re-
turn errors, (2) the distance of the models closest factor portfolio
to the asset frontier, and (3) the mean portfolio return obtainable
by applying the model, for any chosen variance. The third interpre-
tation suggests that the KS criterion is, in principle, well suited for
selecting the best model for portfolio management purposes. The
rst interpretation suggests that KS may also be appropriate for
selecting a model for capital-budgeting purposes since it provides
the best required return estimate based on putting more weight on
more reliable observations.
The (adjusted) OLS RSQ and the HJ-distance may be less tting
evaluation measures for these purposes. OLS RSQ can be inter-
preted as providing a models unweighted squared mean return er-
rors and HJ focuses on pricing errors rather than mean return
errors. Pricing errors appear to be less relevant for purposes of
portfolio management and capital-budgeting, where returns are
the relevant units of account, but they may be more relevant for
judging possible arbitrage opportunities, for which pricing differ-
ences should be a better gauge. These two measures are further
generally different from the KS-ratio when the model is not exactly
true, as was emphasized for OLS RSQ by Roll and Ross (1994) and
Kandel and Stambaugh (1995), and as follows for the HJ-distance
from our Proposition 3 and is argued by Kan and Zhou (2004).
The KS criterion may be reasonable for the theoretical reasons
discussed, but it has the drawback of requiring weights that are
model-dependent. Thus the question of which evaluation measure
to adopt may boil down to empirical performance. The perfor-
mance depends, among other factors, on the nature of the model
mis-specications, and on how well the weighting matrix is esti-
mated. We generate returns from the principal components of
the test assets (for both FF25 and FF55) and add noise that pre-
serves the covariances between these test asset returns. The
weights of the principal components on the test assets are now
the known true factor weights and we consider models that have
different correlations with these true factor weights. We nd that
KS generally outperforms the other two measures in terms of
selecting the better model the model with the highest weight
on the true factors. The performance is better for the FF55 test as-
sets case compared to the FF25 test assets and when the difference
between the models is more pronounced. Using the consensus of
the KS measure together with OLS RSQ substantially reduces the
chance of selecting the worse model when the better model is true,
whereas using the consensus of the KS measure together with HJ
works better when the better model is not true.
Acknowledgements
The authors thank Strat Douglas, Alexei Egorov, Raymond Kan,
Ravi Shukla, Raja Velu, Zhi-Gan Wang, and especially the anony-
mous referee, as well as seminar participants at Syracuse Univer-
sity, Fordham University, UNC-Greensboro, California State
University at Fullerton, the 2005 Financial Management Associa-
tion annual meetings, and the 2006 Midwest Finance Association
annual meetings, for valuable comments. The usual disclaimer
applies.
Appendix A
Proof of Proposition 1. The rst equality in (10) follows from (9).
Substitute (4) and (5) into (3), use r
2
g
1=C, and take the rst-
order condition to obtain
l B
k
=C
k

D
k
=C
k
1=C
k
1=C
B
k
=C
k
B=C
: A1
Substitute the minimizing l from (A1) into (3) to produce Eq.
(7). h
Proof of Proposition 2. For the MSE(k) criterion in (12) subject to
(2), the rst-order conditions for a
k
; b
k
are:
1
0
V
1
l
^
lk 0; A2
S
0
k
l
^
lk 0: A3
Hence, we nd
b
k
l
k
1
k
a
k
; A4
a
k
B B
k
=C C
k
; A5
where (A4) follows from (A3) and (2) given that S
0
k
^
l l
k
and (A5)
uses (A2) and (2) plus the denitions in Eq. (6). Eqs. (2), (A4) and
(A5) together imply
^
lk l
0
1 VS
k
S
0
k
VS
k

1
l
k
l
0
1
k
; A6
where l
0
k a
k
. It is straightforward to verify using (2), (A4) and
(A5) that
^
lk
0
V
1
l
^
lk 0: A7
Thus, expanding (12) and using (A7) implies:
MSEk l
0
V
1
l
^
l
0
kV
1
^
lk; A8
11
To check the effect of the variability of the factors, we conducted the simulation
with the false factors set to be twice as noisy. While the absolute performance of the
evaluation measures was sometimes greatly affected, there was little effect on the
relative performance of the three measures. The results are available from the
authors.
12
We applied the measures to evaluate a set of asset pricing models as in Hodrick
and Zhang (2001) but using all three evaluation measures. The models are those of
Chen et al. (1986), Fama and French (1992), Fama and French (1996), Balvers and
Huang (2007), Campbell and Cochrane (1999), Cochrane (1996), Lettau and Ludvigson
(2001a), Lettau and Ludvigson (2001b), Sharpe (1964), Lintner (1965), Black (1972)
and Breeden (1979). The model rankings by each of the evaluation measures, with
and without corrections for the number of factors, are available from the authors.
1594 R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596
where
^
lk
0
V
1
^
lk
B
k
=C
k
B=C
2
1=C
k
1=C
D
k
=C
k

!
B
2
=C: A9
(A9) follows from (A6) plus
BB
k

2
CC
k

B
k
=C
k
B=C
2
1=C
k
1=C
B
2
k
=C
k

B
2
=C.
The third equality of Eq. (15) follows by relating (A7) and (A8)
to (7) using (6).
The standard cross-sectional GLS RSQ is dened following
Kandel and Stambaugh (1995) as RSQk 1 fMSEk=l
1

lk
0
V
1
l 1

lkg, where

lk is the GLS estimate of the
constant which becomes

lk B=C here. Thus, it follows from
(A8), (A9) and (6) that:
RSQk C=D
B
k
=C
k
B=C
2
1=C
k
1=C
D
k
=C
k

!
: A10
Comparison to (7) shows the second equality in (15) that
RSQk KSk.
To conrm footnote 2, consider the FamaMacBeth evaluation
approach which prescribes calculating the cross-sectional R-square
for each period and averaging. This implies for inverse-covariance-
matrix-weighted errors:
MSE
FM
k Min
a
k
;b
k
Efr
^
lk
0
V
1
r
^
lkg; A11
The resulting rst-order conditions and parameter equations
are the same as for MSE(k). Plugging these parameters back into
(A11) implies
MSE
FM
k Er
0
V
1
r
^
lk
0
V
1
^
lk: A12
Since RSQ
FM
k 1 fMSE
FM
k=Er 1

l
0
V
1
r 1

lg with

l
the average return of the assets, it is straightforward that
RSQ
FM
k 1 fMSE
FM
k=Er
0
V
1
r

l
2
Cg, which is given in
footnote 2 (Note that

l B=C if estimated as the constant in a
GLS regression). As Er
0
V
1
r > lk
0
V
1
lk we have MSEk <
MSE
FM
k and RSQ
FM
k < RSQk.
For MIM(k) dened in (15) subject to (2), consider the portfolio
choices implied by model k for a given choice of variance. The
efcient portfolio decisions s
^
lk; a
k
; b
k
based on the mean
returns implied by model k for the standard mean-variance case
are derived from:
Max
s
s
0
^
lk; a
k
; b
k
; subject to : s
0
Vs r
2
; s
0
1 1; A13
which yields the rst-order conditions
^
lk; a
k
; b
k
kVs j1; A14
with k; j the Lagrangian multipliers for the rst and second con-
straints, respectively, in (A13). Note that k > 0. To solve for the mul-
tipliers use the efcient portfolio choices s from (A14) into the two
constraints in (A13). This yields:
k
2
r
2

^
l
0
V
1
^
l 2j1
0
V
1
^
l j
2
C; A15
k 1
0
V
1
^
l jC: A16
Substituting out j leaves:
k
2

^
l
0
V
1
^
l 1
0
V
1
^
l
2
=C
r
2
1=C
: A17
From (A14) and (A17):
s
^
lk; a
k
; b
k

V
1
^
l V
1
11
0
V
1
^
l=C
k

V
1
1
C
: A18
(A19) and (A18) give:
l
0
s
^
lk B=C
r
2
1=C
1=2

^
l
0
V
1
^
l 1V
1
^
lB=C
1
0
V
1
^
l 1
0
V
1
^
l
2
1=C
1=2
: A19
Use (2) to nd the rst-order conditions for a
k
; b
k
. The constant
a
k
is irrelevant and
b
k
l
k

B B
k
C C
k

1
k
; A20
assures that the numerator of (A20) is equal to the square of its
denominator and that the rst-order condition holds. Substituting
(A21) into (A20) given (2) produces
l
0
s
^
lk B=C
r
2
1=C
1=2
!
2
C
k
=CB B
k

B B
k
C C
k
A
k
B
k
B=C: A21
The right-hand side, with some straightforward algebra, is equal to
KSk=C=D. This proves the nal equality in (15). h
Proof of Proposition 3. The (square of the) HJ-distance given in
(16) relates to the pricing errors based on a linear SDF:
gk Erh
k
r
0
S
k
d
k
1 lc
k
VS
k
d
k
1 and
c
k
h
k
l
0
k
d
k
; A22
which yields Eq. (17). Use a standard matrix inversion expression:
V ll
0

1
V
1
V
1
l1 A
1
l
0
V
1
: A23
Substitute (A22) and (A23) into (16) and obtain the rst-order
conditions for c
k
; d
k
:
l
0
k
d
k
B c
k
A; A24
l
k
c
k
V
k
d
k
1
k

l
k
c
k
A l
0
k
d
k
B
1 A
;
with, from (A24), the right-hand-side term equal to zero:
l
k
c
k
V
k
d
k
1
k
0: A25
The mean returns for model k follow from (A22) and ^ gk 0:
^
lk 1=c
k
VS
k
d
k
=c
k
: A26
Note that, by (A24) and (A26), (A25) is equivalent to
l
0
V
1
l l
0
V
1
^
lk: A27
We now obtain from (A25) that
^
lk 1=c
k
VS
k
V
1
k
1
k
=c
k
l
k
; V
k
S
0
k
VS
k
: A28
Combining (A24) and (A25) and using the denitions of B
k
; A
k
yields
c
k

B B
k
A A
k
: A29
Rewrite Eq. (16) as
HJk c
2
k
l
^
lk
0
V
1
V
1
ll
0
V
1
=1 Al
^
lk;
A30
and employ (A27),
HJk c
2
k
^
lk
0
V
1
l
^
lk: A31
Using (A26), (A25) and (A24) we have
^
lk
0
V
1
^
lk C C
k
=c
2
k
A
k
; A32
Now (A31) becomes, using (A32), (A27) and (A29):
HJk C C
k
B B
k

2
=A A
k
: A33
Straightforward comparison with Eq. (7) implies Eq. (18) in the
text. h
R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596 1595
References
Ahn, S.C., Gadarowski, C., 2004. Small sample properties of the GMM specication
test based on the HansenJagannathan distance. Journal of Empirical Finance
11, 109132.
Balvers, R.J., Huang, D., 2007. Productivity-based asset-pricing: Theory and
evidence. Journal of Financial Economics 86, 405445.
Black, F., 1972. Capital market equilibrium with restricted borrowing. Journal of
Business 45, 444454.
Breeden, D.T., 1979. An intertemporal asset-pricing model with stochastic
consumption and investment opportunities. Journal of Financial Economics 7,
265296.
Campbell, J.Y., Cochrane, J.H., 1999. By force of habit: A consumption-based
explanation of aggregate stock market behavior. Journal of Political Economy
107, 205251.
Chen, N.F., Roll, R., Ross, S.A., 1986. Economic forces and the stock market. Journal of
Business 59, 383403.
Cochrane, J.H., 1996. A cross-sectional test of an investment-based asset-pricing
model. Journal of Political Economy 104, 572621.
Cochrane, J.H., 2001. Asset-Pricing. Princeton University Press, New Jersey.
Eling, M., Schuhmacher, F., 2007. Does the choice of performance measure
inuence the evaluation of hedge funds? Journal of Banking and Finance 31,
22472632.
Fama, E.F., MacBeth, J.D., 1973. Risk return and equilibrium: Empirical tests. Journal
of Political Economy 71, 607636.
Fama, E.F., French, K.R., 1992. The cross-section of expected stock returns. Journal of
Finance 47, 427465.
Fama, E.F., French, K.R., 1996. Multifactor explanations of asset-pricing anomalies.
Journal of Finance 51, 5584.
Farinelli, S., Ferreira, M., Rossello, D., Thoeny, M., Tibiletti, L., 2008. Beyond Sharpe
ratio: Optimal asset allocation using different performance ratios. Journal of
Banking and Finance 32, 20572063.
Gibbons, M.R., Ross, S.A., Shanken, J., 1989. A test of the efciency of a given
portfolio. Econometrica 57, 11211152.
Grauer, R.R., Janmaat, J.A., 2009. On the power of cross-sectional and multivariate
tests of the CAPM. Journal of Banking and Finance 33, 775787.
Grinblatt, M., Titman, S., 1987. The relation between mean-variance efciency and
arbitrage pricing. Journal of Business 60, 97112.
Hansen, L.P., Jagannathan, R., 1997. Assessing specication errors in stochastic
discount factor models. Journal of Finance 52, 557590.
Hodrick, R., Zhang, X., 2001. Evaluating the specication errors of asset-pricing
models. Journal of Financial Economics 62, 327376.
Huberman, G., Kandel, S., Stambaugh, R.F., 1987. Mimicking portfolios and exact
arbitrage pricing. Journal of Finance 42, 19.
Jaganathan, R., Wang, Z., 1996. The conditional CAPM and the cross-section of
expected returns. Journal of Finance 51, 353.
Jagannathan, R., Wang, Z., 2002. Empirical evaluation and asset-pricing models: A
comparison of the SDF and beta methods. Journal of Finance 57, 2337
2367.
Kan, R., Zhou, G., 2004. HansenJagannathan distance: Geometry and exact
distribution. Working Paper, University of Toronto.
Kandel, S., Stambaugh, R.F., 1995. Portfolio inefciency and the cross-section of
expected returns. Journal of Finance 50, 157184.
Lettau, M., Ludvigson, S.C., 2001a. Consumption, aggregate wealth, and expected
stock returns. Journal of Finance 56, 815849.
Lettau, M., Ludvigson, S.C., 2001b. Resurrecting the (C)CAPM: A cross-sectional test
when risk premia are time-varying. Journal of Political Economy 109, 1238
1287.
Lewellen, J., Nagel, S., Shanken, J., forthcoming. A skeptical appraisal of asset-pricing
tests. Journal of Financial Economics.
Lintner, J., 1965. The valuation of risk assets and the selection of risky investments
in stock portfolios and capital budgets. Review of Economics and Statistics 47,
1337.
Roll, R., 1977. A critique of the asset-pricing theory tests part I. Journal of Financial
Economics 4, 129176.
Roll, R., 1985. A note on the geometry of Shankens csr T
2
test for mean variance
efciency. Journal of Financial Economics 14, 349358.
Roll, R., Ross, S.A., 1994. On the cross-sectional relation between expected returns
and betas. Journal of Finance 49, 101121.
Shanken, J., 1985. Multivariate tests of the zero-beta CAPM. Journal of Financial
Economics 14, 327348.
Sharpe, W.F., 1964. Capital asset prices: A theory of market equilibrium under
conditions of risk. Journal of Finance 19, 425442.
Skoulakis, G., 2005. Assessment of asset-pricing models using cross-sectional
regressions. Working Paper, Northwestern University.
Wang, Z., Zhang, X., 2003. Arbitrage and the empirical evaluation of asset-pricing
models. Working Paper, Cornell University.
Zakamouline, V., Koekebakker, in press. Portfolio performance evaluation with
generalized Sharpe ratios: Beyond the mean and variance. Journal of Banking
and Finance.
1596 R.J. Balvers, D. Huang / Journal of Banking & Finance 33 (2009) 15861596

You might also like