CMER Working Paper Series

Working Paper No. 97-16

The Mirage of Portfolio Performance Evaluation Naim Sipra
Opposite Sector ‘U’, DHA, Lahore Cantt.54792, Lahore, Pakistan e-mail: Sipra@lums.edu.pk

November, 1997

CENTRE FOR MANAGEMENT AND ECONOMIC RESEARCH Lahore University of Management Sciences
Opp. Sector ‘U’, DHA, Lahore Cantt. 54792, Lahore, Pakistan Tel.: 92-42-5722670-79, x4222, 4201 Fax: 92-42-5722591 Website: www.lums.edu.pk/cmer

The Mirage of Portfolio Performance Evaluation
You can’t tell which way the train went by looking at the tracks

ABSTRACT
Risk-adjusted performance measures are essential for evaluating portfolio performance. This is needed not only to reward or punish portfolio managers but also to conduct financial research where it is necessary to differentiate between strong and weak performance. This paper investigates the practical limitations of finding appropriate performance measures by looking at the inability of three popular portfolio measures – Sharpe, Jensen and Treynor—to evaluate portfolio performance in a meaningful way. This paper makes the point that the problems with portfolio performance are not restricted to these measures but are likely to exist with any other measure of performance devised since they stem from the random walk properties of the stock prices.

2

a non-observable quantity. Given this difficulty with the definition and measurement of risk. it is impossible to separate managerial skill from chance. risk premium. we feel that the hope of finding an effective performance measure is also unlikely. and chance. thus. So we devise performance measures that can provide us with this ranking. the risk premium is for ex-ante risk. managerial skill. However. notwithstanding. The problem with portfolio performance evaluation is as follows. the available performance measures are probably the best example of “The Emperor has no clothes” that can be found in finance literature. regarding which there is no consensus how to even define it and which. In this paper we will show that all the performance measures developed so far fail to perform the task of evaluating a portfolio’s performance in any meaningful way. cannot be estimated in an unambiguous way by observing the outcome. The observed return from a portfolio is a function of the risk free rate.The Mirage of Portfolio Performance Evaluation You can’t tell which way the train went by looking at the tracks INTRODUCTION Portfolio performance evaluation is one of the most important areas in investment analysis. however. The already near impossible problem of evaluating portfolio performance is compounded by our penchant for ranking. that in order 3 . Moreover. We like to know who is number one.

For example. As representatives of this class of measures we will look at the Sharpe (1966). we hope that it will be readily obvious to everyone that our criticism 4 . These assumptions may be all right for modeling. Jensen (1968). but certainly not for determining fiscal rewards and punishments. Despite these theoretical difficulties. one may justify using a particular performance measure for evaluation if it can be shown that if this measure is used as a decision making tool for future investments one gets better results than any other investment strategy. Nonetheless. at a minimum.to do so we have to make heroic assumptions like unlimited short sales and unlimited borrowing at the risk free rate without which it is not possible to find a best portfolio that is best for everyone. and Treynor (1965) measures of portfolio performance2. we could consider a performance measure as useful if we used it as an investment selector and it gave superior performance according to its own criterion. the Sharpe measure should perform well at least according to the Sharpe measure. as that is what we are after. These are the classic risk-adjusted portfolio performance measures that every textbook on investments discusses in their performance evaluation chapters. the hope of finding a measure that by evaluating past performance can guide us how to obtain superior performance in the future is doomed for failure given that it flies in the face of weak form of market efficiency. what we mean by better results is problematic. Unfortunately. Since problems with using returns alone for portfolio performance are well known we will concentrate on risk-adjusted performance measures. However. 1 Of course.

five. 13. Jensen and Treynor indices. and six record the rankings according to the Sharpe. respectively. column three records the expected returns. For portfolios 11 through 15 the returns were +22% and -18% so in either case they performed 2% better than simply random portfolios.5 percent was used to calculate the Sharpe. Similarly. Column two of Table 1 records the average returns obtained over the twenty periods for each portfolio.#1. The top ten funds ranked by the Sharpe measure given in Table 2 below.5. that is.of these measures is not just restricted to these three measures but to the entire class of risk-adjusted performance measures. and two funds -. and Treynor measures.that had the expected value of zero. The average of these portfolios was taken as the market portfolio. TEST OF PERFORMANCE MEASURES AS PERFORMANCE MEASURES To make the first point. For portfolios one through ten the returns for each period were determined by generating a random number between zero and one and assigning a return of 20% if the number was larger than 0. let us consider the rankings given by the Sharpe measure.18 -. they performed 2% worse than the random portfolios. that it is virtually impossible to separate superior performance from the other elements of a portfolio’s return.4.that had the expected value of -2 percent.#16. and 14 show up in the top ten. Jensen. consider the simulated returns earned by twenty portfolios over twenty periods given in Table 1.5 and -20% if the number was less than or equal to 0. inferior and random performers by any of them. only funds #12. are five funds -. and for portfolios 16 through 20 the returns were +18% and -22%.5. From amongst the superior performers. 3 5 . For an interpretation of these results.6. whereas. Columns four. the rankings given by the other measures illustrate a complete lack of any differentiation between the superior.10 -. and risk free rate of 0.

02 0.0843 -0.0202 0.0203 0.0202 -0.0406 -0.0626 0.02 0.0203 -0.0795 -0.0843 0.Table 1 Performance rankings for randomly generated portfolios 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Geometric Expected Return Sharpe Mean Return Ranking 0.02 4 18 5 1 15 6 12 13 20 14 3 8 9 11 2 19 7 16 17 Jensen Ranking 10 4 18 5 6 15 1 20 12 13 9 14 8 3 2 11 7 19 16 17 Treynor Ranking 16 17 10 20 4 18 5 15 1 12 13 3 8 14 19 6 11 2 9 7 6 .02 0.0202 0.02 -0.0399 -0.0195 0.0407 -0.0591 -0.02 -0.02 0.02 -0.02 -0.0388 0.0640 -0.0412 0 10 -0.0203 -0.0001 0 0 0 0 0 0 0 0 0 0.0984 0.02 -0.

Table 2 Performance of Performance measures Measure Performance of Sharpe Measure Sharpe ranking Fund E(return) % 1 5 0 2 16 -2 3 12 2 4 2 0 5 4 0 6 7 0 7 18 -2 8 13 2 9 14 2 10 1 0 Performance of Jensen Measure Jensen ranking Fund E(return) % 1 7 0 2 15 2 3 14 2 4 2 0 5 4 0 6 5 0 7 17 -2 8 13 2 9 11 2 10 1 0 Performance of Treynor Measure Treynor ranking Fund E(return) % 1 9 0 2 18 -2 3 12 2 4 5 0 5 7 0 6 16 -2 7 20 -2 8 13 2 9 19 -2 10 3 0 7 .

and Treynor measures made good investment selectors we looked at three sets of data. and consequently. monthly returns over a five-year period were used to calculate Sharpe. For each rank of the three performance measures holding period returns for the next year were recorded. Jensen. if one shows that these measures do not make good investment selectors our results may be biased4. the first twelve values were dropped and monthly returns for the next sixty months were used to determine each portfolio’s rank according to the Sharpe. To counter this possible criticism.TEST OF PERFORMANCE MEASURES AS INVESTMENT SELECTORS Data sets To determine whether Sharpe. we also looked at monthly returns for 69 securities. The following 8 . Jensen. Next. Each portfolio was then ranked according to the three performance criteria. and Treynor indices for each portfolio. and Treynor measures. Jensen. Analysis For each data set. S&P 500. One may argue that within the mutual funds there is a lot of changing of securities over time so calculation of betas and standard deviations will not meet the stationarity criteria that are necessary for the calculation of the Sharpe. as well as 69 portfolios created out of these securities by random selection of 10 securities and equally weighting them. and treasury bills for the period January 1967 to December 1992. Jensen. The first data set consisted of monthly returns for 231 mutual funds. S&P 500 and treasury bills for the period January 1972 to December 1989. and Treynor measures.

As can be seen. By repeating this process a series of returns corresponding to each performance rank were obtained for the three performance measures. Columns two through six represent holding period returns corresponding to investing in funds ranked #1 through #5 each year. This is illustrated in Table 3 for just the top five funds ranked according to the Sharpe measure for the data set of 69 individual securities. the Sharpe measure does not fare very well as a predictor of future performance even by its own criterion.year’s holding period returns were then recorded as before. 9 . The last row shows the rankings of these holding period returns according to the Sharpe measure.

099 .033 .367 .577 .198 .169 .217 .028 .164 .257 .177 -.208 .278 .228 .157 -.400 -.210 .566 .068 -.728 .210 .041 60 10 .198 29 3 Returns -.376 .046 .299 .014 .217 .2196 .014 .184 .060 .146 .311 -.089 .291 .759 .163 .071 .471 .077 .102 -.485 .085 -.235 .029 .134 .077 .255 .048 7 5 -.285 .440 .366 -.1341 -.0667 .660 .216 -.431 -.259 .244 -.264 -.236 .069 .549 .328 .014 .287 .639 .097 .341 .060 .131 .Table 3 Holding Period Returns for Investing According to Top Five Sharpe Ranks Original Sharpe Rank Holding period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Sharpe rank for holding period returns 1 -.195 .149 .235 -.043 .416 .383 .231 .048 .317 .045 2 2 -.106 .263 .817 55 4 -.268 -.011 .852 -.010 .634 .534 .019 .091 .069 .073 .303 -.099 .149 .300 .233 -.049 .383 .086 .270 .119 .

The results of regressing original rankings with holding period rankings for the three performance measures and for the three data sets are given in Table 4. at least. A positive beta from this regression would suggest that. leads to future high performance. That is. according to the Sharpe measure. if one invested each year in portfolios that had high Sharpe rankings. according to the Sharpe measure.The last row for the entire 69 securities was regressed against the first row to determine whether there was any correlation between the original rankings and the holding period returns ranking. This result would validate to some extent the use of Sharpe measure as a performance measure even if it failed to distinguish between chance and managerial performance on a yearly basis. 11 . then the outcome of such an investment strategy would also have a high Sharpe ranking. investing in past high performers.

553 .048 .072 .048 .908 12 .097 .190 Treynor .023 -.073 .110 .397 .392 -.130 1.595 .737 .Table 4 Results of Regressing Original Rankings with Rankings of Holding Period Portfolios Regression coefficients for rankings based on: Sharpe Mutual Funds data set Portfolios from 69 securities 69 Individual Securities Beta t-ratio Beta t-ratio beta t-ratio .056 .074 Jensen .890 .

mutual funds. This suggests that none of the performance measures did well as investment selectors even according to their own criterion. The results are given in Table 5. according to each criterion. we compared investing in the #1 fund. or the buy and hold strategy performed better according to the performance measure’s own criterion. As can be seen. 13 . Next. in virtually all cases there was either no significant difference between the buy and hold strategy and the #1 strategy. and the random portfolios-. with a buy and hold strategy.In all cases -. individual securities.the beta coefficient is not significantly different from zero.

225 .915 .804 .006 .026 .029 Treynor .065 .152 Jensen .125 14 .187 .417 .437 .156 .Table 5 Comparison of Returns from a Buy and Hold Strategy with Returns from Portfolios Selected Based on # 1 Ranking Sharpe Mutual Funds Portfolios Buy & Hold # 1 Strategy Buy & Hold # 1 Strategy Individual Securities Buy & Hold # 1 Strategy .166 .079 .219 .051 .077 .

15 . the strategies of investing in the #1 ranked funds were compared with the strategy of randomly selecting a fund each year. One thousand trials were run for each data set.Lastly. Table 5 shows the number of times the random selection strategy beat out the #1 strategy according to its own measure of performance. The results indicate the failure of these measures to outperform even completely random selection.

Table 6 Comparison of Completely Random Selection with Selections Based on # 1 Ranking Sharpe Mutual Funds Portfolios 419 Individual Securities 497 797* Jensen Treynor 502 577 770 482 614 473 * Number of times. random selection strategy beat out selection based on the #1 strategy according to its own criterion. out of 1000 trials. 16 .

Jensen. of course. See. for example. RELATED LITERATURE There are two sets of papers available in the finance literature that draw similar conclusions as the one made in this paper. The second set of papers are those that talk about persistence of winners. and Kahn and Rudd (1995). Goetzmann and Ibbotson (1994). The reason for this pessimism is that a performance measure that can be believed to be able to separate managerial skill from mere chance has to be one that can also be relied upon to make future investment selections. The evidence presented in 17 . and Treynor indices as performance measures. And this ability for superior future selectivity has to rely exclusively on past data. is that this ambiguity will be there no matter what performance measure is used.FUTURE OF PERFORMANCE MEASURES While in this paper we have demonstrated the uselessness of only Sharpe. Our point. but neither lay out in the stark terms that we have the failure of all performance measures. As a representative of the first set of papers one may start with Roll’s famous critique of CAPM tests (1977) where he showed the impossibility of testing CAPM. a requirement that defies all available evidence in favor of weak form of market efficiency. and in his later paper discussed the ambiguity in performance evaluation if CAPM is used as a benchmark (1978). it is our contention that the likelihood of finding any meaningful portfolio performance measure in the future is also not very good.

despite our fervent desire to be able to distinguish winners from losers. CONCLUSION How much a portfolio earns as returns is. is Robert Ferguson’s (1986). intuitively the most appealing measure of performance. it leaves us where we always were -.these papers suggests that winners do not repeat themselves. that are touted as improvements over the naive returns as measures of performance. the paper that comes closest to ours. If anything. However. ‘The Trouble with Performance Measurement: You cannot do it. though it follows different reasoning. Given the arguments presented in this paper the chances of finding such a definition appears to be remote. you never will. perhaps. Surprisingly. Our response to this research is that the question of whether winners repeat themselves or not cannot be answered until we have a meaningful definition of a winner. it has been demonstrated conclusively that simply looking at the returns may be misleading due to the element of chance present in the outcome of any portfolio. this common-sensical argument has not been applied to riskadjusted performance measures to show their inefficacy. So where does this all lead us? Unfortunately. not to mention the problem of needing ex-ante risk but observing only ex-post risk. and who would want to?’. It has been 18 . In terms of the spirit.nowhere!5 The problem with the performance measures lies in the way returns are generated. have even more problems because risk cannot be defined in a universally acceptable way. the risk-adjusted performance measures.

nor yet favor to men of skill. and saw under the sun. neither yet bread to the wise. nor yet riches to men of understanding. that the race is not to the swift. but time and chance happeneth to them all.” 19 . but perhaps the problem is even older and deeper as the following verses from the Ecclesiastics so eloquently state: “I returned. nor the battle to the strong.more than a century since we have known about the randomness of stock prices.

pp. Part I: On Past and Potential Testability of the Theory’. Journal of Portfolio Management .REFERENCES Ferguson. ‘ A Critique of the Asset Pricing Theory’s Tests. pp. Journal of finance. Robert. Financial Analysts Journal. Richard. ‘ Do Winners Repeat?’. Journal of Portfolio Management. Jensen. (1986).. 389-416. Kahn. pp. William N. Roll. Rudd. (1995). pp.. ‘ The Performance of Mutual funds in the Period 1945-1964’. Spring 1986. 4. September 1978. 4-9. Ibbotson. pp. (1968). 33. Winter 1994. Richard. 9-18. Roger G. ‘Ambiguity When Performance is Measured by the Securities Market Line’. Andrew. 129-176. (1977). Ronald N. March 1977. Journal of Finance. 2. ‘ Does Historical Performance Predict Future Performance’. Michael. Journal of Financial Economics. 43-52. pp. No. ‘The Trouble with Performance Measures’. (1978). 1051-69. May 1968. November-December 1995. Roll. Goetzmann. 20 .(1994).

that is. Rf is the riskfree rate. Jack L. 22 Sharpe measure : (Ri . 1 21 . risk premia. pp. pp. σ is the standard deviation of security i’s returns and β is security i’s systematic risk. 63-75. 43. even in this relatively simple case where the element of chance is equivalent to a tossing of a coin with the same fixed outcomes each year. ‘ How to Rate Management Investment Funds’. chance and skill may all be varying in size over time. (1965). ‘Mutual fund Performance’. William F. January-February 1965. However. one did not expect it to be a predictor of good future performance. (1966). January 1966.Rf)/βi Where Ri is return on security i. 5 If we are willing to forego our desire to rank. 4 Though this can hardly be considered a valid criticism because it is exactly in these circumstances that these measures are actually used. No one would want to reward a past performance if it was merely due to chance. Harvard Business Review.Rf)/σi Jensen measure : Ri . then one can hardly view it as a practicable way of performance evaluation in more realistic situations where the risk free rate. Treynor. then we can at least divide the funds into groups that may be preferred by a certain class of investors on the basis of stochastic dominance rules. if it takes more than twenty periods to evaluate a portfolio’s performance. a longer period of observation will allow us to separate the superior performers from inferior performers. 119-138. 3 Ofcourse.[Rf + βi (Rm – Rf) Treynor measure : (Ri . The notions of past performance and future performance are obviously linked. Journal of business.Sharpe.

Sign up to vote on this title
UsefulNot useful