Asset Management: 4. Benchmarking and Performance Evaluation

Asset Management:
4. Benchmarking and Performance Evaluation
Felix Wilke
Nova School of Business and Economics
Spring 2024
Topics
• Benchmarking answers two important questions:

• To what factors are assets, portfolios, mutual funds, and hedge funds exposed?
• Do these investments obtain alpha, a return in excess of the benchmark?
• Active management tries to beat the benchmark.

• Market timing and selection.
• When to invest (how much in stocks? E.g., volatility timing or countercyclical rebalancing
when returns are predictable) vs. where to invest (which stocks? E.g., value vs. growth or
winners vs. losers).
• In the middle between “when” and “where”: characteristics-timing (recall: value-timing).
• Conclusions on the historical performance of mutual fund managers.

• Does alpha measure manager skill?
• Value added.
1/32
Performance measures
Risk-reward ratios
• The reward for investments is

• expected return, or
• expected abnormal return, that is, alpha.
• Positive reward is good and negative reward is bad.
• But is higher expected return always better?
• What about risk?
• Returns/alphas depend on how a strategy is scaled.
• A twice-leveraged strategy has twice the return/alpha of an unleveraged version of the
same strategy.
• Solution: We need to consider reward-to-risk ratios.
f
• Sharpe ratio: SR = E(r−r f)
σ (r−r )
• SR measures the ”reward” for taking risk, per unit of risk that you take.
• Reward = expected return over the risk-free rate.
• Risk = standard deviation of return.
2/32
Information ratio, appraisal ratio
• The information ratio is defined as

E ( r − rb )
IR = ,
σ (r − rb )
where rb is the return on a benchmark b (usually annualized).
• The denominator is also called the ”tracking error.”
• The benchmark exposure may be estimated using a regression:
ret = α + β(rbt − rf ) + ε t
• Note, that the first definition implicitly sets β = 1. In the general case, the IR is
defined as:
α
IR = ,
σ(ε)
which is then also called appraisal ratio.
3/32
High water mark
• Consider a hedge fund’s price of shares or its cumulative return Pt , where:
Pt = Pt−1 × (1 + rt )
• The high water mark (HWM) is the highest price Pt (or highest cumulative return) it
has achieved in the past:
HWMt = max Ps
s≤t
• Often, hedge funds only charge performance fees when their returns are above their
HWM.
• Hence, if they have experienced losses, they must first make these back and only
charge performance fees on the profits above their HWM.
4/32
Drawdown
• An important risk measure for a hedge fund strategy is its drawdown (DD). The
drawdown is the cumulative loss since losses started:
HWMt − Pt
DDt =
HWMt
• Experiencing large drawdowns is costly and risky:
• Can lead to redemptions from investors.
• Concerns from counterparties, e.g., prime brokers increasing margin requirements or
completely pulling the financing of the hedge fund’s positions.
• When evaluating a strategy, people sometimes consider its maximum drawdown
(MDD) over some past time period:
MDDT = max DDt

t≤T
5/32
Benchmarking
Benchmarking: The basics (1/2)
• Alpha is the average return in excess of a benchmark.

• Often interpreted as measure of skill, but it is first and foremost a statement about a
benchmark! (Also, net alpha determined by competition in equilibrium. More on that
later.)
• S&P500, Russell 2000, CAPM, FF3M …. ?
• Define the active return of an asset or portfolio strategy in excess of a benchmark as
ret = rt − rbmk
t
• Performance metrics:
• Alpha (outperformance over benchmark): α = 1
T ∑Tt=1 ret √
( )2
• Tracking error (deviations from benchmark): σTE = σre = 1
T ∑iT=1 rt − rbmk
t
t
• Information ratio (outperformance per unit of tracking error): IR = α

σTE
• A “benchmarked” Sharpe ratio: If the benchmark is the risk-free rate, IR = Sharpe ratio.
6/32
Benchmarking: The basics (2/2)
• These three metrics (α, TE, IR) are used to evaluate performance of mutual funds,
hedge funds, asset managers etc.
• Often a fund manager receives bonus if fund ”beats the benchmark,” α > 0
• But manager may not depart too much from the given benchmark:
σTE < tracking error limit, typically around 6% per year.
• To preclude the manager from simply increasing risk.

• σTE is a measure of idiosyncratic volatility, when the benchmark is risk-adjusted (i.e.,
benchmark exposure β estimated and not assumed to be 1).
• IR measures average active return per unit of risk.

• Although many active funds claim to achieve IR > 1 consistently, this is extremely rare.
7/32
Benchmarks matter: An example
• Martingale Asset Mgmt proposes its sponsor GM Pension Fund to follow a betting
against beta (BAB) strategy based on stocks in the Russell 1000 index.
• Long (short) low (high) beta stocks.
• The sponsors calculate historical performance relative to the Russell 1000:
• rBAB,e
t = rBAB
t − rtR1000
• αBAB = 1.5%, σBAB,TE = 6.16%, IRBAB = 0.24
• This is hardly impressive. Why would GM pay Martingale?
• Wrong benchmark: although stocks drawn from Russell 1000, risk profile is different
(beta wrt. Russell 1000 is assumed to be 1!)
• Consider the CAPM regression:
• rBAB
t − rf = αBAB + β BAB (rR1000
t − rf ) + ε BAB,t
• αBAB = 3.44%, β BAB = 0.73, σε BAB = 4.41%
• The benchmark beta for the strategy is only 0.73!
• Large outperformance of risk-adjusted benchmark with IRBAB = 0.78!
8/32
Factor benchmarks
• Estimating the risk-adjusted benchmark using factor regressions directly identifies

the replicating portfolio that captures alpha!
• For the case of BAB we have:
• CAPM → rBAB
t − rf = αBAB + β BAB (rR1000
t − rf ) + ε BAB,t = 3.44% + 0.73(rR1000
t − rf ) + ε BAB,t
• rBAB
t − 0.73rR1000 − (1 − 0.73)r = 3.44% + ε
t f BAB,t
• αBAB = 3.44% is the expected return of investing 1$ in BAB and shorting 0.73$ of the
market and 0.27$ in T-bills.
• The standard deviation of this combined portfolio is the tracking error
(σε BAB = 4.4%), which can be calculated using:
√
• σε BAB = σr2BAB − β2BAB σr2R1000
t t
• σε BAB is a measure of idiosyncratic volatility when the benchmark is risk-adjusted.
9/32
Multi-factor benchmarks
• Multi-factor models used in performance evaluation define additional

zero-investment, benchmark factors.
• For the FF3M model, we have the replicating portfolio:
• β M in the market portfolio and 1 − β M in the risk-free asset,
• β SMB in small stocks, − β SMB in big stocks,
• β HML in value stocks, − β SMB in growth stocks.
• Analogous interpretation for other multi-factor models (e.g. including momentum).
10/32
Why separate alpha and beta?
• Recall: Excess return ret = rt − rf separated into alpha and beta:
ret = α + β(rM
t − rf ) + ε t
• β = market risk, tendency to follow the market.

• α = excess return, adjusted for market risk, due to trading skills (or luck).
• ε t = idiosyncratic risk.
• Market neutral hedge fund: β = 0
• Market neutral excess return: = ret − β(rM
t − rf ) = α + ε t
• E(Market neutral excess return) = α
• β(rM
t − rf ): can be achieved using an index fund.
• No need to pay large fees for this (e.g., Vanguard etc. will do it for around 0.1%).
• Beta is necessary for risk management:
• If you mix a hedge fund into a portfolio with other market risk, beta risk will not go away.
• How large fee should investors at most pay?
11/32
Characteristics of ideal benchmarks
1. Well defined, i.e. verifiable and free of ambiguity about its contents.
• “US equities” or “Value” is too vague.
2. Tradable and replicable. Alphas should be implementable returns on investment

strategies, with low-cost benchmarks.
• Fama-French Size and Value factors aren’t really tradable, so we should use actual size
and value funds or ETF’s as benchmarks (Cremers, Petajisto, and Zitzewitz, 2012; Berk and
van Binsbergen, 2015).
3. Adjusted for risk.

• Ang (2014): “Sadly, most benchmarks used in the money management business are not
risk-adjusted.”
• Ideally: not only market risk!
12/32
Finding alpha
Does α even exists?
• The purpose of benchmarking is to identify whether an investment obtains an excess

return after accounting for risk.
• Then, choosing the right set of benchmark factors is the most relevant issue.
• For each investment strategy, ex-post one can always find a benchmark such that no α
exists, but it is of little help ex-ante.
• Not every alpha should be rationalized arguing we have the wrong benchmark.
• Case study: Buffett’s alpha (Frazzini, Kabiller and Pedersen, 2018).

• Sharpe ratio of Berkshire Hathaway 1976-2017 in the top 3% among all mutual funds
and the top 7% among all stocks.
• Stocks and funds with higher SRs have short history.
• FFCM alpha of 11% per year!
• Alpha becomes insignificant when controlling for exposures to Betting-Against-Beta and
Quality-Minus-Junk factors.
13/32
Buffett’s alpha
• Portfolio loads on stocks that are “safe” (low beta and volatility), “cheap” (i.e., value
stocks), and “high-quality” (meaning stocks that are profitable, stable, growing, and
with high payout ratios).
• It is easy to construct factor portfolios related to the characteristics that Buffett selects
on that would explain Berkshire’s return well and capture its alpha (i.e., adjust the
benchmark).
• These characteristics are included in factor models nowadays!
• Does this mean Buffett doesn’t deserve his bonus? No!

• Some of these characteristics are not obviously relatable to known risk factors, thus the
alpha with respect to the usual factor models (CAPM, FF3M, FFCM) are to be interpreted as
market inefficiency.
• Moreover, Buffett came up with the strategy >50 years ago!
• Going forward, returns from this strategy will be deemed less impressive by investors.
14/32
Alpha: luck or talent?
What determines the return of equity portfolio managers?

• Luck: Given a large enough sample, luck shouldn’t play a large role in benchmark
factor regressions. Beating the market one year may be luck, beating the market five
years in a row is much less likely to be just luck.
• Talent: Consider the CAPM benchmark regression:
rep,t+1 = αp + β p,M reM,t+1 + ε p,t+1
Suppose αp > 0.
Does this capture timing or selection ability of the manager of portfolio p?
• Timing: Does the manager increase exposure to the market when market returns are high?
• Selection: Does the manager select stocks that outperform the market on average?
• Since managers tend to load on well-known CAPM-anomalies, such strategies do not deserve a
bonus.
• Hence, the most popular benchmark is FFCM (which few managers beat consistently).
15/32
Time-varying market exposure
• Market exposures can vary over time in many ways. Investors will want to know how:
e.g., do exposures increase in turmoil?
• A general model to analyze time-varying exposures:
• Conditional exposures: β p,M,t = β p,M,0 + β p,M,1 Zt , where Zt is an observable (e.g., dividend
yield or business cycle indicator).
• If Zt is standardized (mean=0, variance=1), β p,M,0 ≈ β p,M , the unconditional beta.
• Substituting in CAPM benchmark:
rep,t+1 = αp∗ + β p,M reM,t+1 + β p,M,1 (reM,t+1 Zt ) + ε p,t+1 with
E(rep,t+1 ) = αp∗ + β p,M E(reM,t+1 ) + β p,M,1 Cov(reM,t+1 , Zt )

• What defines positive timing ability in this equation?
• In case Zt predicts returns with a positive (negative) sign, i.e., Cov(reM,t+1 , Zt ) > 0(< 0):
β p,M,1 > 0(< 0) implies market timing.
• αp ≈ αp∗ + β p,M,1 Cov(reM,t+1 , Zt ) = selection + timing
16/32
Example: time-varying market exposure
• Consider a portfolio with unconditional expected return E(reP,t+1 ) = 8.75% and an

unconditional beta β p,M = 1.
• In the simple regression:
rep,t+1 = αp + β p,M reM,t+1 + ε p,t+1 ⇒ αp = 3.75%
• This alpha could be due to the manager increasing exposure to the market when the
dividend yield is high, as a high dividend yield correlates positively with future
market returns.
• This timing ability implies β p,M,1 > 0 in the regression:
rep,t+1 = αp∗ + β p,M reM,t+1 + β p,M,1 (reM,t+1 DYt ) + ε p,t+1
• If αp∗ = 0, we conclude the manager has no selection ability
(β p,M,1 Cov(reM,t+1 , DYt ) = 3.75%).
• If αp∗ > 0, the manager has selection ability (net of timing).
• If DYt doesn’t fully capture how the manager times the market, then some timing will be
left in the intercept αp∗ !
17/32
Timing and selection in regression-based factor benchmarks
• Model easily extended to include interactions with additional factors: timing of size,
value, momentum effects, and so on…
• For instance, a conditional FF3M using the dividend yield:
rep,t+1 =αp∗ + β p,M reM,t+1 + β p,M,1 (reM,t+1 DYt )

+ β p,SMB rSMB,t+1 + β p,SMB,1 (rSMB,t+1 DYt )
+ β p,HML rHML,t+1 + β p,HML,1 (rHML,t+1 DYt ) + ε p,t+1
• Thus, time-varying regression-based factor benchmarks are a useful tool to

understand the conditional performance of portfolios and the (origin of) manager’s
alpha: timing and selection talent.
• Market-timing (β p,M,1 ) and characteristics-timing (β p,SMB,1 , β p,HML,1 ).
18/32
Active mutual fund performance
The reality of active mutual fund performance
• Mutual funds advertise high past performance (in general).

• Marketing and survivorship bias.
• The reality: three stylized facts.
1. With or without adjusting for risk and with or without allowing for time-variation in factor
exposures, the typical active mutual fund delivers a negative alpha after costs, and, at
best, a slightly positive alpha before costs.
• Large funds perform worst (in alpha terms): alpha opportunities are hard to scale, as
concentrated in small, illiquid, distressed stocks.
2. In addition, positive alpha is not persistent (in contrast to negative alpha).
3. Yet, investors chase returns!
• What can explain these stylized facts?
• In other words: why are we paying fund managers so much through fees?
• Interview with Jonathan Berk Youtube
19/32
Active mutual fund managers underperform the market after costs
• Wermers (2000)
• S&P500 return = 15.4%
• Average gross return = 16.9%, average net return = 14.6%
• Average gross alpha = 0.79%, average net alpha = -1.16%
• Fama and French (2010)

• Examine all active mutual funds from 1984 to 2006 and use CAPM, FF3, and FFCM
benchmarks.
• The aggregate active fund essentially holds the market.

• Before fees, it underperforms the market by 0.18%.
• After fees, underperformance increases to 1.13%.
20/32
Fama and French (2010)
21/32
Mutual fund alphas show little persistence over time
22/32
Summary: underperformance and lack of performance persistence
• Some positive CAPM alphas persist, but no fund group has a positive four-factor
(MKT, HML, SMB, MOM) alpha.
• Reason: winning funds hold high momentum stocks.
• After adjusting for the high returns on the momentum strategy, winning funds have
negative alphas.
• The average fund alpha is NEGATIVE.

• Interpretation: Funds underperform after costs.
• This doesn’t imply there are no skilled managers, but these are hard to find and
performance is not, on average, persistent.
• The top 10% perform worse than benchmark.
23/32
Mutual fund investors chase performance
• Chevalier and Ellison (1997): Flows into and out of mutual funds are driven by past
performance (which is not indicative of future performance). 24/32
Survivorship bias
• Investors often look only at past

performance.
• Fund inflows are very sensitive to
past one-year fund returns.
• This is usually the measure picked by
marketers to sell funds.
• Investors are misled by survivorship
bias.
• Fund companies tend to keep top
performers.
• A common claim “75% of our funds
beat the market.”
• They kill poorly performing funds.
25/32
Incubation bias
• Funds use “incubation” periods to test investment strategies and fund managers.
• Incubation period = period before the fund is widely distributed.
• Incubation allows funds to benefit from chance.

• Idea: start many funds with different strategies.
• Then, select the best ones (selection bias).
• Backfill their performance history and market them.
• Fund families can “juice” incubation returns.

• Incubating funds get preferential access to good deals.
26/32
Equilibrium model with rational fund investors learning about managerial skill
• Berk and Green (2004) equilibrium model that explains:

1. Managers add value on a gross basis but none of this filters to investors,
2. investors chase returns, and
3. there is no persistence in outperformance.
• Model Overview:
• Investors supply capital to mutual fund managers in a competitive market – the marginal
return on the last dollar invested is zero, so all investors in an open-ended mutual fund
earn the same zero excess return!
• Managers have differential ability to generate high returns, but decreasing returns to
scale (DRS). As a fund increases, the manager’s alpha decreases.
• Investors learn about managerial ability from past returns. This generates flows. New
money flows until the point at which expected excess returns are zero.
• Fund net abnormal returns (alphas wrt. the benchmark) are a function of
unobservable managerial skill ai,t , assets under management (e.g.,
c(AUMi,t ) = ln(AUMi,t )), and fees (fi,t+1 ).
αnet
i,t+1 + fi,t+1 = ai,t − c(AUMi,t ) + ε i,t+1 27/32
Berk and Green (2004) implications
• In equilibrium, Et (αt+1 − ft+1 ) = 0. All funds earn zero expected excess return after
fees. Investors are indifferent between all actively-managed mutual funds of various
skill levels and passive investments.
• Funds with the highest skill have positive excess returns more frequently, investors
upwardly revise their estimate of the manager’s skill, they allocate more AUM to
those firms. This process continues until the high-skill funds have so much
decreasing returns (price impact) that they cannot longer profitably invest the last
dollar after-fees.
• The funds with the highest skill have the highest AUM. This is an equilibrium theory
of size distribution of MFs.
• Net alpha is not a good measure of manager skill. The competitive allocation of
assets to funds + DRS at the fund level results in skill co-existing with zero net α.
No alpha ̸= no skill!
28/32
Debate: are investors rational?
• Performance chasing and lack of performance persistence: data supports two sets of
polarized conclusions.
1. Households are Bayesian agents, continually assessing fund managers’ skill and
reallocating money accordingly (e.g., Berk and Green, 2004).
2. Households are simple decision makers, who invest using easily obtainable information,
do not engage in sophisticated learning about managers’ alpha, and do not adjust fund
performance using asset pricing models. Managers are not skilled.
• Ben-David, Li, Rossi, and Song (2022): “Households are homo sapiens with limited
financial sophistication rather than hyperrational alpha-maximizing agents.”
• Morningstar changed constructing performance-based ratings within asset classes (e.g.,
equity), to categories (e.g., large-growth).
• No new signal about fund performance or managerial skill, but investors chase new
ratings.
29/32
Are managers skilled?
• Value added (Berk and van Binsbergen, 2015).

• Alpha does not measure skill, but value-added (α × AUM) does.
• Net alpha determined by competition between investors and not by skill of managers
(assumption: skill in short supply).
• Gross alpha is a return measure not a value measure (1% on $10bn adds more value than
10% on $1m).
• Value added reflects value that a mutual fund extracts from capital markets.
• Average mutual fund has generated about $3.2m/year.
• Majority of funds destroy value (57%), but they manage little capital.
• Skill and scale are positively related (Barras, Gagliardini, and Scaillet, 2022).
• Sample: U.S. equity funds over the period 1975 to 2019.
• Skill coefficient is positive for 83.1% of the funds and equal to 3.0% per year on average.
• Funds are highly sensitive to diseconomies of scale; on average, a
one-standard-deviation increase in size reduces the gross alpha by 1.3% per year.
• Great investment ideas are difficult to scale up: corr(skill, scale) = 0.78.
• 60% of funds produce a positive value added, $1.9m per year on average.
30/32
References (1/2)
• Ang, Asset Management: A Systematic Approach to Factor Investing, Ch. 10, 16
• Pederson, Efficiently Inefficient, Ch. 2-3
• Articles
• Barras, Gagliardini, and Scaillet, 2022, Skill, Scale, and Value Creation in the Mutual Fund Industry,
Journal of Finance
• Ben-David, Li, Rossi, and Song, 2022, What Do Mutual Fund Investors Really Care About?, Review of
Financial Studies
• Berk and Green, 2004, Mutual Fund Flows and Performance in Rational Markets, Journal of Political
Economy
• Berk and van Binsbergen, 2015, Measuring Skill in the Mutual Fund Industry, Journal of Financial
Economics
• Chevalier and Ellison, 1997, Risk Taking by Mutual Funds as a Response to Incentives, Journal of
Finance
31/32
References (2/2)
• Articles (cont’d)
• Cremers, Petajisto, and Zitzewitz, 2012, Should Benchmark Indices Have Alpha? Revisiting
Performance Evaluation, Critical Finance Review
• Fama and French, 2010, Luck versus Skill in the Cross-Section of Mutual Fund Returns, Journal of
Finance
• Frazzini, Kabiller, and Pedersen, 2018, Buffet’s Alpha, Financial Analyst Journal
• Wermers, 2000, Mutual Fund Performance: An Empirical Decomposition into Stock-Picking Talent,
Style, Transactions Costs, and Expenses, Journal of Finance
• The Rational Reminder Podcast

• The Arithmetic of Active Management with Jonathan Berk and Jules van Binsbergen
• with Itzhak Ben-David
ETFs, Investor Behavior, and Hedge Fund Fees
32/32

Asset Management: 4. Benchmarking and Performance Evaluation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asset Management: 4. Benchmarking and Performance Evaluation

Uploaded by

Copyright:

Available Formats

Asset Management:

4. Benchmarking and Performance Evaluation

• Benchmarking answers two important questions:

• Active management tries to beat the benchmark.

• Conclusions on the historical performance of mutual fund managers.

• The reward for investments is

• The information ratio is defined as

• Consider a hedge fund’s price of shares or its cumulative return Pt , where:

MDDT = max DDt

• Alpha is the average return in excess of a benchmark.

• Information ratio (outperformance per unit of tracking error): IR = α

• To preclude the manager from simply increasing risk.

• IR measures average active return per unit of risk.

• Estimating the risk-adjusted benchmark using factor regressions directly identifies

• σε BAB is a measure of idiosyncratic volatility when the benchmark is risk-adjusted.

• Multi-factor models used in performance evaluation define additional

• For the FF3M model, we have the replicating portfolio:

• β M in the market portfolio and 1 − β M in the risk-free asset,

• β SMB in small stocks, − β SMB in big stocks,

• β HML in value stocks, − β SMB in growth stocks.

• Analogous interpretation for other multi-factor models (e.g. including momentum).

• Recall: Excess return ret = rt − rf separated into alpha and beta:

• β = market risk, tendency to follow the market.

2. Tradable and replicable. Alphas should be implementable returns on investment

3. Adjusted for risk.

• The purpose of benchmarking is to identify whether an investment obtains an excess

• Case study: Buffett’s alpha (Frazzini, Kabiller and Pedersen, 2018).

• Does this mean Buffett doesn’t deserve his bonus? No!

What determines the return of equity portfolio managers?

rep,t+1 = αp + β p,M reM,t+1 + ε p,t+1

rep,t+1 = αp∗ + β p,M reM,t+1 + β p,M,1 (reM,t+1 Zt ) + ε p,t+1 with

E(rep,t+1 ) = αp∗ + β p,M E(reM,t+1 ) + β p,M,1 Cov(reM,t+1 , Zt )

• Consider a portfolio with unconditional expected return E(reP,t+1 ) = 8.75% and an

rep,t+1 =αp∗ + β p,M reM,t+1 + β p,M,1 (reM,t+1 DYt )

• Thus, time-varying regression-based factor benchmarks are a useful tool to

• Mutual funds advertise high past performance (in general).

• Fama and French (2010)

• The aggregate active fund essentially holds the market.

• The average fund alpha is NEGATIVE.

• Investors often look only at past

• Incubation allows funds to benefit from chance.

• Fund families can “juice” incubation returns.

• Berk and Green (2004) equilibrium model that explains:

• Value added (Berk and van Binsbergen, 2015).

• Ang, Asset Management: A Systematic Approach to Factor Investing, Ch. 10, 16

• Pederson, Efficiently Inefficient, Ch. 2-3

• The Rational Reminder Podcast

You might also like