You are on page 1of 52

Electronic copy available at: http://ssrn.


Not fooled by randomness: using random
portfolios to analyze investment funds

Roberto Stein
Faculty of Economics and Business, University of Chile

August 2012

The biggest challenge in testing mutual funds for manager skill is the lack of a
probability distribution of returns under the null hypothesis of no skill. A
methodology based on randomly trading portfolios and non parametric statistical
tests is explored, and a test of skill is proposed. Simulation is used to perform an
in-depth study of the properties of this test, and to compare its power against that
of other tests of skill based on factor model alphas. Empirical tests performed on a
sample of US equity mutual funds find evidence of skill in a reduced number of
managers, but that the value added by this skill is charged away from the investors
in the form of fund fees and expenses. Overall, random portfolio based measures
are found to be more powerful and easier to interpret than tests based on traditional
and bootstrapped factor model alphas.
Electronic copy available at:

1. Introduction

Fund performance measures, while theoretically good indicators of past overperformance
are notoriously unreliable predictors of future performance
. This makes them a poor
choice for investors who wish to allocate their capital in funds that, at least in expectation,
will overperform in the future.
Perhaps this is why in the last few years the discussion has shifted from measuring
performance to a more elusive factor: testing fund manager skill. The implicit argument is
that, while skill in no way guarantees persisting overperformance, a fund manager that has
obtained a high level of performance in the past through skill is much more likely to repeat
such performance in the future than one who was merely lucky.
While the argument is certainly sound, performance is an observable variable, while skill is
not and therefore measuring or testing for skill presents important empirical problems.
Past attempts at measuring skill are based solely on factor model alphas, first as indicators
of overperformance and lately interpreted as signals of manager skill. However, as
explained in Kosowski, Timmermann, Wermers and White (2006) (henceforth KTWW),
the cross-sectional distribution of the resulting regression alphas exhibit strong deviations
from normality, which invalidates standard statistical significance tests and, much more
importantly, it is not clear what the distribution of alphas should be under the null
hypothesis of no skill. KTWW claim to solve both these problems with a bootstrap

See Kothari & Warner (2001), who critique regularly used performance measures as lacking power enough
to detect economically large magnitudes of abnormal fund performance.
See Carhart (1997) and most of the literature regarding persistence in mutual fund performance, as well as
Goetzmann (2007) on how, in any case, most of these measures are susceptible to fund manager
Electronic copy available at:

methodology, which they apply to a large sample of U.S. mutual funds, as do Cuthbertson,
Nitzsche and O'Sullivan (2008) in the U.K.
In the present paper I test these methodologies and find that both of them, standard and
bootstrap alpha, are prone to be fooled by randomness: that is, they falsely detect skill
in simulated samples of portfolio returns where overperformance is a result of luck. This is
not surprising, since regression alphas are also performance metrics, and as such they are
highly correlated with fund performance, thus supplying little in the way of new
information above regularly used measurements.
I develop a new methodology to test for skill, based on randomly trading portfolios as
proposed in Burns (2007), which are used to derive the empirical distribution of fund
returns under the null of no skill. This measure is superior to the alpha-based
methodologies in that it is powerful enough to distinguish skill from luck in all except the
most extreme cases of luck, is more intuitive in its interpretation since it relies on simple
fund returns, and is designed to be applied to the identification of skill in individual funds,
as opposed to the KTWW method and that of Barras, Scaillet and Wermers (2008), which
is designed to find only the proportion of skilled funds in a given market.

This paper is organized as follows. The next section explores previous measures of fund
manager skill and the results obtained with these. Section 3 details the methodologies
behind the alpha and random portfolio measures of skill. Section 4 presents the results of
tests the power of the measures, using simulated samples of portfolios that are constructed
to obtain a performance above a benchmark through skill or luck. Section 5 presents results
of the application of both types of measures to a sample of U.S. equity funds, and Section 6
contains concluding remarks.

2. Measuring fund manager skill: current state of the art and new proposed measure
Factor model alphas have recently been pushed beyond the original Jensen performance
measure and considered evidence of fund manager skill. In general, this line of research
involves controlling portfolio returns for known risk factors, such as exposure to the
market, firm size, market-to-book ratio and momentum. If these models yield positive and
significant values of alpha, then this is considered evidence that the manager of the fund is
skillful (see Silli (2006) for a review).
However, these traditional regression alphas are of dubious value as tools to evaluate skill.
Apart from the various criticisms inherent in regression models (see Silli (2006), Ferson
and Schadt (1996), Christopherson (1998), Spiegel, Mamaysky and Zhang (2003, 2006),
and others), two critical shortcomings can be identified in these models that make any
inference gleaned from them unreliable. First, hypothesis testing of regression alphas relies
on the assumption of normality of the alphas distribution. KTWW find extreme deviations
from normality in the distribution of alphas in the U.S. market, as do Cuthbertson, Nitzsche
and O'Sullivan (2008) in the U.K. Second, in order to conduct a statistical test, we need to
have the probability distribution of the variable of interest under the null hypothesis.
However, we do not know what a distribution of alphas looks like if funds are managed
with no skill. We therefore take a positive and significant alpha as evidence of skill, even
though Kosowski et. al. use basic statistical testing criteria to show that, in a relatively large
sample, we can expect to observe a certain number of alphas that are positive and
significant through pure random fluctuation (i.e.: luck).
KTWW propose a bootstrap approach to improve the regression alpha test of skill. First
they select a sample of funds that operate in a certain market. For each fund in the sample,
they then regress its returns on a factor model, and obtain factor loadings and residuals.

Then, they repetitively resample the residuals and, together with the factor loadings,
construct a new set of returns, where the value of alpha is set to zero. Finally, they regress
the resulting returns on the same factor model and obtain the resulting alpha. Repeating this
process a number of times, KTWW construct a null distribution of no skill for the alphas
of the full market under study (a distribution as it would look if the alphas real value were
zero). They then compare the number of funds in the market for which regression alphas
are positive and significant with the number of alphas that happen to be positive and
significant in this null distribution, that is, by luck. Their conclusion is that since the
number of real positive alphas exceeds that of the lucky alphas, then some of the real
alphas must be the result of fund manager skill, as random fluctuation cannot explain them
all away.
The bootstrap methodology addresses the problem of normality, as the testing is done on
the basis of empirical distributions and there is no assumption of a parametric one. Also, by
forcing alpha to be zero in the iteration process, this bootstrap process is one way to obtain
a distribution under the null of no skill.
However, the methodology still suffers from serious drawbacks. First, the characterization
of alpha as a measure of skill, even if correctly estimated, is still a matter of interpretation
as opposed to simple returns which are unambiguous in their origin and interpretation.
Second, this measure is of an absolute nature and, as is also the case with the standard
alpha analysis described above, there will inevitably be a high correlation between
performance and alphas. That is, funds that overperform will tend to have positive and
significant alphas, irrespective of manager skill. This is because factor models regress fund
returns on a number of factors, and these factors are equal for all funds tested. In essence,
the factors become a type of benchmark that is equally applied to each fund to correct

returns for exposure to certain risks. The consequence is that inference is unreliable: funds
that perform well need not be managed by skillful managers since luck plays a big role in
performance. In fact, as is shown in Section 4, traditional and bootstrapped measures of
alpha are likely to erroneously reject the null of no skill in the presence of lucky
portfolios, which obtain a high level of performance due to lack. These measures are
therefore easily fooled by randomness. On the other hand, the possibility of finding a
skillful manager whose track record shows poor results is almost zero, and so the
possibility of using the alpha measure as a diagnostic tool to help identify shortcomings and
improve performance is minimal.
Moreover, as a matter of application and interpretation of results, in order to estimate the
bootstrap model of KTWW the alphas of all funds in the market must be estimated and
bootstrapped, and inference is obtained by comparing real fund alphas to a ranked matrix of
bootstrapped alphas (that is, a matrix where the resulting bootstrap alphas have been
ordered from the highest to smallest). Thus, the KTWW methodology requires data from all
funds in the market, even if the study is focused on a single fund. While Cuthbertson et. al.
claim that they are able to study individual funds using this methodology, they still require
the full market dataset, and their inference is based in comparing each real funds alpha
against a distribution of bootstrapped alphas that corresponds to the real funds performance
rank. That is, the best performing funds alpha is compared to the distribution of the highest
bootstrapped alphas, the second best fund to the second best distribution, etc. The
underlying assumption of this test is that the best performing fund will always obtain the
best possible bootstrapped alphas, the second best fund will obtain the second best set of
alphas, and so on for the rest of the funds as ranked by their performance.

The lack of a known distribution under a certain null hypothesis is addressed in a general
framework for investment fund analysis in Dawson and Young (2003). They argue that our
inability to carry out experiments with control groups makes obtaining these distributions
of the null hypothesis a complicated task, and advocate the use samples of random
portfolios in a Monte Carlo experiment setting to generate them. Burns (2007) notes that
constrained random portfolios, that is, portfolios that are allowed to trade randomly but
within the same bounds faced by real fund managers, constitute a control group for a
measure of skill since by construction there is no skill in their trading decisions. The fund
managers constraints or restrictions can be imposed by the firm that offers the funds, for
example in terms of the prospectus and investment goals, or self-imposed trading behavior
that the manager maintains over his career. These restrictions may be in the form of a
subset of the universe of assets in which the manager is allowed to invest (cash, fixed
income, equity and derivatives, value vs. growth stocks, small vs. large firms, etc.),
acceptable levels of risk (minimum and maximum; expressed as standard deviation, VaR,
benchmark risk, etc.), turnover ratio, number of assets in the portfolio, etc.
I will henceforth refer to the general use of constrained random portfolios as an analysis
tool as Constrained Random Portfolio Analysis, or CRPA.
The portfolio returns resulting from randomly trading portfolios are obtained purely by
chance, with no value-adding (or subtracting!) intervention, and thus represent a subset of
the state-space of feasible portfolios that could be attained by the fund manager. A large
enough sample of CRPA portfolios will therefore generate the probability distribution of
every level of performance potentially attainable by the fund manager, within the
constraints she faces. Real fund returns are then compared to the distribution obtained from
the random portfolios. Rejection of the null of no skill depends on a chosen significance

level: for a manager to be considered skillful, her funds returns should be at least better
than a certain percentile of the random fund distribution, where that percentile corresponds
to the desired level of significance. In other words, a manager is considered skillful if she
is able to do better than a certain number of random portfolios.

Correctly applied, CRPA addresses all the arguments against previous measures of fund
manager skill. Being a non-parametric approach, it sidesteps all the theoretical and
econometric problems associated with factor models. The analysis is strictly individual: one
fund can be analyzed with no need of data of other funds (nor the market, macroeconomic
variables, etc.) Thus, in this sense the amount of data required for the analysis is lower than
for other measures of skill, and there is no peer group or relevant benchmark decision to
make. This means as well that the measure is relative and specific to the fund being
tested. Finally, CRPA introduces a flexible and powerful framework that can be used in
many other applications beyond testing for manager skill.


3. Empirical Methodology
Factor model alphas, when positive and significant, are considered signs of fund manager
skill or, at least, abnormal performance not attributable to known sources of risk. The three
most widely used specifications used to estimate these alphas are:
Jensen's alpha (Jensen 1968),

Fama and French 3 factor model (Fama & French 1993),

and Carhart's 4 factor model (Carhart 1997),

, where
are the returns of portfolio
at time
is the risk-free rate,

is the
regression alpha,

is the fund's beta with respect to the market,

Mt is the market return at
time t,

is the fund's beta with respect to Fama & French's High-Minus-Low factor,
HML is Fama & French's High-Minus-Low factor,

is the fund's beta with respect to

Fama & French's Small-Minus-Big factor, SMB is Fama & French's Small-Minus-Big

is the fund's beta with respect to Carhart's momentum factor, UMD is

Carhart's momentum factor, and
is an error term.
The null hypothesis of no skill is rejected if a funds regression alpha is found to be positive
and significant.
A criticism of these models is that the betas are unconditioned and static. Conditional
models with time-varying betas have been developed and estimated (see Silli (2006)), in the
hopes of obtaining more precise estimation of the regression coefficients. However, as far

as skill testing is concerned, KTWW find that inference obtained from both conditional and
unconditional models is virtually the same. Therefore, in the tests that follow after this
section, only unconditional models are used.
Standard factor model analysis generally identifies the alpha (or regression intercept) as
evidence of abnormal return or fund manager skill (if positive and significant). However,
standard models are incapable of differentiating between positive alphas obtained by
skillful managers, and those that result from sheer luck (good or bad), as unlikely events
that can nevertheless be observed at the tails of the distribution.
The Bootstrap Alpha technique improves upon the standard analysis. Applied first in
Kosowski, Timmerman, Wermers and White (2006), then replicated in Cuthbertson,
Nitzche and O'Sullivan (2007) and (with small variations) in Fama and French (2010), this
technique seeks to obtain a distribution of factor model alphas from a bootstrap process
where the true alpha has been set to zero. Thus, this distribution will show the probabilities
or expected frequencies of observed positive and negative alphas under the null that there
exist no managers with skill. This distribution is then compared to the distribution of real
investment fund alphas and, in its simplest form, the number of funds that fall in the
extreme quantiles in one distribution can be compared to those of the other. For example, in
KTWW one statement of their analysis reads Panel A indicates that nine funds should
have an alpha estimate higher than 10% per year by chance, whereas in reality, 29 funds
achieve this alpha. This is taken as evidence that the market must contain at least some
funds that obtain positive alphas by dint of their managers' skill.

CRPA: A non-parametric alternative to factor model alphas

Using the software package PortfolioProbe
, samples of randomly trading funds can be
constructed which, while devoid of skill, may still be bound by user-defined constraints.
A sufficiently large number of these random funds constitute the sample which is then used
as the control group or distribution under the null to test fund manager skill, and can be
used to perform other types of analyses.
To obtain the relevant distribution under the null of no skill, Burns (2007) considers using
the holding period return for each portfolio in a large sample of randomly trading funds.
That is, if 1,000 portfolios are generated then the return of each portfolio is calculated, then
the distribution is based on the cross-section of the 1,000 holding period returns thus
obtained. The skill test then consists of comparing the real funds return with the ranked
random portfolio returns. If the real return attains a certain percentile, for example its
better than 95% of the random returns, then we reject the null of no skill.
Figure 1 shows the probability density of a sample of 1,000 random funds holding period
returns, with a (dashed) line depicting the 95th percentile threshold.
[Figure 1 about here]

The plot was made with a sample of 1,000 random funds trading S&P 500 listed securities
for 6 years, from 2005 to 2010. The constraints imposed on these portfolios were on
turnover and portfolio asset count (the number of different assets that could be contained in
the portfolio). The values used for these constraints are consistent with the mean of these
values (turnover and asset count) for real funds currently operating in the market
Therefore, for a fund manager at the helm of a fund trading these securities and operating
under constraints similar to those simulated, she would have to obtain a return equal to or

Actually, PortfolioProbe is a library of functions written for the R language.
The data was obtained from CRSP Mutual Fund database. The sample employed corresponds to actively
managed funds, whose net asset value is composed in at least 90% of stocks listed in the S&P500 index.

better than roughly 80% during the 6 year period for the null of no skill to be rejected.
I propose an improvement to this methodology, in which the null distribution used for
testing is that of the time series of returns of a single random portfolio, as opposed to the
cross-sectional approach used in Burns (2007). Using a sample of random portfolios
ordered by a certain criteria (which could be, for example, mean return) I first set the
critical value for a percentile, then choose the random portfolio which occupies the position
of that percentile in the ordered sample.
Hypothesis testing is now based on the concept of stochastic order
. We are interested in
testing whether the distribution of returns of the managed fund is stochastically greater than
that of the chosen percentile random fund.
A random variable A can be said to be stochastically greater than another random variable
B if
Pr > Pr > (, +) 4
By this definition we could say that A is 'bigger' than B, but the financial interpretation is
far more interesting: the probability that fund A obtains a return higher than x is higher than
that of fund B attaining a similar performance. While far less powerful than the concept of
stochastic dominance, stochastic order can guide decision making in the sense that it could
point towards a fund manager who has a higher probability of obtaining a certain level of
return in future realizations. This is consistent with the argument in favor of skill over luck
described in the first section of this article.
In order to test stochastic order between two distributions, the null and the real funds time
series of returns, the non-parametric Mann-Whitney U test (also called Mann-Whitney-
Wilcoxon and therefore referred to as MWW) is used. The MWW test is used to assess

See, for example, Shaked and Shanthikumar (1994)

whether one of two samples of independent observations tends to have larger values than
the other. The test involves estimation of the U statistic, which is calculated by first
ranking the values of both distributions and then adding these ranks. The one tail tests
alternative hypothesis can be stated as that the probability of an observation from sample X
is higher than one from sample Y is higher than 0.5, or
> +( = ) > 0.5 5
For robustness purposes, tests of location can also be used. A standard parametric t-test of the
difference in means of both distributions is employed, as well as a non parametric alternative based
on permutation. This last test consists in calculating a certain statistic, for example, t he difference
between the sample means. Then, the elements of both samples are mixed together, and repeatedly
resampled. Each iteration, two vectors are obtained with the same number of observations as each
original sample, but with elements drawn from the mixed dataset, i.e. can contain observations from
either sample. The relevant statistic is calculated, and the process is repeated. If the statistic of
interest is the difference between the means, then after each iteration a difference of means is
calculated between the resulting vectors. A large number of iterations will generate an empirical
distribution of the difference between the means of the vectors, under the null hypothesis that both
original samples were drawn from the same distribution. If the real difference in means is large
enough (higher than a critical value), then the null is rejected and both samples are assumed to come
from different distributions.
This methodology improves the quality of the test, as it analyzes the managed funds full
distribution of returns and not just the overall result achieved over a period of time. The
most important consequence of this analysis is that funds that achieve impressive results by
luck are much more likely to fail the test. Indeed, as is shown in simulation tests, this
version of the CRPA skill test is by far less likely to be fooled by randomness than all the
previously described measures.

4. Power of the Skill Tests

In the previous sections we encountered three measures or tests of fund manager skill,
standard regression alphas, bootstrap alphas and CRPA, detailed their methodologies, and
listed some of their potential shortcomings. In this section I directly test each measure in
terms of its power to detect skill. Moreover, and more importantly, I test the measures
potential to differentiate between skill and luck.
Since, other than the standard alpha test, test statistics do not have parametric distributions,
analytical expressions of the tests power are not obtainable. Hence, I proceed to estimate
the power of each test via simulation.
Power curves for each skill test are built by applying the test of skill to simulated samples
of portfolios which are constructed to exhibit skill or to be simply lucky. This is
accomplished by adding an extra rate of monthly return to the time series of returns of a
baseline vector of returns. For the tests reported, the baseline vector is obtained from a
sample of CRPA random portfolios. These portfolios trade S&P500 stocks, and do so
constrained to the maximum and minimum levels of turnover and number of assets in each
portfolio observed in a sample of real U.S. mutual funds that invest primarily in S&P500
stocks. The average monthly returns for these random portfolios are calculated, and then
the portfolios are ranked by this variable from smaller to larger. Then, the baseline for the
power test samples is chosen as a percentile from these ranked portfolios. For example, if
the 95
percentile is chosen and 1,000 portfolios were generated, then the baseline is a
vector consisting of the time series of returns of the 50
best performing portfolio.
Let the baseline portfolio be referred to as b, then its returns are r
, where i is the time
period to which this return corresponds (i=1, , n; where n is the total number of time

periods under study) and let the full vector of returns be r
. This vector will have a mean
monthly return,

, and a standard deviation,

The samples used to construct the power curves are composed of 1,000 portfolios simulated
for each, skill and luck. To illustrate how these samples are generated, let be a
predetermined rate of added return, and e a noise term. Thus, e is a random variable,
which distributes N(0,

). Finally, define lp as the number of lucky periods to be

simulated in the luck portfolio sample.
Thus, the skill sample is generated by drawing vectors of e and producing portfolio time
series of returns of the form

+ 8
, where

is a vector of length n, and each element of the vector is equal to /n. Thus, the
resulting vectors of returns represent a single (smooth) monthly increase in return with
respect to the baseline, plus a noise or randomizing term with zero mean.
On the other hand, the sample of lucky portfolios is constructed as

+ 9
, where

is a vector of length n, and its elements will contain lp instances of /lp and the
rest will be zero, with the position of the non-zero elements chosen randomly for each
portfolio in the sample. So, for example, if lp is equal to 1, we are simulating a lucky fund
manager that is able to match the performance of a skilful manager with a single lucky
break, that is, a large added return in a single month, while during the rest of the time
period the returns of his fund are, in expectation, no different than the baseline. The
resulting samples have properties that make them ideal for the power tests (see appendix I).

The power test is carried out for various given levels of , ranging from zero (no skill) to
4% per month, a large added return that ensures that at that end of the range the power
curve converges to a probability of 1. Also, the number of lucky periods is allowed to
vary, and can take values of 1 (the full extra return added to a single months return), 3, 5
and 10.
It should be noted that these samples are consistent with the previously given definition of
skill vs. luck in investment funds: the skillful manager may have good and bad periods,
but overall she should be able to obtain a consistent performance that is better than the
market average. On the other hand, a lucky manager may be able to match (or surpass) the
performance of a skillful manager, but does so because of a relatively small number of
lucky breaks, or periods of exceptionally good returns, which have a small probability of
being repeated going forward.
The plots that follow show the resulting power curves for each test: standard alpha,
bootstrap alpha, and CRPA in its two versions, the Burns cross-sectional measure and the
percentile time-series measure.
For both regression alpha tests (standard and bootstrap) only the results based on the
Carhart four-factor model are shown. This is done to preserve the images clarity, as results
stemming from other models (one and three factor) are qualitatively equal and are available
upon request. For the same reason, percentile distribution (time series) based CRPA testing
is done using only the MWW test, as t-test and permutation test results are very similar.
Finally, in order to simplify the images, number of power curves plotted is further reduced
by introducing the concept of net power. Since detecting skill where none exist is, in fact,
a failure of the test employed, the power associated with this type of outcome is deducted
from the estimated power of detecting skill in samples that do have it. Thus, net power is

defined as power to detect skill power to detect luck. This measure of net power is
also consistent with the aim of skill tests, which is to separate skill from luck.
Figure 3 shows the power curves where the luck derived returns are constructed with a
single lucky month in a six year period. This is the most extreme case of luck, and
should be the easiest for the tests to identify as such. As can be observed, the Burns measure
lacks power when applied to skillful and lucky samples with similar levels of return. In fact, its net
power is close to zero for any level of added return, while the other measures exhibit similar levels
of net power. This result is due to a disproportionally large power component estimated for the
luck sample, which in net terms eliminates the equally large power for detect ing skill in the skillful
sample, which would otherwise trump other measures. The other measures fare better, with the
standard alpha and Percentile CRPA test showing very low tendencies to be fooled by these
lucky funds.
[Figure 3 about here]
As the number of lucky periods increases, we can see that the power curves based on
simulated skillful samples remain virtually the same, but the likelihood that the measures of
skill will mistakenly take a lucky fund to be skilful increases. This affects all measures by
severely reducing their net power. However, the effect is least noticeable for the CRPA /
MWW test, which at a distribution of luck into 3 lucky periods becomes the most
powerful test, and remains so for all power evaluations that follow.
Figure 4 shows a generalized deterioration of power for all tests, with the extra return factor
for lucky portfolios now spread over 5 periods. As mentioned above, the MWW test is
still relatively powerful, and remains the best alternative.
[Figure 4 about here]
Once the number of lucky periods reaches 10, out of 72 total trading periods (6 years worth

of data for each sample), all net power curves show marked deterioration, with the standard
alpha, bootstrap alpha and Burns test net power essentially zero, as depicted in Figure 5.
While the Percentile CRPA test is still the best, its net power never rises above
approximately 30%, making its use in these situations questionable.
[Figure 5 about here]
This can be explained again from the point of view of our definitions of skill and luck. As
the number of extra return time periods increases, the boundary between luck and skill
starts to blur. A fund with a relatively large number of good returns in a time series of
fixed length cannot be easily dismissed as lucky, as this might be evidence of a skillful
manager at the helm.
Finally, a point could be made that the testing framework is flawed, since by construction
the samples of lucky funds have the same expected return, but higher volatility than those
of skillful portfolios. Thus, one sample stochastically dominates the other, and the
identification of skillful portfolios could be easily made by applying most measures of risk-
adjusted returns (for example, the Sharpe ratio). However, the argument made here is that
in detecting skill it is not the global rate of return that matters, but how that return is
attained over a period of time. To test the robustness of the Percentile-CRPA test to the
stochastic dominance point, I next perform the power tests using samples with no stochastic
dominance: while the added volatilities remain the same, the return factor added to the
lucky portfolios is larger than that added to the skillful portfolios. Tests are performed
where the added return factor for the lucky sample is increased with respect to the
endowment of the skillful sample by factors of 20%, 40% and 60%.
Figure 6 shows the most extreme case simulated, with the net power curves for all tests
where the sample of lucky portfolios has been endowed with a return factor which is 60%

higher than that of the skillful portfolios, and spread over 5 periods of time. As can be
observed, the Percentile-CRPA measure remains unaffected and able to separate skill from
highly performing lucky funds, while the other tests have net power measures that fall
below zero, indicating that the test is swayed by the extra return of the lucky funds and
attributes skill to these portfolios more often than it does to truly skillful ones.
[Figure 6 about here]


5. Empirical Tests of Measures of Skill
5.1 Sample of Investment Funds and Required Data
While most performance measures require only portfolio returns, CRPA needs a wider
range of data for its implementation. The goal is to obtain as complete a picture as possible
of the constraints faced by the fund manager in her decision making process, in order to
integrate as many of these constraints into the CRPA portfolio formation process as
One of the first, and most important explicit constraints placed on any fund manager is the
universe of securities which are eligible to be part of the fund, which is a subset of the
securities available in the market. This constraint is clearly defined in the funds
prospectus, and is an integral part of the managers mandate and investment strategy.
While CRPA can be applied to virtually any kind of investment fund, to generate the
random portfolios we require a dataset containing the time series of returns of all assets
eligible to be part of the portfolio. Thus, for example, if we wished to analyze a corporate
bond portfolio, any and all bonds that the manager might conceivably invest in must be
included in this dataset, so that random portfolios could eventually contain these assets as
well. While firms tend to have a single stock listed in one exchange, they can (and do) have
various issues of bonds trading in the markets, which invariably makes the amount of data
required far larger. The same can be said for funds which are allowed to trade derivatives
and other assets (and even simple equity funds, which can trade stocks listed in various
markets, worldwide). Again, while conceptually the process is the same, the practical
aspects become more complicated. In order to simplify the data gathering and random
portfolio generation process, I choose to analyze a sample of funds that invest primarily in
stocks of firms listed in the S&P500 index.

Funds are selected that consistently maintain positions in S&P500 stocks that equal or
exceed 90% of their assets (i.e.: are mostly invested in these stocks) throughout the period
under study, which spans 6 years, from 2005 to 2010.
The data then collected includes the monthly returns of S&P500 stocks, as well as each
funds monthly returns, and yearly measures of turnover and asset count (number of assets
in the portfolio). Table I contains the sample fund names and Nasdaq tickers, as well as the
average values observed for turnover and asset count measures (which are used as random
portfolio generation constraints, in conjunction with the sample of S&P500 stocks) for the
2005-2010 period. Investment fund quarterly holdings are collected as well. All data is
obtained from CRSP
[Table I about here]
While the overall sample average turnover rate and asset count data is presented for each
fund, the algorithm that produces the samples of random portfolios required to implement
CRPA works better with bounds expressed as ranges of permissible values, as opposed to
the fixed values shown above. Thus, the average minimum and maximum turnover and
asset count for each real portfolio is calculated
, and these are then used as random
portfolio formation restrictions, in conjunction with the eligible stocks themselves and a
diversification restriction, expressed as a maximum capital allocation to any one stock of
Although these funds compete in the same market segment, and therefore have very similar
mandates, we can already see that the restrictions faced (or imposed) by each manager can
have large variations. While the average turnover rate for the sample is 1.66, the minimum

6 The Name column has the complete registered name of each fund, while the Name (short) column
contains an abbreviated designation, which will be used throughout the analysis.
Data available upon request.

reported is 0.13 (Jensen) while the largest is above 10 (Rydex Growth). For Asset Count,
the average number of assets under management is 106, with a minimum of 26 (Jensen) and
a maximum in excess of 500 (Vanguard). This last one could conceivably be hard to
simulate with random portfolios, given that inevitably it will contain stocks not listed in the
S&P500 index. However, funds were chosen by imposing the condition that at least 90% of
their assets be invested in S&P stocks. Thus, random funds that only contain these stocks
will still be a close approximation of the assets eligible to the fund manager, while the other
stocks that comprise the list reported at one point must be represent very minor holdings.
Table II shows descriptive statistics of each funds time series of returns. The market
portfolio is included as a benchmark
[Table II about here]
For the six year period between 2005 and 2010, the average Holding Period Return (HPR)
for the sample is 18%, while mean monthly return is close to 0.35%. As with management
restrictions, there is much variability in the sample, with the minimum return being 0.14%
per month (ProFunds) and a maximum of 0.58% per month (SunAmerica). It should be
noted that the market portfolio shows a monthly performance close to the best performing
fund, with most other actively managed funds lagging the market. The median return is
invariably higher than the mean, evincing skewed distributions, a fact which is confirmed
by a relatively high level of negative skewness. Also detected in all funds is excess
kurtosis, which explains why for all funds normality of returns is rejected at the 1% level in
most cases, and a few at the 5% level (see last column of the table, where the statistic of a

Market portfolio returns are obtained from the Fama & French dataset which also contain their SMB and
HML factors, all of which are used later to obtain factor model regression alphas.

Jarque-Bera test is shown). The non normality of returns immediately casts doubts on the
interpretation and accuracy of later performance measures, which rely on normality.
As with average monthly return, risk taking is also highly idiosyncratic in these funds, as
depicted by the standard deviation of the funds returns. While the average is 5.1%, the
values range from a minimum of 4.22% (Jensen), to a maximum of 7.95% (Rydex Value).
The level of risk-taking will, of course, affect some performance measures, such as the
Sharpe index. It is therefore premature to draw any insight into the funds qualities, be it
performance or management skill.
Finally, in the next section most tests are carried out using gross returns, as the measure of
skill should, in an absolute sense, be related to the overall performance that a manager can
obtain. However, the investor does not receive the full benefit of t hese returns, as they are
reduced by the funds fees and other expenses. Thus, some tests are also performed using
net returns, to analyze how initial results are affected by expenses. Each funds expenses
are shown in Table III, both the total expenses as self reported data
, as well as the expense
ratio obtained from CRSP.
[Table III about here]
5.2 Fund Performance and Tests of Skill
In this section performance and skill tests are applied to the sample of mutual funds, and
the results from each are analyzed and contrasted.
When contemplating an investment in a mutual fund most investors, even those with some
level of financial education, would consider past measures return sufficient information to
base their decisions on. Thus, fund salespeople will seldom present information beyond

Self-reported data is obtained from each funds publicly available information, such as prospecta,
brochures and web pages. These documents and web addresses are available upon request.

holding period return and/or mean monthly return, data which was presented in the
previous section, but is included in Table IV. Also included are the standard deviation, as
some investors would also consider measures of risk, and the Sharpe index as a simple risk-
adjusted measure of return.
[Table IV about here]
While these measures give no clue as to the managers skills, they are by far the most
employed by management firms in fund marketing and sales, and by investors to choose
between investment options. In order to contrast the decision results based on these
statistics with those of more advanced methodologies, the last column presents a ranking of
funds by their Sharpe indexes. Although in the previous sections we saw that there is
appreciable variability in fund risk and returns, ranking by the Sharpe ratio is similar to
ranking by raw returns. This is perhaps because these funds operate under similar mandates
and in the same market niche, prompting a sufficiently similar risk-taking behavior to make
this variable have little impact when correcting returns to take it into account. Regarding
performance itself, as can be seen in the Sharpe or Ranking columns the best fund is
SunAmerica, while the market portfolio is the second best performing fund in the sample.
While this has been previously reported, this is bad news for the fund management, as
passive management is consistently cheaper (in terms of transaction costs and fees) than
actively managed funds, so if the passive market portfolio performs better, then there would
seem to be very little evidence in favor of active management.
Previous studies make extensive use of factor models to estimate regression alphas. These
alphas have been interpreted as a performance measure (as is described, for example,
Jensens alpha), but increasingly they have come to represent fund manager skill in the
prevalent literature. Table V shows the alphas obtained for each fund under study, using

unconditional versions of the single factor model (as used to obtain Jensens alpha), Fama
& Frenchs three factor model, and Carharts four factor model
[Table V about here]
Also as reported previously, most alphas turn out to be insignificant. The two exceptions
are the WaMu and ProFunds, which exhibit alphas which are negative and significant. If
the standard skill interpretation of regression alphas were to be employed, then we could
say that the managers of these funds actively subtract value through their actions, as
opposed to adding value (which would be the interpretation of a positive and significant
It should also be noted that, while these two funds with negative and significant alphas have
very low Sharpe indexes compared to the rest of the sample, the correlation between
negative alpha and poor performance is not perfect, as WaMu ranks 18
t h
but, for example,
Rochdale Value and Ameristock rank 19
and 20
t h
respectively, and their alphas are
negative but insignificant, as most other funds.
Notwithstanding the popularity and extensively documented applications of factor models,
as reported in the first section factor model alphas have been criticized and new
methodologies proposed to obtain better measures of fund manager skill, the main
contender being Kosowski et. al.s Bootstrap Alpha. Though employed to evaluate a full
market of funds, this methodology has since been applied by Cuthbertson et. al. to test
individual funds for manager skill. Following their methodology, I test the 20 funds in the
sample for fund manager skill using the bootstrap alpha methodology. As in Cuthbertson et.
al., I use two separate (though complementary) hypotheses to test for significance on both
tails of the resulting empirical distributions,

The market portfolio is not included in the table as, by definition, its alpha should be zero.

Hypothesis A: fund manager has skill or adds value,


> 0 15
Hypothesis B: fund manager has negative skill or actively destroys value,


< 0 16

Table VI shows the funds real Carhart four factor alpha, as well as the empirical p-values
obtained from each funds bootstrapped distribution for both hypotheses.
[Table VI about here]
While it is not surprising that the null of no skill is not rejected for any of the funds (see
HA pval), the last column shows that a startling number of fund alphas (19 out of the total
20) are negative and significant at the 1% level, indicating value-destroying management.
Shocking though these results may seem, they do seem coherent in conjunction with the
previously studied statistics. Specifically, if we assume the market to have zero alpha, then
if most of these funds tend to lag the market in terms of performance (both raw and risk
adjusted), it is not surprising that their alphas should be negative. As to the statistical
significance of these alphas, regular tests are at odds with the bootstrap analysis, but the
general trend is clear and consistent.
The real question here is whether were actually measuring skill, or these are still measures
of performance, so influenced by extraneous factors that the existence of the funds
managers skill cannot be ascertained. That is, these are all measurements obtained from
factors related to market and other portfolios performance, and as such are more akin to
benchmarks than true measures of individual skill, which, while related to observable
performance, would not be determined by it.

Next, I apply CRPA measures to the sample of mutual funds. Table VII contains the
resulting empirical p-values obtained from both, the Burns CRPA measure, and the three
tests used to determine stochastic order in the Percentile CRPA test.
[Table VII about here]
As can be observed, in this sample of 20 mutual funds the null of no skill is not rejected in
most cases
. However, for the Jensen fund all variants of the CRPA measure reject the null
at the 5% level or better, while for 6 other funds only the Burns measure rejects the null.
The power tests in the first chapter of this dissertation show that the Burns measure can
reject the null in the presence of a fund managed with no skill, but with a sufficiently high
overall (holding period) return, since this measure analyzes only such returns, as opposed to
the way in which the return is composed, that is, the distribution of partial returns (in the
case of the referred tests, the time series of monthly returns). Thus, the recommendation
gleaned from the power tests is to reject the null only when both CRPA measures do so
Looking at the results on Table VII the obvious inference is that only the Jensen fund is
truly managed with skill, while the funds where only the Burns measure rejects the null
managed to obtain a holding period return large enough to put them at the extreme of the
random portfolio distribution.
Comparisons of the time series distributions of these funds returns prove to be

Skill in this case referring to the ability to add value for the investors. CRPA analysis is not used here to
test the other tail of the distributions, to ascertain if there is value destroying behavior, as previously
reported with bootstrap alphas.
The Vanguard fund might also be a candidate for a skillful manager, since the null is also rejected by the
MWW test, the most sensitive test used to discriminate between the random portfolio percentile and the
real funds distribution.

In Table II we can see that while the Jensen fund has a high holding period return (the
statistic used in the Burns measure) compared to the rest of the sample, it is not the highest
(which belongs to Seligman Value). However, the Jensen fund reaches the second highest
overall return while maintaining the lowest volatility of returns, as seen in its standard
deviation (4.22% per month versus Seligman Values 5.72%, the highest in the sample).
This is evidence that the Jensen fund achieves its performance through steady returns which
are more likely attributable to superior skill, as opposed distributions with a few periods of
high return and more periods of low returns, which lead to higher volatility and can be
interpreted as luck. This interpretation is bolstered by the fact that these funds operate in the
same market, and have very similar mandates (which would not be the case if, for example,
we were comparing equity and bond funds).
To illustrate the above conjecture, Figure 7 displays the time series of fund returns
probability density, comparing the densities of the Jensen fund (full line), ranked 4
in the
sample, with that of the best ranked fund, SunAmerica (dashed line).
[Figure 7 about here]
While these distributions centers seem to be close (mean monthly return for Jensen is
0.4%, compared to SunAmericas 0.6%), the higher volatility of the SunAmerica fund is
clearly seen as a lower peak in the probability mass, and fatter tails, depicting a fund that
may have attained single high returns in certain periods, but is less likely to obtain similar
future performance (i.e.: lower probability of obtaining a result close to its historical mean).
A final note concerns the use of the evidence presented in a hypothetical decision making
process. If the investor is presented with the usual performance statistics, the decision

would inevitably be to invest in the highest ranked fund, that is, in SunAmerica
. If the
decision is to be based on factor model alphas, then no fund appears to be superior to the
market portfolio, whereas if bootstrapped alphas are contemplated, then all of these funds
would definitely be discarded as potential investment alternatives. Only CRPA tests show
any glimmer of hope for these actively managed funds. Using these results, an investor
would consider investing in the Jensen fund, pending further analysis of costs and fees, to
compare it with a passive strategy.
While fund expenses should not be contemplated in a pure analysis of manager skill, they
do impinge on an investors decision making process. Thus, the question is, how sensitive
are the previously derived measures to the addition of expenses? In Table VIII the CRPA-
based statistics presented in Table VII are reproduced, but recalculated using fund returns
net of expenses.
[Table VIII about here]
This analysis, made from the point of view of the investor, shows that where the Jensen
fund was previously identified as skillfully managed by most tests, it now only registers
significance in the Burns cross-sectional test, at first glance making inference about
manager skill unreliable. The correct interpretation of this result is that, though the manager
appears to be skillful, expenses lower the expected benefit to the investor to the point where
she is equally well off investing randomly by herself (and thus avoiding the charges). In
other words, and as has been concluded in previous articles, any overperformance of the
fund seems to be charged away from the investor, so that the benefit of the managers skill
are enjoyed only by the brokerage firm and/or the manager himself.

Unless expenses and fees are also considered, in which case perhaps a passive, market index fund would
beat all alternative strategies.

Finally, consider the analysis that could be made of this data by the fund managers
themselves. The Jensen fund seems to be the only skillfully managed portfolio in the
sample, but it does not obtain the best returns. While luck inevitably plays a part in all
financial results, further analysis can be made contrasting the management constraints faced
by the Jensen manager to those imposed on other managers of similar funds. As was seen in
the previous section, the Jensen fund has the lowest average turnover ratio of the sample, as
well as the lowest asset count. In the first chapter of this dissertation it is shown that for
samples of random funds (where all other variables are controlled for), differences in
management constraints can have an impact in the resulting return distributions. As an
example, for relatively low levels of turnover, having a low or high number of assets under
management are dominated as a strategy by a mid-range level. This points to potential areas
of improvement worth investigating. Perhaps the only thing the Jensen fund must do to
improve performance is used the already detected skill to manage a larger number of assets,
compared to its present level.
On further point is raised in Lisi (2011), who implements a measure of skill based on
random portfolios, but instead of the CRPA methodology, generates simple equal-weighted
portfolios from randomly selected stocks traded in the Italian market. Lisi then goes on to
apply risk adjustment and other measures to the sample of random funds before using them
to make statistical tests. However, the CRPA methodology makes these adjustments
unnecessary. Consider the fact that portfolio risk is a function of manager decisions,
coupled with management restrictions. Thus, adding fund manager constraints to the
random portfolio generation algorithm eliminates the need for further processing of the
resulting sample, be it the use of risk adjustment or factor models. That is, risk is already
restricted to the potentially available portfolios, and applying risk adjustment measures

should not alter the inference obtained from a CRPA analysis. I confirm this by applying
further tests in which I apply the CRPA tests to the portfolios Sharpe measures and
regression alphas. The results, withheld for brevity
, do not change the conclusions
described above. That is, there is no added value obtained from applying further measures,
and the correct application of CRPA to simple portfolio returns suffices.

Available upon request.

6. Conclusions

A new general framework for investment fund analysis using randomly trading portfolios is
outlined and one application, a test of fund manager skill, is developed and fully studied.
The skill test based on CRPA is found to be a powerful and appealing alternative to
traditional methods. On one hand, the statistical properties of the resulting distributions are
free from various assumption problems and biases long recognized in other families of
tests, in particular, the problems of parametric regression-based measures. On the other
hand, while fund manager skill is the focus of this paper, the implications and potential
applications of this methodology are extremely varied.
As an empirical application of the CRPA-based test of skill, a sample of U.S. large cap
mutual funds is analyzed from the point of view of a prospective investor. Standard
performance measures usually employed to make investment decisions are estimated, and
tests of skill are applied, where the null hypothesis for all tests is that managers have no
skill. The results obtained from standard and bootstrap regression alpha methodologies are,
at best, inconclusive, and in the worst cases show alphas which are negative and significant,
signifying negative skill or a reduction in portfolio value attributable to the managers
actions. CRPA skill tests of the sample of funds reveal that, while the null is not rejected
for most of them, skill can be tentatively identified in a few. Results are of further interest
because, unlike the case of regression alphas, CRPA test results are not necessarily
correlated with performance measures. In fact, the one fund where all CRPA tests reject the
null is not the best performing fund in the sample (in terms of returns and Sharpe ratio), but
the fourth best. While the CRPA skill tests detect skill in some funds, once fund fees are
deducted from their returns, the test fails to reject the null of no skill, which is consistent

with previous literature that shows that, while there may be some value added by a few
money managers, this value is charged away from the investors.
The application of the CRPA skill test described above would serve as a guide for investor
decision making. However, the same test can be used in other applications.
As a diagnostic tool for fund management firms, we can consider the case in which the null
is rejected for a fund manager, and therefore we can consider this manager as possessing
skill, but nevertheless the funds performance lags behind a benchmark or peer group.
Further analysis of the trading constraints could, in theory, pinpoint areas where the
manager is over or under restricted, with respect to the competition. If these bounds can be
changed (for example, allow the manager to take on more risk, or to increase the frequency
of trades), then performance could be easily improved. More generally, if a group of
managers with certain mandates lags in performance with respect to others with a different
set of goals, then perhaps what is being uncovered is a systematic market anomaly. An
example of a known anomaly, size, might manifest as investors in small firms obtaining
better returns than those who invest in large firms, after controlling for manager skill.
Perhaps CRPA could help discover other, hitherto unreported, anomalies.
There are also implications for the fund manager job market: would hiring be based on
track record alone and other second-hand sources of data, if skill could be measured
reliably? Also, fund charges to investors could potentially be analyzed and based on
manager skill, which is a more direct link to potential performance than other measures.
CRPA provides a robust new framework for various types of investment fund analysis,
including testing for fund manager skill. The results shown here confirm that CRPA tests
are more sensitive in detecting skill than factor model based tests, and the interpretation of
their results are also easier. No risk adjustment is required, as in other measures, and

potential econometric problems such as non-normalities are not an issue, due to the use of
non-parametric statistics.
Further applications of CRPA include, for example, testing for sources of manager skill
once the null of no skill is rejected. Simple and intuitive statistics can be employed in
conjunction with random portfolio methodology to test for stock picking and asset
allocation abilities. Beyond fund manager skill, CRPA can be employed as a non-
parametric alternative to traditional measures which can be severely weakened in their
applicability by specification problems. One example is measures of market herding, which
suffer from the lack of a distribution under the null hypothesis of no herding. CRPA-
generated markets can serve to obtain an empirical distribution in which there is no
herding, against which to benchmark the resulting herding statistic for a more precise
measure of statistical significance.
In all, CRPA is an important addition to the finance analysis toolkit.


List of References

Burns, P., 2007, Random Portfolios for Performance Measurement, in Erricos John
Kontoghiorghes & Cristian Gatu eds.: Optimization, Econometric and Financial
Analysis (Springer).
Burns Statistics (2011). PortfolioProbe: Portfolio Probe. R package version
Carhart, M., 1997, On Persistance of Mutual Fund Performance, Journal of Finance,
Vol. 52, No. 1, 57-82.
Christopherson, J., Ferson, W., Glassman, D., 1998, Conditioning Manager Alphas on
Economic Information: Another Look at the Persistence of Performance, Review of
Financial Studies, Vol.11, No. 1, 111-142.
Cuthbertson, K., Nitzsche, D., O'Sullivan, N., 2008, UK mutual fund performance: skill
or luck?, Journal of Empirical Finance 15, 613-634.
Dawson, R., R. Young, 2003, Near-uniformly distributed, stochastically generated
portfolios, in Stephen Satchel & Alan Scowcroft eds.: Advances in Portfolio
Construction and Implementation (Butterworth-Heinemann Finance).
Fama, E., K. French, 2010, Luck versus skill in the cross-section of mutual fund
returns, Journal of Finance, Vol. LXV-5, 1915-1947.
Ferson, W., R. Schadt, 1996, Measuring fund strategy and performance in changing
economic conditions, Journal of Finance, Vol. 52, No. 2, 425-461.
Kosowski, R., Timmermann, A., Wermers, R., White, H., 2006, Can Mutual Fund Stars
Really Pick Stocks? New Evidence from a Bootstrap Analysis, Journal of Finance,
Vol. LXI, No. 6, 2551-2595.
Lisi, F., 2011, Dicing with the market: randomized procedures for evaluation of mutual
funds, Quantitative Finance, Vol. 11, No. 2, 163-172.
R Development Core Team (2011). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-
900051-07-0, URL
Shaked, M. and J. G. Shanthikumar, 1994, Stochastic Orders and their Applications,
Associated Press.
Silli, B., 2006, Modern Approaches in the Evaluation of Management Skill in the
Mutual Fund Industry (working paper)
Sharpe, W., 1992, Asset Allocation: Management Style and Performance Measurement,
Journal of Portfolio Management, Vol. 18, No. 2, 7-19

Appendix I: Properties of Random Portfolio Samples for Power Tests
Property 1: The expected return for all funds is the same, whether skillful or lucky.
We can see this by taking expectation in (8) and (9). For both equations we have that

() =

Property 2: Lucky portfolios have higher variance than skillful ones.
I calculate the variance of each type of fund. For the skillful funds we have




+ 2

+ 2

, + 2

, 11

is a vector of equal numbers (zero variance), and by construction,




= 2


, 11
On the other hand, for the lucky funds,





+ 2

, + 2

, 12


= 2




, +2

, 13
The difference between these two variances is





, 14
Now, looking at the terms on the right-hand side of this equation, we can see that the first
term is always positive, since its a variance, and the second term is also positive, as


plus a vector of zero or positive constants. The last term, the covariance between t he
added return vector and the error term, can be positive or negative. However, simulation
shows that the probability that the complete expression (the sum of the three terms) is

negative is very low
. Thus, the difference in variances tends to be positive, and therefore
lucky portfolios tend to have a higher variance than skillful portfolios.

Full results of this Monte Carlo test are not reported in the interest of brevity, but are available upon
request. In short, the average difference of the variances obtained from 1,000 Monte Carlo iterations is
almost always positive. The only negative values appear in some samples when the added return factor is
set to zero. In this case both samples are drawn from the same distribution, and thus the difference of
variances has a 50/50 chance of being positive or negative.

Table I
Identification data, average yearly turnover rate and asset count for funds in sample.
The yearly turnover ratio is the dollar value of all trades occurring in each year (buy and
sell) divided by the total value of assets at the beginning of the year. The figure shown is
the average turnover ratio for the 6 year period studied. Similarly, asset count is the 6 year
average of the yearly number of assets in each portfolio.

Nasdaq Ticker Fund Name Name (short) Turnover Ratio Asset Count
Calamos Investment Trust:
CALAMOS Blue Chip Fund
0.46 108
Williamsburg Investment Trust:
Jamestown Equity Fund
0.52 65
JENSX Jensen Portfolio, Inc Jensen
0.13 26
Northern Funds: Large Cap Value
0.45 48
SGRCX Seligman Growth Fund, Inc.
1.57 64
Seligman Value Fund Series, Inc:
Seligman Large-Cap Value Fund
Seligman Value
0.23 35
SunAmerica Focused Series, Inc:
Focused Dividend Strategy Portfolio
1.69 30
Vanguard Tax-Managed Funds:
Vanguard Tax-Managed Growth &
Income Fund
Vanguard 0.14 525
Van Kampen Growth & Income
Fund: Growth & Income Fund
Van Kampen
0.36 74
Washington Mutual Investors Fund,
0.22 132
AMSTX Ameristock Mutual Fund, Inc Ameristock
0.20 37
Advisors Series Trust: Edgar Lomax
Value Fund
0.52 48
Delaware Group Equity Funds II:
Delaware Value Fund
0.28 34
Advisors' Inner Circle Fund: HGK
Equity Value Fund
0.53 48
Rochdale Investment Trust:
Rochdale Large Growth Portfolio
0.52 77
Rochdale Investment Trust:
Rochdale Large Value Portfolio
Rochdale Value
0.55 90
Baird Funds, Inc: Baird LargeCap
0.46 49
ProFunds: Large-Cap Value
6.64 337
Rydex Series Funds: Large-Cap
Growth Fund
Rydex Growth
10.08 144
Rydex Series Funds: Large-Cap
Value Fund
Rydex Value
7.67 152


Table II
Summary statistics of fund sample returns
HPR is the holding period return for the 6 year period under study. Mean, Median Standard
Deviation (St. Dev.), Skewness and Kurtosis are calculated for each fund based on their monthly
returns. The last column shows the Jarque-Bera test of normality statistic for each fund, with
significance being denoted with *, ** and *** for a 10%, 5% and 1% level, respectively.

Portfolio HPR Mean Median St. Dev. Skew Kurtosis Jarque Bera
Market 0.32 0.0052 0.0117 0.0508 -0.8903 4.7215 18. 15***
Calamos 0.21 0.0038 0.0119 0.0472 -0.8878 4.8158 19. 08***
Jamestown 0.15 0.0029 0.0095 0.0444 -1.1266 5.4389 32. 62***
Jensen 0.25 0.0041 0.0092 0.0422 -0.8768 5.215 23. 61***
Northern 0.14 0.0032 0.0119 0.0526 -0.7402 4.5334 13. 44***
Seligman Growth 0.22 0.0042 0.0066 0.0534 -0.9104 4.7767 19. 15***
Seligman Value 0.26 0.0049 0.0096 0.0572 -0.737 5.0373 18. 71***
SunAmerica 0.35 0.0058 0.01 0.0559 -0.0419 5.6083 20. 15***
Vanguard 0.20 0.0037 0.0125 0.048 -0.8513 4.4199 14. 54***
Van Kampen 0.22 0.0039 0.0082 0.0474 -0.7083 3.7443 7.58**
WaMu 0.11 0.0025 0.0106 0.0443 -1.0576 5.0582 25. 77***
Ameristock 0.09 0.0022 0.0102 0.0451 -0.6409 4.1218 8.58**
Lomax 0.17 0.0034 0.0135 0.0489 -0.9321 4.4941 16. 89***
Delaware 0.18 0.0034 0.011 0.0443 -1.0189 4.2503 16. 91***
HGK 0.20 0.0038 0.0134 0.049 -1.1034 5.2559 29. 46***
Rochdale Growth 0.15 0.0034 0.0092 0.0535 -0.632 4.589 12. 2***
Rochdale Value 0.08 0.0026 0.0119 0.0545 -0.9462 5.4183 27. 89***
Baird 0.14 0.0032 0.0044 0.0515 -0.7186 5.5828 25. 85***
ProFunds 0.00 0.0014 0.0115 0.0521 -0.8977 4.404 15. 37***
Rydex Growth 0.23 0.0044 0.0036 0.0547 -0.5716 4.9627 15. 26***
Rydex Value 0.02 0.0034 0.0096 0.0795 0.1027 6.6613 39. 78***


Table III
Fund expenses
Self Reported total expenses obtained from fund publications (prospecta, web sites, etc.) Expense
Ratio data obtained from CRSP. Both measures reported are yearly costs, as percentage of assets.

Portfolio Self Reported Expense Ratio
Calamos 0.0235 0.0123
Jamestown 0.0113 0.0110
Jensen 0.0125 0.0107
Northern 0.0110 0.0114
Seligman Growth 0.0197 0.0163
Seligman Value 0.0215 0.0189
SunAmerica 0.0095 0.0104
Vanguard 0.0155 0.0128
Van Kampen 0.0150 0.0112
WaMu 0.0149 0.0104
Ameristock 0.0091 0.0057
Lomax 0.0099 0.0108
Delaware 0.0185 0.0137
HGK 0.0099 0.0086
Rochdale Growth 0.0150 0.0218
Rochdale Value 0.0152 0.0207
Baird 0.0100 0.0179
ProFunds 0.0273 0.0127
Rydex Growth 0.0218 0.0239
Rydex Value 0.0190 0.0167


Table IV
Standard portfolio performance measures
HPR are holding period returns obtained from portfolio data over a period of 6 years.
Mean is the portfolios average monthly return, while St. Dev. is the standard deviation of
those returns. Sharpe is the funds Sharpe ratio for the period under study. Rank
corresponds to the rank each fund holds in the sample, ordered by their Sharpe ratios.

Name HPR Mean St. Dev. Sharpe Rank
Market 0.32 0.005 0.051 0.064 2
Calamos 0.21 0.004 0.047 0.040 8
Jamestown 0.15 0.003 0.044 0.022 16
Jensen 0.25 0.004 0.042 0.050 4
Northern 0.14 0.003 0.053 0.024 14
Seligman Growth 0.22 0.004 0.053 0.043 6
Seligman Value 0.26 0.005 0.057 0.052 3
SunAmerica 0.35 0.006 0.056 0.068 1
Vanguard 0.20 0.004 0.048 0.037 10
Van Kampen 0.22 0.004 0.047 0.041 7
WaMu 0.11 0.003 0.044 0.013 18
Ameristock 0.09 0.002 0.045 0.006 20
Lomax 0.17 0.003 0.049 0.031 12
Delaware 0.18 0.003 0.044 0.032 11
HGK 0.20 0.004 0.049 0.038 9
Rochdale Growth 0.15 0.003 0.054 0.027 13
Rochdale Value 0.08 0.003 0.055 0.012 19
Baird 0.14 0.003 0.052 0.024 15
ProFunds 0.00 0.001 0.052 -0.010 21
Rydex Growth 0.23 0.004 0.055 0.046 5
Rydex Value 0.02 0.003 0.080 0.018 17


Table V
Factor model alphas
Regression alphas obtained from a one factor model (Jensens alpha), Fama & Frenchs
three factor model and Carharts four factor model. Significance is denoted with *, ** and
*** for a 10%, 5% and 1% level, respectively.

Portfolios Jensen Fama & French Carhart
Calamos -0.0011 -0.0007 -0.0007
Jamestown -0.0018 -0.0016 -0.0016
Jensen -0.0004 -0.0002 -0.0002
Northern -0.002 -0.0019 -0.002
Seligman Growth -0.001 -0.0006 -0.0006
Seligman Value -0.0005 -0.0002 -0.0003
SunAmerica 0.0005 0.0003 0.0002
Vanguard -0.0013* -0.001* -0.001*
Van Kampen -0.001 -0.0005 -0.0005
WaMu -0.0022* -0.0017 -0.0017
Ameristock -0.0025 -0.0022 -0.0022
Lomax -0.0015 -0.001 -0.001
Delaware -0.0012 -0.0008 -0.0008
HGK -0.0012 -0.0008 -0.0008
Rochdale Growth -0.0017 -0.0015 -0.0014
Rochdale Value -0.0027 -0.0025 -0.0023
Baird -0.002 -0.0022 -0.0023
ProFunds -0.0038** -0.0035*** -0.0035***
Rydex Growth -0.0009 -0.0012 -0.0012
Rydex Value -0.0031 -0.0034 -0.0037*


Table VI
Bootstrap alphas
The first column contains the value of the regression alpha obtained from Carharts four
factor model. The following two columns show the bootstrap p-values obtained for this
alpha when testing two hypotheses alternative to the null that alpha is zero. HA tests the
right tail, or a positive alpha, while HB tests the left tail, or a negative alpha. Significance is
denoted with *, ** and *** for a 10%, 5% and 1% level, respectively.

Portfolio Carhart Alpha HA pval HB pval
Calamos -7.0E- 04 1.00 0.00***
Jamestown -1.6E- 03 1.00 0.00***
Jensen -2.0E- 04 1.00 0.00***
Northern -2.0E- 03 1.00 0.00***
Seligman Growth -6.0E- 04 1.00 0.00***
Seligman Value -3.0E- 04 1.00 0.00***
SunAmerica 2.0E-04 1.00 0.00***
Vanguard -1.0E- 03 1.00 0.00***
Van Kampen -5.0E- 04 1.00 0.00***
WaMu -1.7E- 03 1.00 0.00***
Ameristock -2.2E- 03 1.00 0.00***
Lomax -1.0E- 03 1.00 0.00***
Delaware -8.0E- 04 1.00 0.00***
HGK -8.0E- 04 1.00 0.00***
Rochdale Growth -1.4E- 03 1.00 0.00***
Rochdale Value -2.3E- 03 0.98 0.02**
Baird -2.3E- 03 0.99 0.01***
ProFunds -3.5E- 03 0.97 0.03**
Rydex Growth -1.2E- 03 1.00 0.00***
Rydex Value -3.7E- 03 0.87 0.13


Table VII
CRPA tests of fund manager skill
All values shown are empirically obtained p-values. Burns is the p-value from the Burns
(2007) cross-sectional measure of skill. The other three columns contain p-values derived
from the percentile or time-series approach to CRPA skill testing in which the distribution
of a funds time series of returns is compared with that of a percentile distribution obtained
from a sample of random funds. The T- and Permutation tests measure significance of the
difference in the distribution means. MWW is a test of stochastic order, where the null
hypothesis is that both samples (fund and random portfolio returns) are drawn from the
same distribution, and the one-tailed alternative is that fund returns are stochastically
greater than random portfolio returns. Significance is denoted with *, ** and *** for a
10%, 5% and 1% level, respectively.

Portfolio Burns T-test MWW Test Permutation Test
Calamos 0.221 0.767 0.2607 0.747
Jamestown 0.134 0.7811 0.3631 0.752
Jensen 0.003*** 0.0387** 0.0343** 0.038**
Northern 0.129 0.6985 0.1845 0.647
Seligman Growth 0.426 0.8439 0.3253 0.825
Seligman Value 0.032** 0.4366 0.2854 0.449
SunAmerica 0.238 0.8455 0.411 0.816
Vanguard 0.092* 0.6431 0.039** 0.556
Van Kampen 0.076* 0.675 0.5352 0.678
WaMu 0.144 0.7566 0.4898 0.749
Ameristock 0.003*** 0.1854 0.3212 0.201
Lomax 0.057* 0.6124 0.6305 0.609
Delaware 0.132 0.7398 0.4671 0.682
HGK 0.11 0.7729 0.2758 0.777
Rochdale Growth 0.183 0.6612 0.1298 0.566
Rochdale Value 0.263 0.7606 0.1423 0.713
Baird 0.046** 0.5192 0.107 0.523
ProFunds 0.998 0.9243 0.9558 0.987
Rydex Growth 0.819 0.8666 0.5397 0.881
Rydex Value 0.991 0.8385 0.728 0.767


Table VIII
CRPA tests of fund manager skill, adjusting returns for fund expenses
All values shown are empirically obtained p-values. Burns is the p-value from the Burns
(2007) cross-sectional measure of skill. The other three columns contain p-values derived
from the percentile or time-series approach to CRPA skill testing in which the distribution
of a funds time series of returns is compared with that of a percentile distribution obtained
from a sample of random funds. The T- and Permutation tests measure significance of the
difference in the distribution means. MWW is a test of stochastic order, where the null
hypothesis is that both samples (fund and random portfolio returns) are drawn from the
same distribution, and the one-tailed alternative is that fund returns are stochastically
greater than random portfolio returns. Significance is denoted with *, ** and *** for a
10%, 5% and 1% level, respectively.
Portfolio Burns T-test MWW Test Permutation Test
Calamos 0.794 0.8939 0.6412 0.902
Jamestown 0.256 0.8555 0.6022 0.826
Jensen 0.012** 0.1343 0.1423 0.135
Northern 0.256 0.7792 0.2815 0.743
Seligman Growth 0.718 0.8922 0.5622 0.883
Seligman Value 0.139 0.7171 0.6726 0.709
SunAmerica 0.311 0.8685 0.5216 0.843
Vanguard 0.347 0.7845 0.2777 0.642
Van Kampen 0.24 0.8336 0.7715 0.843
WaMu 0.592 0.9073 0.7966 0.907
Ameristock 0.004*** 0.3158 0.5261 0.324
Lomax 0.103 0.7168 0.7801 0.702
Delaware 0.79 0.8988 0.8215 0.888
HGK 0.178 0.834 0.4762 0.812
Rochdale Growth 0.483 0.7418 0.2893 0.65
Rochdale Value 0.605 0.8423 0.3031 0.811
Baird 0.087* 0.6127 0.1876 0.584
ProFunds 1 0.9521 0.9973 1
Rydex Growth 0.968 0.8938 0.7998 0.956
Rydex Value 1 0.8656 0.8785 0.854


Figure 1
Random fund sample holding period return probability density
Probability density plot for a sample of 1,000 random funds trading securities listed in the
S&P 500 for 6 years, from 2005 to 2010, with constraints on turnover and portfolio asset
count consistent with the mean of these values for real funds currently operating in the
market. Density obtained from the holding period returns of each of 1,000 funds. Six year
holding period return plotted on x-axis, probability in the y-axis. 95th percentile holding
period return denoted by dotted line.


Figure 2
Full random portfolio time series of returns probability densities
Time series of returns probability densities plotted for each of a set of 1,000 random fund
probability distributions, from the same sample used in figure I and ordered by average monthly
return. Separate sample portfolio return densities arrayed along the Y axis, with returns on the X
axis and the probability densities of these returns visible on the Z axis.
CRPA 1000 random fund sample Probability Density Surface


Figure 3
Net Power of Skill Test, 1 Lucky Period
Net power curves of tests of skill. Extra Return Factor is the additional monthly return added to a
baseline random portfolios returns to simulate overperformance through skill or luck, ranging
from 0% (no skill), to 4%. Net power of the test denotes the probability that the test rejects the
null of no skill when the sample has skill minus the probability that the test rejects the null of no
skill when the sample is just lucky (average of number of times the test rejections out of 1,000
trials, for each level of added return).


Figure 4
Net Power of Skill Test, 5 Lucky Periods
Net power curves of tests of skill. Extra Return Factor is the additional monthly return added to a
baseline random portfolios returns to simulate overperformance through skill or luck, ranging
from 0% (no skill), to 4%. Net power of the test denotes the probability that the test rejects the
null of no skill when the sample has skill minus the probability that the test rejects the null of no
skill when the sample is just lucky (average of number of times the test rejections out of 1,000
trials, for each level of added return).


Figure 5
Net Power of Skill Test, 10 Lucky Periods
Net power curves of tests of skill. Extra Return Factor is the additional monthly return added to a
baseline random portfolios returns to simulate overperformance through skill or luck, ranging
from 0% (no skill), to 4%. Net power of the test denotes the probability that the test rejects the
null of no skill when the sample has skill minus the probability that the test rejects the null of no
skill when the sample is just lucky (average of number of times the test rejections out of 1,000
trials, for each level of added return).


Figure 6
Net Power of Skill Test, 5 lucky periods, samples with no stochastic dominance: 60%
extra return for lucky portfolios
Net power curves of tests of skill. Extra Return Factor is the additional monthly return added to a
baseline random portfolios returns to simulate overperformance through skill or luck, ranging
from 0% (no skill), to 4%. Extra return factor for luck sample is higher than for skill sample. Net
power of the test denotes the probability that the test rejects the null of no skill when the sample
has skill minus the probability that the test rejects the null of no skill when the sample is just lucky
(average of number of times the test rejections out of 1, 000 trials, for each level of added return).


Figure 7
Portfolio Return Probability Density
Probability densities of the time series of returns of the Jensen fund (full line) and
SunAmerica (dashed line). The x-axis denotes monthly portfolio return, while the y-axis
shows probability density. Plot obtained using kernel density estimation, with a Gaussian
kernel and automated bandwidth selection.