You are on page 1of 48

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/333894756

Selecting Mutual Funds Using Machine Learning Classifiers

Thesis · May 2019

CITATION READS

1 4,784

1 author:

Cyril Vanderhaeghen
Edhec Business school
1 PUBLICATION   1 CITATION   

SEE PROFILE

All content following this page was uploaded by Cyril Vanderhaeghen on 20 June 2019.

The user has requested enhancement of the downloaded file.


Selecting Mutual Funds

Using Machine Learning Classifiers

Cyril Vanderhaeghen

May 26, 2019

Master of Science in Financial Markets

Under the supervision of Professor Christophe CROUX

EDHEC Business School does not express approval or disapproval concerning the opinions given in this paper

which are the sole responsibility of the author.


Abstract

This paper uses machine learning computed probabilities as fund selection signals and tests this

signal in a fund of fund portfolio. Using time series data and alternative data, we trained several

classifying methods, Support Vector Machines, Logistic regression, Random Forest and an

Artificial Neural Network, to be used as decision processes when rebalancing our portfolio of

US mutual funds.

We found that the signal was relevant when it comes to accurately selecting funds, however the

models were mainly able to capture momentum information within mutual funds.

1
Table of content

1. Introduction ..................................................................................................................................... 5

2. Related Work .................................................................................................................................. 9

3. Models Description ....................................................................................................................... 11

3.1 Logistic Regression ................................................................................................................. 11

3.2 Support Vector Machines ........................................................................................................ 12

3.3 Random Forests ....................................................................................................................... 12

3.4 Voting/Ensemble classifier ..................................................................................................... 13

3.5 Artificial Neural Network ....................................................................................................... 14

4. Data and Features ......................................................................................................................... 15

4.1 Return based features .............................................................................................................. 16

4.2 Non-Net Asset Values based features ..................................................................................... 18

5. Calibrating the models and training results ............................................................................... 22

5.1 Tuning the models’ hyperparameters ...................................................................................... 22

5.2 Bayesian Optimization ............................................................................................................ 23

5.3 Random Search ....................................................................................................................... 23

5.4 Results and model comparison ................................................................................................ 24

5.5 Relationship between the models’ probability and the returns ............................................... 26

6. Applying the models in a strategy ............................................................................................. 30

6.1 Back testing results.................................................................................................................. 31

6.2 Predictions’ accuracy .............................................................................................................. 37

6.3 Looking at the momentum effect ............................................................................................ 39

7. Conclusion................................................................................................................................. 42

2
References ......................................................................................................................................... 44

3
List of Abbreviations

NAV Net Asset Value

ANN Artificial Neural Network

AUC Area under the Curve

CV Cross-validation

NAV Net Asset Value

ReLU Rectified Linear Unit

ROC Receiver Operating Characteristic

SVM Support Vector Machines

IR Information Ratio

4
Table of Figures
Figure 1 Sigmoid function ........................................................................................................ 11
Figure 2 SVM methodology ..................................................................................................... 12
Figure 3 Random Forest Methodology..................................................................................... 13
Figure 4 Artificial Neural Network methodology .................................................................... 14
Figure 5 Conditional distributions of 3 months returns............................................................ 16
Figure 6 Conditional distributions of 6 months returns............................................................ 17
Figure 7 Conditional distributions of 12 months returns.......................................................... 17
Figure 8 Conditional distributions of the volatility .................................................................. 18
Figure 9 Distribution of the consistency feature ...................................................................... 18
Figure 10 Conditional distributions of the number days of existence ...................................... 19
Figure 11 Average number of positive returns per state .......................................................... 20
Figure 12 Average number of positive returns per investment style ........................................ 21
Figure 13 5-fold ROC curves ................................................................................................... 26
Figure 14 Regression of returns to logistic regression probabilities ........................................ 27
Figure 15 Regression of returns to SVM probabilities ............................................................. 27
Figure 16 Regression of returns to random forest probabilities ............................................... 27
Figure 17 Regression of returns to ensemble classifier probabilities ....................................... 28
Figure 18 Regression of returns to ANN probabilities ............................................................. 28
Figure 19 Equal weight portfolio of all the funds over time .................................................... 31
Figure 20 Excess Returns for a quantile of 10% ...................................................................... 32
Figure 21 Excess Returns for a quantile of 20% ...................................................................... 32
Figure 22 Excess Returns for a quantile of 30% ...................................................................... 33
Figure 23 Excess Returns for a quantile of 40% ...................................................................... 33
Figure 24 Excess Returns for a quantile of 50% ...................................................................... 34
Figure 25 Predictions' accuracy ................................................................................................ 37
Figure 26 Overall accuracy for models trained without momentum component ..................... 40
Figure 27 Excess returns for models trained without momentum component ......................... 41
Figure 28 Returns when choosing the top 10% funds .............................................................. 46
Figure 29 Strategies’ value when choosing the top 10% funds ................................................ 46
Figure 30 Returns when choosing the top 20% funds .............................................................. 46
Figure 31 Strategies’ value when choosing the top 20% funds ................................................ 46
Figure 32 Returns when choosing the top 30% funds .............................................................. 46
Figure 33 Strategies’ value when choosing the top 30% funds ................................................ 46
Figure 34 Returns when choosing the top 40% funds .............................................................. 46
Figure 35 Strategies' value when choosing the top 40% funds ................................................ 46
Figure 36 Returns when choosing the top 50% funds .............................................................. 46
Figure 37 Strategies' value when choosing the top 50% funds ................................................ 46

5
Table of Tables
Table 1 Hyperparameter tuning results and cross validation scores ........................................ 25
Table 2 Regressions' results...................................................................................................... 29
Table 3 Testing mean excess returns’ significance for q = 10% .............................................. 34
Table 4 Testing mean excess returns’ significance for q = 20% .............................................. 35
Table 5 Testing mean excess returns’ significance for q = 30% .............................................. 35
Table 6 Testing mean excess returns’ significance for q = 40% .............................................. 35
Table 7 Testing mean excess returns’ significance for q = 50% .............................................. 36
Table 8 Information ratios ........................................................................................................ 36
Table 9 Mean accuracy of only the selected funds for different quantiles ............................... 38
Table 10 Testing results for higher than 50% accuracy ........................................................... 40

6
1. Introduction

This paper delivers an analysis of a mutual fund selection signal based on machine learning

classifiers’ outputted probability. The weighting scheme of each component within the portfolio

is based on the models’ calculated probability of the fund to yield a positive return over the

investment horizon. Using common risk-adjusted measures and prediction precision measures,

we will see how the strategies compare to one another when using different models to compute

the investment signals. We will also analyze their performances with respect to a naïve

momentum fund selection process.

A fund manager’s expertise is mostly captured by its funds track record. Similarly, a fund’s

commercial document usually displays past performance and present this information as selling

arguments. However, when it comes to fund selection, studies show that, on average, actively

managed funds struggle to outperform their benchmarks and other index funds (Fortin &

Michelson, 2002). Yet, numerous investors are willing to dedicate capital to mutual funds and

we can observe some successful funds within the industry, Blackrock or Vanguard to name a

few. It is evidence to the idea that some funds can consistently add value to their clients. This

suggests that, using data on the funds’ characteristics, one might be able to capture those

performing funds.

Even most popular fund classification and rating methods, like Morningstar’s methodology, are

based on the historical performances and use portfolio components such as asset allocation,

market capitalization, value-growth score as well as the beta and alpha of the funds. These

ranking methods are widely used by investors as tools to choose among a universe of funds.

Within this work, when choosing our explanatory variables, we will stray away from these

classical measures and analyze the predictive power of combining usual return-based features

alongside some non-financial features.


7
Because most studies use linear models such as CAPM or factor models as the primary

modelling tools in finance, more complex relationship might not be captured using these

classical methods. Moreover, these financial models are regression algorithms which aim at

providing an estimate of a given target variable over a time period.

A lot of, now popular, machine learning algorithms were developed in the late 20th century. It

is only recently that given the exponential growth of computing power as well as the growing

amount of data that industries have taken interest and have been able to efficiently leverage the

power of these algorithms. In this paper, we will apply classification models that are not

designed to give a continuous numerical prediction but are designed to give a class and a

prediction confidence to the target variable, in this study, a 1 for a positive return, a 0 for a

negative one. We will be using both linear models like logistic regression and nonlinear

machine models such as multilayer perceptron, a common artificial neural network architecture.

8
2. Related Work

Whether fund-specific characteristics can yield predictive information is still not settled among

researchers. Carhart showed that there is momentum information within mutual fund (1997).

However, results are mixed, for instance Lakonishok et al. found no relationship between the

performance of one year and the following one (Lakonishok, Shleifer, Vishny, Hart & Perry,

1992). Other research results found, on average, no positive abnormal returns across mutual

funds (Titman & Grinblatt, 1989).

Machine learning algorithms find successful applications in various fields, image recognition

(Krizhevsky & Sutskever, 2012) or the medical sector (Deo, 2015) to name a few.

Understandably, these algorithms have been a subject of great interest within the financial

industry, too. They are applied on a various range of classical problems such as stock price

prediction (Tarsauliya, Kant, Kala, Tiwari & Shukla, 2010) and also more original problems

like consumer credit risk modelling (Khandani, Kim & Lo, 2010), news sentiment analysis (Ho

& Wang, 2016) and even pattern recognition on chart images using Convolutional Neural

Networks (Gudelek, Boluk & Ozbayiglu, 2017).

When it comes to funds, Indro, Jiang, Patuwo and Zhang used an artificial neural network to

predict mutual funds’ performance (Indro et al., 1999). They find that the Neural Network was

better than classical linear models for growth and blend funds.

Ludwig and Piovoso (2005) apply Decision Trees, Neural Networks and Naïve Bayes to

compare money managers. They use input features such as 1-, 2- and 5-year excess returns,

percentage of outperforming quarters, tracking error and various ratios. The resulting accuracy

from predicting subsequent managers’ performance is above 65% and exceeded the

performance of a simple scoring model.


9
This motivates to use a set of features for this study that consist of both, variables computed

based on past returns and non-financial indicators. Trained on this feature space the models

predict for each given date the investment strategy in funds over a time period.

10
3. Model Description

This section reviews the algorithms applied as well as the data used. Further it is described how

the algorithms are trained and validated.

Throughout this study, we will use the widespread open source Scikit-Learn Python package

for training, testing and validating our Logistic Regression, Support Vector Machines, Decision

Tree and voting ensemble models. To setup and calibrate our Artificial Neuronal Network

(ANN), we will use both Keras and Scikit-Learn.

3.1 Logistic Regression

Logistic regression is a simple machine learning classifying model that maps the result of a

regression model to a sigmoid function:

𝟏
𝝈(𝒛) =
𝟏+ 𝒆−𝒛

With z being the result of a linear regression of our target

variable with our explanatory variables. Figure 1 Sigmoid function

The sigmoid function outputs a value in [0 , 1], as displayed in Figure 1. Therefore, one can

give a probabilistic interpretation of an element being within a certain class, in our case, the

fund yielding positive returns in the next forecasting period. If the function outputs a value

higher than 0.5, the datapoint is classified as 1, and 0 otherwise.

11
3.2 Support Vector Machines

This algorithm classifies the data by constructing a

hyperplane, separating the training data’s different

classes, in our case, positive or negative returns, and

maximizing the distance this hyperplane has to the

training data. Thus, the hyperplane separates the classes


Figure 2 SVM methodology

of funds yielding positive returns in the next forecasting

period and those that do not. When predicting the funds’ returns, the trained algorithm

determines where the new fund’s data points fall with respect to the hyperplane and infers a

positive return or negative return class to the fund.

Support Vector Machines originally do not give any probabilistic interpretation to their

classifications, however, using Platt scaling (Platt, 2000), SVMs can be applied in a

probabilistic setting.

3.3 Random Forests

Random forests algorithms construct several decision trees using the training data, those

decision trees find several simple binary rules to output a class. The random forest algorithm

then takes the mode of the class from all the individual trees as its prediction.

12
Figure 3 Random Forest Methodology

Using multiple trees has the benefit of avoiding the fact that individual trees tend to overfit the

training dataset.

Probabilities can be inferred to the predictions by looking at the proportion of vote for each

class from all the individual trees.

3.4 Voting/Ensemble classifier

This method aggregates the predictions from the previously mentioned algorithms by finding

the class that maximizes the sum of predicted probabilities from all the classifiers. On average,

the ensemble model works better than single classifiers since having several classifiers reduce

the prediction variance. The ensemble classifier will be composed of logistic regression, SVM

and random forest.

The probabilities are simply computed as being the average probabilities from each classifier.

13
3.5 Artificial Neural Network

At its roots, an Artificial Neural Network

(or Multilayer Perceptron) aims at

replicating the biological scheme of

biological brains: neurons connected to

each other by synapses.

The structure can incorporate any number

of hidden layers and neurons per layer, Figure 4 Artificial Neural Network methodology

each neuron from one layer being connected to all the neurons of the next layer. An ANN can

in theory approximate any continuous real function.

The value yk of a neuron is the weighted sum of the values of the previous neurons as defined

as:

𝑦𝑘 = 𝜑(∑𝑛𝑖 𝑤𝑘,𝑖 𝑥𝑖 + 𝑏)

The 𝑥𝑖 𝑠 are the previous layer’s neurons’ values, 𝑤𝑘,𝑖 𝑠 the weights associated to each neuron,

b is a bias and 𝜑 an activation function, typically Rectified Linear Unit (ReLU), hyperbolic tan

or the sigmoid function.

When fitting the model, the algorithm essentially finds all the appropriate weights between

neurons. It does so by using backpropagation, a method used to perform gradient descent to

adjust the weights between each node toward their optimal values in order to minimise the loss

function, mean squared error for instance.

By having one neuron as output to our ANN and choosing a sigmoid activation function for it,

we have a probabilistic output which is given by the value of the neuron.

14
4. Data and Features

The analysed data set used is provided by the Wharton Research Data Service database, it stems

from the “Survivor-bias-Free US Mutual Fund” series. It contains historical information such

as Net Asset Value (NAV) per share, cash percentage, 52days low/high. Furthermore, it

contains more diverse information, such as parent company city/state, phone number. The data

ranges from 1962 to today with information from both active and liquidated funds.

Throughout this study we will use monthly NAV per share data, fund inception date as well as

geographical and investment style data on the funds to conduct feature engineering.

In the next sections we provide a description of the features used as explanatory variables which

we split into two categories: features computed from the NAV per share and alternative features

not based on NAV per share. All the features used to perform model training are computed

from 04/2000 to 04/2001 to train the models at predicting the next quarter’s returns, on 07/2001.

This prediction date was chosen with the objective of having a balanced amount of positive and

negative return, so that the models train the most efficiently possible. At the date of prediction,

there are 2247 funds with positive returns and 1543 with negative returns, so a slight imbalance

with more positive returns.

Throughout this study we will define the returns as follow:

𝑁𝐴𝑉𝑡 − 𝑁𝐴𝑉𝑡−1
𝑟𝑡 =
𝑁𝐴𝑉𝑡−1

15
4.1 Return based features

Using the monthly NAV, we computed returns from which we defined the following 5 features:

- Consistency of returns within the last 12 months

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑡𝑢𝑟𝑛𝑠 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑙𝑎𝑠𝑡 12 𝑚𝑜𝑛𝑡ℎ𝑠


𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 =
12

- Annualized volatility of monthly returns within the last 12 months defined as the

standard deviation of returns times √12

- Return over the last 3 months

- Return over the last 6 months

- Return over the last 12 months

The consistency and the 3-, 6- and 12-months returns were defined with the idea of capturing

momentum effects.

The features defined above display the following distributions conditional to the forward 3

months returns’ sign:

Figure 5 Conditional distributions of 3 months returns

16
Figure 6 Conditional distributions of 6 months returns

Figure 7 Conditional distributions of 12 months returns

As we can see on Figures 5, 6 and 7, the conditional distributions show positive correlation with

the next returns: more funds with positive 3-,6- and 12 months returns yield positive returns

over the next 3 months, the distributions have observable different means.

17
Figure 8 Conditional distributions of the volatility

hereby displays the distribution of volatility

conditional on the forward 3 months returns. We

can confirm visually that the funds that will yield

positive returns and those that will yield negative

returns have different distributions.

Figure 8 Conditional distributions of the volatility

Like previously plotted features, consistency

of returns as shown on Figure 9 Distribution of

the consistency feature, appears to be negatively

skewed and capture momentum effects as

funds with higher consistency in the past tend

to be more likely to yield positive returns over


Figure 9 Distribution of the consistency feature
the next period.

4.2 Non-Net Asset Values based features

To capture information that might not be present in the returns, we defined one continuous

variable and eight dummy variables based on alternative data:

18
Length of existence Feature

The number of days of existence between the

inception date and the calculation date.

The distribution of forward returns displayed on

Figure 10 does not display too different distributions

among the sign of forward returns beside smaller

variance for forward positive returns. Figure 10 Conditional distributions of the number days of
existence

We defined four dummy variables based on which

state of the US the fund’s parent company is located, the rationale is that depending on the

location a fund could have advantages such as better infrastructure, contact or access to talents.

Location Feature

For the 47 different states within the database, we averaged the number of positive returns of

all the funds located in each state, and defined the four categories using the 25%, 50% and 75%

quantiles. Displayed in Figure 11, is the states’ average number of positive returns, ranked from

largest to lowest.

We assume that the state location features are stable and do not change overtime.

19
Figure 11 Average number of positive returns per state

CRSP Style Feature

We follow the same methodology to define four categorical variables based on the

Wiesenberger, Strategic Insight and Lipper Objective codes. The CRSP Style Code consists of

up to four characters, with each position defined. Reading Left to Right, the four codes

represent an increasing level of granularity1. For example, a code for a mutual fund is EDYG,

where: E = Equity, D = Domestic, Y = Style, G = Growth.

Figure 12 shows the mean number of positive monthly returns for all funds in each CRSP Style

Code. We proceeded as for the state features by creating four dummy variables based on the

25%, 50% and 75% quantiles of the mean number of positive returns.

Again, we assume that a given fund’s investment methodology does not change overtime.

1
The complete descriptions of the codes are available at http://www.crsp.com/products/documentation/crsp-
style-code
20
Figure 12 Average number of positive returns per investment style

In total there are 14 features and 3790 different funds to train our models on.

21
5. Calibrating the models and training results

5.1 Tuning the models’ hyperparameters

Hyperparameter tuning is a very important task as these are the models’ parameters that cannot

be tuned during the training process using training data and thus must be defined by the user.

Each model has its own hyperparameters, we decided to tune the following ones:

- For Logistic Regression: The inverse regularization strength which accounts for

overfitting.

- For SVM: The C parameters which represents the hyperplane’s margin to the classes

and the polynomial degree kernel parameter

- For Random Forests: The total number of trees in the forest

- For Artificial Neural Network: The number of hidden layers and neurons per layer

We will conduct hyperparameter tuning with random search for logistic regression, SVM and

random forest, and use Bayesian optimization for our ANN.

To conduct random search optimization and to validate our models, we will use cross validation

(CV). CV is a powerful method to evaluate the algorithms’ predicting power while controlling

for overfitting. For instance, a 5-fold cross validation splits the training data into 5 equally sized

sets, each on which the model makes predictions after training on the 4 remaining ones of

training data.

The random seed was set to 42 whenever possible.

22
5.2 Bayesian Optimization

We decided to use a Bayesian Optimization2 approach (Larochelle & Adams, 2012) for the

ANN model as it is less computationally expensive than a grid search and more efficient than

a random search.

The idea behind Bayesian Optimization is to find the parameters that maximize an unknown

function by evaluating it at different points while still considering the previous tried values by

using a Gaussian process. Every new evaluation point is chosen as the set of parameters with

the highest Expected Improvement, defined as follow:

𝐸𝐼(𝑥) = 𝐸[max(𝑓(𝑥) − 𝑓(𝑥 ∗ ), 0]

With 𝑓 the function to maximize and 𝑥 ∗ the set of parameters giving the current maximum of

the function.

Within the deep learning framework, the number of epochs is the number of times the data is

feed into the ANN and the weights updated using gradient descent. In our case, at each epoch,

the ANN trains on 66% of the data and makes predictions and computes accuracy on the

remaining 33% of data. With this definition in mind, we define the function to optimize the

parameters on as the average accuracy on the 33% of training data during 60 epochs. We

perform the optimization with 20 evaluations on the following search space: one to five layers

of 5 to 20 neurons per layer.

5.3 Random Search

This method works by trying values randomly within a given search space and performing cross

validation to find the best set of tried parameters. The search spaces go as follow:

2
To implement this, we used the Python package available at https://github.com/fmfn/BayesianOptimization
23
- For Logistic Regression’s inverse regularization strength, from 0.01 to 10

- For SVM’s C parameter and polynomial degree kernel parameter respectively, from

0.01 to 10 and 1 to 10.

- For Random Forests’ total number of trees in the forest, from 100 to 300

5.4 Results and model comparison

Table 1 shows the hyperparameters tuning values as well as the mean 5-fold cross validation

score, defined as the average accuracy over each testing set. The accuracy is computed the

following:

𝑇𝑃 + 𝑇𝑁
𝑎𝑐𝑐 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

𝑇𝑃 are the True Positives, correct positively classified instances

𝑇𝑁 are the True Negatives, correct negatively classified instances

𝐹𝑃 are the False Positives, incorrect positively classified instances

𝐹𝑁 are the False Negatives, incorrect negatively classified instances

The cross-validation scores are very high for a finance exercise, we believe it is because the

models can capture the strong relationship between past returns and future returns displayed in

the feature description part very well. Indeed, if we were to choose funds solely based on the

past 3-months returns’ sign, we would already achieve an accuracy of around 64%.

Judging by the mean 5-fold CV score, it appears the best model is the random forest algorithm

and worst is the ANN which yields a bit more overfitting than the other models.

24
Table 1 Hyperparameter tuning results and cross validation scores

To further compare our models, we look at their 5-fold Receiving Operating Characteristic

(ROC) curves and Area Under the Curve (AUC). The ROC curves show the relationship

between the false positives and true positives when changing the classification threshold. An n-

fold ROC curve is a similar concept to cross validation. We train the model several times on

different hold out data and compute the predictions on the rest of the data. We then compute

the mean ROC curve over all the iterations.

The AUC can be interpreted as the probability of the model classifying a randomly chosen

positive instance higher than a randomly chosen negative one. It gives a metric to compare our

models.

25
Figure 13 5-fold ROC curves

The ROC curves are plotted on Figure 13. The models have the following AUC with the best

in terms of AUC being the ANN:

- Logistic Regression: 0.85

- SVM: 0.86

- Random Forest: 0.88

- Ensemble Classifier: 0.88

- ANN: 0.91

5.5 Relationship between the models’ probabilities and the returns

The models output probabilities with the methods described previously in section 3. To get an

idea of the relationship between the returns and the predicted probabilities, we conduct a linear

regression. The target variable is the training data’s target variable: the 3 months returns from

04/2001 to 07/2001. The explanatory variables are the probabilities associated to the models’

predictions.

To conduct this analysis, we split the dataset into 66% testing and 33% training samples, fit the

models on the training sample and compute the probabilities on the testing sample, the results

are plotted below.

26
Figure 14 Regression of returns to logistic regression probabilities

Figure 15 Regression of returns to SVM probabilities

Figure 16 Regression of returns to random forest probabilities

27
Figure 17 Regression of returns to ensemble classifier probabilities

Figure 18 Regression of returns to ANN probabilities

As we can see on Figures 14 to 18, the returns of the funds seem to be well captured by the

probabilities outputted by the models in a linear regression. Table 2 shows the regressions’

coefficients and r-squared, the coefficients are all highly statistically significant.

The coefficients and r-squared are all within a similar range and these results provide support

to our idea of using the models’ probabilities as fund selection signals.

28
Table 2 Regressing returns against probability results

29
6. Applying the models in a strategy

We apply the models on a long-only strategy ran on a universe of 10,415 funds. The time

window used to run the strategy on goes from 01/2002 to 12/2017, which corresponds to 64

quarters. The portfolio is rebalanced quarterly and at each rebalancing date: we filter the

relevant funds meaning those with at least 1 year of data, we recompute the NAV-based features

using up to 1 year of data and add the non-time dependent features. Then, using the models

trained as described previously, we make probabilistic predictions on the universe of funds.

The investment signal used is the outputted probability by the models of the fund yielding a

positive return in 3 months-time, the probabilities are ranked from highest to lowest and the

amount of funds in our portfolio for the next quarter is composed of a chosen quantile of the

funds. We define a weighting scheme so that, funds with higher probabilities have a higher

weight within the portfolio, as computed with the following formula:

𝑝𝑖
𝑤𝑖 =
∑𝑘 𝑝𝑘

𝑤𝑖 is the weight of fund i

𝑝𝑖 is the probability given to fund i

∑𝑘 𝑝𝑘 is the sum over all the selected funds’ probabilities

We also create a naïve momentum strategy in which we invest in the best past 3 months

performers, weighted according to the magnitude of the last 3 months returns, similarly to the

probability weighting:
𝑟𝑖
𝑤𝑖 =
∑𝑘 𝑟𝑘

𝑤𝑖 is the weight of fund i


30
𝑟𝑖 is the previous quarter returns of fund i

∑𝑘 𝑟𝑘 is the sum over all the selected funds’ previous quarter returns

We analyze the excess returns of each of the strategies computed using an equally weighted

market portfolio made of all the mutual funds available in our dataset, which’s value over the

back testing period is displayed on Figure 19.

Figure 19 Equal weight portfolio of all the funds over time

6.1 Back testing results

Figure 20 to Figure 24 below show the quarterly excess returns of the strategy for different

quantile of the universe of funds ranging from 10% to 50% of the top probabilities.

The best performing models are logistic regression and the ANN (labeled MLPClassifier);

however, the models yield similar returns patterns with SVM being the least performing one.

We can clearly see that the machine learning based strategies’ excess returns are correlated with

the Naïve momentum strategy’s performances. This might suggest that the models mostly

captured momentum effects which we will investigate more in depth in part 6.3.

31
As we select more and more funds with less prediction confidence, the below figures show that

the strategies’ returns diminish and look more and more similar because they are more likely to

pick up the same funds. The Naïve strategy on the other hand does not change much at some

point because of the weighting scheme based on past returns.

Figure 20 Excess Returns for a quantile of 10%

Figure 21 Excess Returns for a quantile of 20%

32
Figure 22 Excess Returns for a quantile of 30%

Figure 23 Excess Returns for a quantile of 40%

33
Figure 24 Excess Returns for a quantile of 50%

The below Table 4 to Table 7 are the results of testing the significance of the strategies’ mean

excess returns when adding more unconfident predictions in the portfolio. Again, the more

funds we choose the lower the expected excess returns, and the less the excess returns are

significantly different from zero. This goes in line with the regression results showed in 5.5,

that models’ probabilities can be a good investment signal as the excess returns from strategies

involving more highly confident predictions are more significant.

We can also observe that the Naïve momentum strategy yields much higher and significant

mean excess returns than the strategies we have implemented.

More figures are available in the appendix displaying absolute returns and P&L.

Table 3 Testing mean excess returns’ significance for q = 10%

34
Table 4 Testing mean excess returns’ significance for q = 20%

Table 5 Testing mean excess returns’ significance for q = 30%

Table 6 Testing mean excess returns’ significance for q = 40%

35
Table 7 Testing mean excess returns’ significance for q = 50%

Table 8 displays the information ratio (IR) of each strategies with respect to the market

portfolio, this ratio captures the gain of expected excess return per unit of risk, computed as

follow:

𝐸[𝑟 − 𝑟𝑚 ]
𝐼𝑅 =
√𝑣𝑎𝑟[𝑟 − 𝑟𝑚 ]

𝑟 is the return

𝑟𝑚 is the market portfolio return

It appears that the ANN’s IR drops more significantly compared to the other methods when we

choose more funds, Logistic Regression on the other hand consistently outperforms the other

algorithms and again SVM performs the worst but none of our method can beat the Naïve

momentum strategy which consistently has an IR at least above 2.

Table 8 Information ratios

36
6.2 Predictions’ accuracy

Aiming at the investigation of the predictive power of the models the prediction accuracy is

assessed. The prediction accuracies over time on each quarter are displayed on Figure 25. They

vary significantly over time with the following standard deviations:

- Logistic Regression: 7.72%

- SVM: 7.62%

- Random Forest: 8.09%

- Ensemble Classifier: 7.87%

- ANN: 8.39%

- Naïve: 10.38%

The machine learning based methods seem more reliable than the Naïve selection method which

has the highest standard deviation of accuracy.

Figure 25 Predictions' accuracy

37
The average accuracies are all similar and lower than the results we obtained on the training

data:

- Logistic Regression: 64.54 %

- SVM: 64.75%

- Random Forest: 64.68%

- Ensemble Classifier: 64.92%

- ANN: 65.30%

- Naïve: 65.76%

The nature of the training data we used is likely to be the reason why we see this difference

between the average accuracy for the back-testing period and the training data accuracies, as

the training data had some slight imbalance regarding the amount of positive and negative

instance.

Table 9 displays the mean accuracies when only considering the funds in our portfolio. They

are much higher than the overall accuracies, however the Naïve method seems less reliable than

our machine learning models. This means that the overperformance from the Naïve method

seen previously suggests that, although the Naïve method is less accurate in predicting whether

or not a fund is profitable in the next period, the selected funds yield higher returns.

Table 9 Mean accuracy of only the selected funds for different quantiles

38
6.3 Looking at the momentum effect

Throughout the results we observed in the previous parts, the findings suggest that the machine

learning based strategies could explain the same excess fund performance as Naïve momentum

selection strategies. In order to more closely examine how much of momentum, effect our

algorithms have captured, we retrained our models and back tested them after removing all the

features related to momentum effects. Therefore, we removed, the 3-, 6- and 12-months returns

as well as the consistency of returns feature, as those features were created in an effort to

measure past performance, that is momentum. Thus, the feature space for this test consists of

volatility, number of days of existence as well as the geographical and investment style features,

a total of 10 variables. Volatility remains in the feature space as it is a proxy for the past

riskiness and dispersion of fund returns. Nevertheless, Stivers and Sun (2010) find for stock

returns that dispersion is negatively related to subsequent momentum premiums, which means

for our analysis that, if their result were applicable to funds too, some momentum information

might be captured by volatility. However, for this analysis, volatility as proxy for riskiness

remains within the feature space.

Figure 26 shows the overall accuracy of our models, compared to Figure 25. Clearly the

accuracies are no better than random draws, with average accuracies ranging from 46% to 53%.

However, looking closely at the accuracy patterns of the ANN and Logistic Regression they

appear to be more correlated to each other than to the other models, possibly because they use

the same sigmoid function to infer probabilities.

39
Figure 26 Overall accuracy for models trained without momentum component

This visual conclusion is consolidated when looking at the below Table 10, which summarizes

the results of testing for a higher than 50% accuracy.

Table 10 Testing results for higher than 50% accuracy

40
Figure 27 Excess returns for models trained without momentum component

In line with the previous finding, the excess returns are mostly negative throughout the back-

testing period as Figure 27 shows above.

41
7. Conclusion

In this paper, we apply logistic regression, random forest, support vector machines, ensemble

classifier and artificial neural networks to a fund selection problem. We defined a fund selection

signal based on the probabilities given by the models, which represent the models’ prediction

confidence at classifying the next time period returns as positive. The explanatory variables we

defined include both past returns-based features, volatility, consistency of returns and past

returns, and alternative features to extract information from geographical and investment style

data as well as capturing the time of existence of the funds.

We applied the models when back testing a strategy to build a fund of funds portfolio on a

universe of 10,415 funds from the Survivor-Bias-Free US Mutual Fund database from the

Wharton Research Data Service. The models were trained at predicting the 3-months forward

returns from 04/2001 to 07/2001 using features computed prior to 04/2001, the accuracy on this

training sample was very high. They were then used during a 15 years period from 01/2002 to

12/2017 for a quarterly rebalanced strategy.

The probabilistic signal proved to be relevant at selecting funds. However, when testing the

models without momentum related features their accuracy could not statistically be rejected

from being random. Thus, it can be inferred that the original models we developed and trained,

were only able to capture momentum information within the explanatory variables and no

information came from the non-return-based features. The machine learning algorithms do not

statistically outperform a naïve momentum fund selection strategy but proved to be better at

correctly selecting funds that will yield positive returns over the next time period irrespectively

of the magnitude of the returns. The best performing algorithms were logistic regression and

the artificial neural network, possibly because they share the same method to infer prediction

confidence. On the other hand, the support vector machine performed the worst, it might be

42
because the algorithm is not originally designed to provide a probabilistic output, making it less

reliable.

The study can be extended and pushed further by improving the features engineering and

selection part. Using less momentum-capturing features, the accuracy of the models might

improve, classical financial measures such as the fund’s alpha or various financial ratios might

add additional information factoring. Factoring the funds’ fees and transaction costs into the

calculation of the performance or into the models’ features would also be an interesting addition

to the study.

Although the alternative features we chose were not able to provide additional information,

many studies on other areas suggest that information can be contained in non-financial data.

For instance, one could create a feature describing the fund manager’s experience and

qualification (Chevalier & Ellison, 1999) and have it as a time depend variable as a fund’s

manager can be subject to change.

More state-of-the-art machine learning technics, such as generative models or reinforcement

learning models, should also be looked at as they stray from classical finance models and have

found great success in other fields of applications.

43
References

Brown, S. J., & Goetzmann, W. N. (1995). Performance persistence. The Journal of finance,

50(2), 679-698.

Carhart, M. M. (1997). On persistence in mutual fund performance. The Journal of finance,

52(1), 57-82.

Chevalier, J., & Ellison, G. (1999). Are some mutual fund managers better than others? Cross‐

sectional patterns in behavior and performance. The journal of finance, 54(3), 875-899.

Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920-1930.

Fortin, R., & Michelson, S. (2002). Indexing versus active mutual fund management.

JOURNAL OF FINANCIAL PLANNING-DENVER-, 15(9), 82-95.

Gudelek, M. U., Boluk, S. A., & Ozbayoglu, A. M. (2017, November). A deep learning-based

stock trading model with 2-D CNN trend detection. In 2017 IEEE Symposium Series on

Computational Intelligence (SSCI) (pp. 1-8). IEEE.

Ho, K. Y., & Wang, W. W. (2016). Predicting stock price movements with news sentiment: An

artificial neural network approach. In Artificial Neural Network Modelling (pp. 395-403).

Springer, Cham.

44
Indro, D. C., Jiang, C. X., Patuwo, B. E., & Zhang, G. P. (1999). Predicting mutual fund

performance using artificial neural networks. Omega, 27(3), 373-380.

Khandani, A. E., Kim, A. J., & Lo, A. W. (2010). Consumer credit-risk models via machine-

learning algorithms. Journal of Banking & Finance, 34(11), 2767-2787.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep

convolutional neural networks. In Advances in neural information processing systems (pp.

1097-1105).

Lakonishok, J., Shleifer, A., Vishny, R. W., Hart, O., & Perry, G. L. (1992). The structure and

performance of the money management industry. Brookings Papers on Economic Activity.

Microeconomics, 1992, 339-391.

Ludwig, R. S., & Piovoso, M. J. (2005). A comparison of machine‐learning classifiers for

selecting money managers. Intelligent Systems in Accounting, Finance & Management:

International Journal, 13(3), 151-164.

Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to

regularized likelihood methods. Advances in large margin classifiers, 10(3), 61-74.

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine

learning algorithms. In Advances in neural information processing systems (pp. 2951-2959).

45
Stivers, C., & Sun, L. (2010). Cross-sectional return dispersion and time variation in value and

momentum premiums. Journal of Financial and Quantitative Analysis, 45(4), 987-1014.

Tarsauliya, A., Kant, S., Kala, R., Tiwari, R., & Shukla, A. (2010). Analysis of artificial neural

network for financial time series forecasting. International Journal of Computer Applications,

9(5), 16-22.

Titman, S., & Grinblatt, M. (1989). Mutual fund performance: An analysis of quarterly portfolio

holdings. Journal of Bussines, 62(3).

46

View publication stats

You might also like