You are on page 1of 7

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

Doubly Regularized Portfolio with Risk Minimization

Weiwei Shen† and Jun Wang‡ and Shiqian Ma⇤


Applied Physics and Applied Mathematics, Columbia University, New York, NY, USA


Business Analytics and Mathematical Sciences, IBM T. J. Watson Research, Yorktown Heights, NY, USA

Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
ws2215@columbia.edu, wangjun@us.ibm.com, sqma@se.cuhk.edu.hk

Abstract Typically, the mean-variance portfolios aim at maximiz-


ing expected returns for a given level of risk tolerance or
Due to recent empirical success, machine learning algo- minimizing the investment risk while achieving a target re-
rithms have drawn sufficient attention and are becoming turn. The trade-off between these two critical factors, i.e. risk
important analysis tools in financial industry. In particu-
and return, plays a central role in the Markowitz portfolio
lar, as the core engine of many financial services such as
private wealth and pension fund management, portfolio theory. Generally, portfolio allocation with an explicit mea-
management calls for the application of those novel al- sure of risk-adjusted returns rather than gross growth has
gorithms. Most of portfolio allocation strategies do not been highlighted in finance. One of the common standard
account for costs from market frictions such as transac- criteria for measuring the risk-adjusted return is called the
tion costs and capital gain taxes, as the complexity of Sharpe ratio (Sharpe 1966).
sensible cost models often causes the induced problem
intractable. In this paper, we propose a doubly regular- Meanwhile, the rapid growth of globalized markets urges
ized portfolio that provides a modest but effective so- the implementation of advanced data analysis tools in fi-
lution to the above difficulty. Specifically, as all kinds nance. Recently, machine learning algorithms have been
of trading costs primarily root in large transaction vol- identified as important technical analysis tools and have
umes, to reduce volumes we synergistically combine called intensive attention. Different from the perspectives
two penalty terms with classic risk minimization mod- of finance community, machine learning researchers rely
els to ensure: (1) only a small set of assets are selected
to invest in each period; (2) portfolios in consecutive
more on the real time data stream to design optimal port-
trading periods are similar. To assess the new portfo- folio strategies (Blum and Kalai 1999; Borodin, El-Yaniv,
lio, we apply standard evaluation criteria and conduct and Gogan 2004; Agarwal et al. 2006; Györfi, Lugosi, and
extensive experiments on well-known benchmarks and Udina 2006; Li and Hoi 2012; Li et al. 2012). For example,
market datasets. Compared with various state-of-the-art one recent paper (Li and Hoi 2012) estimates on-line mov-
portfolios, the proposed portfolio demonstrates a supe- ing average reversion from sequential market data and then
rior performance of having both higher risk-adjusted re- maximizes expected returns to achieve optimal portfolio se-
turns and dramatically decreased transaction volumes. lection rules. This approach and many other papers mainly
depend upon the effectiveness of the growth optimal port-
folio (GOP) strategy (Thorp 1971). However, the principal
Introduction drawbacks of GOP (Samuelson 1969), such as high return
Academic research in portfolio management and allocation volatility along the investment horizon and extremely long
has had a remarkable impact on many aspects of the financial time realization, inevitably appear in those data-driven GOP
services industry, from mutual fund management to asset strategies. Moreover, due to the complexity of cost mod-
pricing and insurance to corporate risk management. Most els characterizing market frictions such as transaction costs,
applications build upon the pathbreaking work of Markowitz market impacts and capital gain taxes (Cvitanic 2001), many
(Markowitz 1952), which provides optimal rules of allocat- papers either ignore costs or only subtract ad hoc costs af-
ing wealth across assets by estimating means and covari- terward. Therefore, huge rebalancing volumes, substantial
ances of asset returns in a single static period. However, changes of asset allocation and extreme high asset turnover
due to the inaccurate estimation of parameters this mean- rates can often be observed across all the investment periods.
variance framework is notorious for producing extreme port- Admittedly, incorporating a full-fledged cost model is diffi-
folio weights that fluctuate substantially over time and per- cult. It should involve the factors related to financial rules,
form poorly under out-of-sample settings (Michaud 1989). policy and specific trading activities, such as commission
rules for different asset classes and investors, taxation policy
Copyright c 2014, Association for the Advancement of Artificial for different institutions and investment periods, and implicit
Intelligence (www.aaai.org). All rights reserved. costs from the real trading implementation given a strategy.

1286
In order to overcome those limitations, in this paper, we
propose a doubly regularized portfolio allocation strategy Table 1: Summary of key notations
to bridge the gap between return chasing and cost reduc- Symbol Definition
tion. Specifically, we take advantage of the classic mean- N Number of assets
variance portfolio framework to control volatility risk and T Number of investing periods
impose two regularization terms on the portfolio structure to t Index of investment periods
reduce trading volumes thus implicitly mitigating the costs !t Portfolio weight vector
from market frictions. The first regularization term is moti- !t Re-normalized portfolio before rebalancing
vated from structured sparsity (Huang, Zhang, and Metaxas
!(j)t Portfolio weight on the j th asset at time t
2009), where we aim to determine portfolio weights that are
allocated across assets sparsely. In other words, only a small S(j)t Price of the j th asset at time t
number of assets are considered to form a low-risk portfo- R(j)t Gross return of the j th asset at time t
lio. The second regularization term underlines a consistent r!(j)t Trade of the j th asset at time t
allocation, where we ensure portfolio weights between con- ˆt
⌃ Estimated covariance matrix at time t
secutive investment periods are similar. Those two regular-
ization terms help to control trading volumes such that costs
from market frictions can be implicitly reduced. Besides, errors in estimates of means have a larger impact on portfo-
they represent typical concerns of individual investors in lio weights than those in covariances. In addition, portfolio
private wealth management. To validate the proposed strat- performances could be improved when a shrinkage estima-
egy, we utilize a battery of standard finance metrics, includ- tor or a one factor model is applied for covariance matrix es-
ing Sharpe ratios, cumulative wealth, turnovers and volatil- timation (Ledoit and Wolf 2003). The recent papers (Brodie
ity to measure the performance from various perspectives. et al. 2009; DeMiguel et al. 2009; Fan, Zhang, and Yu 2012)
Our extensive empirical studies and comparisons over sev- show superior portfolio performances when various types of
eral well-known financial benchmarks and real-world mar- norm regularities are combined in the MVP framework.
ket data clearly illustrate the superiority of the new strategy.
Methodology
Mean-Variance Portfolio
We first briefly introduce the notations and financial terms
We briefly review one of the most representative port- used in this paper. Investment time is modeled as discrete
folio frameworks, i.e., Mean-Variance Portfolio (MVP), and indexed as t = 0, ..., T , with t = 0 the initial period
which is a cornerstone of Markowitz’s modern portfolio and t = T the terminal period. The investment over a set
theory (Markowitz 1952). The fundamental assumption of of assets in the tth time period is denoted by a column vec-
MVP is that investors are risk averse, which means that they tor representing portfolio weights ! t = {!(j)t }N j=1 , where
tend to choose the portfolio with a lower risk if profits are the N is the number of assets. Since !(j)t specifies the per-
same. Accordingly, the only way to achieve a higher profit
centage of invested wealth over the j th asset, the sum of
is to take more risk. Since MVP characterizes the risk by the
all the allocated portfolio weights always equals one, i.e.
variance of asset returns and models the profits by the ex- PN
pected asset returns, investors will optimize their investment !>t 1 = j=1 !(j)t = 1. In addition, the value !(j)t > 0
by selecting mean-variance efficient portfolios. In particu- indicates that investors take a long position of the j th asset,
lar, assume at time t the covariance matrix of asset returns is and !(j)t < 0 means that they take a short sale position. A
⌃t and the expected net return is denoted by a column vector short sale position means investors borrow an asset to sell
rt . The MVP weight ! t is obtained by solving the following and then invest its liquidation on other assets. A negative
optimization problem: sign of the short sale weight indicates investors will suffer
a loss if the price of this asset starts to mount. We also de-
! ⇤t = arg min ! t > ⌃t ! t ⌫r>
t !t (1) note Rt = {R(j)t }N j=1 as the asset gross returns from t 1
!t
to t. The gross return R(j)t for the j th asset is computed
s.t. ! t > 1 = 1,
as R(j)t = S(j)t /S(j)t 1 , where S(j)t and S(j)t 1 are the
where ! t indicates the proportion of the invested wealth prices of the j th asset at time t and t 1, respectively. Ac-
across all assets, and ⌫ > 0 is a risk tolerance factor. The cordingly, the portfolio net return µt from time t to t + 1 can
above cost function that has two components, the variance be easily calculated as µt = ! > t Rt+1 1. We will use
of portfolio returns ! t > ⌃t ! t and the expected return r>
t !t , the term “return” to represent “gross return” in the sequel.
captures the aforementioned risk-return tradeoff. Table 1 summarizes the key notations in the paper.
Since the ingenious work by Markowitz, many variants
of MVP have been developed (Brandt 2010). Among them, Formulation
without considering estimating means, minimum-variance
portfolios often perform stronger than mean-variance port- We consider investors who operate portfolio management in
folios for out-of-sample tests (Jagannathan and Ma 2003). a market containing N assets with the return Rt at time t.
That is because it is more difficult to accurately estimate According to the change of the portfolio weights between
means than covariances of asset returns (Merton 1980) and consecutive rebalancing periods, the trades at time t are

1287
computed by: Algorithm 1 Doubly Regularized Portfolio (DRP)
⇥ ⇤> Input: Asset returns data points {R ⌧ +1 , · · · , RT },
r! t = ! t ! t = r!(1)t · · · r!(N )t , (2) Initialization:
where r!(j)t = !(j)t !(j)t indicates the trades of Investment period t = 0;
rebalancing the j th asset at the tth time period; ! t de- Initial portfolio weight vector ! 0 ;
notes the re-normalized portfolio weight before the rebal- for t = 1 ! T 1 do
ancing at time t. Specifically, investors buy more asset j if Estimate the covariance matrix of asset returns:
r!(j)t > 0, and sell asset j if r!(j)t < 0. After a trad- ˆt
⌃ {R ⌧ +t+1 , · · · , Rt }
ing period, the portfolio weights have changed due to the
asset price fluctuation. Thus, we need to calculate the re- Renormalize the portfolio weight vector ! t using
normalized portfolio weight ! t as Equation (3);
Compute the optimal portfolio weight vector ! t by
! t 1 Rt solving Equation (6);
!t = , (3)
!>t 1 Rt Perform rebalancing ! t ! t ;
where the symbol denotes the Hadamard product and end for
Output:
!>t 1 Rt represents the portfolio return at time t.
By the principle of risk minimization, we derive the dou- The portfolio weight vectors {! t }Tt=01 and the portfolio
bly regularized portfolio, where the objective is to choose returns {µt }Tt=01 .
a limited set of assets to invest and control the changes of
the asset positions. The changes of the positions during each
rebalancing period are directly related to transaction costs, the core component of minimum-variance and the two struc-
market impacts and taxes, while the number of selected as- ture regularizations, we can derive the doubly regularized
sets is related to management costs. In particular, the reg- portfolio (DRP) with risk minimization.
ularization term on the position changes is denoted by the Note that the above doubly regularized quadratic loss
`2 -norm as in Equation (6) can be linked to existing learning frame-
N h i2 works in both machine learning and statistics. For example,
X
||! t ! t k22 = !(j)t !(j)t . (4) if 1 + 2 = 1, the doubly regularized form becomes the
j=1
so called elastic-net penalty, a convex combination of the
lasso and ridge penalty (Zou and Hastie 2005). A similar
The sparse selection of assets is realized through imposing regularization framework has also been combined with the
the `1 regularization term as hinge loss to derive the doubly regularized support vector
N machine (Wang, Zhu, and Zou 2006).
X
||! t ||1 = |!(j)t |. (5)
Optimization and Analysis
j=1
In the above formulation of DRP, we need to estimate the co-
Then the doubly regularized portfolio allocation is formed variance matrix of asset returns ⌃ˆ t . Instead of using a sam-
as ple covariance matrix, we apply a factor model (Fan, Fan,
min ! > ˆ ! t ||22 (6) and Lv 2008), which has been shown effective for estimat-
t ⌃t ! t + 1 ||! t ||1 + 2 ||! t
!t ing high-dimensional covariance matrices. Besides, differ-
s.t. !> ent from the focus of growth optimal portfolio research (Li
t 1 = 1.
and Hoi 2012), we do not rely on any prediction of portfo-
The core component in the above objective function is the lio returns in the procedure of designing portfolio allocation
minimization of the portfolio variance ! > ˆ ˆ
t ⌃t ! t , where ⌃t 2 rules. It is well-known that the prediction of future returns
R N ⇥N
is the estimated covariance matrix of asset returns is extremely challenging (Merton 1980; Rapach and Zhou
from historical data at time t. Besides the unit constraint of 2012). In addition, we notice that a proposed gross-exposure
the sum of the portfolio weights, we impose two additional constrained (GEC) MVP also achieves a sparse selection of
regularization terms weighted by two coefficients 1 and 2 . assets (Fan, Zhang, and Yu 2012). However, since the con-
The first regularization with an `1 -norm enforces structured sistency of consecutive portfolios has not been considered,
sparsity to portfolio weights (Huang, Zhang, and Metaxas there is no guarantee that the GEC portfolio can reduce as-
2009) so that ! t will only have a few non-zero elements, set trading volumes and the corresponding costs from market
which indicates only a small set of assets are selected. In ad- frictions. Hence, the uniqueness of the proposed DRP port-
dition, the sparsity regularization term prohibits any extreme folio lies in the advantages of minimizing investment risks
holding of long or short-sale positions of a particular asset. and meanwhile reducing trading costs. We discuss the opti-
The second regularization with an `2 -norm ||! t ! t ||22 im- mization strategy of the DRP portfolio below.
poses the consistency of portfolio weights before and after Without considering the `1 -norm regularization term in
each rebalance. Hence, the `2 regularization term implicitly the objective function, the optimization problem defined in
reduces turnover rates and trading costs by moderately con- Equation (6) is a standard convex quadratic problem (QP).
straining the changes of asset positions. In summary, with Sparsity-inducing norms have often been adapted in vari-

1288
Table 2: Summary of the tested datasets Table 3: Portfolio Sharpe ratios (%) with p-values
Dataset Frequency Time Period # of data N Method FF25 FF48 FF100 ETF INDEX EQUITY
FF25 Monthly 07/01/1963 - 12/31/2004 498 25 EW 26.58 24.30 26.97 6.01 5.98 9.03
FF48 Monthly 07/01/1963 - 12/31/2004 498 48 (1.00) (1.00) (1.00) (1.00) (1.00) (1.00)
FF100 Monthly 07/01/1963 - 12/31/2004 498 100 VW 28.53 23.37 29.76 5.85 5.74 8.95
ETF Weekly 01/01/2008 - 10/30/2012 252 139 (0.01) (0.22) (0.00) (0.22) (0.60) (0.86)
INDEX Weekly 02/15/2008 - 10/30/2012 246 26 OLMAR 21.50 24.48 23.60 7.61 15.80 12.15
EQUITY Weekly 01/01/2008 - 10/30/2012 252 181 (0.01) (0.93) (0.22) (0.44) (0.06) (0.27)
FMMV 39.71 27.34 45.94 11.51 22.54 13.86
(0.02) (0.60) (0.00) (0.16) (0.14) (0.07)
GEC 39.85 30.34 46.44 17.87 22.69 13.84
ous machine learning problems if the given a priori knowl- (0.01) (0.13) (0.00) (0.12) (0.15) (0.07)
edge supports such assumptions (Bach et al. 2011), similar DRP 41.67 30.37 48.27 19.64 23.33 13.88
to our formulation for the sparse portfolio allocation. With- (0.00) (0.13) (0.00) (0.15) (0.12) (0.05)
out considering the constraint ! > t 1 = 1, problem (6) can
be viewed as an `1 -norm regularized least squares problem,
which is known as Lasso (Tibshirani 1996) in statistics and a Experiment
sparse recovery problem in compressed sensing (Candès and In this section, we will report the results of our empirical
Tao 2006). Many efficient algorithms have been proposed studies using several real-world benchmarks.
for solving lasso or `1 sparse recovery problems. To name
just a few, these methods include FISTA (Beck and Teboulle Data
2009), FPC (Hale, Yin, and Zhang 2008), TFOCS (Becker, In the experiments, two types of datasets are chosen for
Candès, and Grant 2011), etc. When both the `1 -norm and performance validation and comparison. The first type of
the constraint ! >t 1 = 1 are present, the problem is much datasets is from the well-known academic benchmarks
harder than QP and Lasso. Although it is possible to imple- called Fama and French datasets (Fama and French 1992).
ment an efficient first-order algorithm based on the projected Briefly speaking, based on the data sampled from the U.S.
proximal gradient method (Boyd and Vandenberghe 2004) stock market, the benchmarks were formed for different fi-
or the alternating direction method of multipliers (Boyd et nancial segments. For example, FF48 contains monthly re-
al. 2011), we use a commercial software toolbox TOM- turns of 48 portfolios representing different industrial sec-
LAB/SNOPT developed by the Stanford Systems Optimiza- tors, and FF100 includes 100 portfolios formed on the
tion Laboratory (SOL) for simplicity. basis of size and book-to-market ratio. Fama and French
datasets have been recognized as standard testbeds and heav-
ily adopted in finance research because of its extensive cov-
Sequential Portfolio Allocation erage to asset classes and very long historical data series.
The second type of datasets represents three popular finan-
In order to perform the process of sequential portfolio allo- cial asset classes, exchange-traded funds (ETF), major world
cation, we first need to choose a time window ⌧ over which stock market indices (INDEX), and individual equities sam-
to perform the estimation of ⌃ ˆ t . To avoid overfitting, we split pled from the large-cap segment of the Russell 200 index
the entire dataset into two parts. The first ⌧ data points, de- (EQUITY). The data of the prices are all crawled from Ya-
noted as {R ⌧ +1 , · · · , R0 }, are used as the training data to hoo! Finance in a weekly base from 2008 to 2012. They rep-
estimate the initial covariance matrix. The sequential port- resent the commonly chosen investment opportunity sets by
folio allocation starts from t = 0 and lasts a total number investors. For example, ETFs that have the advantage over
of T time periods. We apply the “rolling-horizon” proce- conventional mutual funds of low costs, tax efficiency, and
dure (DeMiguel et al. 2009) to sequentially perform port- stock-like features have been highlighted in both the indus-
folio allocation and evaluate the performance. Specifically, try and academia. The 26 major stock market indices cover
we shift the estimation window along the time axis by in- all the mature world wide financial markets. The selected
cluding the data point of the next period and dropping the stocks pool of the large-cap segment of the U.S. stock mar-
data point of the earliest period. This “rolling-horizon” pro- ket measure the performance of the 200 largest U.S. compa-
cedure is repeated until the end of the dataset. At the end nies and cover 63% of total market capitalization.1
of this process, we will have T portfolio vectors derived for Table 2 summarizes those benchmark datasets used in
our experiments. Note that these two types of datasets pro-
each strategy, denoted as {! t }Tt=01 . The portfolio allocation
vide different prospectives for performance evaluation. The
at the last point t = T is excluded due to the lack of future
Fama and French datasets essentially emphasize the long-
data for performance evaluation. For calculating the initial
term performance of the proposed strategy, as results of such
portfolio ! 0 , we can either choose a simple heuristics such
long historical datasets introduce limited selection bias and
as an equally-weighted portfolio, or derive the optimal solu-
tion of the problem defined in Equation (6) without the con- 1
After removing those with incomplete historical data, for
sistency regularization. The algorithm chart 1 summarizes which the assets did not exist at the beginning of the chosen trading
the sequential procedure of the proposed DRP strategy. periods, finally we have 181 stocks in the EQUITY dataset.

1289
Table 4: Cumulative wealth Table 5: Portfolio turnovers (%)
Method FF25 FF48 FF100 ETF INDEX EQUITY Method FF25 FF48 FF100 ETF INDEX EQUITY
EW 98.06 54.77 123.92 1.20 1.12 1.30 OLMAR 36.88 26.22 32.23 17.22 14.96 15.55
VW 137.05 48.06 198.32 1.19 1.22 1.29 FMMV 43.11 27.39 55.23 99.72 7.60 10.99
OLMAR 33.56 42.34 57.74 1.21 1.24 1.34 GEC 36.80 6.05 46.49 13.23 7.56 10.77
FMMV 246.35 46.78 523.47 1.05 1.30 1.38 DRP 12.75 4.23 20.42 4.78 5.01 9.62
GEC 225.37 53.64 496.01 1.22 1.30 1.38
DRP 266.32 54.74 646.91 1.27 1.31 1.38
where µ̂ is the mean of portfolio net returns and ˆ represents
the standard deviation of portfolio net returns.
are difficult to be manipulated. The experiments on the real The cumulative wealth of a portfolio measures the total
world short-term datasets underline the robustness of the profit yield from the portfolio strategy across investment pe-
proposed portfolio with respect to the high volatility envi- riods without considering any risks and costs. By starting
ronment after the recent financial crisis. with one dollar, the cumulative wealth (CW) is computed by
TY1
Experimental Settings
CW = !>
t Rt+1 . (8)
We set the length of the estimation window as ⌧ = 120 t=0
(DeMiguel et al. 2009), which means that the previous 120
data points are used to make the current decisions of portfo- The portfolio turnover indicates the frequency of trading
lio allocation. For the parameters in DRP such as 1 and activities and the volumes of rebalancing. It has been recog-
2 , cross validation is applied to determine the optimal
nized as an important performance metric for portfolio man-
values. For the comparison study, we consider the follow- agement, since a high turnover inevitably generates high ex-
ing methods: a) equally-weighted portfolio (EW); b) value- plicit and implicit trading costs thus degenerating the real
weighted portfolio (VW); c) factor model based minimum- return. The portfolio turnover rate is calculated as an aver-
variance portfolio (FMMV) (Fan, Fan, and Lv 2008); d) age absolute value of the rebalancing trades across all the
gross-exposure constrained (GEC) portfolio (Fan, Zhang, assets and over all the trading periods:
and Yu 2012); e) on-line moving average reversion (OL- T
X1
MAR) based portfolio (Li and Hoi 2012); f) the proposed 1
Turnover = ||! t ! t ||1 , (9)
doubly regularized portfolio (DRP). The first four portfo- T 1 t=1
lios typify those in finance. For example, the EW port-
folio is a naive approach yet has been empirically shown where T 1 indicates the total number of the rebalancing
to mostly outperform 14 models across seven empirical periods and ! t is the re-normalized portfolio weight vector
datasets (DeMiguel, Garlappi, and Uppal 2009). VW mim- before rebalance.
ics a market portfolio. The FMMV portfolio adopts a fac- Finally, the volatility represents a quantitative measure-
tor model to estimate high-dimensional covariance matrices ment of investment risk, which is computed as the standard
and has shown a large performance improvement over MVP. deviation of returns. To achieve a fair comparison of the
The GEC portfolio shows better results than MVP by simply portfolios with the different rebalancing
p periods, we com-
imposing a sparsity constraint. The OLMAR portfolio that pute the annualized volatility by M ˆ , where M is the
is based on GOP represents a more data-driven approach frequency of the rebalance each year, where ˆ is computed
developed by machine learning researchers and has been as (7). For the monthly and weekly rebalancing frequency,
shown robust and outperforms 12 different portfolio strate- M = 12 and M = 52, respectively.
gies across five datasets. For all the compared approaches, To measure the statistical significance of the difference
we follow the recommended parameter settings in the corre- between the volatility and the Sharpe ratio for two compar-
sponding studies. ing portfolios, we report the p-values under the correspond-
ing results in Tables 3 and 6. To compute the p-values for the
Performance Metrics case of no i.i.d. returns, we adopt the studentized circular
We compare the out-of-sample performance of the portfo- block bootstrapping methodology (Ledoit and Wolf 2011).
lios using four standard criteria in finance (Brandt 2010): We test the statistical significance by using the EW portfolio
(i) Sharpe ratios; (ii) cumulative wealth; iii) turnovers; (iv) as the benchmark, 1000 bootstrap resamples, a 95% signifi-
volatility. These four evaluation metrics represent different cance level, and a block size equal to 5.
focuses on measuring portfolio performance. The Sharpe
ratio (SR) measures the reward-to-risk ratio of a portfolio
Results
strategy, which is computed as the portfolio return normal- Tables 3, 4, 5 and 6 summarize portfolio performance eval-
ized by its standard deviation: uated by the Sharpe ratios, cumulative wealth, turnovers
v and volatility for all the tested benchmarks, respectively.
T 1 u T 1 Among the comparisons of various methods, the values in
µ̂ 1 X u1 X
SR = , µ̂ = µt , ˆ = t (µt µ̂)2 . (7) bold denote the winners’ performance. The proposed DRP
ˆ T t=0 T t=0 strategy achieves the best performance in most of the cases.

1290
300 60
EW
VW 600 EW
EW 50

Cumulative Wealth
VW OLMAR VW

Cumulative Wealth
Cumulative Wealth

OLMAR FMMV OLMAR


200 40 GEC FMMV
FMMV
GEC DRP 400 GEC
DRP 30 DRP

100 20
200

10

0 0 0
12/1974 12/1984 12/1994 12/2004 12/1974 12/1984 12/1994 12/2004 12/1974 12/1984 12/1994 12/2004
Investment Period Investment Period Investment Period

(a) (b) (c)


1.4 1.35
EW EW
EW 1.4 VW
VW
VW

Cumulative Wealth
OLMAR OLMAR

Cumulative Wealth
Cumulative Wealth

OLMAR 1.25 FMMV


FMMV
FMMV GEC
1.2 GEC
GEC DRP
DRP 1.2
DRP
1.15

1
1
1.05

0.8 0.95 0.8


04/2010 10/2010 04/2011 10/2011 04/2012 10/2012 10/2010 04/2011 10/2011 04/2012 10/2012 04/2010 10/2010 04/2011 10/2011 04/2012 10/2012
Investment Period Investment Period Investment Period
(d) (e) (f)
Figure 1: The curves of cumulative wealth across the investment periods for different portfolios on a) FF25, b) FF48, c) FF100,
d) ETF, e) INDEX, and f) EQUITY datasets.

In Table 3, where we report the annualized Sharpe ratios


in percentage, DRP generates consistent higher Sharpe ra- Table 6: Portfolio volatilities (%) with p-values
tios than the rest, which indicates that it achieves higher Method FF25 FF48 FF100 ETF INDEX EQUITY
risk-adjusted returns. The GEC portfolio archives approx- EW 17.66 16.97 18.33 21.98 13.32 18.34
imately as high Sharpe ratios, as its strategy embodies a (1.00) (1.00) (1.00) (1.00) (1.00) (1.00)
similar idea of risk minimization and sparse portfolio selec- VW 17.32 16.76 18.10 21.79 13.04 18.00
tion. Even for the portfolio returns measured by cumulative (0.23) (0.16) (0.05) (0.02) (0.00) (0.00)
OLMAR 16.97 15.49 17.66 16.12 7.21 14.42
wealth, DRP is among of the top performers in most of the (0.32) (0.01) (0.14) (0.00) (0.00) (0.00)
cases, as shown in Table 4. In addition, Table 5 illustrates FMMV 13.42 13.86 13.42 2.51 6.80 13.62
that DRP has dramatically lower turnover rates, reduced by (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
10.7% ⇠ 66.2% compared with GEC. The reduction of the GEC 13.42 13.05 13.1 6.29 6.79 13.61
turnover rates is attributed to the unique consistency regular- (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ization in the DRP formulation. Accordingly, DRP not only DRP 12.96 12.82 12.77 6.92 6.85 13.61
improves risk-adjusted returns but also ensures much less (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
trading costs. VW has a zero turnover rate by definition and
EW incurs the equal portfolio allocation thus having negligi-
ble trading volumes. Hence we exclude these two heuristic Conclusion
portfolios and only report the results from the other port-
folios in Table 5. Finally, Table 6 unsurprisingly illustrates We have presented a novel doubly regularized portfolio
that the DRP strategy has the lowest volatility risk in most of strategy through leveraging structure regularization, i.e.
the cases. Although FMMV is designed to create a minimal sparsity and consistency, to the conventional minimum-
variance portfolio, without having constraints or regularities variance portfolio framework. In particular, the sparsity reg-
on the portfolio structure its out-of-sample variance cannot ularization enforces to choose a small set of assets with a
be ensured minimal. minimum risk and the consistency regularization produces
similar asset positions in consecutive investment periods.
Extensive experiments and comparisons on both long-term
and short-term financial benchmarks and real-world market
Furthermore, in order to compare the trend and dynamics datasets are conducted. The experimental results show that,
of the return growth, Figure 1 illustrates the curves of the cu- by virtue of the synergistic combination of the classic risk
mulative wealth over the investment periods for the different minimization framework and the novel structure regulariza-
datasets. Apparently, DRP outperforms the others with the tion, the proposed portfolio constantly achieves higher risk-
visible margins on four of the tested datasets in most of the adjusted returns with much lower implied costs measured by
time periods, as shown in Figures 1(a), 1(c), 1(d) and 1(f). turnover rates. Our future work includes appropriately incor-
Those figures show DRP grows more steadily together with porating market friction models into the studied framework,
a reduced volatility across most of the investment periods. such as capital gain taxes and market price impacts.

1291
References Fan, J.; Zhang, J.; and Yu, K. 2012. Asset allocation and risk
Agarwal, A.; Hazan, E.; Kale, S.; and Schapire, R. E. 2006. assessment with gross exposure constraints for vast portfo-
Algorithms for portfolio management based on the Newton lios. Journal of American Statistical Association 412–428.
method. In Proceedings of the 23th international conference Györfi, L.; Lugosi, G.; and Udina, F. 2006. Nonparametric
on machine learning, 9–16. kernal-based sequential investment strategies. Mathematical
Bach, F.; Jenatton, R.; Mairal, J.; and Obozinski, G. 2011. Finance 16:337–357.
Convex optimization with sparsity-inducing norms. Opti- Hale, E. T.; Yin, W.; and Zhang, Y. 2008. Fixed-point
mization for Machine Learning 19–53. continuation for `1 -minimization: Methodology and conver-
Beck, A., and Teboulle, M. 2009. A fast iterative shrinkage- gence. SIAM Journal on Optimization 19(3):1107–1130.
thresholding algorithm for linear inverse problems. SIAM Huang, J.; Zhang, T.; and Metaxas, D. 2009. Learning with
Journal Imaging Sciences 2(1):183–202. structured sparsity. In Proceedings of the 26th Annual Inter-
Becker, S.; Candès, E. J.; and Grant, M. 2011. Templates for national Conference on Machine Learning, 417–424.
convex cone problems with applications to sparse signal re- Jagannathan, R., and Ma, T. 2003. Risk reduction in large
covery. Mathematical Programming Computation 3(3):165– portfolios: Why imposing the wrong constraints helps. Jour-
218. nal of Finance 58:1651–1684.
Blum, A., and Kalai, A. 1999. Universal portfolios with Ledoit, O., and Wolf, M. 2003. Improved estimation of
and without transaction costs. Machine Learning 35(3):193– the covariance matrix of stock returns with an application to
205. portfolio selection. Journal of Empirical Finance 10:603–
Borodin, A.; El-Yaniv, R.; and Gogan, V. 2004. Can we 621.
learn to beat the best stock? Journal of Artificial Intelligence Ledoit, O., and Wolf, M. 2011. Robust performance hypoth-
21:579–594. esis testing with the variance. Wilmott Magazine 86–89.
Boyd, S., and Vandenberghe, L. 2004. Convex optimization. Li, B., and Hoi, S. C. 2012. On-line portfolio selection
Cambridge University Press. with moving average reversion. In Proceedings of the 29th
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; and Eckstein, J. international conference on machine learning.
2011. Distributed optimization and statistical learning via Li, B.; Hoi, P.; Hoi, S. C.; and Gopalkrishnan, V. 2012.
the alternating direction method of multipliers. Foundations PAMR:passive aggressive mean reversion strategy for port-
and Trends R in Machine Learning 3(1):1–122. folio selection. Machine Learning 87:221–258.
Brandt, M. W. 2010. Portfolio choice problems. In Ait- Markowitz, H. 1952. Portfolio selection. Journal of Finance
Sahalia, Y., and Hansen, L. P., eds., Handbooks of Financial 7:77–91.
Econometrics. Elsevier. 269–336. Merton, R. C. 1980. On estimating the expected return on
Brodie, J.; Daubechies, I.; Mol, C. D.; Giannoned, D.; and the market: An exploratory investigation. Journal of Finan-
Loris, I. 2009. Sparse and stable Markowitz portfolios. cial Economics 8:323–361.
In Proceedings of the National Academy of Scineces of the Michaud, R. O. 1989. The Markowitz optimization enigma:
United States of America, volume 106, 12267–12272. Is ‘optimized’ optimal? Financial Analysts Journal 31–42.
Candès, E. J., and Tao, T. 2006. Near optimal signal re- Rapach, D. E., and Zhou, G. 2012. Forecasting stock re-
covery from random projections: Universal encoding strate- turns. In Elliott, G., and Timmermann, A., eds., Handbook
gies. IEEE Transactions on Information Theory 52(1):5406– of Forecasting II. North-Holland.
5425. Samuelson, P. A. 1969. Lifetime portfolio selection by dy-
Cvitanic, J. 2001. Theory of portfolio optimization in mar- namic stochastic programming. Review of Economics and
kets with frictions. In Jouini, E.; Cvitanic, J.; and Musiela, Statistics 51:239–246.
M., eds., Handbooks in Mathematical Finance. Cambridge Sharpe, W. F. 1966. Mutual fund performance. Journal of
University Press. Business 39:119–138.
DeMiguel, V.; Garlappi, L.; Nogales, F. J.; and Uppal, R. Thorp, E. O. 1971. Portfolio choice and the Kelly criterion.
2009. A generalized approach to portfolio optimization: Im- Business and Economics Statistics Section of Proceedings of
proving performance by constraining portfolio norms. Man- the American Statistical Association 215–224.
agement Science 55:798–812.
Tibshirani, R. 1996. Regression shrinkage and selection via
DeMiguel, V.; Garlappi, L.; and Uppal, R. 2009. Opti- the lasso. Journal of the Royal Statistical Society. Series B
mal versus naive diversification: How inefficient is the 1/N (Methodological) 58(1):267–288.
portfolio strategy? The Review of Financial Study 22:1915–
Wang, L.; Zhu, J.; and Zou, H. 2006. The doubly regularized
1953.
support vector machine. Statistica Sinica 16(2):589–615.
Fama, E. F., and French, K. R. 1992. The cross-section of
Zou, H., and Hastie, T. 2005. Regularization and variable
expected stock returns. Journal of Finance 47(2):427–465.
selection via the elastic net. Journal of the Royal Statistical
Fan, J.; Fan, Y.; and Lv, J. 2008. High dimensional co- Society: Series B (Statistical Methodology) 67(2):301–320.
variance matrix estimation using a factor model. Journal of
Econometrics 147:186–197.

1292

You might also like