You are on page 1of 83

Cross Validation Based Transfer Learning for

Financial Covariance Estimation:


A Data-Driven Approach
Torsten Mörstedt
Deka Investment GmbH*, Mainzer Landstr. 16, 60325 Frankfurt, Germany, torsten.moerstedt@deka.de.

Bernhard Lutz, Dirk Neumann


University of Freiburg, Rempartstr. 16, 79085 Freiburg, Germany, bernhard.lutz@is.uni-freiburg.de,
dirk.neumann@is.uni-freiburg.de

Existing studies on covariance estimation generally assume that the future covariance matrix must be

estimated based only on the limited history of a given set of portfolio constituents. In this study, we propose

a new perspective on how to estimate the covariance matrix. We present a purely data-driven approach

that selects the estimation parameters using cross validation to be historically optimal on a disjoint transfer

set of assets according to the given objective. The proposed approach additionally uses a second shrinkage

target that is determined based on how much the sample eigenvalues are imbalanced according to their Gini

coefficient. Our empirical evaluation based on a total of six stock market indices shows that the proposed

approach outperforms established estimators in minimizing variance and maximizing risk-adjusted return.

The second shrinkage target is particularly relevant for high-dimensional covariance matrices where the

number of assets is greater than the number of historic datapoints. To the best of our knowledge, this study

is the first to apply the concept of transfer learning to the problem of covariance estimation.

Key words : Covariance estimation, transfer learning, non-linear shrinkage, second shrinkage target,

data-driven approach, portfolio optimization

1. Introduction
Covariance estimation is a well studied problem in the finance and operations research literature. An

accurate estimate of the future covariance is required for standard portfolio optimization procedures

* Views expressed in this paper are those of the author and do not necessarily reflect those of Deka Investment or its

employees.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
2

like minimizing variance and maximizing risk-adjusted return (Markowitz 1952). The task of the

problem is to estimate the future covariance of a given portfolio of N assets based on the history of

T datapoints. However, the sample covariance is generally a poor estimate of the future covariance,

which leads to deteriorated out-of-sample performance (Frost and Savarino 1986, Jorion 1986,

Michaud 1989, Olivares-Nadal and DeMiguel 2018). As a remedy, researchers have proposed several

approaches to improve the sample covariance estimate. The proposed methods can be grouped into

(i) explicit (non-) linear shrinkage (e.g., Ledoit and Wolf 2003, 2017b, Engle et al. 2019), (ii) implicit

shrinkage by combining sample and reference weights (e.g., Frahm and Memmel 2010, Bodnar

et al. 2018), (iii) thresholding (e.g., Bickel and Levina 2008, Fan et al. 2013), and signal/noise

separation (e.g., Laloux et al. 1999, Plerou et al. 2002, Zhao et al. 2019), (iv) factor models (e.g.,

Fan et al. 2011, 2018, De Nard et al. 2019), or (v) regularization of the inverse covariance matrix

(e.g., Friedman et al. 2008, Lam and Fan 2009, Cai et al. 2011, Nguyen et al. 2021). Other studies

have proposed estimation error reduction through constrained portfolio optimization in contrast to

covariance adjustments (e.g., Jagannathan and Ma 2003, DeMiguel et al. 2009a, Fan et al. 2012).

Extensive literature reviews and empirical studies of covariance estimation can be found in Fan

et al. (2016), Ledoit and Wolf (2017a) and Ledoit and Wolf (2021a).

However, all of the aforementioned studies considered the problem of covariance estimation from

a rather restrictive perspective, where the only available real-world data for parameter selection is

given by the history of the actual portfolio constituents. To determine the estimation parameters

such as shrinkage intensities, prior studies applied loss functions with statistically derived priors for

an unknown oracle covariance, e.g., by using random matrix theory (most recently Ledoit and Wolf

2021b). In a similar fashion, using sample data only, others calculated in-sample portfolio volatility

to directly determine the shrinkage intensity (e.g., Frahm and Memmel 2010, Bodnar et al. 2018)

or to identify signal/noise separation (Zhao et al. 2019).

1.1. Our Approach

In this study, we propose an entirely different perspective on financial covariance estimation.

Following prior studies, we also assume that the future covariance must be estimated for N assets

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
3

based on only T datapoints. However, we assume that we are additionally provided with other

stock market time-series data from a disjoint set of assets to select the estimation parameters.

From a real-world perspective, the history of joint datapoints may be reduced due to company

merges and acquisitions, splits, or one of the portfolio constituents being recently listed at the stock

market. In this case, the future covariance matrix must be estimated based on the comparably short

history of the portfolio constituents. We then argue that financial investors can rely on additional

time-series data from a disjoint set of assets that are not part of the portfolio. More precisely, we

argue that we can select the estimation parameters using cross validation based on the history of

other assets. A major advantage of applying cross validation for parameter selection is that we can

apply this method to estimate the future covariance according to different optimization objectives,

such as minimum variance or maximum risk-adjusted return. The historically optimal parameters

are subsequently used to estimate the actual covariance based on the given assets and datapoints.

The concept of “transfer learning” is well known in the context of machine learning (e.g., Bastani

2021, Pan and Yang 2009) or the pricing literature (e.g., Bastani et al. 2021). Yet, we are not aware

of any approach that applies the same idea to the problem of parameter selection in covariance

estimation like shrinkage intensities and shrinkage target.

We present a novel covariance estimator based on non-linear shrinkage, which we call “Cross

Validation based Transfer Learning” (CVTL). The approach can be attributed to the group of

rotation equivariant non-linear shrinkage estimators. This means that the sample eigenvectors are

left unchanged, while the sample eigenvalues are shrunk, in our case, against the mean eigenvalue

and a second non-linear shrinkage target. The second shrinkage target is determined by a two-step

process. First, we scale the sample eigenvalue imbalance measured by their Gini coefficient to a

desired quantity. Second, we search for a Beta distribution, for which the corresponding cumulative

distribution function achieves the desired Gini coefficient. All parameters are selected by cross

validation to be historically optimal on a disjoint history with respect to the given objective.

We recommend the disjoint history to contain at least one additional year of trading data. This

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
4

allows for performing at least twelve train-evaluate iterations during cross validation with monthly

rebalancing to achieve a stable parameter configuration. The objective can be chosen freely and it

can be completely tailored towards the overarching portfolio optimization goals, e.g., minimizing

variance or maximizing risk-adjusted return. Therefore, our approach is purely data-driven and

agnostic with respect to the desired eigenvalue distribution and optimization objective. In case

of maximizing risk-adjusted return, our approach operates also on the covariance estimate only,

without having to estimate future returns. Instead, the weights are always calculated according to

the Markowitz global minimum variance portfolio (Markowitz 1952) but the eigenvalues are set to

be historically optimal under the desired objective function on the disjoint dataset.

Our numerical evaluation based on several global stock indices as well as aggregated industry

portfolios shows that CVTL is superior to established covariance estimators in minimizing variance

and maximizing risk-adjusted return. The latter holds in particular when additionally accounting

for transaction costs. The second shrinkage target is assigned a non-zero shrinkage intensity when

estimating high-dimensional covariance matrices where the number of assets is greater than the

number of datapoints. However, the intensity of the second shrinkage target decreases if the number

of datapoints increases relatively to the number of assets. Besides, we empirically compare the

resulting eigenspectra of CVTL with the analytically derived spectra from the non-linear shrinkage

approaches QuEST and BN. The results indicate a similar eigenspectrum, while CVTL tends to

increase instead of decrease the largest eigenvalue for high-dimensional covariance matrices.

1.2. Contributions

We provide a novel perspective on covariance estimation by proposing a purely data-driven estimator

CVTL that combines cross validation and transfer learning. We show that transfer learning allows us

to replace the traditional loss functions based on theoretically derived priors of the oracle covariance

with a flexible objective function tailored to the overarching optimization problem. In contrast

to existing approaches, CVTL is agnostic towards the desired eigenspectrum as all estimation

parameters are selected based on a disjoint dataset. We further outline a novel non-linear shrinkage

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
5

target based on the cumulative distribution function of a Beta distribution to achieve a desired

Gini coefficient. The second shrinkage target is particularly useful for high-dimensional estimation

problems with N > T . In addition, we demonstrate the flexibility of CVTL by setting the objective

function towards maximizing risk-adjusted return. However, we do not explicitly estimate future

returns as we calculate the portfolio weights solely based on the formula of the GMV portfolio,

which also presents a novel approach.

1.3. Outline

The remainder of this paper is structured as follows. Section 2 introduces the preliminaries on

financial covariance estimation. In addition, it presents a brief explanation of existing shrinkage

methods. Section 3 presents our covariance estimator that applies cross validation based transfer

learning. In particular, it presents a detailed decription of how to select the shrinkage parameters.

Section 4 describes our empirical evaluation including the competing approaches and the datasets

used for parameter estimation and out-of-sample evaluation. Section 5 presents the results. We

first present the main results for the objectives minimum variance and maximum risk-adjusted

return. Second, we present several sensitivity analyses, where we show how varying the number of

cross validations and the parameter search space influences out-of-sample performance. Third, we

show how our approach is linked to established covariance estimators from an empirical perspective.

Finally, Section 6 concludes and provides an outlook on future research.

2. Preliminaries
Portfolio optimization aims at finding the optimal allocation of weights w = (w1 , . . . , wN ) for a

portfolio consisting of N assets. The global minimum variance (GMV) portfolio allocation minimizes

the portfolio risk (Markowitz 1952). Let Σ ∈ RN ×N denote the future covariance matrix and 1 the

vector consisting of N ones. The weights of the GMV portfolio are given by

Σ−1 1
w= . (1)
10 Σ−1 1

However, the future covariance matrix Σ is an unknown parameter that needs to be estimated

based on historic data. Let X T,N = x1 , . . . , xT with xt ∈ RN denote the sequence of historic returns

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
6

1
PT
used for covariance estimation. The average return vector is given as x̄ = T t=1 xt . The sample

b ∈ RN ×N can then be calculated as


covariance matrix estimate Σ
T
1 X
Σ
b= (xt − x̄)(xt − x̄)0 . (2)
T − 1 t=1

The dimension D = N/T specifies the fraction between the number of assets and datapoints used for

covariance estimation. If D < 1 (more datapoints than assets), the estimation problem is considered

low-dimensional. Here, Σ
b is symmetric and positive semi-definite by construction. Conversely, if

D > 1, the problem is considered high-dimensional, which corresponds to an estimated sample

covariance that is rank deficient and thus not invertible.

2.1. Eigendecomposition

Eigendecomposition can be used to write the sample covariance matrix as the product of an

orthonormal matrix Λ ∈ RN ×N , where column i contains the eigenvector associated with eigenvalue

λi , and a diagonal matrix diag(λ1 , . . . , λN ) containing the eigenvalues in descending order λi ≥ λi+1

b = Λ diag(λ1 , . . . , λN ) Λ0 .
Σ (3)

The eigenvalues of the covariance matrix reflect the risk contribution of the portfolio constituents

(Roncalli and Weisang 2016). If eigenvalue λi is greater than λj , then the eigenvector Λi captures

more variance than the eigenvector Λj . The GMV weights (1) are calculated based on the inverse

covariance matrix Σ−1 . The inverse of (3) is given as


 
b −1 = Λ diag 1 1
Σ ,..., Λ0 . (4)
λ1 λN

Given the sample covariance estimate Σ,


b the GMV portfolio weights can be expressed as a

function of the eigenvalues


 
Λ diag 1
λ1
Λ0 1
, . . . , λ1N
w(λ1 , . . . , λN ) =   . (5)
10 Λ diag λ11 , . . . , λ1N Λ0 1

For instance, it can easily be shown that setting all eigenvalues to the same value c ∈ R>0 results in

the 1/N portfolio.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
7

2.2. Shrinkage Methods

Shrinkage estimators present a common approach to improve the sample covariance estimate (e.g.,

Ledoit and Wolf 2003, 2004a, Frahm and Memmel 2010, Bodnar et al. 2018). Shrinkage approaches

can be distinguished between linear and non-linear shrinkage estimators.

Linear Shrinkage The idea of linear shrinkage (LS) is to combine the sample covariance matrix

b with a reference covariance matrix R by applying the shrinkage intensity δ ∗


Σ

b LS = (1 − δ ∗ ) Σ
Σ b + δ ∗ R. (6)

A frequent choice for R is the covariance matrix implied by the 1/N portfolio, which is given

by a diagonal matrix that contains the mean sample eigenvalue (Ledoit and Wolf 2004b, Frahm

and Memmel 2010). Such approaches are usually seen an application of “Stein-type” covariance

estimators (see a discussion in Ledoit and Wolf 2021a). The fundamental idea of “Stein-type”

estimators was originally derived to reduce mean squared error of multivariate mean estimation

by shrinking the estimate to a target vector (Stein 1956, James and Stein 1961). The shrinkage

intensity δ ∗ specifies how close the shrunk covariance matrix will be to the reference matrix R.

Setting δ ∗ to 0 yields the sample covariance, while setting δ ∗ to 1 yields the reference matrix R.

In the related literature, δ ∗ is often expressed as a function of D (e.g., Frahm and Memmel 2010,

Bodnar et al. 2018). As an alternative, δ ∗ can be derived by implicitly optimizing a loss function

(see Ledoit and Wolf (2021c) for a review on different loss functions), e.g., the Frobenius loss
P P P P 2
LF c, = c − measured in units of risk between the sample and the reference portfolio

F

(e.g., Ledoit and Wolf 2003). Since LF cannot be explicitly calculated, researchers make assumptions
P P
about the characteristics of to minimize LF implicitly. Assumptions about are based, among

others, on an uninformed identity matrix (e.g., Ledoit and Wolf 2004b, Frahm and Memmel 2010,

Bodnar et al. 2018), an identity matrix with the average correlation on its off-diagonals (e.g., Ledoit

and Wolf 2004a) or an arbitrary factor structure (e.g., Ledoit and Wolf 2003).

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
8

Non-Linear Shrinkage Non-linear shrinkage (N LS) methods apply a function

f N LS (λ1 , . . . , λN ) = λ∗1 , . . . , λ∗N (7)

based on the sample eigenvalues to obtain the shrunk eigenvalues λ∗1 , . . . , λ∗N . The N LS covariance

estimate is then calculated using eigendecomposition (3) as

b N LS = Λ diag(λ∗1 , . . . , λ∗N ) Λ0 .
Σ (8)

For instance, f N LS can be defined as

f N LS (λ1 , . . . , λN ) = (1 − δ ∗ )λ + δ ∗ λθ , (9)

with non-linear shrinkage target λθ . The difference to linear shrinkage is that λθ is not a vector that

simply repeats a given value like the mean eigenvalue. Instead, the sample eigenvalues are shrunk

against different individual target eigenvalues.

Recent approaches set the f N LS to the Stieltjes transform of the rescaled Marcenko Pastur

(MP) density (e.g., De Nard et al. 2019, Bun et al. 2017, Ledoit and Wolf 2012). Intuitively, this

distribution presents the limit eigenvalue distribution of a random covariance matrix with N → ∞.

Recently, Ledoit and Wolf (2021b) proposed another N LS approach that combines traditional

“Stein-type” shrinkage with MP-based estimators. The approach applies quadratic shrinkage with

two shrinkage targets, where the respective shrinkage intensity mainly depends on the dimension D.

The more data points are available for covariance estimation, the higher the intensity of “Stein-type”

shrinkage over MP adjustments.

Another common approach is to divide the sample eigenvalues into signal and noise regions

(Laloux et al. 1999, Plerou et al. 2002, Zhao et al. 2019). The function f N LS then rescales only

the noisy eigenvalues while preserving the signal eigenvalues. Similar approaches are known as

“thresholding” (Fan et al. 2013) where the noise eigenvalues are erased from the covariance estimate.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
9

2.3. Cross Validation

The concept of cross validation is well known in the machine learning literature (e.g., Pan and Yang

2009, Ban et al. 2018). The underlying idea is to evaluate multiple train/test splits of the available

data to select the optimal model parameters. By evaluating multiple train/test splits, the resulting

parameters are more robust than the parameters obtained by using a single train/test split (e.g.,

Bergmeir and Benı́tez 2012).

In covariance estimation, the shrinkage parameters like intensity and target could, in theory, also

be selected based on cross validation. This requires a long history of datapoints to be available to

perform a sufficient number of cross train-evaluate iterations. However, if there was a large amount

of data available for cross validation, the same data could also directly be used to estimate the

sample covariance matrix (2). The central limit theorem states that the sample covariance estimate

generally becomes better if the number of datapoints T increases. In particular for high-dimensional

estimation problems, the history of datapoints is smaller than the number of assets, so that there is

no data available to perform cross validation.

We propose an approach that applies cross validation based transfer learning to identify the

optimal shrinkage parameters. We solve the problem of limited data availability by using a sufficiently

long history of disjoint assets for parameter selection. The optimal parameters identified on the

disjoint dataset are then transferred to the actual covariance estimation problem.

3. Covariance Estimation Through Cross Validation based Transfer


Learning
Our approach called “CVTL” (Cross Validation based Transfer Learning) can be attributed to

the class of rotation equivariant non-linear shrinkage estimators. That is, we shrink the sample

eigenvalues, while not changing the eigenvectors. The main idea of our approach is to select the

shrinkage parameters based on a comparatively longer but disjoint history of other assets. The

approach is fully data-driven and agnostic towards achieving a particular shrinkage target or

intensity.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
10

Let U = {1, . . . , U } denote the index set of all assets in the universe. Given a portfolio of N

assets with stock indices X ⊂ U , we need to estimate the future covariance matrix based on the

joint history X T,N . We now assume that there is at least one other index set V ⊂ U of size N with

V ∩ X = ∅ but longer joint history. Specifically, we assume that the history V contains at least

one additional year of trading data so that we can perform at least twelve cross validations for

parameter selection given monthly rebalancing (e.g., Ackermann et al. 2017, De Nard et al. 2019,

Zhao et al. 2019). The optimal shrinkage parameters are then transferred to the actual estimation

problem based on X T,N .

3.1. Non-Linear Shrinkage Using Second Shrinkage Target

The main challenge of non-linear shrinkage is to find an optimal parameter setting which specifies

how to adjust the sample eigenvalues. Specifically, this requires one or more shrinkage targets and

the respective shrinkage intensities.

We shrink the sample eigenvalues λ = λ1 , . . . , λN against two shrinkage targets (λ̄ and λθ )

λ∗ = δ1 λ + δ2 λ + (1 − δ1 − δ2 )λθ . (10)

The first shrinkage target λ is given by a vector of length N that simply repeats the average sample
1
PN
eigenvalue N i=1 λi as suggested by Ledoit and Wolf (2003). The shrinkage parameters δ1 , δ2 , and

λθ are selected using cross validation based transfer learning on a disjoint history V K,N .

The second shrinkage target λθ is calculated based on the imbalance of the sample eigenvalues.

We measure imbalance according to the Gini coefficient (Dorfman 1979). The values of the Gini

coefficient range between 0 and 1, where a value of 0 indicates perfect balance (λi = c ∈ R>0 ,

∀i = 1, . . . , N ), while a value of 1 indicates perfect imbalance (λ = (c, 0, . . . , 0)). We prefer the Gini

over other measures of imbalance such as entropy as the Gini coefficient is limited to the interval

[0, 1]. Since the Gini coefficient requires λi ≥ 0 for all eigenvalues, we set all negative eigenvalues to

zero (if any).

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
11

Definition 3.1 (Gini Coefficient) Given eigenvalues λ = λ1 , . . . , λN with λi ≥ 0, ∀i = 1, . . . , N ,

the Gini coefficient is defined as


PN PN
i=1j=1 |λi − λj |
G(λ1 , . . . , λN ) = PN . (11)
2N i=1 λi

 
We define G Σ
b as the Gini coefficient of the eigenvalues of a sample covariance estimate Σ.b
 
Given G Σ b , we first determine the desired Gini coefficient Gθ for the second shrinkage target.

Subsequently, we search for a cumulative Beta distribution that achieves the desired Gini coefficient.

Thereby, we are flexible in achieving almost any possible eigenspectrum in the non-linear shrinkage

target.

To obtain Gθ , we introduce a parameter γ ∈ [−1, 1] that adjusts the Gini coefficient of Σ


b as


γ + (1 − γ) G(Σ), if γ ≥ 0,

 b
θ
G = (12)


(1 − |γ |) G(Σ),
 b otherwise.
 
Thereby, we calculate a convex combination of G Σ
b and 1 to increase the imbalance of the sample
 
eigenvalues (γ ≥ 0), or between G Σ
b and 0 to balance the sample eigenvalues (γ < 0). Thereby, it

is ensured that Gθ ∈ [0, 1].

Given the desired Gini coefficient Gθ , we generate artificial eigenvalues λθ so that G(λθ ) ≈ Gθ .

For this purpose, we follow (e.g., Ledoit and Wolf 2012, 2017a) in using the cumulative distribution

function (CDF) of a Beta distribution with parameters α, β. The main argument for a Beta

distribution is its bounded support and easily adjustable shape for which Ledoit and Wolf (2012)

calls it the ”best suited family” of distribution for such purpose. The eigenvalues are then calculated

as the values of CDFα,β over an equally spaced grid over the interval [0, 1]. This yields the following

optimization problem

λα,β = arg min |G(λα,β ) − Gθ | (13)


λα,β
     
α,β N N −1 1
where λ = CDFα,β , CDFα,β , . . . , CDFα,β . (14)
N N N

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
12

We optimize α over the set {1, 2, 3, 4, 5, 10, 20, . . . , 100, 150} while setting β = 1. Thereby, we can

generate a large variety of eigenvalue distributions with different Gini coefficients as suggested by

the optimal parameter setting found through cross validation. To provide an intuition, Figure 1

shows the CDFs of three Beta distributions and the corresponding Gini coefficients for N = 100.

Figure 1 Cumulative Distribution Functions of Different Beta Distributions and Corresponding Gini Coefficients

for N = 100 eigenvalues.


Cumulative probability

1.0 1.0 1.0


0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
0 0.5 1 0 0.5 1 0 0.5 1
x x x

(a) α = 2, β = 1, G(λ2,1 ) = 0.502 (b) α = 5, β = 1, G(λ5,1 ) = 0.716 (c) α = 10, β = 1, G(λ10,1 ) = 0.834

We normalize the resulting eigenvalues λα,β from (13) in order to keep the trace of the sample

eigenvalues (e.g., Laloux et al. 2000).

1
PN
θ α,β N i=1 λi
λ =λ . (15)
||λ ||1
α,β

This step is necessary in order to ensure that the second shrinkage target has a reasonable influence

,β ∗
on the resulting shrunk eigenvalues. If the artificial eigenvalues λα are all much larger than their

pendants in the sample eigenvalues λ, the shrinkage operation is strongly dominated by λα,β . Note

that normalization does not alter the Gini coefficient (i.e., G(λα,β ) = G(λθ )) as the Gini coefficient

is invariant to scalar multiplication.

3.2. Parameter Selection Using Cross Validation

We need to select a total of three parameters, namely the adjustment factor γ for the Gini coefficient

and the shrinkage intensities δ1 , δ2 . The parameters are chosen such that the resulting weights

w(γ, δ1 , δ2 ) are historically optimal on a disjoint history V K,N according to the given objective

φ(w) (e.g., minimize variance or maximize risk-adjusted return). The disjoint set of asset V can be

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
13

Algorithm 1 Covariance Estimation with Given Parameters.


1: input: X T,N = x1 , . . . , xT history of returns for covariance estimation, γ scaling parameter, δ1 , δ2 shrinkage
intensities
T
b = 1 P (xt − x̄)(xt − x̄)0
2: Σ T −1
t=1
3: calculate eigendecomposition Σb = Λ λ Λ0
(
γ + (1 − γ) G(Σ),
b if γ ≥ 0,
4: Gθ =
(1 − |γ|) G(Σ),
b otherwise.
5: λα,β = arg minλα,β |G(λα,β ) − Gθ |
where λα,β = CDFα,β N1 , CDFα,β N2 , . . . , CDFα,β N
  
6: N
,
7: and α ∈ {1, 2, 3, 4, 5, 10, 20, . . . , 100, 150}, β = 1.
1 PN
i=1 λi
8: normalization λθ = λα,β N
||λα,β ||1

9: λ∗ = δ1 λ + δ2 λ̄ + (1 − δ1 − δ2 )λθ
∗ ∗ 0
10: Σ = Λ λ Λ
11: output: Σ∗ covariance estimate

selected by randomly sampling indices from U \ X , as done in this study. As an alternative, one

could attempt to specifically select similar portfolio constituents, e.g., from a similar stock market

index, to further improve the resulting parameter configuration.

We assume monthly rebalancing so that each portfolio allocation is held for ρ = 21 trading days

(e.g., Ackermann et al. 2017, De Nard et al. 2019, Zhao et al. 2019). Our empirical results imply

that the parameter setting becomes stable after approximately twelve train-evaluate iterations for

cross validation (see EC.2.2). Hence, we recommend to use a disjoint history V K,N containing at

least another trading year (i.e., 12 · 21 = 252 days) of data so that K ≥ T + 252 and applied the

same methodology to our results.

The process of cross validation is described by Algorithm 2. Without loss of generality, we assume

that φ(w) needs to be minimized. For each parameter configuration γ, δ1 , δ2 , we first check if the

resulting covariance estimate ΣX (γ, δ1 , δ2 ) on the actual dataset X is invertible. If ΣX (γ, δ1 , δ2 ) is

not invertible, the parameter configuration is discarded. Otherwise, we perform time-series cross

validation with monthly rebalancing to calculate the historical performance of γ, δ1 , δ2 under the

given objective. Let the disjoint history V K,N = v1 , . . . , vK be sorted from the oldest (v1 ) to the

most recent datapoint (vK ). We first estimate a covariance matrix with the given parameters based

on v1 , . . . , vT (see Algorithm 1). The resulting covariance matrix is then evaluated based on the

first out-of-sample period vT +1 , . . . , vT +ρ . Subsequently, we shift the estimation and out-of-sample

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
14

periods by ρ days towards the future. Accordingly, the second covariance matrix is estimated

based on vρ+1 , . . . , vρ+1+T and its performance is measured on the second out-of-sample period

vρ+T +2 , . . . , v2ρ+T +2 . This continues until the most recent datapoint vK becomes part of the last

out-of-sample period. We use n to denote the number of train-evaluate iterations that are performed

during cross validation (it is ensured that n ≥ 12). The performance of the current parameter

configuration is then aggregated as the mean performance over all evaluated out-of-sample periods.

Algorithm 2 Parameter Selection Using Cross Validation.


1: input: X T,N history of returns for covariance estimation, V K,N disjoint history of returns for parameter selection
with K ≥ T + 12ρ, ρ rebalancing interval (21 trading days), P discrete parameter space, φ(w) objective function
to be minimized, ψ(γ, δ1 , δ2 ) → {true, f alse} additional filter for admissible configurations
2: r∗ = ∞
3: for each γ, δ1 , δ2 ∈ P do
4: if ΣX (γ, δ1 , δ2 ) not invertible then
5: ignore configuration γ, δ1 , δ2
6: end if
7: t=1
8: n=0
9: while t < K − T − ρ do
10: calculate Σ e based on parameters γ, δ1 , δ2 and datapoints vt , . . . , vt+T (see Algorithm 1)
e −1 1
11: w = 1Σ0 Σ
e −1 1
12: rn (γ, δ1 , δ2 ) = evaluate φ(w) based on vt+T +1 , . . . , vT +1+ρ
13: t=t+ρ
14: n=n+1
15: end while
r = n1 n
P
16: i=1 ri (γ, δ1 , δ2 )
17: if r < r∗ ∧ ψ(γ, δ1 , δ2 ) then
18: γ ∗ , δ1∗ , δ2∗ = γ, δ1 , δ2
19: r∗ = r
20: end if
21: end for
22: output: Historically optimal parameters γ ∗ , δ1∗ , δ2∗

For the purpose of this study, we select the estimation parameters from the following search

space P . We later provide a sensitivity analysis to assess the influence of a more fine-grained grid.

γ ∈ {−1, −0.8, . . . , 1} (16)

δ1 , δ2 ∈ {0, 0.05, . . . , 1} with δ1 + δ2 ≤ 1 (17)

We include an additional filter ψ(γ, δ1 , δ2 ) → {true, f alse}, which allows us to limit the search

space based on particular criteria, as described in the next section.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
15

3.3. Objectives

The objective function φ(w) is evaluated on all out-of-sample periods that occur during cross

validation. Let Σi and v̄i denote the covariance matrix and mean return vector of the out-of-sample

period in the ith train-evaluate iteration during cross validation. To minimize variance, the objective

function must be set to

φ(w) = w0 Σi w. (18)

Alternatively, we can set the objective function to maximize risk-adjusted return as

v̄i0 w
φ(w) = . (19)
w0 Σi w

Note that the weights are always calculated according to the formula of the GMV portfolio (1)

for any given objective. Existence of the inverse covariance matrix is guaranteed by filtering the

estimation parameters accordingly. In particular, our approach does not require us to estimate future

returns. Instead, we select an eigenvalue distribution and shrinkage intensities to explicitly optimize

the given objective φ(w) for the disjoint history. The same parameters should then implicitly

optimize φ(w) in the future investment period. This approach has – to the best of our knowledge –

not been attempted before.

When optimizing for risk-adjusted return (19), we observed overfitting in terms of heavily

dispersed portfolio weights. Specifically, the resulting portfolios exhibit high out-of-sample volatility

combined with higher out-of-sample returns, or, the opposite; low out-of-sample returns paired

with low out-of-sample volatility, while not achieving a stable configuration. Therefore, we limit

the search space to improve the stability of the parameter configurations. For this purpose, we

calculate the optimal η ∗ ∈ [0, 1] that maximizes risk-adjusted return as a linear combination between

the weights of the sample covariance wGM V and the 1/N portfolio w1/N using the disjoint transfer

history. The optimal η ∗ is determined based on the following optimization problem


n
∗ 1 X v̄i0 w
η = arg max , (20)
η n i=1 w0 Σi w

where w = η wGM V + (1 − η) w1/N . (21)

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
16

Subsequently, we define η ∗ to be the middle of an interval [a, b] with a = η ∗ − ζ and b = η ∗ + ζ given

the parameter ζ. We set ζ = 0.30, however, the performance is not sensitive to small changes in ζ

as we later show in our sensitivity analyses (see table 9). We then calculate the acceptance interval

Ψσ in terms of the out-of-sample variance of the portfolios wa and wb achieved by combining the

sample weights and the 1/N portfolio weights according to a and b


" n n
#
1X 0 1X 0
Ψσ = w Σi wa , w Σi wb . (22)
n i=1 a n i=1

We define σ(γ, δ1 , δ2 ) as the mean variance of the given parameter configuration over all periods

during cross validation


n
1X
σ(γ, δ1 , δ2 ) = w(γ, δ1 , δ2 )0 Σi w(γ, δ1 , δ2 ). (23)
n i=1

Finally, we define the acceptance function to filter for parameters γ, δ1 , δ2 so that the resulting

mean variance during cross validation is within Ψσ




1, if σ(γ, δ1 , δ2 ) ∈ Ψσ ,


ψ(γ, δ1 , δ2 ) = (24)


0, otherwise.

Thereby, we avoid the aforementioned problem of achieving portfolios with overly low or high

out-of-sample variance, which leads to more stable parameter estimates. Note that ψ can be defined

in regard to arbitrary objectives. For instance, one could control transaction cost by only accepting

portfolios that do not exceed a given turnover during cross validation or incorporate specific goals

and behavioral theory of individual investors directly into the covariance matrix as suggested in

(e.g., Shefrin and Statman 2000, Das et al. 2010, 2018).

Altogether, we consider the following approaches for the remainder of this study

• CVTL. CVTL with objective minimum variance (18).

• CVTLLS. CVTL with objective minimum variance (18). We require δ1 + δ2 = 1 to exclude the

non-linear shrinkage target λθ .

• CVTL σµ CVTL with objective maximum risk-adjusted return (19).

• CVTLLS σµ . CVTL with objective maximum risk-adjusted return (19). We require δ1 + δ2 = 1

to exclude the non-linear shrinkage target λθ .

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
17

3.4. Conceptual Interpretation and Relation to Other Estimators

CVTL is related to several existing estimators like linear-shrinkage (e.g., Ledoit and Wolf 2003,

2004b,a). When setting δ1 + δ2 = 1, our approach (CVTLLS) has the same shrinkage target as Ledoit

and Wolf (2004b). Similarly, setting δ2 = 1 yields the 1/N portfolio. The intuition behind CVTL is to

adjust the sample eigenvalues to be historically optimal under a given objective φ(w) on the disjoint

history. Thereby, CVTL implicitly performs the following steps: (i) estimating the covariance matrix,

(ii) inverting the covariance matrix, (iii) optimizing the portfolio towards the given objective. This

can be interpreted as a flexible extension of the minimum variance loss suggested by (Engle et al.

2019), where the oracle estimator is derived by using cross validation. However, the difference is that

we replace the statistically derived prior for the oracle eigenvalues by Engle et al. (2019) with the

objective function based on the disjoint history. Furthermore, CVTL is related to the bounded-noise

estimator by Zhao et al. (2019) in the sense that the covariance adjustment is entirely driven by

the given objective. However, CVTL differs from the bounded-noise approach as it does not rely on

bootstrapping within the sample data for optimal parameter calibration.

Figure 2 Conceptual Difference between Established Non-Linear Shrinkage Methods (left, dark gray) and

CVTL (right, light gray).


Established Non-linear Shrinkage History X T,N Cross Validation based Transfer Learning

Implicit minimization Transfer set V K,N


b = Λ diag(λ) Λ0
Σ Objective φ(w)
of loss function L(Σ,
b Σ) with K  T

Cross validation

λ∗ = fθN LS (λ) λ∗ = δ1 λ + δ2 λ̄ + (1 − δ1 − δ2 )λθ

Σ∗ = Λ diag(λ∗ ) Λ0

The conceptual difference between established non-linear shrinkage estimators and CVTL is

illustrated in Figure 2. While existing non-linear shrinkage methods (e.g., Ledoit and Wolf 2015, Bun

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
18

et al. 2017, Ledoit and Wolf 2020, 2021b), select the shrinkage parameters to implicitly minimize a

loss function L(Σ,


b Σ), CVTL selects the shrinkage parameters based on cross validation on a disjoint

history according to a given objective φ(w). CVTL uses cumulative Beta distribution functions to

generate the shrinkage target. However, setting δ1 + δ2 = 0 does not lead to the same non-linear

shrinkage as those of Ledoit and Wolf (e.g., 2015, 2021b). Instead, we could achieve a comparable

estimator by replacing the second shrinkage target of CVTL with the non-linear function presented

in Ledoit and Wolf (2015). Using the Gini coefficient as a measure of imbalance to generate the

second (non-linear) shrinkage target is related to linear shrinkage to constant correlation (Ledoit

and Wolf 2004a). The higher the average correlation in the sample covariance, the higher the Gini

coefficient of the sample eigenvalues, and vice versa.

4. Evaluation Method
We evaluate CVTL according to the objectives minimum variance and maximum risk-adjusted

return against established covariance estimators from the literature. An overview of all considered

estimators is presented in Table 1.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
19

Table 1 Overview of Covariance Estimators Used in Evaluation.


Approach Description Reference

Estimators with Objective Minimum Variance


CVTL CVTL with second shrinkage target –
CVTLLS CVTL with δ1 + δ2 = 1 to exclude second shrinkage target –
QIS Quadratic-Inverse Shrinkage combining linear and non-linear shrinkage (Ledoit and Wolf 2021b)
QuEST Quantized Eigenvalues Sampling Transform: NLS based on Stieltjes (Ledoit and Wolf 2015)
transform of the MP density
LShriCC Linear shrinkage to constant correlation (Ledoit and Wolf 2004a)
LShri Linear shrinkage towards identity matrix (Ledoit and Wolf 2004b)
FMEst Frahm Memmel Estimator: Shrinkage between the sample covariance (Frahm and Memmel 2010)
and 1/N portfolio weights
BPSEst Bodnar Parolya Schmid Estimator: Generalized version of FMEst (Bodnar et al. 2018)
POET I Principal Orthogonal complEment Thresholding: constant threshold (Bai and Ng 2002, Fan et al.
parameter / dynamic factors count 2013, 2016)
POET II Principal Orthogonal complEment Thresholding: constant threshold (Fan et al. 2013, 2016)
parameter / heuristic factors count
BN Bounded Noise estimator: latest signal/noise separation approach with (Zhao et al. 2019)
L = 1000 cross validations
Sample Sample covariance estimator –

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ CVTL with second shrinkage target –
σ
CVTLLS µ CVTL with δ1 + δ2 = 1 to exclude second shrinkage target –
σ

BN VAR BN estimator with objective maximum risk-adjusted return (Zhao et al. 2019)
NC2R Generalized unconstrained partial min-var portfolio trained on return (DeMiguel et al. 2009a)
of prior period
CT Combined Talmud estimator: Combination of 50% sample covariance (Tu and Zhou 2011)
GMV and 50% 1/N portfolio
1/N 1/N portfolio with equal weights (DeMiguel et al. 2009b)

Dataset Our evaluation is based on six different datasets. First, we use three stock datasets

from Bloomberg: (i) all US listed stocks on the NYSE, AMEX, and NASDAQ (US ), (ii) European

stocks listed in the Stoxx 600 Europe index (EU ), (iii) a geographic mixture of stocks listed in the

MSCI World index (WO). Second, we use three publicly available aggregated industry portfolios

with (iv) 10, (v) 30, and (vi) 49 industries from (French 2021) (FFI ). All datasets span the time

period from January 3, 2000 – December 31, 2020. We delete all common NaN data points, e.g.,

weekends, national holidays and other non-trading days. For each dataset, the respective disjoint

history is randomly sampled from the union of the US, EU and WO datasets.

Rolling Window Evaluation We perform a rolling window evaluation with one-step-ahead

predictions, where the covariance matrix is estimated based on daily returns. We apply monthly

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
20

portfolio rebalancing to keep the turnover low (e.g., De Nard et al. 2019). We analyze low- and
N
high-dimensional covariance estimation problems with the dimensions D = T
∈ {2, 32 = 1.50, 34 ≈

1.33, 25 = 0.40, 15 = 0.20, 7.5


1
≈ 0.13} to increase the robustness of our findings (Rossi and Inoue 2012).

Each result is reported as the mean over 50 evaluations. In each evaluation, we randomly select

N ∈ {100, 200, 300} portfolio constituents from the respective dataset. Industry portfolio results

are based on the full dataset with N ∈ {10, 30, 49} industry constituents. For both datasets, we

calculate the following performance metrics as the mean over all occurring out-of-sample periods

between monthly portfolio rebalancing.

Performance Metrics We consider two objectives, namely, minimizing variance and maximizing

risk-adjusted return. Accordingly, we provide annualized out-of-sample volatility and risk-adjusted

return as the annualized return divided by the annualized volatility. In addition, we provide the

turnover and risk-adjusted return after transaction costs assuming 0.25 % per trade (Thapa and

Poshakwale 2010). Additional performance metrics are provided in EC.4. We perform two-sided

t-tests with α = 0.05 to check whether the performance of CVTL is significantly different from

existing estimators.

5. Results
5.1. Minimum Variance

We first consider the results for the objective minimum variance as shown in Table 2. We present

the out-of-sample volatility for estimation problems with D = 43 ≈ 1.33 and D = 0.40. We also show

the results of the estimators with objective maximum risk-adjusted return as a high risk-adjusted

return may also imply low volatility. The volatility of the best approach per setting is highlighted

in bold. Underlined values indicate that the respective approach is significantly outperformed with

p < 0.05 by CVTL. The results show that CVTL and CVTLLS consistently achieve the lowest

volatility among competing estimators independent from the dataset and portfolio size. CVTL

outperforms CVTLLS in most settings. The relative advantage of the second shrinkage target in

CVTL is stronger in the high-dimensional estimation problems. Furthermore, we find that QuEST

leads consistently to the lowest volatility among the competing covariance estimators.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
21

Table 2 Out-of-Sample Annualized Volatility for Different Datasets.


D = N/T High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
N 100 200 300 100 200 300
Data US EU WO US EU WO US EU WO US EU WO US EU WO US EU WO
Estimators with Objective Minimum Variance
CVTL 9.33 9.72 9.21 7.52 8.42 7.97 6.86 7.98 7.45 9.02 9.38 8.83 7.51 8.52 7.88 7.04 8.26 7.72
CVTLLS 9.44 9.80 9.24 7.60 8.46 7.98 6.94 7.98 7.43 9.04 9.41 8.83 7.53 8.52 7.86 7.09 8.23 7.65
QIS 9.42 9.84 9.30 7.60 8.46 8.06 6.92 7.99 7.49 9.08 9.46 8.95 7.58 8.56 7.99 7.10 8.28 7.74
QuEST 9.35 9.77 9.23 7.56 8.43 8.03 6.89 7.97 7.46 9.06 9.46 8.94 7.57 8.57 7.99 7.09 8.28 7.74
LShriCC 10.25 10.59 9.85 8.86 9.51 8.76 8.38 8.89 8.29 9.64 9.94 9.32 8.24 9.12 8.42 7.84 8.79 8.20
LShri 9.76 10.44 9.68 8.22 9.43 8.74 7.66 9.17 8.32 9.39 10.02 9.30 8.03 9.19 8.47 7.63 8.91 8.25
BPSEst 13.55 13.50 12.64 11.69 12.26 11.41 10.72 11.62 10.79 10.07 10.60 9.94 8.51 9.61 8.96 8.02 9.22 8.67
FMEst – – – – – – – – – 10.15 10.73 10.06 8.55 9.69 9.05 8.04 9.29 8.74
POET I – – – – – – – – – 9.86 10.38 10.34 12.63 10.33 14.95 13.82 10.87 26.34
POET II 9.79 10.52 9.93 7.92 8.97 8.50 7.17 8.38 7.84 9.16 9.75 9.19 7.66 8.79 8.15 7.17 8.45 7.87
BN 10.34 10.17 9.64 8.35 8.81 8.38 7.59 8.26 7.83 9.15 9.53 9.01 7.66 8.65 8.00 7.17 8.43 7.77
Sample – – – – – – – – – 10.38 11.11 10.45 8.67 9.93 9.31 8.11 9.47 8.93

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 9.72 9.95 9.43 7.83 8.61 8.14 7.15 8.17 7.66 9.56 9.75 9.22 7.97 8.93 8.10 7.49 8.56 8.05
σ
CVTLLS µ 9.77 10.06 9.69 7.89 8.65 8.18 7.20 8.20 7.66 9.85 9.92 9.36 8.13 8.91 8.20 7.77 8.53 8.06
σ
BN VAR 11.17 11.14 10.39 9.44 9.95 9.34 9.06 9.86 9.16 10.27 10.93 10.19 8.69 9.92 9.05 7.95 9.76 8.73
NC2R 13.03 12.53 11.99 12.63 12.47 11.52 14.18 13.88 12.81 12.97 12.50 11.80 12.96 12.63 11.57 14.63 14.00 12.99
CT – – – – – – – – – 11.63 11.52 10.67 10.71 10.75 9.85 10.70 10.73 9.93
1/N 17.19 16.40 14.97 16.77 16.13 14.52 16.92 16.29 14.66 17.27 16.41 14.90 16.81 15.85 14.19 17.17 16.23 14.55

Note: Results of all evaluated covariance estimators for US, EU and WO data with N ∈ {100, 200, 300}. Each value is
given in percent as the average over 50 evaluations of random portfolio constituents. Out-of-sample volatility greater
than 50 percent due to estimation errors is denoted by “–”. The best estimator per problem setting is highlighted in
bold. Underlined values indicate significant differences from CVTL with p < 0.05.

We further assess the results for out-of-sample volatility with respect to different dimensions

D = N/T ∈ {2, 1.50, 1.33, 0.40, 0.20, 0.13} of the sample data X. The results for the US dataset are

shown in Table 3. The results for all other datasets can be found in EC.4. Again, CVTL generally

outperforms the competing estimators. The difference between CVTL and CVTLLS is significantly

larger in the high-dimensional problems with D > 1. However, for the low-dimensional problems

with D < 1, the benefit of the second shrinkage target diminishes and both CVTL models achieve

similar out-of-sample variance.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
22

Table 3 Out-Of-Sample Annualized Volatility for US Data and Different Dimensions.


N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 9.50 9.32 9.33 9.02 9.01 9.39 7.69 7.61 7.52 7.51 8.19 8.92 6.86 6.91 6.86 7.04 8.09 7.60
CVTLLS 9.66 9.44 9.44 9.04 9.01 9.38 7.81 7.70 7.60 7.53 8.19 8.89 6.95 6.99 6.94 7.09 8.07 7.62
QIS 9.59 9.42 9.42 9.08 9.07 9.39 7.76 7.67 7.60 7.58 8.21 8.89 6.90 6.97 6.92 7.10 8.07 7.59
QuEST 9.55 9.37 9.35 9.06 9.06 9.39 7.72 7.63 7.56 7.57 8.20 8.89 6.88 6.94 6.89 7.09 8.07 7.59
LShriCC 10.33 10.18 10.25 9.64 9.32 9.54 8.93 8.87 8.86 8.24 8.54 9.12 8.23 8.41 8.38 7.84 8.46 7.78
LShri 9.85 9.71 9.76 9.39 9.22 9.49 8.15 8.21 8.22 8.03 8.42 9.04 7.41 7.64 7.66 7.63 8.32 7.69
BPSEst 12.77 12.92 13.55 10.07 9.40 9.57 10.61 10.95 11.69 8.51 8.53 9.10 – 10.02 10.72 8.02 8.41 7.73
FMEst – – – 10.15 9.41 9.58 – – – 8.55 8.54 9.10 – – – 8.04 8.41 7.72
POET I – – – 9.86 9.29 9.53 – – – 12.63 8.62 9.13 – – – 13.82 8.58 7.82
POET II 11.54 10.03 9.79 9.16 9.17 9.50 9.02 8.11 7.92 7.66 8.31 9.04 7.87 7.33 7.17 7.17 8.23 7.75
BN 10.81 10.42 10.34 9.15 9.17 9.57 8.83 8.51 8.35 7.66 8.33 9.05 7.82 7.73 7.59 7.17 8.20 7.76
Sample – – – 10.38 9.49 9.61 – – – 8.67 8.56 9.10 – – – 8.11 8.41 7.72

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 9.92 9.66 9.72 9.56 9.76 10.26 7.92 7.78 7.83 7.97 8.77 9.90 7.24 7.16 7.15 7.49 9.07 7.92
σ
CVTLLS µ 10.06 9.66 9.77 9.85 9.97 10.45 7.95 7.86 7.89 8.13 9.01 9.52 7.35 7.32 7.20 7.77 8.76 8.03
σ
BN VAR 11.41 11.13 11.17 10.27 10.40 10.64 9.81 9.50 9.44 8.69 9.36 9.83 9.14 9.12 9.06 7.95 8.88 8.42
NC2R 13.07 13.15 13.03 12.97 13.14 13.73 12.60 12.76 12.63 12.96 13.77 14.22 13.69 13.81 14.18 14.63 15.40 13.03
CT – – – 11.63 11.42 11.83 – – – 10.71 11.39 12.02 – – – 10.70 11.60 10.12
1/N 17.03 17.03 17.19 17.27 16.98 17.40 16.66 16.82 16.77 16.81 17.67 18.37 16.72 16.83 16.92 17.17 18.32 15.18

Note: Results of all evaluated covariance estimators for US data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best
estimator per problem setting is highlighted bold. Underlined values indicate significant differences from CVTL with
p < 0.05.

Regarding the competing estimators, we observe a general pattern that the achieved volalities

are less diverged for D < 1. We find that QuEST (Ledoit and Wolf 2015) even presents the overall

dominant approach for a total of three low-dimensional problems with N = 200, D = 0.13 and

N = 300, D ∈ {0.20, 0.13}. Other estimators that rely on the weights given by the sample covariance

matrix, namely FMEst and CT, show significantly worse performance for high D up to the point

where the resulting portfolio yields an out-of-sample volatility of above 50 percent, which is denoted

by “–”.

Interestingly, we find that CVTL, QuEST, and QIS (Ledoit and Wolf 2021b) achieve comparably

lower variances for higher than for lower dimensions D. This seems surprising, as greater D implies

a more deteriorated covariance matrix with greater imbalance in eigenvalues, and, thus, a more

challenging estimation problem. However, this difference in volatility can be explained as the

number and timeframe of the individual out-of-sample periods in the rolling window evaluations

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
23

are different. For instance, the setting D = 0.13 and N = 300 requires 2250 datapoints, or ten years

of return history to estimate the first covariance matrix. All of these datapoints are hence not used

as out-of-sample periods. Conversely, the setting D = 2 and N = 300 requires only 150 datapoints

to estimate the first covariance matrix. Accordingly, the high-dimensional evaluation is based on

more and older datapoints than the low-dimensional evaluation.

Figure 3 Visualization of Performance.

(a) Annualized Volatility vs. Annualized Return (b) Annualized Volatility vs. Risk-Adjusted Return

Note: Visualization of out-of-sample annualized volatility, annualized return, and risk-adjusted for all covariance

estimators based on US data with N = 100 and D = 0.20. Each value is given in percent as the average over 50

simulations with random portfolio constituents. The area under the frontier created by CVTL provides additional

diversification benefits.

Figure 3a illustrates the out-of-sample performance of each estimator along the efficient frontier

in a return versus volatility diagram for N = 100 and D = 0.20. In addition to our prior findings,

we observe that CVTL leads to comparable or better returns than most competing estimators. As

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
24

a result, it shifts the efficient frontier , which traditionally starts at the sample portfolio, to the

upper left by reducing variance and increasing return.

We also consider the results for out-of-sample volatility based on small aggregated industry

portfolios as done by Zhao et al. (2019). Note that these datasets are publicly available (see French

2021), which facilitates reproducibility of our results. Table 4 presents the results for volatility for

N ∈ {10, 30, 49} industry portfolios. CVTL now achieves only average results if the portfolio is very

small, with N = 10. However, both CVTL approaches become superior to the competing approaches

for larger potfolios with N > 10. For N = 10, POET I achieves the lowest volatility. Besides, we

find similar patterns for the otherwise superior QuEST model, which is outperformed by linear

shrinkage for N < 49. This was already suggested by Ledoit and Wolf (2021a) and is mainly due to

the complexity of fitting a non-linear function for small X T ×N .

Table 4 Out-Of-Sample Annualized Volatility for FFI Data and Different Dimensions.
N 10 30 49
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 14.14 13.76 13.60 12.79 12.14 12.05 11.89 11.85 11.68 11.25 10.82 10.89 11.23 10.99 10.89 10.37 10.32 10.11
CVTLLS 14.00 13.58 13.42 12.27 11.77 11.85 12.05 11.89 11.71 11.21 10.78 10.86 11.16 10.99 10.85 10.34 10.31 10.08
QIS 13.43 13.49 13.63 12.36 11.98 11.96 12.12 12.15 12.14 11.41 11.09 11.08 11.33 11.12 11.14 10.63 10.50 10.24
QuEST 13.25 12.97 12.73 12.26 11.95 11.97 11.99 11.94 11.76 11.41 11.08 11.08 11.23 11.06 11.04 10.60 10.50 10.24
LShriCC 13.32 13.12 12.91 12.15 12.04 12.11 12.14 12.32 12.04 11.67 11.10 11.05 11.46 11.54 11.29 10.81 10.56 10.31
LShri 13.11 12.78 12.56 12.04 11.77 11.82 11.95 12.03 11.93 11.47 11.02 11.03 11.46 11.29 11.28 10.77 10.56 10.28
BPSEst 14.69 14.97 15.33 13.09 12.16 12.05 14.14 14.37 15.26 12.25 11.40 11.21 13.24 13.66 14.35 11.65 10.81 10.40
FMEst – 15.55 – 13.23 12.22 12.09 – – – 12.42 11.44 11.23 – – – 11.81 10.83 10.41
POET I 13.11 12.71 12.43 11.89 11.61 11.70 11.95 12.15 13.37 11.43 10.92 10.94 27.62 12.79 16.88 10.69 10.41 10.16
POET II 15.86 14.52 13.93 12.41 12.11 12.17 13.25 12.68 12.43 11.64 11.33 11.37 12.22 11.39 11.42 10.72 10.67 10.47
BN 13.31 12.83 12.63 12.01 11.85 11.89 12.03 11.94 11.69 11.36 10.97 11.07 11.39 11.21 11.03 10.60 10.47 10.25
Sample – – – 13.91 12.43 12.21 – – – 13.03 11.61 11.33 – – – 12.31 10.98 10.50

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 23.52 13.63 15.78 13.02 12.64 13.58 13.33 13.32 13.11 12.28 12.07 11.79 11.62 11.61 11.80 10.84 10.97 10.95
σ
CVTLLS µ 40.22 13.66 26.38 12.47 12.76 12.90 13.20 12.32 12.47 11.97 11.52 11.70 11.72 11.09 11.04 10.69 11.12 11.19
σ
BN VAR 13.84 13.91 13.69 13.67 13.22 13.55 13.15 13.53 12.59 12.59 12.54 12.63 12.10 12.10 11.98 11.95 12.02 12.37
NC2R 14.26 13.81 13.80 13.55 13.18 13.24 14.00 14.20 13.85 13.58 13.29 12.99 13.44 13.14 13.45 12.96 13.24 12.91
CT – – 46.79 12.82 12.44 12.50 – – – 12.74 12.34 12.31 – – – 12.22 12.04 11.80
1/N 15.54 15.55 15.56 15.51 15.52 15.68 16.80 16.80 16.76 17.00 16.78 16.83 16.69 16.88 16.72 16.66 16.75 16.43

Note: Results of all evaluated covariance estimators for FFI industry portfolios with N ∈ {10, 30, 49} and
D ∈ {2, 1.50, 1.33, 0.40, 0.20, 0.13}. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted
by “–”. The best estimator per problem setting is highlighted in bold. Underlined values indicate significant differences
from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
25

5.2. Maximum Risk-Adjusted Return

Next, we consider all covariance estimators based on the objective maximum risk-adjusted return.

Table 5 presents the out-of-sample risk-adjusted return as annual return divided by annual volatility.

Again, we consider all estimators, including the estimators that were specifically designed to

minimize volatility as a high risk-adjusted return may also be achieved through low volatility. In

this case, underlined values indicate that the respective approach is significantly outperformed with

p < 0.05 by CVTL σµ .

Table 5 Out-of-Sample Risk-Adjusted Return and Different Datasets.


D = N/T High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
N 100 200 300 100 200 300
Data US EU WO US EU WO US EU WO US EU WO US EU WO US EU WO

Estimators with Objective Minimum Variance


CVTL 0.98 1.12 0.88 1.38 1.52 1.03 1.25 1.48 0.99 0.81 1.10 0.75 1.09 1.43 1.08 1.01 1.42 0.99
CVTLLS 0.99 1.11 0.92 1.38 1.49 1.07 1.24 1.46 1.02 0.82 1.09 0.79 1.09 1.40 1.10 1.00 1.39 1.01
QIS 0.97 1.06 0.88 1.36 1.46 1.02 1.21 1.44 0.96 0.76 1.05 0.74 1.05 1.40 1.02 0.98 1.39 0.99
QuEST 0.97 1.06 0.88 1.38 1.48 1.02 1.23 1.45 0.97 0.77 1.06 0.73 1.05 1.40 1.02 0.98 1.39 0.99
LShriCC 0.58 0.69 0.61 0.79 0.88 0.71 0.44 0.78 0.58 0.50 0.79 0.60 0.64 0.82 0.76 0.46 0.81 0.71
LShri 0.92 1.00 0.87 1.22 1.32 1.01 0.98 1.11 0.81 0.72 0.97 0.74 0.90 1.14 0.92 0.79 1.09 0.90
BPSEst 0.78 0.82 0.82 0.94 0.92 0.80 0.79 0.78 0.73 0.67 0.85 0.74 0.85 1.03 0.87 0.76 0.99 0.85
FMEst – – – – – – – – – 0.63 0.84 0.71 0.82 1.00 0.83 0.75 0.99 0.83
POET I – – – – – – – – – 0.87 1.06 0.86 0.49 1.52 0.48 1.06 1.24 0.18
POET II 0.83 0.93 0.81 1.21 1.34 1.02 1.04 1.25 0.96 0.69 1.00 0.74 0.90 1.21 1.02 0.84 1.17 0.99
BN 0.98 1.08 0.86 1.33 1.45 1.00 1.31 1.51 0.99 0.81 1.12 0.75 1.08 1.55 1.12 1.02 1.59 1.03
Sample – – – – – – – – – 0.58 0.81 0.65 0.78 0.96 0.77 0.72 0.97 0.81

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 1.00 1.12 0.92 1.39 1.49 1.03 1.26 1.51 0.99 0.89 1.11 0.83 1.24 1.49 1.15 1.09 1.59 1.02
σ
CVTLLS µ 0.98 1.09 0.96 1.39 1.46 1.06 1.25 1.43 1.02 0.90 1.05 0.83 1.22 1.45 1.17 1.07 1.43 1.01
σ
BN VAR 0.73 0.86 0.60 0.89 1.24 0.68 0.86 1.36 0.70 0.55 1.05 0.52 0.81 1.45 1.01 0.88 1.70 1.00
NC2R 0.80 0.77 0.81 0.86 0.95 0.82 0.71 0.82 0.80 0.70 0.78 0.74 0.77 0.90 1.01 0.54 0.83 0.72
CT – – – – – – – – – 0.89 0.92 0.93 1.11 1.15 1.17 0.82 1.02 0.97
1/N 0.83 0.77 0.86 0.88 0.78 0.89 0.77 0.82 0.85 0.86 0.75 0.88 1.03 0.97 1.13 0.68 0.78 0.84

Note: Results of all evaluated covariance estimators for US, EU and WO data with N ∈ {100, 200, 300}. Each value is
given in percent as the average over 50 evaluations of random portfolio constituents. Out-of-sample volatility greater
than 50 percent due to estimation errors is denoted by “–”. The best estimator per problem setting is highlighted in
bold. Underlined values indicate significant differences from CVTL µ with p < 0.05.
σ

The results do not suggest a dominating estimator, which is somewhat expected as the estimation

of returns is usually more unstable than the estimation of risk (Michaud 1989). However, we observe

that CVTL σµ and CVTLLS σµ usually perform among the best models in the considered low- and high-

dimensional settings. The highest risk-adjusted return in a few settings (D = 1, 33, N = 200, EU and

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
26

WO dataset) is achieved by CVTL and CVTLLS, albeit these approaches are primarily minimizing

variance. The largest outperformance over CVTL σµ is achieved by CT in the low-dimensional

problem with N = 100 on the MSCI World (WO) dataset. In addition, we find that CVTL and

CVTL σµ achieve similar risk-adjusted returns even though they were designed to optimize different

objectives. There is only one setting where CVTL or CVTL σµ do not outperform the approaches

BN VAR and NC2R, which are both designed to maximize risk-adjusted return.

Following our prior analyses, we also assess the results for risk-adjusted return based on the US

dataset and dimensions D = N/T ∈ {2, 1.50, 1.33, 0.40, 0.20, 0.13}. The results are shown in Table 6.

Here, we observe that CVTL σµ and CVTLLS σµ achieve the highest risk-adjusted return in most

settings.

Table 6 Out-Of-Sample Risk-Adjusted Return for US Data and Different Dimensions.


N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 1.06 1.13 0.98 0.81 0.94 0.81 1.25 1.45 1.38 1.09 0.73 0.73 1.53 1.33 1.25 1.01 0.72 1.47
CVTLLS 1.10 1.15 0.99 0.82 0.94 0.82 1.25 1.44 1.38 1.09 0.74 0.72 1.51 1.33 1.24 1.00 0.72 1.43
QIS 1.08 1.14 0.97 0.76 0.90 0.79 1.22 1.45 1.36 1.05 0.71 0.69 1.51 1.30 1.21 0.98 0.68 1.42
QuEST 1.08 1.14 0.97 0.77 0.90 0.79 1.23 1.46 1.38 1.05 0.71 0.69 1.53 1.32 1.23 0.98 0.68 1.42
LShriCC 0.71 0.81 0.58 0.50 0.72 0.66 0.65 0.81 0.79 0.64 0.47 0.52 0.76 0.57 0.44 0.46 0.38 1.15
LShri 1.06 1.12 0.92 0.72 0.85 0.76 1.15 1.32 1.22 0.90 0.63 0.65 1.34 1.08 0.98 0.79 0.58 1.32
BPSEst 0.89 0.91 0.78 0.67 0.83 0.75 1.02 1.07 0.94 0.85 0.62 0.64 0.34 0.99 0.79 0.76 0.57 1.30
FMEst – – – 0.63 0.82 0.75 – – – 0.82 0.62 0.64 – – – 0.75 0.57 1.30
POET I – – – 0.87 0.99 0.84 – – – 0.49 0.80 0.75 – – – 1.06 0.78 1.46
POET II 0.87 1.04 0.83 0.69 0.83 0.75 1.01 1.24 1.21 0.90 0.62 0.65 1.19 1.15 1.04 0.84 0.62 1.31
BN 1.01 1.05 0.98 0.81 0.98 0.83 1.15 1.44 1.33 1.08 0.75 0.75 1.46 1.29 1.31 1.02 0.76 1.53
Sample – – – 0.58 0.79 0.73 – – – 0.78 0.60 0.64 – – – 0.72 0.56 1.30

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 1.08 1.10 1.00 0.89 1.11 0.84 1.27 1.46 1.39 1.24 0.82 0.83 1.51 1.34 1.26 1.09 0.87 1.52
σ
CVTLLS µ 1.11 1.12 0.98 0.90 1.10 0.84 1.24 1.44 1.39 1.22 0.84 0.82 1.51 1.32 1.25 1.07 0.86 1.52
σ
BN VAR 0.80 0.82 0.73 0.55 0.74 0.74 0.79 1.04 0.89 0.81 0.58 0.62 0.80 0.72 0.86 0.88 0.68 1.58
NC2R 0.80 0.72 0.80 0.70 0.72 0.43 0.74 0.75 0.86 0.77 0.71 0.38 0.90 0.84 0.71 0.54 0.51 0.86
CT – – – 0.89 1.08 0.79 – – – 1.11 0.76 0.69 – – – 0.82 0.67 1.05
1/N 0.93 0.85 0.83 0.86 1.02 0.68 0.92 0.95 0.88 1.03 0.69 0.59 0.89 0.80 0.77 0.68 0.59 0.75

Note: Results of all evaluated covariance estimators for US data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best
estimator per problem setting is highlighted in bold. Underlined values indicate significant differences from CVTL µ
σ
with p < 0.05.

Figure 3b illustrates the out-of-sample performance of each estimator along the efficient frontier

in a risk-adjusted return vs volatility diagram for N = 100 and D = 0.20. We observe that, by

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
27

introducing CVTL σµ , we obtain an improved efficient frontier between CVTL, CVTL σµ and the 1/N

portfolio that dominates existing estimators. Importantly, as shown by Figure 3a, the increase in

out-of-sample risk-adjusted return in CVTL σµ is not caused by strategic scaling, e.g., decreasing

variance towards zero to improve risk-adjusted return. Instead, CVTL σµ exhibits comparable, but

improved, levels of absolute risk and return than the traditional efficient frontier.

The results for maximum risk-adjusted return on the FFI industry portfolios are shown in Table 7.

The results differ from those of our prior analyses on risk-adjusted return based on single stock data.

For small portfolios with N = 10, POET II leads to the highest risk-adjusted return. For N = 49,

linear shrinkage to constant correlation (LinShriCC) dominates all other estimators. The CVTL

estimators perform weakly, which seems to be caused by the small portfolio sizes, in particular for

N < 49.

Table 7 Out-Of-Sample Risk-Adjusted Return for FFI Data and Different Dimensions.
N 10 30 49
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13
Estimators with Objective Minimum Variance
CVTL 0.63 0.77 0.62 0.68 0.89 0.96 0.78 0.82 0.83 0.84 0.94 0.92 0.75 0.81 0.88 0.89 0.73 0.95
CVTLLS 0.66 0.78 0.64 0.70 0.94 0.98 0.78 0.85 0.84 0.86 0.93 0.93 0.77 0.86 0.88 0.90 0.72 0.93
QIS 0.85 0.92 0.98 0.88 1.07 1.04 0.59 0.86 0.81 0.80 0.87 0.90 0.82 0.81 0.71 0.73 0.64 0.85
QuEST 0.92 0.94 0.92 0.91 1.11 1.05 0.62 0.82 0.76 0.79 0.86 0.91 0.79 0.78 0.71 0.78 0.65 0.85
LShriCC 0.79 0.84 0.90 0.87 0.95 0.87 0.71 0.65 0.65 0.79 0.90 0.93 0.85 0.99 0.96 0.85 0.71 0.91
LShri 0.68 0.82 0.76 0.82 1.04 1.04 0.73 0.83 0.77 0.83 0.89 0.91 0.76 0.72 0.75 0.81 0.67 0.88
BPSEst 0.62 0.72 0.69 0.81 1.05 1.02 0.50 0.82 0.72 0.78 0.85 0.87 0.79 0.74 0.80 0.59 0.60 0.81
FMEst – 0.68 – 0.84 1.08 1.04 – – – 0.74 0.85 0.88 – – – 0.55 0.60 0.81
POET I 0.67 0.89 0.89 0.85 1.02 1.01 0.78 0.80 0.72 0.60 0.83 0.86 0.36 0.65 0.16 0.72 0.62 0.85
POET II 1.16 1.30 1.27 0.99 1.11 0.96 0.61 0.84 0.78 0.72 0.79 0.81 0.82 0.75 0.79 0.87 0.63 0.74
BN 0.78 0.84 0.71 0.86 1.09 1.06 0.70 0.84 0.84 0.78 0.85 0.89 0.67 0.84 0.80 0.79 0.63 0.91
Sample – – – 0.83 1.11 1.06 – – – 0.68 0.86 0.89 – – – 0.50 0.60 0.79

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 0.74 0.80 0.27 0.55 0.79 0.75 0.65 0.84 0.95 0.85 0.94 0.89 0.75 0.86 0.82 1.00 0.73 1.02
σ
CVTLLS µ 1.07 0.81 – 0.73 0.84 0.87 0.64 0.83 0.76 0.82 1.01 0.91 0.56 0.86 0.86 0.95 0.71 0.98
σ
BN VAR 0.67 0.93 0.95 0.86 1.04 0.97 0.70 0.73 0.92 0.78 0.94 1.02 0.71 0.93 0.88 0.95 0.73 0.86
NC2R 0.54 0.78 0.73 0.61 0.76 0.79 0.46 0.65 0.68 0.71 0.49 0.74 0.68 0.76 0.70 0.74 0.64 0.84
CT – – 0.86 0.75 0.91 0.95 – – – 0.81 0.81 0.85 – – – 0.74 0.63 0.85
1/N 0.63 0.68 0.52 0.49 0.57 0.68 0.54 0.62 0.74 0.69 0.60 0.64 0.51 0.66 0.58 0.72 0.51 0.72

Note: Results of all evaluated covariance estimators for FFI industry portfolios with N ∈ {10, 30, 49} and
D ∈ {2, 1.50, 1.33, 0.40, 0.20, 0.13}. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted
by “–”. The best estimator per problem setting is highlighted in bold. Underlined values indicate significant differences
from CVTL µ with p < 0.05.
σ

We also consider out-of-sample risk-adjusted return after transaction costs (0.25 percent per

trade (Thapa and Poshakwale 2010)), for which we briefly summarize the findings. Detailed results

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
28

are provided in EC.1. We first consider the turnover rates (i.e., the sum of weight changes during

the evaluation period) as they are directly influencing transaction costs. Here, we find that CVTL

usually achieves the lowest turnover of all estimators with the objective minimizing risk. Among the

estimators with objective maximum risk-adjusted return, CVTL σµ most often achieves the lowest

turnover rates. The only estimator that shows lower turnover for N = 300 is N2CR. This can be

expected as N2CR employs weight constraints. Regarding risk-adjusted return after transaction

costs, we can confirm the results of Kourtis (2015), which suggest that it is hard to outperform

the 1/N portfolio on a risk-adjusted basis after accounting for transaction costs. However, CVTL σµ

and CVTLLS σµ are the only estimators that outperform the 1/N portfolio in several settings. The

results about risk-adjusted return after costs for other datasets are provided in EC.4.

5.3. Sensitivity Analyses

We now assess the sensitivity of CVTL in regard to the cross validation parameters. Specifically,

we consider a higher number of disjoint histories used in cross validation and different sizes of

the search grid for the covariance estimation parameters γ, δ1 , δ2 . We analyze the sensitivity for

D ∈ {0.40, 1.33} and portfolio size N = 100 based on the US dataset. In addition, we analyze the

influence of the parameter ζ that is used to define the function (24) that reduces the parameter

search space when maximizing risk-adjusted return.

In our main analyses, we used only one transfer dataset (i.e., one disjoint history) for cross valida-

tion and the parameter grid from (17). In this analysis, we evaluate the difference in performance

for 1, . . . , 5 transfer sets and two additional parameter grids. We consider a smaller grid Psmall

1
γ ∈ {−1, − , . . . , 1} (25)
3
δ1 , δ2 ∈ {0, 0.1, . . . , 1} with δ1 + δ2 ≤ 1 (26)

and a larger grid Plarge .

γ ∈ {−1, −0.9, . . . , 1} (27)

δ1 , δ2 ∈ {0, 0.02, . . . , 1} with δ1 + δ2 ≤ 1 (28)

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
29

Table 8 shows the difference in basis points (bps, 1 basis point = 0.01 %) for the particular

performance metrics between the modified parameter setting and the reference setting (i.e., one

transfer set and the medium-sized grid (17)). The scores are calculated in a way that positive values

indicate an improvement, while negative values indicate a decline in the respective performance

metric. For the objective minimum variance, increasing the number of transfer sets has a small

positive effect as it reduces out-of-sample volatility by one bp. Increasing the grid size from medium

to large reduces out-of-sample volatility by 5 bps in the high-dimensional setting and by 8 bps in

the low-dimensional setting. Conversely, reducing the grid size from medium to small increases

out-of-sample volatility by up to 5 bps in the high-dimensional setting and by up to 8 bps in the

low-dimensional setting. Interestingly, we find that increasing the grid and the number of transfer

sets simultaneously leads to higher out-of-sample volatility than using only one transfer set in

combination with a large grid.

Table 8 Sensitivity of CVTL Performance Towards Different Search Grids and Number of Transfer Sets.
High-Dimensional Setting (D = 1.33) Low-Dimensional Setting (D = 0.40)
Search Small Medium Large Small Medium Large
grid
Transfer
Objective Minimum Variance (CVTL)
sets
1 −5 – 5 −8 – 8
2 −5 1 0 −6 2 4
3 −5 1 0 −6 2 4
4 −4 1 0 −6 2 4
5 −5 1 0 −6 2 4
Transfer
Objective Risk-Adjusted Return (CVTL µ )
sets σ

1 2 – −3 8 – −8
2 4 1 −2 7 2 −5
3 4 1 0 9 2 −6
4 3 0 0 11 2 −5
5 4 2 0 13 3 −5

Note: Scores are given in basis points. Positive values indicate improvements, while negative values indicate declines in
the respective performance metric. The reference setting is given by a medium-sized grid and one transfer set for cross
validation.

For CVTL σµ with the objective maximum risk-adjusted return, increasing the number of transfer

sets has a greater positive effect of up to 2 bps in the low-dimensional and 3 bps in the high-

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
30

dimensional setting. However, increasing the grid size from medium to large decreases the risk-

adjusted return by up to 8 bps in the high-dimensional setting. In particular, we find that it is best

to decrease the grid from medium to small as this increases out-of-sample risk-adjusted return by up

to 4 bps in the low-dimensional and 13 bps in the high-dimensional setting. These findings suggest

that using a very fine-grained grid leads to overfitting when optimizing the covariance estimate to

maximize risk-adjusted return. Instead, it is beneficial to use a more coarse grid to achieve better

generalizability of the parameters found via cross validation.

We also analyze the influence of the ζ parameter that is used to decide whether or not a given

parameter configuration δ1 , δ2 , γ is admissible when optimizing for risk-adjusted return. In our

main analysis, we set ζ = 0.30. We now analyze how out-of-sample risk-adjusted return changes for

different values ζ ∈ {0.10, 0.20, 0.30, 0.40, 0.50} and if we remove the search space reduction. The

results are shown in Table 9. Evidently, it is generally beneficial to filter the parameter configurations

so that the resulting portfolios exhibit non-extreme volatility in cross validation. Note that not

filtering the parameter configurations can also lead to impressively good results, e.g.for N = 200 and

D = 0.13, however, we believe that such scores only appeared by chance and will not be reproducible

in the future. In addition, we find that risk-adjustred return changes only marginally for different

values of ζ. The difference in risk-adjusted return across different values of ζ is a maximum of 6 bps

in the high-dimensional problems and a maximum of 10 bps in the low-dimensional problems.

Table 9 Out-Of-Sample Risk-Adjusted Return for US Data and Different Dimensions.


N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

CVTL µ with Search Space Reduction


σ

ζ = 0.10 1.08 1.12 1.00 0.86 1.15 0.82 1.28 1.44 1.34 1.26 0.79 0.81 1.50 1.30 1.20 0.97 0.79 1.44
ζ = 0.20 1.08 1.12 1.01 0.88 1.13 0.84 1.27 1.44 1.38 1.25 0.80 0.82 1.51 1.34 1.25 1.05 0.84 1.50
ζ = 0.30 1.08 1.10 1.00 0.89 1.11 0.84 1.27 1.46 1.39 1.24 0.82 0.83 1.51 1.34 1.26 1.09 0.87 1.52
ζ = 0.40 1.09 1.10 0.99 0.89 1.08 0.84 1.30 1.45 1.36 1.24 0.83 0.85 1.51 1.35 1.25 1.11 0.85 1.54
ζ = 0.50 1.06 1.10 0.99 0.88 1.06 0.84 1.29 1.45 1.36 1.22 0.83 0.86 1.51 1.34 1.23 1.11 0.86 1.54

CVTL µ without Search Space Reduction


σ

0.71 1.09 0.73 0.87 1.02 0.84 1.30 1.44 1.32 1.15 0.79 2.74 1.47 1.34 1.28 1.12 0.73 0.77

Note: Results of all evaluated covariance estimators for US data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. The best estimator per problem setting is highlighted in bold.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
31

5.4. Empirical Interpretation and Relation to Other Estimators

Finally, we analyze the resulting estimation parameters and eigenvalue distributions from an

empirical point of view. In addition, we compare the resulting eigenvalue distributions with the

established non-linear shrinkage estimator QuEST (Ledoit and Wolf 2015) and BN (Zhao et al.

2019).

Estimation Parameters of CVTL We start by considering the most frequent parameters

γ, δ1 , δ2 that are used to estimate the future covariance based on sample data. Table 10 provides

the average parameters of CVTL and CVTL σµ for US data with N = 100 and different dimensions.

To ease readability, we define δ3 = 1 − δ1 − δ2 as the weight of the non-linear shrinkage target λθ .

We observe that δ3 is considerably higher in the high-dimensional settings with D > 1, which is

consistent with the recent results by Ledoit and Wolf (2021b) who also found greater usage of

the non-linear shrinkage target for D > 1. For the low-dimensional problems, δ3 decreases and δ1

increases, which implies a greater weight of the sample eigenvalues. We further observe a decreasing

scaling parameter γ for increasing D. The parameters of CVTL σµ show similar patterns. The weight

of the non-linear shrinkage target increases for higher D, while the weight of the sample eigenvalues

increases for lower D. Additional empirical observations about the most frequent parameters for

larger portfolios, parameter values over time, and parameter interdependence are provided in EC.2.

Table 10 Average Parameter Calibration for CVTL Estimators and Different Sample Dimensions.
CVTL CVTL µ
σ

D δ1 δ2 δ3 γ δ1 δ2 δ3 γ
2.00 20.9 26.6 52.4 88.6 23.2 43.0 33.8 40.8
1.50 30.7 32.9 36.4 90.9 24.0 47.5 28.5 59.0
1.33 36.5 36.7 26.8 91.9 28.6 46.4 25.0 42.1
0.40 60.6 27.5 11.9 84.8 33.5 45.8 20.7 30.0
0.20 68.8 19.5 11.7 81.8 33.7 47.0 19.3 11.4
0.13 69.4 17.7 12.9 69.0 29.3 45.3 25.5 35.8

Note: Average parameter calibration for US data with N = 100 and D ∈ {2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each score is
given in percent as the average over 50 evaluations with random portfolio constituents. δ3 = 1 − δ1 − δ2 .

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
32

Eigenvalue Distributions Next, we compare the resulting eigenvalue distributions of CVTL,

QuEST (Ledoit and Wolf 2015) and BN (Zhao et al. 2019). Figure 4 plots the shrunk eigenvalues

compared to the sample eigenvalues for CVTL (black cross), QuEST (dark gray circle) and BN

(gray triangles). The plotted eigenvalues denote the average over 50 evaluations with N = 100 based

on the US data.

Figure 4 Sample and Shrunk Eigenvalues.

D = 2.00 D = 1.50 D = 1.33


Shrunk eigenvalue

10−2 10−2 10−2

10−3 10−3 10−3

10−4 10−4 10−4

10−4 10−3 10−2 10−4 10−3 10−2 10−4 10−3 10−2

Sample eigenvalue Sample eigenvalue Sample eigenvalue

D = 0.40 D = 0.20 D = 0.13


Shrunk eigenvalue

10−2 10−2 10−2

10−3 10−3 10−3

10−4 10−4 10−4

10−4 10−3 10−2 10−4 10−3 10−2 10−4 10−3 10−2

Sample eigenvalue Sample eigenvalue Sample eigenvalue

CVTL QuEST BN Sample

Note: The plotted eigenvalues are averaged over 50 simulations with random portfolio constituents based on the US

data with N = 100.

We make several observations regarding the shrunk eigenvalues of CVTL. First, CVTL shrinks the

eigenvalues in a clearly non-linear way. Second, CVTL shrinks small eigenvalues in a similar fashion

to QuEST, especially for lower dimensions. This is surprising given that CVTL does not induce any

statistically derived lower limit for small eigenvalues. Accordingly, this supports the use of cross

validation for parameter selection as it leads to comparable results like statistically derived methods.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
33

Third, CVTL does not reduce the largest eigenvalue for all dimensions. Instead, it increases the

first eigenvalue in the high dimensional problems. As a result, the weight directions of the first

eigenvalue will be suppressed to a stronger extent compared to other estimators. This indicates

that it is not always necessary to decrease the first eigenvalue. Independent of the dimension, the

eigenvalues achieved by BN deviate from QuEST and CVTL, which can be expected due to the

separation of signal and noise. The same analysis for CVTL σµ is provided in EC.2.3.

Benefits of Cross Validation and Second Shrinkage Target Ultimately, we perform an

analysis to estimate the isolated effects of using cross validation for parameter selection and the

benefit of the proposed second shrinkage target. For this purpose, we generate a new covariance

estimator called “QuESTCV” that uses the shrinkage target by QuEST (Ledoit and Wolf 2015)

but cross validation to select the shrinkage intensity. This allows us to calculate the individual

benefits of (i) cross validation for selection of shrinkage intensities as the difference in bps between

QuESTCV and QuEST, and (ii) the proposed second shrinkage target λθ as the difference in bps

between CVTL and QuESTCV. Table 11 presents the results for the objectives minimum variance

(we only consider the objective minimum variance as QuEST specifically minimizes volatility). For

instance, the value 7 in the row for cross validation for N = 100 and D = 0.40 indicates that using

cross validation to select the shrinkage intensity of QuEST instead of applying the original QuEST

estimator reduces out-of-sample volatility by 7 bps.

Table 11 Individual Benefits of Cross Validation and Second Shrinkage Target for US data.
N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13
Cross −1 0 1 7 6 3 −2 −1 2 6 3 1 1 2 3 5 3 0
validation
Shrinkage 7 5 3 −1 0 −3 4 5 4 0 −1 −3 2 1 1 −1 −3 −1
target λθ

Note: CV denotes the benefit of cross validation as the difference in basis points when using cross validation to select
the shrinkage intensity of QuEST (Ledoit and Wolf 2015) instead of applying the original QuEST estimator. λθ
denotes benefit of the proposed second shrinkage target as the difference in basis points when using CVTL instead
of QuEST with cross validation to select the shrinkage intensity. Each score denotes the difference in out-of-sample
volatility basis points, e.g., 1 denotes the difference 10.00% vs. 10.01%. Positive values indicate an improvement in the
respective performance metric. Negative values indicate a decline.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
34

We find that using cross validation for selecting shrinkage intensities has the greatest benefit

for low-dimensional problems, while the benefit for high-dimensional problems is smaller or even

negative. Conversely, using the second shrinkage target of CVTL instead of the non-linear shrinkage

target by QuEST reduces out-of-sample volatility in particular for the high-dimensional problems,

while it often has a negative effect for low-dimensional problems. This supports the recent idea of

Ledoit and Wolf (2021b) to shrink the eigenvalues quadratically for low-dimensional problems using

a non-linear and a linear shrinkage target. Additional analyses about the benefits of cross validation

and the second shrinkage target are provided in EC.3.

6. Concluding Remarks
We proposed a novel approach for covariance estimation that applies cross validation based transfer

learning called “CVTL”. The proposed approach applies non-linear shrinkage to the eigenvalues of

the sample covariance according to a given objective function. In contrast to existing methods, which

generally rely on analytically derived values for the estimation parameters, CVTL is purely data-

driven and agnostic with respect to the resulting eigenvalue distribution and shrinkage intensities.

All estimation parameters are selected using cross validation and the given objective based on a

disjoint history of assets. The resulting parameters are subsequently used to estimate the actual

covariance matrix of the given portfolio constituents. Thereby, our study presents a novel perspective

on the problem of covariance estimation.

Although CVTL is purely data-driven, the resulting shrunk eigenvalues are similar to those

of existing non-linear shrinkage models. However, we found differences in the high-dimensional

problems with objective minimum variance, where CVTL tends to increase the largest eigenvalue

which presents the exact opposite of existing estimators. This is surprising as, so far, the underlying

idea of non-linear shrinkage estimators was to push small eigenvalues up and large eigenvalues

down, as recently summarized by Ledoit and Wolf (2021a). A major advantage of CVTL over

existing estimators is its flexibility, as the estimation parameters can be selected based on a

predefined objective function. All portfolio weights are directly calculated through the GMV portfolio

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
35

optimization procedure that relies only on the covariance matrix. For instance, to maximize risk-

adjusted return, CVTL does not need to explicitly estimate future returns but they are implicitly

estimated via the chosen parameter configuration to shrunk the covariance eigenvalues.

This study opens up several avenues for future research. First, one could improve on the selection

of transfer datasets used during cross validation. In our study, we used random portfolio constituents

for the disjoint history. However, it should be possible to obtain more accurate estimation parameters

by specifically selecting similar stocks, e.g., from the same sectors but in different countries. Second,

the estimation parameters could be selected based on more sophisticated search procedures, like

random search (Bergstra and Bengio 2012), where each parameter value is drawn from a predefined

distribution. Third, it also seems reasonable to evaluate our approach based on other important

criteria like sustainability. Recently, portfolios with a particular focus on ESG (Environmental,

Social and Governance) criteria have received growing attention from investors. A possible objective

function could be defined as the ESG score divided by volatility, which would present a novel way

of generating ESG portfolios with low risk.

References
Ackermann F, Pohl W, Schmedders K (2017) Optimal and naive diversification in currency markets. Manage-

ment Sci. 63(10):3347–3360.

Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70(1):191–

221.

Ban GY, El Karoui N, Lim AEB (2018) Machine learning and portfolio optimization. Management Sci.

64(3):1136–1154.

Bastani H (2021) Predicting with proxies: Transfer learning in high dimension. Management Sci. 67(5):2964–

2984.

Bastani H, Simchi-Levi D, Zhu R (2021) Meta dynamic pricing: Transfer learning across experiments.

Management Sci. Forthcoming.

Bergmeir C, Benı́tez JM (2012) On the use of cross-validation for time series predictor evaluation. Inf. Sci.

191:192–213.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
36

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2):281–

305.

Bickel PJ, Levina E (2008) Covariance regularization by thresholding. Ann. Stat. 36(6):2577–2604.

Bodnar T, Parolya N, Schmid W (2018) Estimation of the global minimum variance portfolio in high

dimensions. Eur. J. Oper. Res. 266(1):371–390.

Bun J, Bouchaud JP, Potters M (2017) Cleaning large correlation matrices: Tools from random matrix theory.

Phys. Rep. 666:1–109.

Cai T, Liu W, Luo X (2011) A constrained l1 minimization approach to sparse precision matrix estimation.

J. Am. Stat. Assoc. 106(494):594–607.

Das SR, Markowitz H, Scheid J, Statman M (2010) Portfolio optimization with mental accounts. J. Financial

Quant. Anal. 45(2):311–337.

Das SR, Ostrov D, Radhakrishnan A, Srivastav D (2018) A new approach to goals-based wealth management.

J. Invest. Manag. 16(3):1–27.

De Nard G, Ledoit O, Wolf M (2019) Factor models for portfolio selection in large dimensions: The good, the

better and the ugly. J. Financ. Econom. Forthcoming.

DeMiguel V, Garlappi L, Nogales F, Uppal R (2009a) A generalized approach to portfolio optimization:

Improving performance by constraining portfolio norms. Management Sci. 55(5):798–812.

DeMiguel V, Garlappi L, Uppal R (2009b) Optimal versus naive diversification: How inefficient is the 1/N

portfolio strategy? Rev. Financ. Stud. 22(5):1915–1953.

Dorfman R (1979) A formula for the gini coefficient. Rev. Econ. Stat . 61(1):146–149.

Engle RF, Ledoit O, Wolf M (2019) Large dynamic covariance matrices. J. Bus. Econ. Stat. 37(2):363–375.

Fan J, Liao Y, , Mincheva M (2011) High-dimensional covariance matrix estimation in approximate factor

models. Ann. Stat. 39(6):3320–3356.

Fan J, Liao Y, Liu H (2016) An overview of the estimation of large covariance and precision matrices. Econom.

J. 19(1):C1–C32.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
37

Fan J, Liao Y, Mincheva M (2013) Large covariance estimation by thresholding principal orthogonal comple-

ments. J. R. Stat. Soc. Series. B 75(4):603–680.

Fan J, Liu H, Wang W (2018) Large covariance estimation through elliptical factor models. Ann. Stat.

46(4):1383–1414.

Fan J, Zhang J, Yu K (2012) Vast portfolio selection with gross-exposure constraints. J. Am. Stat. Assoc.

107(498):592–606.

Frahm G, Memmel C (2010) Dominating estimators for minimum-variance portfolios. J. Econom. 159(2):289–

302.

French KR (2021) Current research returns. https://mba.tuck.dartmouth.edu/pages/faculty/ken.

french/data_library.html.

Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso.

Biostatistics 9(3):432–441.

Frost PA, Savarino JE (1986) An empirical Bayes approach to efficient portfolio selection. J. Financial Quant.

Anal. 21(3):293–305.

Jagannathan R, Ma T (2003) Risk reduction in large portfolios: Why imposing the wrong constraints helps.

J. Finance 58(4):1651–1683.

James W, Stein C (1961) Estimation with quadratic loss. Proc. 4th Berkeley Symp. Mathematical Statistics

Probability, 361–380.

Jorion P (1986) Bayes-stein estimation for portfolio analysis. J. Financial Quant. Anal. 21(3):68–74.

Kourtis A (2015) A stability approach to mean-variance optimization. Financial Rev. 50(3):301–330.

Laloux L, Cizeau P, Bouchaud JP, Potters M (1999) Noise dressing of financial correlation matrices. Phys.

Rev. Lett. 83(7):1467–1470.

Laloux L, Cizeau P, Potters M (2000) Random matrix theory and financial correlations. Int. J. Theor. Appl.

Finance 3(3):391—-397.

Lam C, Fan J (2009) Sparsistency and rates of convergence in large covariancematrix estimation. Ann. Stat.

37(6B):4254–4278.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
38

Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application

to portfolio selection. J. Empir. Finance 10(5):603–621.

Ledoit O, Wolf M (2004a) Honey, I shrunk the sample covariance matrix. J. Portf. Manag. 4(30):110–119.

Ledoit O, Wolf M (2004b) A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar.

Anal. 88:365–411.

Ledoit O, Wolf M (2012) Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Stat.

40(2):1024–1060.

Ledoit O, Wolf M (2015) Spectrum estimation: a unified framework for covariance matrix estimation and Pca

in large dimensions. J. Multivar. Anal. 139(2):360–384.

Ledoit O, Wolf M (2017a) Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz

meets Goldilocks. Rev. Financ. Stud. 30(12):4349–4388.

Ledoit O, Wolf M (2017b) Numerical implementation of the quest function. Comput. Stat. Data. Anal.

115:199–223.

Ledoit O, Wolf M (2020) Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Stat.

48(5):3043–3065.

Ledoit O, Wolf M (2021a) The power of (non-)linear shrinking: A review and guide to covariance matrix

estimation. J. Financ. Econom. Fortcoming.

Ledoit O, Wolf M (2021b) Quadratic shrinkage for large covariance matrices. Bernoulli Fortcoming.

Ledoit O, Wolf M (2021c) Shrinkage estimation of large covariance matrices: Keep it simple, statistician? J.

Multivar. Anal. 186.

Markowitz HM (1952) Portfolio selection. J. Finance 7(1):77–91.

Michaud RO (1989) The Markowitz optimization enigma: Is ‘optimized’ optimal? Financial Anal. J. 45(1):31–

42.

Nguyen VA, Kuhn D, Esfahani PM (2021) Robust inverse covariance estimation: The wasserstein shrinkage

estimator. Oper. Res. Forthcoming.

Electronic copy available at: https://ssrn.com/abstract=3986993


Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation
39

Olivares-Nadal AV, DeMiguel V (2018) Technical note - A robust perspective on transaction costs in portfolio

optimization. Oper. Res. 66(3):733–739.

Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans. Knowl. Data. Eng. 22(10):1345–1359.

Plerou V, Gopikrishnan P, Rosenow B, Amaral LAN, Guhr T, Stanley HE (2002) A random matrix approach

to cross-correlations in financial data. Phys. Rev. E . 65(6):066126.

Roncalli T, Weisang G (2016) Risk parity portfolios with risk factors. Quant. Finance 16(3):377–388.

Rossi B, Inoue A (2012) Out-of-sample forecast tests robust to the choice of window size. J. Bus. Econ. Stat.

30(3):432–453.

Shefrin H, Statman M (2000) Behavioral portfolio theory. J. Financial Quant. Anal. 35(2):127–151.

Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc.

3rd Berkeley Symp. Mathematical Statistics Probability, 197–206.

Thapa C, Poshakwale SS (2010) International equity portfolio allocations and transaction costs. J. Bank.

Financ. 34(11):2627–2638.

Tu J, Zhou G (2011) Markowitz meets Talmud: A combination of sophisticated and naive diversification

strategies. J. Financ. Econ. 99(1):204–215.

Zhao L, Chakrabarti D, Muthuraman K (2019) Portfolio construction by mitigating error amplification: The

bounded-noise portfolio. Oper. Res. 67(4):965–983.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec1

E-Companion to “Cross Validation Based Transfer Learning


for Financial Covariance Estimation: A Data-Driven
Approach”
This e-companion contains a total of four appendices. EC.1 presents additional results for risk-

adjusted return after transaction costs. EC.2 provides several insights about the observed parameter

configurations of our approach. EC.3 evaluates the possible benefit of our approach over existing

methods. EC.4 presents the results for minimum variance and maximum risk-adjusted return for

several other datasets.

EC.1. Risk-Adjusted Return After Transaction Costs


In this appendix, we present the results for risk-adjusted return after transaction costs (0.25 percent

per trade (Thapa and Poshakwale 2010)). The turnover rates are shown in Table EC.1. The results

for risk-adjusted return after transaction costs for different datasets are shown in Table EC.3 for

D = 1.33 and Table EC.2 for D = 0.40. The results for all dimensions based on the US dataset are

shown in Table EC.4.

Table EC.1 Turnover for Different Portfolio Sizes and Dimensions for US Data.
N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13
Estimators with Objective Minimum Variance
CVTL 21.25 18.57 17.03 9.36 6.61 5.32 17.69 15.53 14.08 8.03 6.43 5.41 15.29 13.39 12.68 7.61 6.18 5.25
CVTLLS 22.78 20.12 18.33 9.90 6.82 5.40 19.59 17.04 15.40 8.56 6.56 5.52 16.66 14.79 13.95 8.12 6.35 5.43
QIS 22.67 21.95 21.77 13.00 8.45 6.55 19.21 18.14 17.63 11.12 7.71 6.18 16.94 16.60 16.33 10.38 7.36 5.53
QuEST 20.48 19.16 18.44 12.09 8.25 6.48 17.25 15.79 15.04 10.73 7.63 6.14 15.10 14.58 14.20 10.14 7.31 5.50
LShriCC 36.50 33.01 31.53 16.52 9.82 7.28 36.30 34.03 33.05 17.00 9.98 7.30 37.01 35.64 34.46 17.68 9.80 6.78
LShri 27.85 27.20 26.99 16.62 9.90 7.31 30.39 30.53 30.58 17.39 10.04 7.34 31.40 32.95 33.20 17.94 9.83 6.76
BPSEst – – – 24.34 11.49 7.94 – – – 22.76 10.92 7.68 – – – 22.22 10.52 7.03
FMEst – – – 25.91 11.66 8.00 – – – 23.65 11.03 7.71 – – – 22.91 10.60 7.05
POET I – – – 18.63 9.52 6.86 – – – 20.89 9.61 6.91 – – – 24.42 9.75 6.52
POET II – 43.65 37.67 15.09 9.55 7.50 – 39.98 34.15 13.85 9.33 7.32 – 38.53 32.73 13.55 9.13 6.59
BN 17.02 16.49 16.00 11.78 7.45 5.87 17.11 16.24 15.71 10.98 7.20 5.98 17.08 16.09 15.45 10.66 7.31 5.78
Sample – – – 28.41 12.26 8.32 – – – 25.12 11.43 7.89 – – – 24.03 10.85 7.13

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 19.33 15.79 14.51 7.26 4.89 4.31 15.85 13.36 11.71 6.01 4.89 5.37 11.95 11.14 11.13 5.58 5.18 4.78
σ
CVTLLS µ 21.96 18.57 16.91 6.27 4.19 3.11 18.15 15.93 13.77 5.32 3.70 3.73 11.38 11.29 13.52 4.56 3.95 3.31
σ
BN VAR 22.10 21.26 20.69 15.70 10.25 7.95 21.17 19.74 18.81 11.98 7.55 6.14 18.46 16.26 15.30 8.26 5.60 4.18
NC2R 20.08 19.79 19.12 20.03 22.25 23.26 17.06 19.23 18.10 20.58 19.24 22.39 7.01 6.56 6.26 7.82 7.70 10.07
CT – – – 14.25 6.20 4.26 – – – 12.60 5.78 4.05 – – – 12.04 5.49 3.69
1/N 1.35 1.38 1.37 1.35 1.32 1.30 1.36 1.35 1.37 1.31 1.32 1.32 1.36 1.37 1.34 1.30 1.31 1.23

Note: Empirical results based on the All US data set with N ∈ {100, 200, 300}. Each value is given in percent as the

average over 50 evaluations of random portfolio constituents. Turnover greater than 50 due to estimation errors is

denoted by “–”. The best estimator per problem setting (excluding the 1/N portfolio) is highlighted in bold.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec2 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Table EC.2 Out-Of-Sample Annualized Turnover and Risk-Adjusted Return after Cost for Different Datasets.
D Turnover Risk-Adjusted Return After Cost
N 100 200 300 100 200 300
Data US EU WO US EU WO US EU WO US EU WO US EU WO US EU WO
Estimators with Objective Minimum Variance
CVTL 17.03 20.64 19.70 14.08 18.50 17.55 12.68 17.93 16.62 0.49 0.55 0.32 0.88 0.91 0.45 0.76 0.87 0.40
CVTLLS 18.33 22.10 20.96 15.40 19.71 18.85 13.95 18.87 17.80 0.47 0.50 0.32 0.83 0.85 0.45 0.70 0.81 0.39
QIS 21.77 23.21 23.36 17.63 20.04 21.04 16.33 19.42 20.72 0.35 0.42 0.22 0.74 0.82 0.33 0.59 0.78 0.24
QuEST 18.44 20.00 20.06 15.04 17.64 18.59 14.20 17.39 18.68 0.45 0.51 0.30 0.84 0.90 0.40 0.68 0.86 0.31
LShriCC 31.53 32.74 30.87 33.05 33.48 32.73 34.46 34.11 34.66 −0.20 −0.11 −0.19 −0.17 −0.03 −0.25 −0.58 −0.21 −0.47
LShri 26.99 36.12 32.16 30.58 43.38 37.90 33.20 48.24 42.25 0.19 0.09 0.01 0.24 0.09 −0.12 −0.14 −0.26 −0.48
BPSEst – – – – – – – – – – – – – – – – – –
FMEst – – – – – – – – – – – – – – – – – –
POET I – – – – – – – – – – – – – – – – – –
POET II 37.67 45.78 44.45 34.15 42.83 42.53 32.73 41.48 41.32 −0.16 −0.20 −0.33 0.08 0.07 −0.27 −0.13 −0.05 −0.38
BN 16.00 17.71 16.86 15.71 18.01 17.53 15.45 18.84 18.04 0.57 0.60 0.40 0.82 0.89 0.45 0.76 0.89 0.39
Sample – – – – – – – – – – – – – – – – – –

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 14.51 16.44 16.58 11.71 15.60 14.27 11.13 15.57 16.03 0.60 0.67 0.45 0.98 0.99 0.57 0.84 0.98 0.44
σ
CVTLLS µ 16.91 19.93 20.27 13.77 18.49 16.52 13.52 17.26 18.00 0.52 0.55 0.40 0.91 0.87 0.52 0.75 0.85 0.40
σ
BN VAR 20.69 23.53 22.30 18.81 21.64 20.84 15.30 18.04 17.24 0.24 0.30 0.04 0.37 0.64 0.10 0.42 0.85 0.21
NC2R 19.12 20.58 17.82 18.10 11.23 11.65 6.26 4.79 4.57 0.41 0.33 0.41 0.47 0.70 0.54 0.59 0.72 0.70
CT – – – – – – – – – – – – – – – – – –
1/N 1.37 1.32 1.23 1.37 1.27 1.20 1.34 1.34 1.24 0.80 0.75 0.83 0.86 0.76 0.87 0.75 0.80 0.82

Note: Turnover and risk-adjusted return after costs (0.25% per trade) of all evaluated covariance estimators for

US, EU and WO data with N ∈ {100, 200, 300} and D = 0.4. Each score is given in percent as the average over 50

evaluations with random portfolio constituents. Out-of-sample volatility greater than 50 percent due to estimation

errors is denoted by “–”. The best estimator per problem setting (in terms of turnover excluding 1/N portfolio) is

highlighted in bold.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec3

Table EC.3 Out-Of-Sample Annualized Turnover and Risk-Adjusted Return after Cost for Different Datasets.
D Turnover Risk-Adjusted Return After Cost
N 100 200 300 100 200 300
Data US EU WO US EU WO US EU WO US EU WO US EU WO US EU WO
Estimators with Objective Minimum Variance
CVTL 9.36 11.27 10.41 8.03 10.84 10.00 7.61 10.65 10.03 0.54 0.78 0.44 0.80 1.07 0.74 0.73 1.06 0.65
CVTLLS 9.90 11.84 11.05 8.56 11.25 10.45 8.12 10.82 10.45 0.53 0.75 0.46 0.78 1.04 0.74 0.70 1.03 0.65
QIS 13.00 13.15 13.93 11.12 12.59 13.51 10.38 12.25 13.45 0.39 0.68 0.33 0.66 0.99 0.57 0.59 0.99 0.53
QuEST 12.09 12.36 13.04 10.73 12.12 13.19 10.14 11.97 13.46 0.41 0.71 0.35 0.68 1.01 0.58 0.60 0.99 0.53
LShriCC 16.52 16.48 16.15 17.00 17.50 17.59 17.68 18.62 18.81 0.06 0.35 0.15 0.10 0.32 0.22 −0.11 0.25 0.11
LShri 16.62 20.55 19.04 17.39 22.25 20.89 17.94 23.54 22.34 0.26 0.42 0.20 0.33 0.49 0.28 0.18 0.39 0.19
BPSEst 24.34 26.30 26.31 22.76 27.13 27.31 22.22 27.48 28.10 0.04 0.20 0.06 0.16 0.28 0.08 0.05 0.21 0.01
FMEst 25.91 28.53 28.68 23.65 28.72 29.13 22.91 28.61 29.58 −0.02 0.14 −0.03 0.11 0.22 0.00 0.01 0.18 −0.04
POET I 18.63 22.04 21.42 20.89 24.59 25.45 24.42 26.63 28.55 0.37 0.49 0.31 0.06 0.85 0.03 0.64 0.56 −0.10
POET II 15.09 18.16 18.11 13.85 17.34 17.41 13.55 17.22 16.93 0.26 0.50 0.22 0.42 0.68 0.46 0.34 0.62 0.42
BN 11.78 11.83 11.43 10.98 12.12 11.88 10.66 12.72 12.18 0.47 0.78 0.41 0.70 1.16 0.72 0.62 1.17 0.61
Sample 28.41 32.42 32.86 25.12 31.25 32.09 24.03 30.51 32.00 −0.12 0.05 −0.16 0.03 0.13 −0.11 −0.04 0.12 −0.11

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 7.26 7.84 7.57 6.01 7.33 6.48 5.58 9.07 6.31 0.69 0.89 0.61 1.03 1.26 0.93 0.89 1.29 0.81
σ
CVTLLS µ 6.27 6.46 6.10 5.32 6.28 5.33 4.56 7.76 5.27 0.73 0.87 0.65 1.05 1.26 0.99 0.91 1.18 0.83
σ
BN VAR 15.70 15.93 15.56 11.98 12.70 12.38 8.26 9.74 8.81 0.16 0.65 0.13 0.45 1.09 0.65 0.61 1.42 0.73
NC2R 20.03 20.49 17.02 20.58 16.19 14.04 7.82 5.54 5.26 0.28 0.34 0.36 0.34 0.55 0.68 0.40 0.72 0.61
CT 14.25 16.24 16.45 12.60 15.66 16.07 12.04 15.28 16.03 0.56 0.54 0.52 0.79 0.75 0.73 0.52 0.63 0.54
1/N 1.35 1.29 1.19 1.31 1.26 1.17 1.30 1.28 1.17 0.84 0.72 0.86 1.01 0.95 1.11 0.66 0.76 0.81

Note: Turnover and risk-adjusted return after costs (0.25% per trade) of all evaluated covariance estimators for US,

EU and WO data with N ∈ {100, 200, 300} and D = 1.33. Each score is given in percent as the average over 50

evaluations with random portfolio constituents. Out-of-sample volatility greater than 50 percent due to estimation

errors is denoted by “–”. The best estimator per problem setting (in terms of turnover excluding 1/N portfolio) is

highlighted in bold.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec4 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Table EC.4 Out-Of-Sample Risk-Adjusted Return after Cost for US Data and Different Sample Dimensions.
N 100 200 300
D = N/T 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13
Estimators with Objective Minimum Variance
CVTL 0.46 0.59 0.49 0.54 0.74 0.66 0.63 0.89 0.88 0.80 0.52 0.57 0.93 0.81 0.76 0.73 0.52 1.28
CVTLLS 0.47 0.57 0.47 0.53 0.74 0.66 0.58 0.84 0.83 0.78 0.53 0.56 0.86 0.76 0.70 0.70 0.51 1.23
QIS 0.45 0.51 0.35 0.39 0.65 0.60 0.56 0.81 0.74 0.66 0.46 0.51 0.85 0.67 0.59 0.59 0.44 1.22
QuEST 0.51 0.59 0.45 0.41 0.66 0.60 0.63 0.89 0.84 0.68 0.47 0.51 0.93 0.76 0.68 0.60 0.45 1.22
LShriCC – – – 0.06 0.44 0.46 – – – 0.10 0.17 0.32 – – – – 0.09 0.91
LShri 0.30 0.37 0.19 0.26 0.57 0.56 0.17 0.32 0.24 0.33 0.32 0.44 0.22 – – 0.18 0.28 1.08
BPSEst – – – 0.04 0.50 0.53 – – – 0.16 0.29 0.42 0.28 – – 0.05 0.25 1.05
FMEst – 0.29 0.29 – 0.49 0.52 – – – 0.11 0.28 0.42 0.28 – – 0.01 0.24 1.05
POET I – 0.58 – 0.37 0.71 0.66 – – – 0.06 0.50 0.55 – – – 0.64 0.49 1.23
POET II – – – 0.26 0.56 0.54 – – 0.08 0.42 0.32 0.44 – – – 0.34 0.33 1.08
BN 0.58 0.62 0.57 0.47 0.76 0.66 0.63 0.91 0.82 0.70 0.53 0.57 0.86 0.73 0.76 0.62 0.53 1.33
Sample – 0.29 0.29 – 0.44 0.50 – – – 0.03 0.25 0.41 0.28 – – – 0.22 1.05

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 0.56 0.66 0.60 0.69 0.98 0.73 0.73 0.99 0.98 1.03 0.67 0.68 1.06 0.92 0.84 0.89 0.71 1.36
σ
CVTLLS µ 0.52 0.60 0.52 0.73 0.98 0.76 0.63 0.89 0.91 1.05 0.73 0.71 1.08 0.91 0.75 0.91 0.74 1.40
σ
BN VAR 0.29 0.31 0.24 0.16 0.48 0.54 0.23 0.48 0.37 0.45 0.37 0.45 0.27 0.26 0.42 0.61 0.51 1.44
NC2R 0.39 0.32 0.41 0.28 0.27 – 0.38 0.35 0.47 0.34 0.34 – 0.76 0.71 0.59 0.40 0.38 0.65
CT – 0.24 0.25 0.56 0.92 0.69 – – 1.00 0.79 0.62 0.60 0.23 – – 0.52 0.54 0.95
1/N 0.90 0.83 0.80 0.84 1.00 0.65 0.90 0.93 0.86 1.01 0.67 0.57 0.86 0.78 0.75 0.66 0.57 0.72

Note: Empirical results of all evaluated covariance estimators for US data with N ∈ {100, 200, 300} and

D ∈ {2.0, 1.5, 1.33, 0.4, 0.2, 0.3}. Each score is given in percent as the average over 50 evaluations with random portfolio

constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best

estimator per problem setting is highlighted in bold.

Table EC.5 Out-Of-Sample Risk-Adjusted Return after Cost for FFI Data and Different Sample Dimensions.
N 10 30 49
D = N/T 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13
Estimators with Objective Minimum Variance
CVTL 0.37 0.49 0.34 0.34 0.59 0.73 0.05 0.11 0.15 0.45 0.66 0.68 – 0.01 0.15 0.47 0.44 0.69
CVTLLS 0.39 0.48 0.35 0.32 0.62 0.75 0.10 0.17 0.18 0.46 0.64 0.67 – 0.10 0.17 0.45 0.42 0.67
QIS 0.05 – 0.00 – 0.45 0.59 – – – 0.03 0.34 0.50 – – – – 0.13 0.45
QuEST 0.13 0.06 – 0.07 0.51 0.58 – – – 0.08 0.34 0.51 – – – 0.07 0.16 0.46
LShriCC – – – 0.18 0.50 0.54 – – – 0.09 0.41 0.56 – – – 0.02 0.20 0.51
LShri 0.10 0.21 0.14 0.17 0.56 0.67 – – – 0.01 0.34 0.52 – – – – 0.12 0.47
BPSEst 0.14 0.37 0.52 – 0.39 0.55 – – – – 0.19 0.42 – – – – – 0.35
FMEst – 0.67 0.68 – 0.37 0.54 0.44 – – – 0.17 0.41 – – 0.08 – – 0.34
POET I – 0.10 0.10 0.17 0.56 0.67 – – – – 0.36 0.52 – – – – 0.15 0.49
POET II 0.28 0.34 0.28 0.07 0.57 0.58 – – – – 0.32 0.47 – – – 0.12 0.18 0.41
BN 0.33 0.31 0.17 0.20 0.56 0.62 0.08 0.18 0.15 0.24 0.42 0.52 0.01 0.17 0.15 0.28 0.22 0.55
Sample – – 0.68 – 0.29 0.50 0.44 – – – 0.12 0.40 – – 0.08 – – 0.31

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 0.49 0.33 – 0.25 0.59 0.59 0.20 0.39 0.39 0.56 0.77 0.75 0.00 0.31 0.35 0.65 0.55 0.88
σ
CVTLLS µ 0.89 0.45 – 0.38 0.68 0.73 0.15 0.40 0.20 0.55 0.84 0.78 – 0.09 0.16 0.61 0.54 0.88
σ
BN VAR 0.03 0.11 0.13 – 0.29 0.29 – – – 0.09 0.36 0.50 – 0.10 0.09 0.34 0.24 0.35
NC2R – – – – 0.22 0.32 – – – 0.02 – 0.19 – – – 0.20 0.13 0.35
CT – – 0.51 – 0.50 0.67 0.34 – – 0.05 0.45 0.62 – – 0.07 – 0.30 0.63
1/N 0.61 0.67 0.51 0.48 0.56 0.67 0.53 0.60 0.73 0.67 0.58 0.63 0.49 0.65 0.57 0.70 0.50 0.70

Note: Empirical results of all evaluated covariance estimators for FFI data with N ∈ {10, 30, 49} and D ∈

{2.0, 1.5, 1.33, 0.4, 0.2, 0.3}. Each score is given in percent as the average over 50 evaluations with random portfolio

constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best

estimator per problem setting is highlighted in bold.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec5

EC.2. Empirical Parameters


This appendix provides additional analyses about the empirically observed parameters.

EC.2.1. Parameters in Analysis on US Data

The mean parameter values on the US dataset for different dimensions are shown in Table EC.6.

Table EC.6 Parameter Values on US Data for Different Sample Dimensions.


CVTL CVTLLS CVTL µ CVTLLS µ
σ σ

D = N/T δ1 δ2 δ3 γ δ1 δ2 δ1 δ2 δ3 γ δ1 δ2
N = 100
2.00 20.90 26.60 52.40 88.60 61.90 38.10 23.20 43.00 33.80 40.80 57.60 42.40
1.50 30.70 32.90 36.40 90.90 63.00 37.00 24.00 47.50 28.50 59.00 56.40 43.60
1.33 36.50 36.70 26.80 91.90 62.60 37.40 28.60 46.40 25.00 42.10 55.10 44.90
0.40 60.60 27.50 11.90 84.80 75.80 24.20 33.50 45.80 20.70 30.00 45.40 54.60
0.20 68.80 19.50 11.70 81.80 83.50 16.50 33.70 47.00 19.30 11.40 44.20 55.80
0.13 69.40 17.70 12.90 69.00 85.40 14.60 29.30 45.30 25.50 35.80 43.30 56.70
N = 200
2.00 25.10 24.90 50.00 91.80 63.60 36.40 27.30 42.60 30.10 48.80 56.60 43.40
1.50 33.30 29.20 37.50 94.60 65.00 35.00 31.20 46.10 22.70 50.30 57.00 43.00
1.33 38.50 33.00 28.50 96.20 64.10 35.90 30.80 48.60 20.60 16.30 52.00 48.00
0.40 59.60 26.30 14.00 96.10 75.90 24.10 34.90 49.30 15.80 14.00 43.20 56.80
0.20 63.40 17.60 19.00 77.50 84.60 15.40 30.20 44.80 25.10 49.10 41.90 58.10
0.13 66.40 15.30 18.30 74.90 87.60 12.40 34.30 38.50 27.20 18.20 53.30 46.70
N = 300
2.00 30.00 29.30 40.60 91.00 61.50 38.50 27.90 55.90 16.20 19.30 37.90 62.10
1.50 36.10 33.00 30.90 96.00 62.50 37.50 28.90 52.00 19.10 46.30 43.90 56.10
1.33 39.60 33.80 26.60 97.30 63.60 36.40 29.70 46.80 23.50 41.30 54.40 45.60
0.40 51.10 23.60 25.30 98.50 75.20 24.80 30.20 51.90 17.90 40.80 39.40 60.60
0.20 62.90 18.40 18.70 82.70 83.60 16.40 30.50 45.80 23.70 31.00 46.10 53.90
0.13 63.20 12.60 24.20 69.70 88.90 11.10 40.10 45.20 14.70 −4.50 43.50 56.50

Note: Each parameter value is given in percent as the average over 50 evaluations with random portfolio constituents.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec6 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.2.2. CVTL Parameters over Time

Next, we present how the parameters of our approach change over time after model inception. We

provide the corresponding plots for N ∈ {100, 200, 300} and D ∈ {2, 1.5, 1.33, 0.40, 0.20, 0.13} for

the objectives minimum variance (Figure EC.1, Figure EC.2, and Figure EC.3), and maximum

risk-adjusted return (Figure EC.6, Figure EC.7, Figure EC.8).

EC.2.2.1. Minimum Variance

Figure EC.1 Parameter Values over Time after Model Inception for Minimum Variance and N=100.
D = 2.00 D = 1.50 D = 1.33
1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

D = 0.40 D = 0.20 D = 0.13


1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

δ1 δ2 γ

Note: Values are aggregated over 50 evaluations with random portfolio constituents for US data and N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec7

Figure EC.2 Parameter Values over Time after Model Inception for Minimum Variance and N=200.
D = 2.00 D = 1.50 D = 1.33
1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

D = 0.40 D = 0.20 D = 0.13


1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

δ1 δ2 γ

Note: Values are aggregated over 50 evaluations with random portfolio constituents for US data and N = 200.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec8 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.3 Parameter Values over Time after Model Inception for Minimum Variance and N=300.
D = 2.00 D = 1.50 D = 1.33
1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

D = 0.40 D = 0.20 D = 0.13


1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

δ1 δ2 γ

Note: Values are aggregated over 50 evaluations with random portfolio constituents for US data and N = 300.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec9

Figure EC.4 Parameter Variation Over Time for Minimum Variance and N=100, D=1.33.
δ1 δ2
1 1
Paramter value

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 40 50 0 10 20 30 40 50

Months after inception Months after inception

1
Paramter value

−1

0 10 20 30 40 50

Months after inception

δ1 δ2 γ

Note: Parameter mean value and standard deviations. Values are aggregated over 50 evaluations with random

portfolio constituents for US data and N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec10 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.5 Parameter Variation Over Time for Minimum Variance and N=100, D=0.40.
δ1 δ2
1 1
Paramter value

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 40 50 0 10 20 30 40 50

Months after inception Months after inception

1
Paramter value

−1

0 10 20 30 40 50

Months after inception

δ1 δ2 γ

Note: Parameter mean value and standard deviations. Values are aggregated over 50 evaluations with random

portfolio constituents for US data and N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec11

EC.2.2.2. Maximum Risk-Adjusted Return

Figure EC.6 Parameter Values over Time after Model Inception for Maximum Risk-Adjusted Return and

N=100.
D = 2.00 D = 1.50 D = 1.33
1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

D = 0.40 D = 0.20 D = 0.13


1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

δ1 δ2 γ

Note: Values are aggregated over 50 evaluations with random portfolio constituents for US data and N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec12 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.7 Parameter Values over Time after Model Inception for Maximum Risk-Adjusted Return and

N=200.
D = 2.00 D = 1.50 D = 1.33
1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

D = 0.40 D = 0.20 D = 0.13


1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

δ1 δ2 γ

Note: Values are aggregated over 50 evaluations with random portfolio constituents for US data and N = 200.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec13

Figure EC.8 Parameter Values over Time after Model Inception for Maximum Risk-Adjusted Return and

N=300.
D = 2.00 D = 1.50 D = 1.33
1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

D = 0.40 D = 0.20 D = 0.13


1 1 1
Paramter value

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5


0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Months after inception Months after inception Months after inception

δ1 δ2 γ

Note: Values are aggregated over 50 evaluations with random portfolio constituents for US data and N = 300.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec14 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.9 Parameter Variation Over Time for Maximum Risk-Adjusted Return and N=100, D=1.33.
δ1 δ2
1 1
Paramter value

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 40 50 0 10 20 30 40 50

Months after inception Months after inception

1
Paramter value

−1

0 10 20 30 40 50

Months after inception

δ1 δ2 γ

Note: Parameter mean value and standard deviations. Values are aggregated over 50 evaluations with random

portfolio constituents for US data and N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec15

Figure EC.10 Parameter Variation Over Time for Maximum Risk-Adjusted Return and N=100, D=0.40.
δ1 δ2
1 1
Paramter value

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 40 50 0 10 20 30 40 50

Months after inception Months after inception

1
Paramter value

−1

0 10 20 30 40 50

Months after inception

δ1 δ2 γ

Note: Parameter mean value and standard deviations. Values are aggregated over 50 evaluations with random

portfolio constituents for US data and N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec16 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.2.3. Shrunk eigenvalues of CVTL with Objective Maximum Risk-Adjusted


Return

In the main paper, we compared the eigenvalues of CVTL against those of QuEST (Ledoit and

Wolf 2015) under the objective minimum variance. The same analysis under the objective maximum

risk-adjusted return is shown in Figure EC.11.

Figure EC.11 Sample and shrunk eigenvalues.


D = 2.00 D = 1.50 D = 1.33
Shrunk eigenvalue

10−2 10−2 10−2

10−3 10−3 10−3

10−4 10−4 10−4

10−4 10−3 10−2 10−4 10−3 10−2 10−4 10−3 10−2

Sample eigenvalue Sample eigenvalue Sample eigenvalue

D = 0.40 D = 0.20 D = 0.13


Shrunk eigenvalue

10−2 10−2 10−2

10−3 10−3 10−3

10−4 10−4 10−4

10−4 10−3 10−2 10−4 10−3 10−2 10−4 10−3 10−2

Sample eigenvalue Sample eigenvalue Sample eigenvalue

CVTL µ QuEST Sample


σ

Note: The plotted eigenvalues are averaged over 50 simulations with random portfolio constituents based on the US

data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec17

EC.2.4. Relations between Parameters

We also consider the relations between the parameters δ1 , δ2 , γ. For convenience, we define δ3 =

1 − δ1 − δ2 as the intensity of the second shrinkage target. We consider the relations between our

estimation parameters after 12 months following model inception. Figures EC.12 and EC.13 show

the relations and histogramms for the objective minimum variance based on dimensions D = 1.33

and D = 0.40. Figures EC.14 and EC.15 show the same plots but for the objective maximum

risk-adjusted return.

We observe that the parameter pairs (δ1 , δ2 ), (δ1 , δ3 ), and (δ1 , γ) are in opposite relation for low-

and high-dimensional problems.

EC.2.4.1. Minimum Variance

Figure EC.12 Parameter Relations for High-Dimensional Problem with D = 1.33 under Objective Minimum

Variance.

Note: The plotted eigenvalues are averagered over 50 simulations with random portfolio constituents based on the US

data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec18 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.13 Parameter Relations for Low-Dimensional Problem with D = 0.40 under Objective Minimum

Variance

Note: The plotted eigenvalues are averaged over 50 simulations with random portfolio constituents based on the US

data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec19

EC.2.4.2. Maximum Risk-Adjusted Return

Figure EC.14 Parameter Relations for High-Dimensional Problem with D = 1.33 under Objective Maximum

Risk-Adjusted Return.

Note: The plotted eigenvalues are averaged over 50 simulations with random portfolio constituents based on the US

data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec20 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.15 Parameter Relations for Low-Dimensional Problem with D = 0.40 under Objective Maximum

Risk-Adjusted Return.

Note: The plotted eigenvalues are averaged over 50 simulations with random portfolio constituents based on the US

data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec21

EC.3. Performance of Grid Entries in Empirical Analysis for US Data


In this appendix, we provide additional analyses about the benefit of using cross validation and

the proposed second shrinkage target of CVTL. In particular, we illustrate how different grid

entries γ, δ1 , δ2 ∈ P relate to traditional combinations between the sample covariance portfolio and

the 1/N portfolio. In addition, we illustrate how grid entries from CVTL relate to possible linear

shrinkage combinations (EC.3.1). We further extend the analysis about individual benefits of cross

validation and the second shrinkage target from the main paper (see Section 5.4) by relating results

from CVTL to the hypothetical combination of CVTL and QuEST (QuESTCV) (EC.3.2). We

also examine the potential benefit of CVTL (over the sample covariance estimate) and the second

shrinkage target by providing the share of grid entries from P that would lead to an outperformance

versus the respective baseline model (EC.3.3).

EC.3.1. Non-Linear Grid Entries with Outperformance Potential

Figures EC.16 and EC.17 illustrate the historic results of hypothetical grid entries from P for the

US dataset with N = 100 and D ∈ {1.33, 0.40}. Bright gray transparent circles represent the grid

entries of theoretically reachable portfolios, filled gray circles represent possible grid entries in the

class of rotation equivariant estimators resulting in linear shrinkage with λ1 + λ2 = 1. Black crosses

represent linear combinations between the sample and 1/N portfolios. Figure EC.17 zooms into

non-linear grid entries that have the potential to either improve volatility or risk-adjusted return.

Comparing linear shrinkage (filled gray circles) and non-linear shrinkage (bright gray transparent

circles), Figure EC.16 indicates that the hypothetically optimal grid entries converge in terms of

the portfolio that minimizes variance. Along the efficient frontier, however, both types of shrinkage

procedures diverge within the hypothetical grid entries in the sense that our non-linear shrinkage

target has the potential to increase return for the same level of risk. Yet, the non-linear shrinkage

target also leads to hypothetical grid entries that are clearly dominated by others, especially the

linear shrinkage combination. Hence, we attribute greater potential but also greater predictive risk

to CVTL than to CVTLLS. The right side of Figure EC.16 supports the common literature (e.g.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec22 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Ledoit and Wolf 2004a) that it is not beneficial to rely solely on the sample covariance matrix when

performing mean-variance portfolio optimization as the sample frontier is outperformed by most

hypothetical grid entries even without predictive model.

Figure EC.17 zooms into entries located near the historically optimal hypothetical grid entry that

are optimal in minimizing variance or maximizing risk-adjusted return. We observe two opposing

findings. In case of minimum variance, we observe significantly more non-linear versus linear grid

entries that lead to lower variance in the high-dimensional than in the low-dimensional problem.

This supports our finding from Table 11, where we find greater benefit of the non-linear shrinkage

target for high-dimensional problems. In case of maximum risk-adjusted return, we observe the

opposite, that is, significantly more non-linear versus linear hypothetical grid entries that lead to

greater risk-adjusted return in the low-dimensional than in the high-dimensional problems.

Figure EC.16 Performance of Grid Entries.

Note: Values are averaged over 50 simulations with random portfolio constituents based on the US data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec23

Figure EC.17 Performance of Grid Entries (Zoomed In).

Note: Values are averaged over 50 simulations with random portfolio constituents based on the US data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec24 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.3.2. Benefits of Cross Validation and Second Shrinkage Target

Figures EC.18 and EC.19 enlarge figs. EC.16 and EC.17 by also showing the grid entries of the

hypothetical estimator QuESTCV described in section 5.4 represented by small dark filled circles.

The traditional QuEST estimator is represented by one large dark filled circle. QuESTCV is achieved

by combining the eigenvalues of QuEST and cross validation to select the shrinkage intensity.

Figure EC.18 indicates that using the QuEST implied eigenvalues restricts the improvement

potential to minimizing variance. Along the efficient frontier their is no hypothetical grid entry that

leads to higher returns for the same level of variance compared to the linear shrinkage entries. Also

this can be expected by construction of the QuEST model, it highlights the flexibility of our second

shrinkage target. Figure EC.19 confirms this finding by indicating that neither in high nor in the low

dimensional case any hypothetical grid entry of QuESTCV leads to a higher risk-adjusted return

than the respective linear shrinkage grid entries. It further supports the recent idea of Ledoit and

Wolf (2021b), that combining QuEST eigenvalues with a linear shrinkage target has the potential

to improve the QuEST estimator, however, depends on finding the optimal intensity as several

QuESTCV grid entries have the potential of outperform QuEST. Finally, Figure EC.19 supports

our second non-linear shrinkage target as it leads to potentially lower variance than any combination

of the QuEST eigenvalues with a linear shrinkage target.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec25

Figure EC.18 Grid Entries of CVTL and QuESTCV.

Note: Values are averaged over 50 simulations with random portfolio constituents based on the US data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec26 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Figure EC.19 Grid Entries of CVTL and QuESTCV (Zoomed In).

Note: Values are averaged over 50 simulations with random portfolio constituents based on the US data with N = 100.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec27

EC.3.3. Possible Benefit Analysis

Table EC.7 further underlines the possible benefit of CVTL over the sample portfolio when

minimizing variance, and over the 1/N portfolio when maximizing risk-adjusted return. In addition,

Table EC.7 shows the possible benefits of the second shrinkage target of CVTL over CVTLLS

(linear shrinkage). We measure the potential benefit by counting the number of grid entries that

outperform the respective benchmark multiplied by 100 and divide the result by the grid size. As

an example, the possible benefit over the sample portfolio for N = 10 and D = 2 is 87 implying

that 87% of the hypothetical CVTL grid entries would outperform the sample portfolio in terms

of minimizing variance. The comparison using CVTL σµ measures the potential benefit in terms of

risk-adjusted return. Hence, Table EC.7 is the numerical representation of Figures EC.16 and EC.17.

When comparing CVTL to the sample covariance, the possible benefit decreases with an increasing

D. Under the objective maximum risk-adjusted return, the possible benefit of CVTL σµ over the

1/N portfolio increases with increasing D and N . Similar findings can be made when comparing

the volatility achieved by CVTL against CVTLLS. However, when D becomes larger, the possible

benefit is almost zero. This consistent with our findings in the main paper that the relative advantage

of CVTL over CVTLLS diminishes for large D.

Table EC.7 Possible Benefits of CVTL and CVTL µ .


σ

N CVTL CVTL µ
σ

D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13
Possible benefit over sample portfolio Possible benefit over 1/N portfolio
10 87.00 87.60 87.50 75.00 39.10 27.40 7.00 11.50 27.70 35.20 49.10 48.10
30 83.00 84.10 84.40 48.00 20.40 7.50 62.30 68.90 69.40 77.30 83.00 75.80
49 77.00 80.40 79.80 41.80 14.80 7.10 71.90 69.80 77.10 81.20 83.90 92.10
100 71.00 75.20 75.80 40.30 17.40 10.70 90.50 90.50 88.40 94.40 96.60 89.90
200 66.90 69.20 70.10 36.80 12.80 6.10 90.50 90.50 90.50 95.20 95.50 94.50
300 67.70 69.80 68.90 37.70 11.30 3.90 91.00 90.50 90.50 92.70 92.40 97.20
Possible benefit over CVTLLS Possible benefit over CVTLLS µ
σ

10 0.90 0.40 0.90 0.00 0.00 0.10 0.20 4.30 6.30 0.70 3.10 10.90
30 2.40 1.70 1.40 0.00 0.20 0.40 3.00 1.30 5.50 9.20 1.10 2.10
49 7.00 5.90 3.50 0.20 0.40 0.00 2.60 2.20 3.00 2.00 1.00 4.70
100 5.30 3.50 1.80 0.30 0.30 0.20 8.00 3.30 7.50 14.50 11.00 8.90
200 2.90 1.80 1.10 0.30 0.20 0.00 3.80 1.40 3.70 1.40 6.30 8.40
300 1.90 1.20 1.30 0.20 0.00 0.00 3.70 1.10 1.80 0.20 1.80 0.50

Note: Possible benefit of CVTL over different benchmarks based on FFI data for N ∈ 10, 30, 49 and US data for
N ∈ 100, 200, 300. For US data, each value is given in percent as the average over 50 evaluations of random portfolio
constituents.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec28 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.4. Additional Results on Other Datasets


This appendix presents several additional results and performance metrics on the remaining datasets.

The analyses in the main paper were primarily based on the US dataset and the metrics volatility

and risk-adjusted return. Here, we present the results on the WO and EU dataset. In addition, we

present of several other performance metrics, namely, the annualized return (Ret), the maximum

drawdown (maximum relative price depreciation over the time series averaged per annum, MDD),

downside volatility (annualized volatility of negative returns only, DwVol), Calmar ratio (we assume

the ratio as Ret/MDD, Calm), Sortino ratio (we assume the ratio as Ret/DwVol, Sort), and portfolio

turnover (sum of actively forced changes in weights averaged per annum, Turn). Altogether, we

present the following analyses

• US Dataset (EC.4.1)

— Other Performance Metrics (EC.4.1.1)

• WO Dataset (EC.4.2)

— Volatility (EC.4.2.1)

— Risk-Adjusted Return (EC.4.2.2)

— Risk-Adjusted Return After Cost (EC.4.2.3)

— Other Performance Metrics (EC.4.2.4)

• EU Dataset (EC.4.3)

— Volatility (EC.4.3.1)

— Risk-Adjusted Return (EC.4.3.2)

— Risk-Adjusted Return After Cost (EC.4.3.3)

— Other Performance Metrics (EC.4.3.4)

• FFI Industry Datasets (EC.4.4)

— Other Performance Metrics (EC.4.4.1)

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec29

EC.4.1. US Data

EC.4.1.1. Other Performance Metrics

Table EC.8 Other Performance Metrics on US Data for N=100.


High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm Sort VaR Turn Ret MDD DwVol Calm Sort VaR Turn
Estimators with Objective Minimum Variance
CVTL 9.10 31.35 6.00 29.03 1.52 1.23 17.03 7.32 29.88 5.83 24.49 125.56 1.20 9.36
CVTLLS 9.31 31.69 6.10 29.37 1.53 1.25 18.33 7.38 29.98 5.85 24.61 126.08 1.21 9.90
QIS 9.09 31.85 6.08 28.56 1.49 1.25 21.77 6.92 30.27 5.88 22.87 117.80 1.21 13.00
QuEST 9.08 31.64 6.05 28.71 1.50 1.24 18.44 6.94 30.19 5.86 22.98 118.44 1.21 12.09
LShriCC 5.97 34.60 6.67 17.24 0.89 1.38 31.53 4.81 32.66 6.22 14.73 77.33 1.29 16.52
LShri 8.98 32.85 6.28 27.33 1.43 1.29 26.99 6.77 31.34 6.04 21.61 112.21 1.25 16.62
BPSEst 10.61 44.59 8.53 23.79 1.24 1.76 50.00 6.74 33.73 6.48 19.99 104.03 1.34 24.34
FMEst – – – – – – – 6.44 34.06 6.51 18.92 98.96 1.35 25.91
POET I – – – – – – – 8.58 32.91 6.37 26.06 134.53 1.32 18.63
POET II 8.13 33.09 6.29 24.57 1.29 1.30 37.67 6.29 30.81 5.94 20.41 105.96 1.23 15.09
BN 10.16 34.49 6.65 29.46 1.53 1.37 16.00 7.38 30.34 5.90 24.32 125.02 1.22 11.78
Sample – – – – – – – 5.99 34.96 6.64 17.14 90.28 1.38 28.41

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 9.71 32.48 6.28 29.89 1.55 1.29 14.51 8.53 31.55 6.21 27.03 137.27 1.28 7.26
σ
CVTLLS µ 9.57 32.69 6.31 29.26 1.52 1.30 16.91 8.85 32.44 6.40 27.28 138.40 1.31 6.27
σ
BN VAR 8.12 37.87 7.33 21.45 1.11 1.51 20.69 5.70 34.72 6.72 16.41 84.74 1.39 15.70
NC2R 10.44 43.57 8.35 23.96 1.25 1.73 19.12 9.01 42.17 8.43 21.38 106.88 1.74 20.03
CT – – – – – – – 10.32 37.73 7.42 27.34 139.13 1.53 14.25
1/N 14.23 54.70 10.63 26.01 1.34 2.21 1.37 14.82 55.16 10.80 26.86 137.19 2.24 1.35

Note: Each value is given in percent as the average over 50 evaluations of N = 100 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with the best score per metric is highlighted in bold. Underlined values
indicate significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec30 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Table EC.9 Other Performance Metrics on US Data for N=200.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 10.41 25.05 4.75 41.56 2.19 0.97 14.08 8.16 25.21 4.71 32.35 173.12 0.96 8.03
CVTLLS 10.47 25.42 4.81 41.17 2.17 0.98 15.40 8.19 25.33 4.73 32.34 173.21 0.96 8.56
QIS 10.36 25.37 4.81 40.84 2.16 0.98 17.63 7.96 25.50 4.75 31.23 167.51 0.97 11.12
QuEST 10.41 25.25 4.79 41.23 2.17 0.97 15.04 7.97 25.47 4.75 31.28 167.81 0.97 10.73
LShriCC 6.96 29.64 5.53 23.49 1.26 1.14 33.05 5.24 28.18 5.16 18.60 101.47 1.06 17.00
LShri 10.05 27.29 5.12 36.84 1.96 1.05 30.58 7.21 27.07 4.99 26.64 144.53 1.03 17.39
BPSEst 10.96 38.61 7.22 28.40 1.52 1.49 50.00 7.26 28.72 5.30 25.27 136.97 1.09 22.76
FMEst – – – – – – – 7.05 28.85 5.32 24.43 132.63 1.10 23.65
POET I – – – – – – – 6.16 35.18 8.21 17.51 75.02 1.69 20.89
POET II 9.56 26.37 4.96 36.25 1.93 1.01 34.15 6.87 26.06 4.80 26.37 143.13 0.98 13.85
BN 11.13 27.76 5.34 40.10 2.09 1.08 15.71 8.28 25.70 4.78 32.21 173.37 0.98 10.98
Sample – – – – – – – 6.74 29.27 5.38 23.02 125.19 1.11 25.12

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 10.85 26.21 4.97 41.39 2.18 1.01 11.71 9.85 26.56 5.05 37.07 194.82 1.02 6.01
σ
CVTLLS µ 10.95 26.41 5.01 41.46 2.19 1.01 13.77 9.95 27.06 5.17 36.77 192.62 1.04 5.32
σ
BN VAR 8.43 32.05 6.31 26.32 1.34 1.28 18.81 7.04 29.20 5.54 24.11 127.02 1.14 11.98
NC2R 10.83 41.35 8.12 26.20 1.33 1.66 18.10 9.96 42.74 8.20 23.31 121.44 1.68 20.58
CT – – – – – – – 11.89 34.65 6.68 34.31 177.85 1.36 12.60
1/N 14.81 53.67 10.23 27.59 1.45 2.10 1.37 17.28 52.93 10.37 32.65 166.68 2.13 1.31

Note: Each value is given in percent as the average over 50 evaluations of N = 200 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per metric is highlighted in bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Table EC.10 Other Performance Metrics on US Data for N=300.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 8.60 23.37 4.52 36.81 1.90 0.92 12.68 7.13 23.68 4.42 30.10 161.42 0.90 7.61
CVTLLS 8.61 23.67 4.58 36.39 1.88 0.94 13.95 7.10 23.86 4.45 29.78 159.69 0.91 8.12
QIS 8.40 23.66 4.55 35.50 1.85 0.93 16.33 6.96 24.02 4.44 28.98 156.82 0.91 10.38
QuEST 8.48 23.54 4.54 36.01 1.87 0.93 14.20 6.93 24.00 4.43 28.86 156.27 0.91 10.14
LShriCC 3.66 29.58 5.35 12.37 0.68 1.12 34.46 3.63 27.55 4.88 13.16 74.26 1.01 17.68
LShri 7.53 26.43 4.91 28.47 1.53 1.01 33.20 6.06 26.05 4.73 23.25 128.05 0.98 17.94
BPSEst 8.49 36.23 6.72 23.43 1.26 1.39 50.00 6.13 27.41 4.99 22.36 122.75 1.03 22.22
FMEst – – – – – – – 6.02 27.51 4.99 21.88 120.55 1.03 22.91
POET I – – – – – – – 14.67 37.29 8.69 39.33 168.71 1.81 24.42
POET II 7.48 24.51 4.65 30.52 1.61 0.96 32.73 6.00 24.39 4.44 24.58 134.94 0.92 13.55
BN 9.95 25.13 5.03 39.60 1.98 1.03 15.45 7.29 24.04 4.45 30.33 163.94 0.91 10.66
Sample – – – – – – – 5.86 27.81 5.02 21.06 116.55 1.04 24.03

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 9.00 24.16 4.75 37.26 1.89 0.97 11.13 8.15 24.91 4.76 32.73 171.36 0.97 5.58
σ
CVTLLS µ 9.02 24.42 4.76 36.92 1.89 0.97 13.52 8.29 25.85 4.96 32.06 167.04 1.01 4.56
σ
BN VAR 7.82 30.75 6.21 25.44 1.26 1.26 15.30 7.03 26.49 5.07 26.54 138.66 1.04 8.26
NC2R 10.06 45.79 9.05 21.98 1.11 1.86 6.26 7.94 46.92 9.30 16.91 85.35 1.91 7.82
CT – – – – – – – 8.74 34.54 6.70 25.29 130.35 1.38 12.04
1/N 13.02 53.99 10.48 24.11 1.24 2.17 1.34 11.69 53.82 10.58 21.73 110.56 2.19 1.30

Note: Each value is given in percent as the average over 50 evaluations of N = 300 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per metric is highlighted in bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec31

EC.4.2. WO Data

EC.4.2.1. Volatility

Table EC.11 Out-Of-Sample Annualized Volatility for WO Data and Different Dimensions.
N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 9.28 9.20 9.21 8.83 8.75 9.06 8.17 8.04 7.97 7.88 8.59 8.99 7.59 7.61 7.45 7.72 8.44 8.16
CVTLLS 9.30 9.22 9.24 8.83 8.73 9.02 8.14 8.03 7.98 7.86 8.54 8.92 7.57 7.58 7.43 7.65 8.35 8.05
QIS 9.28 9.25 9.30 8.95 8.85 9.12 8.16 8.07 8.06 7.99 8.65 8.97 7.62 7.65 7.49 7.74 8.40 8.11
QuEST 9.23 9.19 9.23 8.94 8.85 9.12 8.13 8.04 8.03 7.99 8.65 8.97 7.60 7.63 7.46 7.74 8.40 8.11
LShriCC 9.84 9.82 9.85 9.32 9.04 9.24 8.87 8.78 8.76 8.42 8.86 9.15 8.33 8.39 8.29 8.20 8.67 8.26
LShri 9.57 9.61 9.68 9.30 9.02 9.22 8.62 8.69 8.74 8.47 8.83 9.11 8.16 8.38 8.32 8.25 8.64 8.23
BPSEst 11.52 11.98 12.64 9.94 9.20 9.31 10.23 10.72 11.41 8.96 8.95 9.17 9.54 10.16 10.79 8.67 8.72 8.26
FMEst – – – 10.06 9.23 9.32 – – – 9.05 8.97 9.17 – – – 8.74 8.73 8.26
POET I – – – 10.34 9.04 9.23 – – – 14.95 9.15 9.11 – – – 26.34 9.21 8.25
POET II 11.55 10.08 9.93 9.19 9.06 9.31 9.50 8.63 8.50 8.15 8.79 9.17 8.53 8.08 7.84 7.87 8.56 8.30
BN 9.79 9.62 9.64 9.01 8.82 9.10 8.61 8.44 8.38 8.00 8.64 8.98 8.06 8.00 7.83 7.77 8.43 8.15
Sample – – – 10.45 9.35 9.39 – – – 9.31 9.06 9.22 – – – 8.93 8.79 8.29

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 9.54 9.39 9.43 9.22 9.17 9.65 8.25 8.14 8.14 8.10 9.17 9.69 7.72 7.77 7.66 8.05 9.25 8.44
σ
CVTLLS µ 10.21 9.77 9.69 9.36 9.20 9.59 8.95 8.17 8.18 8.20 9.01 9.20 7.76 7.87 7.66 8.06 8.68 8.41
σ
BN VAR 10.39 10.32 10.39 10.19 10.00 10.17 9.52 9.33 9.34 9.05 9.67 9.96 9.22 9.24 9.16 8.73 9.44 8.91
NC2R 11.89 11.89 11.99 11.80 11.64 12.23 11.64 11.75 11.52 11.57 12.44 12.84 12.18 12.48 12.81 12.99 13.86 12.02
CT – – – 10.67 10.24 10.58 – – – 9.85 10.48 10.88 – – – 9.93 10.59 9.56
1/N 14.81 14.83 14.97 14.90 14.34 14.71 14.51 14.52 14.52 14.19 15.13 15.70 14.50 14.59 14.66 14.55 15.68 13.43

Note: Results of all evaluated covariance estimators for WO data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best
estimator per problem setting is highlighted in bold. Underlined values indicate significant difference from CVTL with
p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec32 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.4.2.2. Risk-Adjusted Return

Table EC.12 Out-Of-Sample Risk-Adjusted Return for WO Data and Different Dimensions.
N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 0.69 0.98 0.88 0.75 1.03 0.89 0.88 1.14 1.03 1.08 0.81 0.77 1.10 1.05 0.99 0.99 0.85 1.27
CVTLLS 0.76 1.01 0.92 0.79 1.04 0.90 0.96 1.19 1.07 1.10 0.83 0.76 1.14 1.07 1.02 1.01 0.85 1.32
QIS 0.71 0.99 0.88 0.74 0.98 0.86 0.91 1.14 1.02 1.02 0.79 0.76 1.07 1.01 0.96 0.99 0.86 1.32
QuEST 0.72 0.99 0.88 0.73 0.98 0.87 0.90 1.15 1.02 1.02 0.79 0.76 1.08 1.02 0.97 0.99 0.86 1.32
LShriCC 0.53 0.70 0.61 0.60 0.87 0.78 0.44 0.75 0.71 0.76 0.69 0.68 0.72 0.67 0.58 0.71 0.72 1.21
LShri 0.71 0.97 0.87 0.74 0.97 0.87 0.92 1.12 1.01 0.92 0.78 0.76 1.08 0.91 0.81 0.90 0.84 1.29
BPSEst 0.71 0.84 0.82 0.74 0.96 0.86 0.81 0.97 0.80 0.87 0.77 0.75 0.93 0.87 0.73 0.85 0.82 1.29
FMEst – – – 0.71 0.95 0.86 – – – 0.83 0.77 0.75 – – – 0.83 0.82 1.28
POET I – – – 0.86 1.06 0.93 – – – 0.48 0.85 0.82 – – – 0.18 1.03 1.42
POET II 0.57 0.91 0.81 0.74 0.97 0.86 0.81 1.10 1.02 1.02 0.79 0.80 1.02 1.00 0.96 0.99 0.89 1.29
BN 0.77 0.92 0.86 0.75 1.04 0.90 0.85 1.09 1.00 1.12 0.85 0.80 1.03 1.03 0.99 1.03 0.89 1.34
Sample – – – 0.65 0.91 0.85 – – – 0.77 0.75 0.75 – – – 0.81 0.82 1.27

Estimators with Objective Minimum Variance


CVTL µ 0.79 0.99 0.92 0.83 1.15 0.92 0.91 1.13 1.03 1.15 0.83 0.75 1.09 1.05 0.99 1.02 0.80 1.27
σ
CVTLLS µ 0.55 0.91 0.96 0.83 1.13 0.91 0.67 1.18 1.06 1.17 0.84 0.75 1.09 1.05 1.02 1.01 0.79 1.23
σ
BN VAR 0.54 0.68 0.60 0.52 0.94 0.91 0.55 0.82 0.68 1.01 0.80 0.70 0.58 0.66 0.70 1.00 0.70 1.26
NC2R 0.69 0.82 0.81 0.74 0.91 0.66 0.72 0.83 0.82 1.01 0.95 0.54 0.85 0.83 0.80 0.72 0.59 1.07
CT – – – 0.93 1.19 0.95 – – – 1.17 0.91 0.76 – – – 0.97 0.79 1.24
1/N 0.94 0.85 0.86 0.88 1.12 0.83 0.94 0.96 0.89 1.13 0.82 0.62 0.89 0.81 0.85 0.84 0.61 0.98

Note: Results of all evaluated covariance estimators for WO data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best
estimator per problem setting is highlighted in bold. Underlined values indicate significant difference from CVTL µ
σ
with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec33

EC.4.2.3. Risk-Adjusted Return After Cost

Table EC.13 Out-Of-Sample Risk-Adjusted Return after Cost for WO Data and Different Dimensions.
N 100 200 300
D = N/T 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13
Estimators with Objective Minimum Variance
CVTL 0.01 0.37 0.32 0.44 0.80 0.72 0.18 0.49 0.45 0.74 0.58 0.60 0.38 0.44 0.40 0.65 0.63 1.09
CVTLLS 0.05 0.36 0.32 0.46 0.81 0.72 0.19 0.49 0.45 0.74 0.59 0.58 0.38 0.42 0.39 0.65 0.61 1.12
QIS 0.02 0.31 0.22 0.33 0.70 0.65 0.17 0.43 0.33 0.57 0.50 0.54 0.31 0.30 0.24 0.53 0.56 1.08
QuEST 0.09 0.39 0.30 0.35 0.71 0.66 0.23 0.50 0.40 0.58 0.50 0.54 0.38 0.37 0.31 0.53 0.56 1.08
LShriCC – – – 0.15 0.58 0.57 – – – 0.22 0.36 0.44 – – – 0.11 0.38 0.94
LShri – 0.08 0.01 0.20 0.65 0.63 – – – 0.28 0.42 0.51 – – – 0.19 0.46 1.01
BPSEst – – – 0.06 0.59 0.61 – – – 0.08 0.38 0.49 – – – 0.01 0.42 0.99
FMEst – – – – 0.57 0.60 – – – – 0.37 0.49 – – – – 0.42 0.99
POET I – – – 0.31 0.73 0.70 – – – 0.03 0.49 0.58 – – – – 0.67 1.14
POET II – – – 0.22 0.64 0.62 – – – 0.46 0.46 0.56 – – – 0.42 0.56 1.02
BN 0.29 0.43 0.40 0.41 0.78 0.69 0.28 0.52 0.45 0.72 0.57 0.58 0.40 0.42 0.39 0.61 0.60 1.09
Sample – – – – 0.51 0.58 – – – – 0.34 0.48 – – – – 0.40 0.97

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 0.25 0.51 0.45 0.61 1.00 0.80 0.32 0.63 0.57 0.93 0.68 0.59 0.56 0.60 0.44 0.81 0.62 1.10
σ
CVTLLS µ – 0.36 0.40 0.65 1.02 0.83 0.05 0.59 0.52 0.99 0.72 0.64 0.65 0.65 0.40 0.83 0.65 1.11
σ
BN VAR – 0.10 0.04 0.13 0.63 0.67 – 0.22 0.10 0.65 0.57 0.52 0.03 0.15 0.21 0.73 0.53 1.12
NC2R 0.27 0.41 0.41 0.36 0.50 0.25 0.43 0.54 0.54 0.68 0.64 0.20 0.75 0.73 0.70 0.61 0.47 0.92
CT – – – 0.52 1.00 0.83 – – – 0.73 0.72 0.64 – – – 0.54 0.62 1.10
1/N 0.92 0.83 0.83 0.86 1.09 0.81 0.92 0.93 0.87 1.11 0.80 0.60 0.87 0.79 0.82 0.81 0.59 0.95

Note: Empirical results of all evaluated covariance estimators for WO data with N ∈ {100, 200, 300} and
D ∈ {2.0, 1.5, 1.33, 0.4, 0.2, 0.3}. Each value is given in percentage and is based on the average over 50 evaluations
of random investment universes. – is set for annualized out-of-sample volatility > 50% and negative out-of-sample
risk-adjusted returns. Best estimator per setting is highlighted in bold.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec34 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.4.2.4. Other Performance Metrics

Table EC.14 Other Performance Metrics on WO Data for N=100.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 8.12 33.40 6.08 24.31 133.54 1.26 19.70 6.65 31.56 5.80 21.06 114.62 1.21 10.41
CVTLLS 8.52 33.34 6.10 25.56 139.65 1.26 20.96 6.93 31.48 5.78 22.03 119.86 1.20 11.05
QIS 8.17 33.73 6.14 24.21 133.12 1.27 23.36 6.60 31.96 5.85 20.64 112.66 1.22 13.93
QuEST 8.10 33.54 6.10 24.15 132.75 1.26 20.06 6.56 31.96 5.85 20.51 111.99 1.22 13.04
LShriCC 6.01 35.66 6.50 16.84 92.37 1.35 30.87 5.55 33.11 6.12 16.76 90.67 1.27 16.15
LShri 8.45 34.77 6.35 24.31 133.13 1.32 32.16 6.86 32.90 6.04 20.86 113.63 1.26 19.04
BPSEst 10.31 43.51 8.05 23.70 128.10 1.67 50.00 7.39 34.82 6.42 21.24 115.20 1.34 26.31
FMEst – – – – – – – 7.15 35.27 6.49 20.28 110.21 1.35 28.68
POET I – – – – – – – 8.92 34.80 6.61 25.63 134.98 1.37 21.42
POET II 8.08 35.52 6.47 22.75 124.88 1.34 44.45 6.79 32.56 5.98 20.86 113.52 1.25 18.11
BN 8.29 34.85 6.36 23.78 130.39 1.31 16.86 6.72 32.32 5.92 20.79 113.55 1.23 11.43
Sample – – – – – – – 6.79 36.63 6.72 18.53 100.93 1.40 32.86

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 8.64 33.88 6.22 25.50 138.89 1.29 16.58 7.65 32.78 6.05 23.34 126.54 1.25 7.57
σ
CVTLLS µ 9.31 34.34 6.37 27.11 146.25 1.31 20.27 7.75 33.25 6.14 23.30 126.16 1.27 6.10
σ
BN VAR 6.22 37.79 6.98 16.46 89.11 1.44 22.30 5.34 36.98 6.85 14.45 77.96 1.42 15.56
NC2R 9.75 42.30 7.95 23.05 122.64 1.63 17.82 8.76 40.68 7.78 21.54 112.57 1.60 17.02
CT – – – – – – – 9.94 36.94 6.87 26.90 144.59 1.42 16.45
1/N 12.82 50.53 9.57 25.37 133.88 1.96 1.23 13.18 50.22 9.57 26.24 137.75 1.96 1.19

Note: Each value is given in percent as the average over 50 evaluations of N = 100 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per KPI is marked bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec35

Table EC.15 Other Performance Metrics on WO Data for N=200.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 8.23 28.42 5.12 28.95 160.61 1.05 17.55 8.49 28.11 5.07 30.19 167.52 1.05 10.00
CVTLLS 8.55 28.37 5.13 30.14 166.65 1.05 18.85 8.64 28.02 5.03 30.82 171.71 1.04 10.45
QIS 8.20 28.72 5.16 28.55 158.85 1.06 21.04 8.16 28.45 5.11 28.68 159.60 1.05 13.51
QuEST 8.16 28.64 5.15 28.50 158.55 1.05 18.59 8.18 28.43 5.11 28.77 159.98 1.05 13.19
LShriCC 6.18 30.91 5.53 20.00 111.72 1.15 32.73 6.40 29.91 5.38 21.41 119.09 1.11 17.59
LShri 8.80 30.44 5.51 28.91 159.60 1.13 37.90 7.82 29.77 5.36 26.27 146.03 1.11 20.89
BPSEst 9.13 39.71 7.11 22.98 128.30 1.47 50.00 7.79 31.45 5.63 24.76 138.35 1.16 27.31
FMEst – – – – – – – 7.55 31.75 5.68 23.78 132.86 1.18 29.13
POET I – – – – – – – 7.12 37.20 9.85 19.14 72.30 2.03 25.45
POET II 8.63 29.69 5.35 29.07 161.34 1.10 42.53 8.34 28.62 5.18 29.14 161.08 1.07 17.41
BN 8.41 29.91 5.40 28.13 155.95 1.11 17.53 8.95 28.46 5.14 31.47 174.11 1.06 11.88
Sample – – – – – – – 7.21 32.62 5.85 22.11 123.15 1.21 32.09

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 8.42 29.13 5.26 28.91 160.09 1.08 14.27 9.29 28.84 5.23 32.22 177.52 1.08 6.48
σ
CVTLLS µ 8.69 29.21 5.26 29.75 165.13 1.08 16.52 9.60 29.28 5.29 32.79 181.51 1.09 5.33
σ
BN VAR 6.31 33.70 6.23 18.74 101.31 1.27 20.84 9.18 31.95 5.95 28.73 154.38 1.21 12.38
NC2R 9.41 40.06 7.47 23.49 125.98 1.53 11.65 11.70 40.27 7.46 29.04 156.88 1.52 14.04
CT – – – – – – – 11.55 34.08 6.27 33.90 184.27 1.28 16.07
1/N 12.96 50.22 9.21 25.80 140.63 1.87 1.20 16.07 48.28 9.11 33.30 176.43 1.85 1.17

Note: Each value is given in percent as the average over 50 evaluations of N = 200 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per metric is highlighted in bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec36 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Table EC.16 Other Performance Metrics on WO Data for N=300.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 7.38 26.98 4.96 27.34 148.70 1.03 16.62 7.65 27.67 4.95 27.66 154.65 1.02 10.03
CVTLLS 7.56 26.90 4.94 28.11 153.10 1.03 17.80 7.72 27.37 4.86 28.21 158.71 1.00 10.45
QIS 7.18 27.10 4.95 26.50 145.02 1.03 20.72 7.63 27.56 4.93 27.67 154.64 1.02 13.45
QuEST 7.22 27.03 4.94 26.71 146.17 1.02 18.68 7.64 27.58 4.94 27.71 154.82 1.02 13.46
LShriCC 4.81 29.58 5.44 16.26 88.49 1.14 34.66 5.79 29.05 5.19 19.95 111.55 1.07 18.81
LShri 6.72 29.54 5.41 22.75 124.15 1.13 42.25 7.41 28.95 5.19 25.60 142.82 1.07 22.34
BPSEst 7.82 37.64 6.89 20.78 113.50 1.44 50.00 7.33 30.22 5.41 24.26 135.43 1.12 28.10
FMEst – – – – – – – 7.29 30.40 5.45 23.97 133.57 1.13 29.58
POET I – – – – – – – 4.67 38.95 18.94 12.00 24.67 3.70 28.55
POET II 7.56 27.56 5.11 27.44 147.87 1.06 41.32 7.77 27.61 5.01 28.13 155.02 1.03 16.93
BN 7.77 28.43 5.19 27.32 149.65 1.07 18.04 8.01 27.86 4.99 28.76 160.46 1.02 12.18
Sample – – – – – – – 7.25 31.01 5.58 23.37 129.93 1.15 32.00

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 7.58 27.85 5.08 27.23 149.22 1.05 16.03 8.18 28.83 5.15 28.38 158.95 1.05 6.31
σ
CVTLLS µ 7.80 27.74 5.09 28.10 153.14 1.05 18.00 8.12 28.90 5.17 28.09 157.12 1.06 5.27
σ
BN VAR 6.38 34.18 6.26 18.67 102.00 1.28 17.24 8.76 31.32 5.68 27.98 154.34 1.16 8.81
NC2R 10.20 44.18 8.36 23.09 121.93 1.71 4.57 9.38 44.78 8.57 20.94 109.36 1.75 5.26
CT – – – – – – – 9.67 34.33 6.37 28.18 151.77 1.29 16.03
1/N 12.43 50.38 9.37 24.67 132.61 1.92 1.24 12.15 49.17 9.39 24.72 129.41 1.92 1.17

Note: Each value is given in percent as the average over 50 evaluations of N = 300 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per metric is highlighted in bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec37

EC.4.3. EU Data

EC.4.3.1. Volatility

Table EC.17 Out-Of-Sample Annualized Volatility for EU Data and Different Dimensions.
N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 9.87 9.71 9.72 9.38 9.51 9.85 8.54 8.53 8.42 8.52 9.19 9.61 7.93 7.95 7.98 8.26 8.95 8.87
CVTLLS 9.97 9.80 9.80 9.41 9.52 9.84 8.60 8.58 8.46 8.52 9.15 9.54 7.94 7.95 7.98 8.23 8.90 8.77
QIS 9.90 9.77 9.84 9.46 9.57 9.89 8.54 8.55 8.46 8.56 9.17 9.53 7.91 7.94 7.99 8.28 8.88 8.68
QuEST 9.85 9.72 9.77 9.46 9.57 9.89 8.51 8.52 8.43 8.57 9.18 9.53 7.89 7.91 7.97 8.28 8.88 8.68
LShriCC 10.62 10.50 10.59 9.94 9.85 10.05 9.61 9.56 9.51 9.12 9.45 9.70 8.99 8.88 8.89 8.79 9.13 8.77
LShri 10.36 10.32 10.44 10.02 9.80 10.02 9.21 9.43 9.43 9.19 9.41 9.67 8.58 8.96 9.17 8.91 9.11 8.75
BPSEst 12.50 12.94 13.50 10.60 9.95 10.09 10.90 11.54 12.26 9.61 9.49 9.69 10.09 10.72 11.62 9.22 9.16 8.77
FMEst – – – 10.73 9.97 10.10 – – – 9.69 9.50 9.70 – – – 9.29 9.17 8.77
POET I – – – 10.38 9.85 10.04 – – – 10.33 9.51 9.69 – – – 10.87 9.25 8.83
POET II 14.98 10.66 10.52 9.75 9.84 10.14 10.57 9.18 8.97 8.79 9.34 9.69 9.10 8.40 8.38 8.45 8.97 8.74
BN 10.46 10.24 10.17 9.53 9.59 9.90 9.04 8.94 8.81 8.65 9.23 9.56 8.40 8.30 8.26 8.43 8.97 8.79
Sample – – – 11.11 10.09 10.16 – – – 9.93 9.58 9.73 – – – 9.47 9.21 8.78

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 10.08 9.97 9.95 9.75 9.93 10.55 8.66 8.72 8.61 8.93 9.75 10.49 8.16 8.10 8.17 8.56 9.82 9.75
σ
CVTLLS µ 10.27 10.04 10.06 9.92 10.04 10.53 8.94 8.76 8.65 8.91 9.70 9.91 8.22 8.19 8.20 8.53 9.17 9.63
σ
BN VAR 11.11 11.04 11.14 10.93 10.98 11.24 10.00 10.00 9.95 9.92 10.53 10.94 9.77 9.86 9.86 9.76 10.45 10.13
NC2R 12.72 12.63 12.53 12.50 12.48 13.05 12.36 12.82 12.47 12.63 13.37 13.94 13.20 13.41 13.88 14.00 15.15 13.60
CT – – – 11.52 11.20 11.58 – – – 10.75 11.40 11.95 – – – 10.73 11.58 10.83
1/N 16.38 16.42 16.40 16.41 15.89 16.35 16.17 16.15 16.13 15.85 16.90 17.64 16.06 16.16 16.29 16.23 17.55 15.50

Note: Results of all evaluated covariance estimators for EU data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best
estimator per problem setting is highlighted in bold. Underlined values indicate significant difference from CVTL with
p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec38 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.4.3.2. Risk-Adjusted Return

Table EC.18 Out-Of-Sample Risk-Adjusted Return for EU Data and Different Dimensions.
N 100 200 300
D = N/T 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13 2 1.50 1.33 0.40 0.20 0.13

Estimators with Objective Minimum Variance


CVTL 1.06 1.18 1.12 1.10 1.28 1.17 1.31 1.49 1.52 1.43 1.23 1.17 1.64 1.62 1.48 1.42 1.30 1.86
CVTLLS 1.07 1.16 1.11 1.09 1.27 1.15 1.32 1.46 1.49 1.40 1.23 1.14 1.59 1.58 1.46 1.39 1.26 1.82
QIS 1.03 1.12 1.06 1.05 1.24 1.14 1.27 1.44 1.46 1.40 1.24 1.17 1.57 1.58 1.44 1.39 1.30 1.85
QuEST 1.03 1.13 1.06 1.06 1.25 1.15 1.28 1.45 1.48 1.40 1.24 1.18 1.59 1.60 1.45 1.39 1.31 1.85
LShriCC 0.70 0.78 0.69 0.79 0.92 0.92 0.66 0.84 0.88 0.82 0.92 0.94 0.87 0.97 0.78 0.81 0.95 1.59
LShri 0.98 1.08 1.00 0.97 1.15 1.08 1.20 1.29 1.32 1.14 1.12 1.10 1.41 1.31 1.11 1.09 1.16 1.78
BPSEst 0.88 0.86 0.82 0.85 1.10 1.04 1.06 1.07 0.92 1.03 1.09 1.08 1.23 1.04 0.78 0.99 1.12 1.76
FMEst – – – 0.84 1.10 1.04 – – – 1.00 1.09 1.08 – – – 0.99 1.12 1.77
POET I – – – 1.06 1.32 1.21 – – – 1.52 1.39 1.30 – – – 1.24 1.50 –
POET II 0.32 1.02 0.93 1.00 1.14 1.06 0.96 1.29 1.34 1.21 1.11 1.13 1.26 1.42 1.25 1.17 1.31 1.70
BN 1.01 1.09 1.08 1.12 1.34 1.20 1.26 1.44 1.45 1.55 1.31 1.22 1.52 1.56 1.51 1.59 1.44 1.89
Sample – – – 0.81 1.07 1.04 – – – 0.96 1.08 1.09 – – – 0.97 1.13 1.77

Estimators with Objective Minimum Variance


CVTL µ 1.08 1.15 1.12 1.11 1.34 1.16 1.36 1.47 1.49 1.49 1.25 1.10 1.68 1.64 1.51 1.59 1.27 1.63
σ
CVTLLS µ 1.07 1.14 1.09 1.05 1.33 1.13 1.27 1.43 1.46 1.45 1.22 1.11 1.54 1.53 1.43 1.43 1.26 1.67
σ
BN VAR 0.84 0.96 0.86 1.05 1.29 1.31 1.07 1.25 1.24 1.45 1.40 1.28 1.23 1.36 1.36 1.70 1.41 1.99
NC2R 0.70 0.90 0.77 0.78 0.97 0.90 0.86 0.96 0.95 0.90 0.87 0.65 0.89 0.94 0.82 0.83 0.53 1.20
CT – – – 0.92 1.16 1.00 – – – 1.15 1.00 0.76 – – – 1.02 0.78 1.34
1/N 0.82 0.75 0.77 0.75 0.97 0.77 0.86 0.82 0.78 0.97 0.73 0.43 0.78 0.76 0.82 0.78 0.43 0.87

Note: Results of all evaluated covariance estimators for EU data with N ∈ {100, 200, 300} and D ∈
{2, 1.50, 1.33, 0.40, 0.20, 0.13}. Each value is given in percent as the average over 50 evaluations of random portfolio
constituents. Out-of-sample volatility greater than 50 percent due to estimation errors is denoted by “–”. The best
estimator per problem setting is highlighted in bold. Underlined values indicate significant difference from CVTL µ
σ
with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec39

EC.4.3.3. Risk-Adjusted Return After Cost

Table EC.19 Out-Of-Sample Risk-Adjusted Return after Cost for EU Data and Different Dimensions.
N 100 200 300
D = N/T 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13 2.0 1.50 1.33 0.40 0.20 0.13
Estimators with Objective Minimum Variance
CVTL 0.36 0.55 0.55 0.78 1.05 1.00 0.59 0.84 0.91 1.07 0.98 0.98 0.90 0.99 0.87 1.06 1.06 1.67
CVTLLS 0.34 0.50 0.50 0.75 1.03 0.98 0.54 0.78 0.85 1.04 0.98 0.95 0.82 0.91 0.81 1.03 1.01 1.62
QIS 0.36 0.46 0.42 0.68 0.99 0.95 0.56 0.77 0.82 0.99 0.96 0.96 0.86 0.92 0.78 0.99 1.02 1.63
QuEST 0.42 0.55 0.51 0.71 1.00 0.95 0.63 0.84 0.90 1.01 0.97 0.97 0.93 1.00 0.86 0.99 1.03 1.63
LShriCC – – – 0.35 0.65 0.72 – – – 0.32 0.60 0.71 – – – 0.25 0.62 1.33
LShri 0.06 0.15 0.09 0.42 0.83 0.85 – 0.08 0.09 0.49 0.76 0.85 0.02 – – 0.39 0.78 1.50
BPSEst – – – 0.20 0.75 0.80 – – – 0.28 0.71 0.82 – – – 0.21 0.73 1.48
FMEst – – – 0.14 0.74 0.80 – – – 0.22 0.70 0.82 – – – 0.18 0.73 1.48
POET I – – – 0.49 1.00 0.99 – – – 0.85 1.03 1.06 – – – 0.56 1.13 1.80
POET II – – – 0.50 0.82 0.82 – – 0.07 0.68 0.78 0.88 – – – 0.62 0.98 1.43
BN 0.53 0.62 0.60 0.78 1.09 0.99 0.68 0.87 0.89 1.16 1.03 1.00 0.88 0.93 0.89 1.17 1.14 1.64
Sample – – – 0.05 0.70 0.78 – – – 0.13 0.68 0.82 – – – 0.12 0.73 1.48

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 0.51 0.68 0.67 0.89 1.19 1.04 0.76 0.95 0.99 1.26 1.08 0.92 1.08 1.13 0.98 1.29 1.04 1.46
σ
CVTLLS µ 0.40 0.58 0.55 0.87 1.21 1.05 0.65 0.86 0.87 1.26 1.08 0.97 1.05 1.04 0.85 1.18 1.08 1.56
σ
BN VAR 0.25 0.38 0.30 0.65 1.00 1.08 0.42 0.64 0.64 1.09 1.16 1.10 0.65 0.82 0.85 1.42 1.24 1.86
NC2R 0.26 0.46 0.33 0.34 0.53 0.48 0.61 0.70 0.70 0.55 0.47 0.48 0.79 0.85 0.72 0.72 0.47 1.15
CT – – – 0.54 0.99 0.89 – – – 0.75 0.82 0.65 – – – 0.63 0.62 1.22
1/N 0.80 0.72 0.75 0.72 0.94 0.75 0.84 0.80 0.76 0.95 0.71 0.41 0.76 0.74 0.80 0.76 0.41 0.84

Note: Empirical results of all evaluated covariance estimators for EU data with N ∈ {100, 200, 300} and
D ∈ {2.0, 1.5, 1.33, 0.4, 0.2, 0.3}. Each score is given in percentage and is based on the average over 50 evaluations
of random investment universes. – is set for annualized out-of-sample volatility > 50% and negative out-of-sample
risk-adjusted returns. Best estimator per setting is highlighted in bold. Underlined values indicate significant difference
from CVTL µ with p < 0.05.
σ

Electronic copy available at: https://ssrn.com/abstract=3986993


ec40 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

EC.4.3.4. Other Performance Metrics

Table EC.20 Other Performance Metrics on EU Data for N=100.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 10.90 35.29 6.41 30.89 169.99 1.32 20.64 10.34 33.39 6.15 30.96 168.06 1.27 11.27
CVTLLS 10.87 35.58 6.47 30.55 168.07 1.33 22.10 10.24 33.48 6.16 30.60 166.20 1.27 11.84
QIS 10.39 35.90 6.50 28.93 159.76 1.34 23.21 9.95 33.74 6.19 29.50 160.89 1.28 13.15
QuEST 10.39 35.72 6.46 29.08 160.68 1.33 20.00 10.06 33.72 6.18 29.82 162.63 1.28 12.36
LShriCC 7.27 38.42 6.99 18.91 104.03 1.45 32.74 7.88 35.42 6.48 22.25 121.52 1.34 16.48
LShri 10.43 37.67 6.81 27.70 153.27 1.41 36.12 9.73 35.43 6.51 27.46 149.44 1.35 20.55
BPSEst 11.13 47.62 8.62 23.37 129.07 1.78 50.00 9.06 37.43 6.85 24.22 132.25 1.42 26.30
FMEst – – – – – – – 9.01 37.82 6.92 23.83 130.29 1.43 28.53
POET I – – – – – – – 11.05 36.49 6.73 30.29 164.12 1.39 22.04
POET II 9.75 37.86 6.83 25.76 142.69 1.41 45.78 9.75 34.56 6.34 28.20 153.65 1.31 18.16
BN 10.96 37.02 6.76 29.60 162.07 1.39 17.71 10.68 33.98 6.23 31.44 171.34 1.29 11.83
Sample – – – – – – – 9.00 39.02 7.13 23.07 126.19 1.48 32.42

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 11.16 35.96 6.57 31.04 169.81 1.36 16.44 10.78 34.57 6.41 31.18 168.14 1.32 7.84
σ
CVTLLS µ 10.94 36.54 6.63 29.94 164.86 1.37 19.93 10.43 35.30 6.53 29.56 159.81 1.35 6.46
σ
BN VAR 9.57 40.76 7.61 23.48 125.75 1.57 23.53 11.44 39.22 7.39 29.16 154.73 1.52 15.93
NC2R 9.63 45.70 8.41 21.07 114.51 1.73 20.58 9.71 44.31 8.25 21.91 117.71 1.70 20.49
CT – – – – – – – 10.61 40.60 7.46 26.14 142.25 1.54 16.24
1/N 12.66 57.11 10.32 22.17 122.75 2.12 1.32 12.25 56.71 10.41 21.59 117.69 2.14 1.29

Note: Each value is given in percent as the average over 50 evaluations of N = 100 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per KPI is marked bold. The lowest turnover of any
optimized portfolio is marked bold. Underlined values indicate significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec41

Table EC.21 Other Performance Metrics on EU Data for N=200.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 12.78 29.52 5.39 43.31 237.25 1.11 18.50 12.15 29.11 5.52 41.74 219.98 1.13 10.84
CVTLLS 12.59 29.82 5.44 42.23 231.29 1.12 19.71 11.96 29.21 5.52 40.96 216.72 1.14 11.25
QIS 12.39 29.74 5.42 41.67 228.68 1.11 20.04 11.95 29.32 5.55 40.75 215.23 1.14 12.59
QuEST 12.47 29.65 5.40 42.05 230.80 1.11 17.64 12.02 29.30 5.55 41.04 216.59 1.14 12.12
LShriCC 8.40 33.72 6.03 24.90 139.26 1.24 33.48 7.51 31.67 5.81 23.72 129.35 1.20 17.50
LShri 12.42 32.43 5.97 38.29 208.11 1.23 43.38 10.44 31.39 5.85 33.25 178.55 1.21 22.25
BPSEst 11.27 42.43 7.64 26.55 147.36 1.58 50.00 9.89 33.05 6.08 29.94 162.64 1.26 27.13
FMEst – – – – – – – 9.72 33.26 6.12 29.24 158.84 1.27 28.72
POET I – – – – – – – 15.66 34.52 6.53 45.36 239.99 1.35 24.59
POET II 12.03 30.99 5.66 38.82 212.43 1.16 42.83 10.65 29.86 5.63 35.67 189.36 1.17 17.34
BN 12.77 31.26 5.70 40.85 224.09 1.17 18.01 13.38 29.44 5.60 45.44 238.74 1.15 12.12
Sample – – – – – – – 9.49 33.92 6.25 27.99 151.86 1.30 31.25

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 12.87 30.48 5.59 42.22 230.17 1.15 15.60 13.30 30.75 5.85 43.26 227.39 1.20 7.33
σ
CVTLLS µ 12.60 30.64 5.58 41.12 225.74 1.15 18.49 12.95 30.78 5.83 42.07 222.19 1.20 6.28
σ
BN VAR 12.30 35.06 6.68 35.07 184.06 1.35 21.64 14.41 33.51 6.60 43.01 218.37 1.34 12.70
NC2R 11.86 44.00 8.16 26.95 145.35 1.67 11.23 11.40 43.96 8.09 25.94 140.97 1.66 16.19
CT – – – – – – – 12.40 37.42 6.88 33.13 180.29 1.42 15.66
1/N 12.58 57.05 10.26 22.04 122.58 2.08 1.27 15.37 54.93 10.04 27.99 153.07 2.05 1.26

Note: Each value is given in percent as the average over 50 evaluations of N = 200 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per KPI is marked bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec42 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Table EC.22 Other Performance Metrics on EU Data for N=300.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 11.81 27.94 5.27 42.27 223.94 1.08 17.93 11.69 28.45 5.36 41.10 218.15 1.10 10.65
CVTLLS 11.62 28.18 5.30 41.23 219.25 1.09 18.87 11.42 28.35 5.31 40.28 214.89 1.09 10.82
QIS 11.52 28.02 5.28 41.11 218.08 1.08 19.42 11.53 28.39 5.34 40.60 215.63 1.10 12.25
QuEST 11.59 27.97 5.26 41.44 220.16 1.08 17.39 11.51 28.37 5.35 40.58 215.23 1.10 11.97
LShriCC 6.91 31.74 5.73 21.76 120.63 1.19 34.11 7.09 30.77 5.56 23.05 127.50 1.15 18.62
LShri 10.18 31.69 5.96 32.11 170.88 1.23 48.24 9.70 30.09 5.64 32.24 171.93 1.16 23.54
BPSEst 9.11 40.94 7.34 22.25 124.12 1.52 50.00 9.17 31.54 5.80 29.08 158.20 1.19 27.48
FMEst – – – – – – – 9.17 31.62 5.83 29.00 157.25 1.20 28.61
POET I – – – – – – – 13.43 33.59 6.92 39.97 194.18 1.42 26.63
POET II 10.47 29.21 5.48 35.84 191.08 1.13 41.48 9.89 28.99 5.45 34.10 181.51 1.12 17.22
BN 12.52 29.06 5.35 43.07 233.89 1.12 18.84 13.38 28.94 5.45 46.22 245.51 1.11 12.72
Sample – – – – – – – 9.20 31.99 5.93 28.75 154.99 1.22 30.51

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 12.31 28.47 5.40 43.23 228.01 1.11 15.57 13.59 29.28 5.62 46.43 241.78 1.14 9.07
σ
CVTLLS µ 11.73 28.98 5.44 40.46 215.76 1.12 17.26 12.21 29.58 5.56 41.27 219.64 1.13 7.76
σ
BN VAR 13.39 35.33 6.73 37.89 198.88 1.37 18.04 16.63 34.06 6.52 48.84 254.94 1.32 9.74
NC2R 11.31 48.93 9.02 23.12 125.44 1.87 4.79 11.62 48.68 9.23 23.86 125.82 1.88 5.54
CT – – – – – – – 10.92 37.40 6.93 29.19 157.51 1.41 15.28
1/N 13.40 57.34 10.32 23.37 129.84 2.12 1.34 12.66 55.96 10.48 22.62 120.73 2.13 1.28

Note: Each value is given in percent as the average over 50 evaluations of N = 300 portfolio constituents with
D ∈ (1.33, 0.40). The covariance estimator with best score per KPI is marked bold. Underlined values indicate
significant difference from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation ec43

EC.4.4. Industry Data

EC.4.4.1. Other Performance Metrics

Table EC.23 Other Performance Metrics on FFI Data for N=10.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 8.38 44.32 8.73 18.91 95.95 1.78 14.14 8.76 40.71 7.93 21.51 110.48 1.63 16.46
CVTLLS 8.58 43.57 8.60 19.70 99.77 1.75 14.66 8.63 39.09 7.59 22.07 113.58 1.57 17.47
QIS 13.33 45.37 8.66 29.39 153.96 1.78 50.00 10.83 40.13 7.65 27.00 141.56 1.57 43.42
QuEST 11.74 41.80 7.98 28.08 147.08 1.65 45.46 11.22 39.50 7.59 28.41 147.83 1.56 38.93
LShriCC 11.58 42.35 8.09 27.35 143.20 1.65 45.44 10.53 39.32 7.49 26.77 140.45 1.53 31.42
LShri 9.60 40.71 7.99 23.58 120.17 1.63 29.47 9.82 38.30 7.48 25.63 131.32 1.53 29.43
BPSEst 10.57 50.60 9.79 20.89 107.96 1.99 9.61 10.55 42.55 8.16 24.78 129.19 1.66 45.90
FMEst – – – – – – – 11.14 43.29 8.18 25.74 136.16 1.67 50.00
POET I 11.05 39.64 7.93 27.87 139.26 1.62 37.01 10.12 37.75 7.41 26.80 136.58 1.51 30.64
POET II 17.72 44.43 8.63 39.88 205.22 1.77 50.00 12.29 40.19 7.66 30.57 160.47 1.57 42.86
BN 9.01 40.92 8.14 22.02 110.70 1.67 26.18 10.32 38.35 7.44 26.92 138.69 1.53 29.78
Sample – – – – – – – 11.58 46.15 8.59 25.09 134.87 1.75 50.00

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 4.28 44.52 10.28 9.61 41.60 2.15 20.89 7.22 42.10 8.13 17.14 88.76 1.66 15.06
σ
CVTLLS µ – 62.28 17.47 – – 3.68 23.74 9.08 39.72 7.75 22.86 117.23 1.59 16.31
σ
BN VAR 13.02 43.09 8.82 30.21 147.52 1.78 42.16 11.72 43.87 8.73 26.71 134.19 1.79 47.43
NC2R 10.08 44.78 8.83 22.51 114.17 1.78 46.29 8.23 43.10 8.55 19.09 96.18 1.74 40.59
CT – – – – – – – 9.58 41.20 8.02 23.25 119.45 1.64 38.73
1/N 8.07 51.48 9.98 15.68 80.93 2.02 0.76 7.62 50.78 9.77 15.00 77.93 1.99 0.80

Note: Values are given in percent based on N = 10 portfolio constituents (industries) with D ∈ (1.33, 0.40). The
covariance estimator with best score per metric is highlighted in bold. Underlined values indicate significant difference
from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993


ec44 e-companion to Mörstedt, Lutz, and Neumann: Transfer Learning for Covariance Estimation

Table EC.24 Other Performance Metrics on FFI Data for N=30.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 9.68 38.50 7.44 25.13 130.06 1.51 29.95 9.47 36.42 7.24 25.99 130.82 1.49 16.35
CVTLLS 9.81 38.47 7.45 25.51 131.68 1.51 28.95 9.59 36.31 7.20 26.42 133.15 1.49 16.56
QIS 9.80 40.36 7.57 24.29 129.52 1.55 50.00 9.10 37.72 7.45 24.14 122.26 1.54 33.57
QuEST 8.94 39.27 7.48 22.76 119.55 1.52 47.92 9.04 38.03 7.49 23.77 120.64 1.54 31.02
LShriCC 7.81 40.17 7.46 19.44 104.63 1.54 50.00 9.20 38.57 7.43 23.85 123.72 1.54 31.12
LShri 9.16 39.36 7.57 23.27 120.92 1.54 50.00 9.52 37.61 7.40 25.31 128.62 1.53 35.72
BPSEst 10.92 50.73 9.45 21.52 115.52 1.93 50.00 9.50 40.37 7.82 23.53 121.46 1.61 50.00
FMEst – – – – – – – 9.20 41.12 7.92 22.38 116.24 1.63 50.00
POET I 9.64 43.49 8.25 22.16 116.81 1.70 50.00 6.81 38.44 7.48 17.71 91.05 1.54 33.14
POET II 9.70 40.90 7.85 23.71 123.54 1.62 50.00 8.34 39.16 7.59 21.29 109.81 1.55 33.11
BN 9.85 38.63 7.48 25.49 131.61 1.51 30.63 8.89 37.46 7.44 23.74 119.59 1.52 23.40
Sample – – – – – – – 8.90 43.44 8.28 20.49 107.44 1.72 50.00

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 12.48 43.04 8.25 29.01 151.32 1.68 27.19 10.38 39.59 7.86 26.23 132.15 1.61 13.02
σ
CVTLLS µ 9.45 40.82 7.89 23.15 119.84 1.61 26.43 9.83 38.73 7.63 25.37 128.80 1.57 11.85
σ
BN VAR 11.61 41.48 8.12 28.00 142.95 1.65 45.46 9.80 41.18 8.27 23.81 118.56 1.70 32.80
NC2R 9.39 46.68 9.02 20.11 104.12 1.81 50.00 9.64 44.59 8.96 21.62 107.61 1.84 35.50
CT – – – – – – – 10.29 40.89 8.25 25.17 124.70 1.68 36.77
1/N 12.47 56.56 10.45 22.04 119.27 2.13 0.95 11.70 56.59 10.94 20.68 106.99 2.24 0.89

Note: Values are given in percent based on N = 30 portfolio constituents (industries) with D ∈ (1.33, 0.40). The
covariance estimator with best score per metric is hihglighted in bold. Underlined values indicate significant difference
from CVTL with p < 0.05.

Table EC.25 Other Performance Metrics on FFI Data for N=49.


D High-Dimensional Problem (D=1.33) Low-Dimensional Problem (D=0.40)
Metric Ret MDD DwVol Calm. Sort. VaR Turn. Ret MDD DwVol Calm. Sort. VaR Turn.
Estimators with Objective Minimum Variance
CVTL 9.60 36.07 6.93 26.61 138.55 1.41 30.10 9.28 33.98 6.59 27.30 140.78 1.35 16.48
CVTLLS 9.58 35.78 6.91 26.79 138.60 1.40 29.09 9.33 33.84 6.60 27.57 141.32 1.34 17.31
QIS 7.86 38.57 7.15 20.38 109.99 1.46 50.00 7.74 36.50 6.80 21.20 113.81 1.38 31.52
QuEST 7.87 37.97 7.10 20.72 110.76 1.46 43.77 8.29 36.17 6.77 22.93 122.46 1.38 28.80
LShriCC 10.89 37.10 7.06 29.35 154.28 1.46 50.00 9.15 36.20 6.90 25.29 132.57 1.41 34.39
LShri 8.42 38.61 7.18 21.82 117.34 1.47 50.00 8.68 36.27 6.89 23.93 126.02 1.40 37.11
BPSEst 11.44 46.78 9.25 24.47 123.68 1.88 50.00 6.84 40.03 7.50 17.09 91.16 1.54 50.00
FMEst – – – – – – – 6.54 40.89 7.59 16.00 86.21 1.56 50.00
POET I – – – – – – – 7.66 36.29 6.86 21.12 111.76 1.40 32.53
POET II 8.98 38.88 7.28 23.09 123.30 1.49 50.00 9.35 36.70 6.95 25.49 134.54 1.42 30.60
BN 8.82 36.50 7.07 24.17 124.78 1.45 27.30 8.41 35.17 6.77 23.91 124.14 1.38 20.62
Sample – – – – – – – 6.21 43.16 7.97 14.39 77.96 1.63 50.00

Estimators with Objective Maximum Risk-Adjusted Return


CVTL µ 9.72 38.77 7.62 25.06 127.57 1.53 20.99 10.85 34.79 6.89 31.19 157.51 1.40 14.13
σ
CVTLLS µ 9.53 36.29 7.05 26.26 135.11 1.43 29.31 10.16 34.65 6.85 29.33 148.25 1.39 13.60
σ
BN VAR 10.59 39.40 7.86 26.89 134.82 1.60 36.21 11.36 39.13 7.85 29.02 144.63 1.60 27.22
NC2R 9.36 44.57 8.73 20.99 107.50 1.77 36.27 9.61 42.88 8.46 22.41 113.56 1.71 26.58
CT – – – – – – – 9.06 40.31 7.88 22.48 115.07 1.59 35.22
1/N 9.78 55.47 10.73 17.62 91.08 2.17 0.94 11.99 54.55 10.66 21.99 112.48 2.15 0.89

Note: Values are given in percent based on N = 49 portfolio constituents (industries) with D ∈ (1.33, 0.40). The
covariance estimator with best score per metric is highlighted in bold. Underlined values indicate significant difference
from CVTL with p < 0.05.

Electronic copy available at: https://ssrn.com/abstract=3986993

You might also like