This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

www.comisef.eu

COMISEFWORKINGPAPERSSERIES

WPS-04521/09/2010

HeuristicStrategiesin

Finance– AnOverview

M.Lyra

- Marie Curie Research and Training Network funded by the EU Commission through MRTN-CT-2006-034270 -

Heuristic Strategies in Finance – An Overview

∗

Marianna Lyra

†

September 21, 2010

Abstract

This paper presents a survey on the application of heuristic opti-

mization techniques in the broad ﬁeld of ﬁnance. Heuristic algorithms

have been extensively used to tackle complex ﬁnancial problems, which

traditional optimization techniques cannot eﬃciently solve. Heuristic

optimization techniques are suitable for non-linear and non-convex

multi-objective optimization problems. Due to their stochastic fea-

tures and their ability to iteratively update candidate solutions, heuris-

tics can explore the entire search space and reliably approximate the

global optimum. This overview reviews the main heuristic strategies

and their application to portfolio selection, model estimation, model

selection and ﬁnancial clustering.

Keywords: ﬁnance, heuristic optimization techniques, portfolio manage-

ment, model selection, model estimation, clustering.

∗

Financial support from the EU Commission through MRTN-CT-2006-034270

COMISEF is gratefully acknowledged.

†

Department of Economics, University of Giessen, Germany.

1

Contents

1 Introduction 4

2 Heuristic Optimization Techniques 5

2.1 Trajectory Methods . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Simulated Annealing . . . . . . . . . . . . . . . . . . . 6

2.1.2 Threshold Accepting . . . . . . . . . . . . . . . . . . . 6

2.2 Population based Methods . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Diﬀerential Evolution . . . . . . . . . . . . . . . . . . . 10

2.3 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Hybrid Meta-Heuristics . . . . . . . . . . . . . . . . . . 13

2.3.2 Other Hybrids . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Constraint Handling Techniques . . . . . . . . . . . . . 15

2.4.2 Calibration Issues . . . . . . . . . . . . . . . . . . . . . 17

3 Response Surface Analysis for Diﬀerential Evolution 18

3.1 Response Surface Analysis . . . . . . . . . . . . . . . . . . . . 18

3.2 Implementation Details for Diﬀerential Evolution . . . . . . . 19

3.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Heuristics in Finance 22

4.1 Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.1 Transaction Costs and integer Constraints . . . . . . . 24

4.1.2 Minimum Transaction Lots . . . . . . . . . . . . . . . 26

4.1.3 Cardinality versus Diversity . . . . . . . . . . . . . . . 27

4.1.4 Index Tracking . . . . . . . . . . . . . . . . . . . . . . 29

4.1.5 Downside Risk Measures . . . . . . . . . . . . . . . . . 30

4.2 Robust Model Estimation . . . . . . . . . . . . . . . . . . . . 31

4.2.1 Nonlinear Model Estimation . . . . . . . . . . . . . . . 31

4.2.2 Robust Estimation of Financial Models . . . . . . . . . 35

4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.1 Risk Factor Selection . . . . . . . . . . . . . . . . . . . 37

4.3.2 Selection of Bankruptcy Predictors . . . . . . . . . . . 37

4.4 Financial Clustering . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.1 Credit Risk Assignment . . . . . . . . . . . . . . . . . 40

4.4.2 Portfolio Performance Improvement by Clustering . . . 41

4.4.3 Identiﬁcation of Mutual Funds Style . . . . . . . . . . 42

2

5 Conclusions 43

Bibliography 49

List of Tables

1 Optimal Orthogonal Design to minimize Median of Squares

Residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Optimal Rotatable Design to minimize Median of Squared

Residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Experimental Design Coding. . . . . . . . . . . . . . . . . . . 45

4 Experimental Design (Simulation Setup). . . . . . . . . . . . . 47

List of Figures

1 Decomposition of heuristic optimization techniques . . . . . . 7

2 The crossover mechanism in GA. . . . . . . . . . . . . . . . . 10

3 Crossover representation for the the ﬁrst and the last parameter 12

4 Classiﬁcation of Hybrid Meta-Heuristics . . . . . . . . . . . . 14

5 Calibration of technical parameters. . . . . . . . . . . . . . . . 18

6 Least median of squared residuals as a function of α and β . . 36

7 CCD Graph for three factors. . . . . . . . . . . . . . . . . . . 46

8 CCD with orthogonal design . . . . . . . . . . . . . . . . . . . 48

9 CCD with rotatable design . . . . . . . . . . . . . . . . . . . . 48

3

1 Introduction

The ﬁnancial paradigm entails simple mathematical models but often im-

poses unrealistic assumptions. Basic ﬁnancial models consist of linear and/or

quadratic functions which ideally can be solved by traditional optimization

techniques like quadratic programming. However, the ﬁnance community

soon realizes the complexity of the economic environment and the market

ineﬃciency. To cope with this, the ﬁnancial community develops more re-

alistic mathematical models which include more complex, often non-convex

and non-diﬀerentiable functions with real world constraints. Under these

more challenging conditions quadratic programming routines or other clas-

sical techniques will often not result in the optimal solution (or even in a

near-optimum).

Heuristic optimization methods

1

are ﬂexible enough to tackle many com-

plex optimization problems (without a linear approximation of the objective

function). Heuristics’ ﬂexibility stems either from the random initialization

or from the intermediate stochastic selection and acceptance criterion of the

candidate solutions. However, the optimization results should be carefully

interpreted, due to the random eﬀects introduced during the optimization

process. Besides, repetitions of the algorithm might generate heterogenous

solutions from an unknown distribution. To approximate the optimum solu-

tion without any prior knowledge of the distribution, Gilli and Winker (2009)

suggested repeating an experiment with diﬀerent parameter settings or ap-

plying extreme value theory. Also, Jacobson et al. (2006) developed a model

to approach the unknown distribution of the results.

This paper examines the modern ﬁnancial problems of portfolio selection,

robust model estimation and selection and ﬁnancial clustering. It analyzes

research papers published during the last two decades, where alternative

heuristic techniques were applied independently or complementarily to other

computational techniques. Furthermore, this overview extends the contribu-

tions collected in Gilli et al. (2008), who presented the most popular heuristics

and their usage in ﬁnance. Moreover, this work emphasizes and distinguishes

the implementation characteristics of several applications.

The paper is organized as follows: Section 2 reports on heuristic and

hybrid meta-heuristic methods and describes the basic algorithms. Also, it

discusses some implementation issues that should be considered when ap-

plying heuristics. Section 3 introduces in more detail the technical issue of

parameter setting in heuristic optimization algorithms. In due course, an

1

The expressions ‘heuristic optimization methods’, ‘heuristic strategies’, ‘heuristic tech-

niques’, ‘meta-heuristics’ and ‘heuristics’ will be used interchangeably throughout the pa-

per.

4

established methodology, Response Surface Methodology (RSM) (Box and

Draper (1987)), is used for parameter calibration. To the best of my knowl-

edge, RSM has not been applied so far in the context of parameter tuning for

heuristics. Section 4 addresses contemporary problems in ﬁnance and ana-

lyzes how heuristics can be applied to approach and overcome them. Finally,

Section 5 concludes and suggests further applications of heuristics to open

ﬁnancial questions.

2 Heuristic Optimization Techniques

Heuristic optimization techniques form part of the broad category of com-

putational ‘intelligent’ techniques which have, in principle, solved non-linear

ﬁnancial models. In comparison with other intelligent techniques, heuris-

tics need little parameter tuning and they are less dependant on the choice

of starting solutions. Heuristics provide an alternative or optimize the cur-

rent intelligent techniques, like Neural Networks (NN) (Ahn et al. (2006)).

The interaction of heuristics with other intelligent techniques is discussed in

Section 4.

A diverse range of optimization heuristics have been developed to solve

non-linear constrained ﬁnancial problems. The ﬁrst application of heuristics

in ﬁnance can be found in the work of Dueck and Winker (1992), who used

Threshold Accepting (TA) for portfolio optimization with diﬀerent utility

criteria. Their basic methodology is described in Section 4.1. This overview

explains the basic optimization heuristic algorithms applied in the recent

literature to ﬁnancial problems.

The two main classes of heuristic algorithms are construction heuristics

and improvement (or local search) heuristics. Construction heuristics start

from an empty initial solution and construct, e.g. by the means of nearest

neighborhood, the candidates that form an optimum solution (Hoos and

St¨ utzle (2005), ch.1). Local search heuristics iteratively update a set of

initial candidates that improve the objective function. While constructive

methods are easy and fast to implement and can be applied to problems like

scheduling and network routing, they do not appear in practical ﬁnancial

problems. This might be due to their tendency to local optima and their

problem-speciﬁc performance.

This chapter addresses only local search methods. Figure 1 decomposes

the local search methods to trajectory methods and population based meth-

ods. The mechanisms of neighborhood structure, as well as the number of

solutions that are updated simultaneously is what distinguish local search

methods. Here we present the four most common local search heuristics

5

found in the ﬁnancial literature. For an extensive description of heuristic

optimization techniques see also Winker (2001), ch.5.

2.1 Trajectory Methods

2.1.1 Simulated Annealing

Simulated Annealing (SA) as well as TA (section 2.1.2) are threshold methods

that accept a neighboring solution based on a threshold value. In that sense,

they can accept not only improvements but also controlled impairments of

the objective function value as long as a threshold is satisﬁed. SA is inspired

by the principals of statistical thermodynamics, where the changes (transfor-

mations) in the energy are related to macroscopic variables, i.e. temperature.

SA relates the change in the objective value to a probability threshold. The

probability value is subject to a macroscopic variable, T. T is iteratively

changed and the best objective function value is reported. Algorithm 1 de-

scribes the general outline of a SA implementation.

Initially, the algorithm randomly generates a solution (2:). Next, for

a predeﬁned number of iterations, I, the following steps are repeated. In

every iteration a new neighboring solution is randomly generated (4:). The

algorithm always accepts the new solution if it results in an improvement of

the objective function value. However, an impairment is also accepted with

a diminishing probability (6:), e

−∆/T

> u. u is a uniformly distributed value

lying in the interval [0, 1]. The new solution is carried onto the next iteration

and is used as a benchmark.

The probability value is inﬂuenced by changes in the objective function

value and the ‘temperature’ T. Ceteris paribus, the higher the (negative)

change, ∆, the lower the probability that an impairment will be accepted.

The ‘temperature’ T controls the tolerance of the algorithm in accepting

downhill moves. The lower its value is, the less tolerant it is to negative

changes. This is required, especially at the end of the search, where we believe

we are closer to the optimum solution and impairments are unwelcome.

By allowing moves towards worse solutions, the algorithm has a chance

to escape local minima. The number of iterations or the properties of the

temperature deﬁne the stopping criterion. In the latter case the temperature

reduces until an equilibrium state is found where no further change in the

objective function value can be observed.

2.1.2 Threshold Accepting

Dueck and Scheuer (1990) and Moscato and Fontanari (1990) conceived the

6

Heuristic Techniques

¸..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸

¸

¸

¸

¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸

Construction methods Local search methods

Trajectory methods

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

Population search

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

Threshold

accepting

¸

¸

¸

Diﬀerential

evolution

¸

¸

¸

Simulated

Annealing

Genetic

algorithms

Hybrid Meta Heuristics

¸¸

Figure 1: Decomposition of heuristic optimization techniques

7

Algorithm 1 General Simulated Annealing Algorithm.

1: Initialize I, T

2: Generate at random a solution χ

0

3: for r = 1 to I do

4: Generate neighbor at random, χ

1

∈ N(χ

0

)

5: Compute ∆ = f(χ

0

) − f(χ

1

)

6: if ∆ < 0 or e

−∆/T

> u then

7: χ

0

= χ

1

8: end if

9: Reduce T

10: end for

idea of building a simple search tool appropriate for many types of optimiza-

tion problems named TA. Like SA, TA enables the search to escape local

minima by accepting not only improvements, but also impairments of the

objective function value. Nonetheless, the impairment acceptance is not a

probabilistic decision but rather a given threshold sequence. More precisely,

until the stopping criterion is met, the current candidate solution is com-

pared with a neighboring solution (∆ = f(χ

0

) − f(χ

1

)). A threshold value

(τ) determines to which extent not only local improvements, but also local

impairments are accepted (∆ < τ). For a comprehensive overview of TA, its

parameter settings and its convergence properties, see Winker (2001).

Winker and Fang (1997) and Winker and Maringer (2007) suggested a

data driven approach for the construction of the threshold sequence. The

threshold sequence is ex-ante constructed based on the search space. The lo-

cal diﬀerences of the ﬁtness function are sorted in descending orders. Thus,

they represent a diminishing threshold sequence. The reduction of the neigh-

borhood allows a wider search space at the beginning of the search and a

greedy search towards the end. Moreover, Lyra et al. (2010b) proposed a

threshold sequence based on the diﬀerences in the ﬁtness of candidate solu-

tions that are found in a speciﬁc place of the search process. Local diﬀerences

actually calculated during the optimization run are considered. As a result,

the threshold sequence adapts to the region where the current solution be-

longs and to the objective function used. By using a moving average, a

smooth threshold sequence is obtained. In addition, the thresholds decrease

linearly with the number of iterations.

8

2.2 Population based Methods

2.2.1 Genetic Algorithms

Genetic Algorithms (GA) are one of the oldest evolutionary optimization

methods. They were developed by Holland (1975). They have been applied

to numerous ﬁnancial problems including trading systems, stock and portfolio

selection, bankruptcy prediction, credit evaluation and budget allocation.

Their wide applicability springs from generating new candidate solutions

using logical operators and from evaluating the ﬁttest solution. A thorough

explanation of the algorithm follows.

Algorithm 2 depicts the implementation details of GA. First in (2:), a

population n

p

of initial solutions P

(0)

j,i

is randomly initialized. Usually, the

solutions are encoded in binary stings but, other representations can be used.

The objective function value (or the ﬁtness value) is estimated for every

initial solution. Second, the algorithm iteratively generates n

G

new candidate

solutions, ‘chromosomes’, as follows: The two ﬁttest candidate solutions P

(0)

.,r

1

and P

(0)

.,r

2

, ‘parent chromosomes’, are chosen from the initial set to generate a

new candidate solution, ‘reproduce’. The reproduction or update consists of

two mechanisms, the recombination of existing elements, ‘genes’, crossover

and random mutation.

Algorithm 2 Genetic Algorithms.

1: Initialize parameters n

p

, n

G

2: Initialize and evaluate population P

(0)

j,i

, j = 1, · · · , d, i = 1, · · · , n

p

3: for k = 1 to n

G

do

4: for i = 1 to n

p

do

5: Select r

1

,r

2

∈1, · · · ,n

p

, r

1

= r

2

= i

6: Crossover and mutate P

(0)

.,r1

and P

(0)

.,r2

to produce P

(u)

.,i

7: Evaluate f(P

(u)

.,i

)

8: if f(P

(u)

.,i

) < f(P

(0)

.,i

) then

9: P

(1)

.,i

= P

(u)

.,i

10: else

11: P

(1)

.,i

= P

(0)

.,i

12: end if

13: end for

14: end for

There are alternative ways to crossover which are problem-speciﬁc and can

improve the performance of GA. The simplest way is to choose randomly a

crossover point and copy the elements before this point from the ﬁrst parent

and the elements after the crossover point from the second parent. This

way, we construct the ﬁrst child chromosome. The second child chromosome

9

↓

crossover-point

parent

1

=

1 1 0 1 | 0 1 0 ... 1

1×k

parent

2

=

1 0 1 0 | 1 1 0 ... 1

1×k

child

1

=

1 1 0 1 | 1 1 0 ... 1

1×k

child

2

=

1 0 1 0 | 0 1 0 ... 1

1×k

Figure 2: The crossover mechanism in GA.

combines the second part of the ﬁrst parent chromosome and the ﬁrst part

of the second parent chromosome. Figure 2 shows such a simple crossover

procedure. An alternative crossover operators is to select more than one

random crossover points (Back et al. (1996)) or to apply a uniform crossover

(Savin and Winker (2010)).

To complete the update, a random mutation of the elements takes place.

During mutation the new elements change randomly. In the case of binary

encoding a few randomly chosen bits change from 1 to 0 or from 0 to 1.

Hence, for each member of the population a new candidate solution, ‘children

or oﬀspring’, is produced and evaluated (7:). In some implementations of GA

a number of elite solutions, ‘elite chromosomes’, are transferred unmodiﬁed

to the next step. These are solutions with distinguished characteristics like

good ﬁtness value. Finally, only the solutions with the best objective function

value survive the generation evolution.

2.2.2 Diﬀerential Evolution

Similar to GA, Diﬀerential Evolution (DE) is a population based technique,

originated by Storn and Price (1997), but more appropriate for continuous

problems. The attractiveness of this technique is the little parameter tuning

needed.

During the initialization phase, Algorithm 3 (1:), the population size

n

p

and the generation number n

G

should be determined. Two technical

parameters F and CR should also be initialized at this stage. In order to

achieve convergence, n

p

should be increased more than ten times the number

10

Algorithm 3 Diﬀerential Evolution.

1: Initialize parameters n

p

, n

G

, F and CR

2: Initialize population P

(1)

j,i

, j = 1, · · · , d, i = 1, · · · , n

p

3: for k = 1 to n

G

do

4: P

(0)

= P

(1)

5: for i = 1 to n

p

do

6: Generate r

1

,r

2

,r

3

∈1, · · · ,n

p

, r

1

= r

2

= r

3

= i

7: Compute P

(υ)

.,i

= P

(0)

.,r1

+ F × (P

(0)

.,r2

- P

(0)

.,r3

)

8: for j = 1 to d do

9: if u < CR then

10: P

(u)

j,i

= P

(υ)

j,i

11: else

12: P

(u)

j,i

= P

(0)

j,i

13: end if

14: end for

15: if f(P

(u)

.,i

) < f(P

(0)

.,i

) then

16: P

(1)

.,i

= P

(u)

.,i

17: else

18: P

(1)

.,i

= P

(0)

.,i

19: end if

20: end for

21: end for

of parameters to be estimated,

2

while the product of n

p

and n

G

deﬁnes the

computational load. Price et al. (2005) in their book about DE state that,

although the scale factor F has no upper limit and the crossover parameter

CR is a ﬁne tuning element, both are problem speciﬁc. Winker et al. (2010)

in an application of DE propose values for F = 0.8 and CR = 0.9. Their

calibration procedure as well as a more statistically oriented methodology

are explained in Section 3.

As mentioned above, the initial population of real numbers is randomly

chosen and evaluated (2:). Then, for a predeﬁned number of generations the

algorithm performs the following procedure. According to the characteristics

of a population based approach a set of population solutions P

(0)

.,i

is updated.

The algorithm updates a set of parameters through diﬀerential mutation (7:)

and crossover (9:) using arithmetic instead of logical operations.

While both in GA and DE the mutation serves as a mechanism to over-

come local mimina their implementation diﬀers. Diﬀerential mutation gen-

erates a new candidate solution by linearly combining some randomly chosen

elements and not by randomly mutated elements, as GA does, whereas each

element of the current solution crosses with a uniformly designed candidate

2

A practical advice for optimizing objective functions with DE given on

www.icsi.berkeley.edu/ ∼ storn/.

11

and not with an existing one.

Particularly, diﬀerential mutation constructs new parameter vectors P

(υ)

.,i

by adding the scaled diﬀerence of two randomly selected vectors to a third

one. (7:) demonstrates the ‘metamorphosis’ (modiﬁcation), where F is the

scale factor that determines the speed of shrinkage in exploring the search

space.

Further, during crossover, (9:), DE combines the initial elements with the

new candidates. With a probability CR, each component P

(0)

j,i

is replaced by

a mutant one P

(υ)

j,i

resulting in a new trial vector P

(u)

j,i

(Figure 3).

Finally, the value of the objective function f(P

(u)

.,i

) of the trial vector is

compared with that of the initial element f(P

(0)

.,i

). Only if the trial vector

results in a better value of the objective function, it replaces the initial ele-

ment in the population. The above process repeats until all elements of the

population are considered and for a predeﬁned number of generations.

.

.

.

.

.

.

u

1

u

d

CR CR

uniform crossover

P

(u)

1,i

. . .

P

(u)

d,i

.

.

.

.

.

.

u

1

u

d

CR CR

uniform crossover

P

(u)

1,i

. . .

P

(u)

d,i

Figure 3: Crossover representation for the the ﬁrst and the last parameter

12

2.3 Hybrid Methods

2.3.1 Hybrid Meta-Heuristics

Heuristic optimization algorithms, also known as meta-heuristics in their

general setting, are suitable for optimizing many combinatorial problems.

Though, as diﬀerent meta-heuristics exhibit diﬀerent (advantageous) char-

acteristics, a combination of them can be beneﬁcial for ﬁnding the optimal

solution. When two or more meta-heuristics are combined, new hybrid com-

putational intelligent techniques are constructed. The basic advantage of

hybrid meta-heuristics is that they can combine the outstanding character-

istics of speciﬁc heuristics.

Numerous hybrid meta-heuristics can be constructed when combining dif-

ferent heuristics and their characteristics. The taxonomy based on hybrids’

design (algorithm architecture) merges a hierarchical and an independent

scheme (see Talbi (2002)). Figure 4 presents this classiﬁcation. The ﬁrst

two columns represent the hierarchical classiﬁcation, while the last column

the ﬂat scheme. A low-level hybrid crosses components of diﬀerent meta-

heuristics. In a high-level method the meta-heuristics are autonomous (col-

umn 1). Both a high and a low level methods distinguish relay and cooper-

ative hybridization (column 2). In the latter group the heuristics cooperate

and in the former group they are combined in a sequence.

Furthermore, hybrid meta-heuristics can be categorized based on other

characteristics, independent of the hierarchy. These are homogenous or het-

erogenous, global or partial, and general or special. When identical in struc-

ture meta-heuristics are used, a homogenous hybrid is constructed. Two

diﬀerent meta-heuristics construct a heterogenous hybrid. When the whole

search space is explored by heuristics we have a global hybridization. In a

partial hybrid, diﬀerent heuristics explore only a part of the solution space.

General hybrid algorithms are suitable for solving the same type of problems,

whereas special hybrids combine meta-heuristics that solve diﬀerent types of

problems. Besides their advantages, hybrids should be applied with caution

since algorithm performance will be well aﬀected by parameter tuning, which

becomes more intense as compared to single heuristics.

Two characteristics distinguish optimization heuristics; the number of so-

lutions they iteratively update and the mechanism for this update. The ﬁrst

characteristic refers to the distinction between trajectory search and popu-

lation search heuristics and the second to the mechanism of neighborhood

solution construction. Let us take a threshold heuristic as an example, i.e.

TA, and a population based heuristic, i.e. GA. The new hybrid resulted from

a composition of a threshold and a population based method is a high-level or

13

Low

¸

¸

¸

¸

¸

¸

¸

¸

¸

Coop

·

·

·

·

·

·

·

·

·

Homo/Hetero

Global/Partial

High Relay General/Special

Figure 4: Classiﬁcation of Hybrid Meta-Heuristics

heterogenous, global, general hybrid also known as Memetic Algorithm (MA)

(Moscato (1989)). On the one hand, a GA can explore eﬃciently a wider

search space using the cross over and mutation mechanisms. On the other

hand, TA can well escape local minima by its acceptance criterion. It accepts

neighborhood solutions even if they result in a deterioration, as long as they

do not exceed the threshold value. Thus, combining these meta-heuristics

can be beneﬁcial for the algorithm convergence. In the next sections some

applications of hybrid meta-heuristics are described, e.g. Maringer (2005)

used a MA in the context of risk factor and portfolio selection.

2.3.2 Other Hybrids

Often in the literature, additional hybridizations of meta-heuristics with ei-

ther constructive methods or other intelligent techniques can be found. Some

selected applications are presented in the following.

Gilli and Winker (2003) constructed a hybrid optimization technique us-

ing the Nelder-Mead simplex search and the TA algorithm. They model the

behavior of foreign exchange (DM/US-$) market agents while optimizing the

Agent Based Model (ABM) parameters. The Nelder-Mead simplex search

(Lagarias et al. (1999)) has the advantage of ﬁnding an optimum solution

with few computational steps.

3

Starting from d + 1 (in this application 3)

initial solutions, the search moves to a better solution that is a reﬂection of

this initial triangle or makes bigger steps if the reﬂection results in an im-

provement of the objective function value. However, if the simplex method

cannot ﬁnd a reﬂection point that outperforms the initial parameter settings,

the search space shrinks and the algorithm gives a solution which is nothing

more than a local optimum. To overcome the problem, the authors applied

the acceptance criterion of TA (Section 2.1.2). Using that technique they

showed that interactions between market agents could realistically represent

foreign exchange market expectations.

3

The simplex search chooses an eﬃcient step size.

14

In a diﬀerent application, Trabelsi and Esseghir (2005), Ahn et al. (2006)

and Min et al. (2006) used GA together with other intelligent technics to

select an optimal set of bankruptcy predictors. Besides model selection,

these papers applied GA to tune the parameters of bankruptcy prediction

models. Trabelsi and Esseghir (2005) constructed a hybrid algorithm which

evolves an appropriate (optimal) artiﬁcial neural network structure and its

weights. Ahn et al. (2006) and Min et al. (2006) forecast bankruptcy with

Support Vector Machines (SVM). GA selected the best set of predictors and

estimated the parameters of a kernel function to improve SVM performance

using data from Korean companies. The latter paper stressed the potentials

of incorporating GA to SVM in future research.

Varetto (1998) applied GA to train NN. The meta-heuristic was used not

only for model selection, but also for weight estimation.

Despite of the potentials of meta-heuristics in tuning or training other

techniques, e.g. NN and SVM, they should be applied with caution. First,

the more unknown parameters a heuristic algorithm has, the more complex

an algorithm is. Hence, it will be diﬃcult to evaluate the consistency of the

results. So, keeping the implementation simple is more advisable. Second,

diﬀerent heuristics are suitable for diﬀerent types of problems. Selecting the

appropriate heuristic technique will improve the performance of the hybrid,

e.g. the application of DE, instead of GA, in weight estimation might im-

prove both the eﬃciency and the computational time of the hybrid since DE

appears to be more suitable for continuous problems.

2.4 Technical Details

2.4.1 Constraint Handling Techniques

Real world ﬁnancial problems face simultaneously several constraints. Heuris-

tics are able to consider almost any kind of constraints, resulting generally

in good approximations of the optimal solution. However, the presence of

constraints makes the search spaces discontinuous and ﬁnding a solution that

satisﬁes all the constraints might be a demanding task. Several methods to

handle constraints are suggested in the literature so as to ensure at the end

a feasible solution.

When running the optimization heuristics, the problem constraints have

to be taken into account. To this end, alternative methods can be considered:

rewriting the deﬁnition of domination, such that it includes the constraint

handling; imposing a penalty on infeasible solutions; applying a repair func-

tion to ﬁnd the closest feasible solution to an infeasible one.

The ﬁrst possibility has been described for the application of DE in credit

15

risk assignment in Krink et al. (2007). The intuitive idea of this constraint

handling technique is to leave the infeasible area of the search space as quickly

as possible and never return. For minimization problems, the procedure can

be described as follows within Algorithm 3, Section 2.2.2:

1. If the new candidate solution P

(u)

j,i

and the current candidate solution

P

(0)

j,i

satisfy the side constraints, P

(u)

j,i

replaces P

(0)

j,i

if its ﬁtness f(P

(u)

j,i

)

satisﬁes the condition f(P

(u)

j,i

) ≤ f(P

(0)

j,i

).

2. If only one candidate solution is feasible, select the feasible one

3. If both solutions violate constraints, . . .

(a) . . . select the one that violates fewer constraints.

(b) . . . if both solutions violate the same number of constraints, P

(u)

j,i

replaces P

(0)

j,i

if its ﬁtness f(P

(u)

j,i

) satisﬁes the condition f(P

(u)

j,i

) ≤

f(P

(0)

j,i

).

In the second possibility, a penalty technique allows infeasible candidate

solutions while running the algorithm as a stepping stone to get closer to

promising regions of the search space. In this case, a penalty term is added

to the objective function. Solutions should be penalized the more they violate

the constraints.

Dueck and Winker (1992) (Section 4.1) provide an implementation of TA

where infeasible solutions are accepted to pass on to the next step by impos-

ing a punishment function. The penalty function is added to the objective

function (utility function), which ensures at the end a feasible solution. Lyra

et al. (2010b) (Section 4.4) implemented TA for credit risk assignment with

both the constraint-dominated handling technique and a penalty technique.

To guarantee a feasible solution at the end and avoid early convergence to a

local minimum, the penalty was increased over the runtime of the algorithm.

An increasing penalty function was also applied by Gimeno and Nave (2009)

(Section 4.2) in an GA implementation to parameter estimation.

In contrast, Krink et al. (2009) in an application of DE to index tracking

(Section 4.1.4), applied a repair function to ﬁnd the closest feasible solution

to an infeasible one.

Generally, Lyra et al. (2010b) found that the constraint-dominated han-

dling technique performed well while taking comparatively little computation

time. However, depending on the kind of objective function used the penalty

technique may improve the reliability of heuristics, i.e. reduce the variance

of the results obtained. Depending on the type of the objective function,

16

diﬀerent penalty weight functions can be applied, e.g. linear, exponential,

increasing or decreasing with the iteration number. Since calculation of the

penalty weight function might increase the CPU time, a rather simple penalty

function is recommended.

2.4.2 Calibration Issues

The selection of parameter values for a heuristic optimization algorithm is

essential for its eﬃciency. These parameters are the iteration number, the

neighborhood structure and the new solution (candidate) acceptance crite-

rion. A simple, but sometimes time consuming way to select the parameter

values is to iteratively run the algorithm for diﬀerent parameter values and

then extract some descriptive statistics (Maringer (2005)). Nonetheless, a

regression analysis or a Response Surface Analysis (Box and Draper (1987)))

can also be tested. For a small number of parameters, the above methods

are eﬃcient and relatively fast in ﬁnding the optimal parameter settings for

a heuristic algorithm.

4

Alternatively, in higher dimensions (higher number

of parameters) a heuristic can optimize the parameters of another heuristic.

Winker et al. (2010) implemented DE for LMS estimation of the CAPM

and the multifactor model of Fama-French. A detailed description of this

optimization problem is given in Section 4.2.2. While DE, as any optimiza-

tion heuristic technique, should not depend on the speciﬁc choice of starting

values, the optimal combination of the technical parameters, F and CR, de-

termines the speed of convergence to the global solution and the variability

of the results, respectively. To ﬁnd the optimal combination of F and CR,

the authors run the algorithm for diﬀerent combinations of F and CR. The

procedure is illustrated in Algorithm 4 for parameter values ranging from 0.5

to 0.9.

Algorithm 4 Calibration Issues.

1: Initialize parameters n

p

, n

G

2: Initialize population P

(1)

j,i

, j = 1, · · · , d, i = 1, · · · , n

p

3: for F = 0.5, · · · , 0.9 do

4: for CR = 0.5, · · · , 0.9 do

5: Repeat Algorithm 3 from line 3-21

6: end for

7: end for

Figure 5 exhibits the dependence of the best objective value obtained for

4

See Section 3 for an application of Response Surface Analysis to the estimation of DE

parameters in an application to Least Median of Squares (LMS) estimation of the CAPM

and a multifactor model.

17

diﬀerent combinations of F and CR for the LMS objective function (Sec-

tion 4.2.2) using the ﬁrst 200 daily stock returns of the IBM stock starting

on January, 2nd, 1970. The population size n

p

and the number of generations

n

G

are set to 50 and 100, respectively. The left side of Figure 5 presents the

results for a single run of the algorithm, while the right side shows the mean

over 30 restarts. Although the surface is full of local minima for CR below

0.7, it becomes smoother as CR reaches 0.8 independent of the choice of F.

The results clearly indicate that for higher values of CR, results improve,

while the dependency on F appears to be less pronounced. Based on these

results, values of F = 0.8 and CR = 0.9 are proposed for estimating the

parameters of the CAPM and the Fama-French factor model.

0.6

0.8

1

0.6

0.8

1

5

5.5

6

x 10

−5

CR

F

f

0.6

0.8

1

0.6

0.8

1

5

5.1

5.2

x 10

−5

CR

F

f

Figure 5: Calibration of technical parameters.

Instead of repeating a core algorithm numerous times,

5

a more eﬃcient ex-

perimental design methodology can be applied, namely RSM.

3 Response Surface Analysis for Diﬀerential

Evolution

3.1 Response Surface Analysis

RSM helps explore the relationship between the objective function value and

the level of parameters using only few level combinations (Box and Draper

(1987)). The aim of RSM is to ﬁnd the parameters value with little tuning

and without precise prior knowledge of their optimum value.

RSM consists of three main stages. In the ﬁrst stage, the input variables

are initialized. The initialization is based on a prior knowledge about the best

5

In this case 41 × 41 for all combinations of F and CR ranging in 0.5:0.01:0.9.

18

combination of parameters that will optimize the objective function. These

values constitute the reference or the center points. Next, the experimental

domain is speciﬁed. It is based on orthogonal and/or rotatable experimental

designs.

6

Inside this domain the factors are assigned speciﬁc values (levels),

at which the response variable is estimated.

In the ﬁnal stage, a second order polynomial is ﬁtted using the designed

experiments. After diﬀerentiation, this results in the optimal set of param-

eters. The reason for choosing a polynomial relationship is that no prior

information about the existing relationship between the input and the re-

sponse variable is needed.

y

i

= β

0

+

β

i

X

i

+β

ii

X

2

i

+β

ij

X

i

X

j

+ε , (1)

where

y

i

minimum median of squared residuals for i − th setup of input variables

β

′

s parameters of equation

X

′

s input variables for setup i

ε residual .

To ﬁnd the optimal combination of factors’ value that optimize the objec-

tive function of interest one should ﬁrst understand how the objective func-

tion value changes with possible changes in the factors’ level. To discover

the relationship, the most common process is trial and error. By randomly

changing the factor values (either simultaneously or one at a time) one can

investigate the reaction of the objective function value. If the lower and up-

per bounds of the factor levels are known, one can evaluate the objective

function for these factor combinations and then make some inference about

the possible relationship between the parameters and the objective function.

Alternatively, one can model the relationship by considering either a linear

or a non-linear shape. When the shape of the relationship is not known a

priori, for three factors, a second order polynomial relationship can be tested

(see Equation 1).

7

This process approximates either linear or quadratic re-

lationship.

3.2 Implementation Details for Diﬀerential Evolution

RSM explores the relationship between several input variables (factors) and a

response output. In this exercise, the performance of RSM is tested in ﬁnding

6

Orthogonal and rotatable designs are described in detail in what follows.

7

A higher order polynomial can be tested when there are enough data points.

19

the optimal combination of DE parameters value in the speciﬁc application

of obtaining the CAPM parameters to minimize the median of squares error.

The input variables are the (scaling) factor F, the crossover probability

CR and the population size n

p

. Note that for DE the product of the popu-

lation size n

p

and number of generations n

G

determines the computational

complexity of the algorithm. Hence, the aim is to ﬁnd the optimal combi-

nation of the parameters F, CR and n

p

for a given computational load n

p

× n

G

. For that, the computational load (n

p

× n

G

) is kept constant. Three

diﬀerent levels of computational resources are considered, with n

G

× n

p

=

500, 2500 and 5000 meaning, low, medium and high complexity, respectively.

Since the product of n

p

× n

G

should be constant, an adjustment is made on

n

G

in each alteration.

The response variable is the minimum value obtained for

min

α,β

(med(ε

2

i,t

)) , (2)

where ε

i,t

= y

i,t

− α −βx

i,t

are the residuals of a factor model.

To specify the factor levels that help estimate the parameters of Equation

1, a Central Composite Design (CCD) is applied. CCD is an experimental

method that eﬃciently builds a second order model for the response variable

(Barker (1985)). Eﬃciency means to obtain a representative relationship

between the parameters and the objective function of interest with the least

parameter tuning.

CCD contains ﬁve level-combinations. First, a two level full factorial de-

sign assigns two levels, a low and a high level, to the three factors (input vari-

ables). Thus, eight experiments result from the two level-combinations (2

3

)

in our problem. A two level factorial combination alone can capture a pos-

sible linear relationship between the input variable and the response output.

The two levels range symmetrically around a center point (e.g. F

center

±0.1).

In addition to the two level factorial design, three additional levels are as-

signed to the factors. That is, ﬁrst, a level that represents the best known

value for the factors (F

center

). Also, in order to expand the experimental

domain, two additional levels are considered, both above and below the high

and low values considered above.

By using ﬁve levels for each factor a curvilinear relationship, even up

to a fourth level, or a quartic relationship can be investigated. Besides,

we are able to search the optimal factor levels over a wider experimental

domain. The eﬃciency of the CCD stems from the fact that it uses only 18

level combinations to calibrate the input variables. Alternatively, a ﬁve level

full factorial design uses 5

3

= 125 level-combinations. The Appendix gives a

detailed explanation and a numerical representation of the CCD experiments.

20

For each level-combination of input variables a response value is obtained,

the minimum median of squared residuals, Equation 2. To control for the

randomness in the heuristic’s output, the experiments are repeated 10 times.

The mean response value (mean out of 10 repetitions of Equation 2) for

each combination is reported. Using the response values from all setups

(experiments), a second order polynomial is ﬁtted.

Factors’ optimal values are determined by diﬀerentiating the ﬁtted second

order polynomial with respect to each input variable, Equation 3.

8

∂y

∂X

i

= β

i

+ 2β

ii

X

i

+β

ij

X

j

= 0 (3)

Then, the optimal combination of input variables is used to estimate

Equation 2. The median of squares residuals for CAPM estimation is com-

pared with the best known optimum reported by Winker et al. (2010).

3.3 Output

Tables 1 and 2 report the optimal combination of DE parameters, as sug-

gested by RSM using orthogonal and rotatable designs, respectively. We

report results for diﬀerent computational complexities, low, medium and

high (columns 1-4). The optimal values of n

p

are rounded up to the nearest

integer. Since the values of n

p

are large enough, the rounding does not af-

fect the mathematical results of the design (does not inﬂuence dramatically

the objective function value). The reported optimal parameter combinations

are used to evaluate the performance of DE. For that, Equation 2 is evalu-

ated and the median of squares residuals for CAPM estimation is reported.

The evaluation is repeated 30 times to control for the stochastic eﬀects of

heuristics. Columns 5 to 11 report for each set of parameters the best value,

the median, the worst value, the variance, the 5th percentile, the 90th per-

centile, and the frequency of the best value occurs over 30 repetitions of the

algorithm.

Table 1 presents the results of RSM with orthogonal design. The output

suggests that the level of the scale factor F is irrelevant for the convergence of

DE especially when CR is above 0.7. With n

p

at least ten times the number

8

For this step the build-in Matlab 7.6 function, quadprog is used. This tool uses a

reﬂective Newton method for minimizing a quadratic function. It is eﬀective for quadratic

functions when only bound constraints are considered. In our problem the lower bound

of F, CR, n

p

reﬂect the lower level CCD design, while the upper bounds are 2, 1 and inf,

respectively. The upper bound of F is based on Price et al. (2005), CR is a probability,

whereas n

p

has no real upper limit. The choice of lower bounds prevents the Newton

method from stacking in a local optimum.

21

of parameters of the objective function (parameters of CAPM) and CR above

0.7, the algorithm can converge to the best results reported in Winker et al.

(2010). Although, for medium computational load the algorithm ﬁnds the

global optimum just once, in 13 other cases it approaches the optimum with

accuracy of 4 decimals. For high computational complexity the algorithm is

more stable and results in the best reported optimum (Winker et al. (2010))

in all 30 repetitions.

Table 2 reports the results of RSM for rotatable design. RSM results

in slightly diﬀerent optimal parameter settings. Yet, n

p

must be at least

ten times the number of parameters in order for the algorithm to reach the

global optimum. For medium computational complexity the optimum is

identical with that of orthogonal design with lower variance in 30 repetitions.

Similarly, for high computational complexity the global optimum can be

found, but not repetitively. Generally, n

p

is an important parameter in the

convergence of DE and CR stabilizes the algorithm.

RSM is proven to be a useful starting tool for calibrating the parameters of

DE that best minimize the median of squares residual (Equation 2) without

using high computational resources. Of course, one should bear in mind

that the analysis is sensitive to the starting values (centers) as well as the

dispersion of the high and low design points around the center. Shrinking or

widening the distance between the design points (distance of factorial and

orthogonal designs from center) might result in highly uncertain parameter

estimates.

Possible extensions of the approach might include diﬀerent formulations

(or transformations), other than a polynomial one, to represent the relation-

ship between the parameters and the objective function. Also, in higher

dimensions (higher number of parameters) a heuristic tool can be used to

optimize the parameters’ values.

4 Heuristics in Finance

4.1 Portfolio Selection

In the simple case of the Markowitz mean-variance (MV) framework with two

assets, investors choose their optimal portfolio by maximizing the expected

portfolio return (E = w

1

µ

1

+w

2

µ

2

) given their risk proﬁle, or by minimizing

the portfolio risk (V = w

2

1

σ

2

1

+ w

2

2

σ

2

2

+ 2w

1

w

2

σ

1,2

) for a given desirable level

of return. The variance of a stock σ

2

1

is the covariance σ

1,1

. These models

consist of a linear function and a quadratic function of the weighted asset

covariances which can be solved by traditional optimization techniques like

22

Table 1: Optimal Orthogonal Design to minimize Median of Squares Residuals.

CPU F CR n

p

Best Med Worst Var q5% q90% Freq

H 0.6597 0.9871 25 4.9935 · 10

−5

4.9935 · 10

−5

4.9935 · 10

−5

4.7501 · 10

−41

4.9935 · 10

−5

4.9935 · 10

−5

30

M 0.6616 0.7226 26 4.9935 · 10

−5

4.9935 · 10

−5

5.6897 · 10

−5

1.7661 · 10

−12

4.9935 · 10

−5

5.1175 · 10

−5

1

L 1.3457 0.6587 13 5.0069 · 10

−5

5.2804 · 10

−5

9.9768 · 10

−5

1.5042 · 10

−10

5.0075 · 10

−5

6.9232 · 10

−5

1

Table 2: Optimal Rotatable Design to minimize Median of Squared Residuals.

CPU F CR n

p

Best Med Worst Var q5% q90% Freq

H 0.6318 0.6318 81 4.9935 · 10

−5

4.9935 · 10

−5

4.9935 · 10

−5

4.7501 · 10

−41

4.9935 · 10

−5

4.9935 · 10

−5

1

M 0.6322 0.9910 26 4.9935 · 10

−5

4.9935 · 10

−5

4.9935 · 10

−5

2.8666 · 10

−29

4.9935 · 10

−5

4.9935 · 10

−5

1

L 1.3041 1.0000 11 5.0079 · 10

−5

5.1599 · 10

−5

5.4474 · 10

−5

9.0143 · 10

−13

5.0089 · 10

−5

5.2711 · 10

−5

1

2

3

quadratic programming.

However, investors may well seek to optimize utility criteria other than

MV (like a Cobb-Douglas objective function for portfolio choice) or other risk

criteria (like Value at Risk (VaR), expected shortfall (ES)). Alternative risk

criteria are mainly chosen to relax the normality assumption of returns and

to accommodate the asymmetry of asset returns. A non-convex, complex,

non-diﬀerentiable utility function (with linear or non-linear side conditions)

will not result in the optimum (eﬃcient) portfolio selection (or in any op-

timum near) by quadratic programming routines. The optimum might be

even diﬃcult to approach for a linear utility function (an absolute or semi-

absolute deviation risk function) when constraints are considered (Mansini

and Speranza (1999)). Yet, optimization heuristics are ﬂexible enough to

optimize portfolios based on complex objective functions without the need

for a linear approximation of the objective function.

The seminal work of Dueck and Winker (1992) opened new horizons to

portfolio optimization using heuristics, more precisely using TA. In that ap-

plication, two more complex risk functions other than variance were opti-

mized, being a semi variance function and a weighted mean geometric return

function. Considering a slightly more complex objective function, even with

linear side conditions on the weights, the problem was not tractable with

standard optimization algorithms. TA, with fairly limited computational re-

sources available, could select among 71 bonds of the Federal Republic of

Germany proﬁtable portfolios (minimize risk functions given expected re-

turn) in reasonable time. The portfolios were 5% to 30% more proﬁtable

than the usual practice until that point, meaning human portfolio selection.

No transaction costs were considered though.

In the last two decades, heuristic optimization techniques have been ap-

plied to higher-dimension problems, due to extensive availability of computa-

tional resources. They have been applied to portfolio optimization problems

that also allow for side conditions like transaction cost, taxes, cardinality

constraints, etc. The following subsections describe the most recent research

in this direction.

4.1.1 Transaction Costs and integer Constraints

Apart from the alternative utility criteria that can be pursued by an investor,

several other implications and constraints appear in real life in portfolio se-

lection problems. The constraint considered here is transaction costs. In

the early literature transaction costs were omitted, mainly for simplicity.

However, the selection process with transaction costs can often result in

totally diﬀerent optimal portfolios (Dueck and Winker (1992)). Thus, mod-

24

ern portfolio selection includes transaction costs in the optimization process

(Maringer (2005)).

Transaction costs can appear in the form of minimum amount, propor-

tional amount on trading volume and ﬁxed fees per trade. For an investment

in asset a

m

m = 1, . . . , M, transaction costs (c) can vary between

c

f

ﬁxed costs

c

v

= c

p

· a

m

· S

(0)

m

variable costs where S

(0)

m

is the current stock price

max{c

f

, c

v

} variable costs with lower (minimum) limit

c

f

+c

v

ﬁxed plus variable costs .

The optimization algorithm selects the assets a

m

and their weights by

using (ideally) the total budget amount. Then, transaction costs are part of

the budget. Often, a budget amount is uninvested since only integer numbers

of assets can be bought. In such a case the excess amount can be deposited.

If one of the above conditions is considered the search space is far from

being smooth and convex. Hence, it might be diﬃcult to approach the global

optimum or even something near. In that respect, heuristic techniques can

be applied. SA, TA and GA are often used in portfolio selection problems.

SA and TA are ﬂexible enough to adjust approximately to any restrictions.

Their construction (Section 2) allows for faster and more eﬃcient application

to discrete ﬁnancial problems. GA has the additional advantage of simulta-

neously maintaining a population of candidate portfolios.

9

Maringer (2005), ch. 3, applied SA to portfolio selection with transaction

costs. He used 30 German assets from the DAX index. For the optimal

portfolio composition, budget constraints with transaction costs and inte-

ger constraints are included and compared. Integer constraints and several

types of transaction schemes were used and their relationship to the invest-

ment level was tested. The study concluded that the presence of transaction

costs, when also non-negativity and integer constraints are included, aﬀected

the optimal portfolio structure. Diﬀerent cost schemes could aﬀect the di-

versity of the portfolio which could lead to diﬀerent utilities. The number

of diﬀerent assets invested are reduced with increasing transaction schemes.

One exception are investors facing only a ﬁxed cost scheme with an initial

investment over 1 million. Nevertheless, SA, Section 2.1.1, could help ﬁnd a

portfolio composition with better diversiﬁcation (improvement of the utility

function) and higher Sharpe Ratio (SR) than the simpliﬁed model with no

transaction costs.

9

In high dimensions, the simultaneous maintenance of many candidate solutions can

be a computational burden.

25

In reality not only transaction costs appear, but also other constraints like

transaction limits and cardinality constraints. Together these constraints

can add to the complexity of the problem and harden the construction of

an optimal portfolio. This is especially prominent when we consider small

portfolios. The induction of additional constraints into the portfolio selection

problem is presented in the following sections.

4.1.2 Minimum Transaction Lots

Minimum transaction lots is a type of integer constraint. It constraints the

number of units from a speciﬁc stock or bond that should be bought simul-

taneously. That is, a minimum bundle of units, e.g. 5000 or multiples, from

a speciﬁc asset should be bought. The constraint is formulated in terms of

money as,

l

m

= N

m

· S

(0)

m

, (4)

where

l

m

minimum transaction lots in money

N

m

minimum required units of asset m.

In the late 1990s, Mansini and Speranza (1999) showed that for a linear

utility function (an absolute or semi-absolute deviation risk function) when

linear constraints (minimum transaction lots and proportional transaction

costs) are considered the problem is NP hard. They applied variants of

evolutionary heuristic algorithms to solve the portfolio selection problem with

transaction lots. The algorithm would select the k cheapest assets a

k

to

construct the portfolio. Diﬀerent crossover points indicated the securities in

and out of the portfolio. The half non-selected assets were replaced by the k

low cost assets in the portfolio. Instead of a typical stopping criterion, e.g.

reach ﬁxed iteration number, the algorithm was repeated until a number of

assets was tested in the portfolio construction. This basic algorithm (and

some variations) allowed for a good

10

solution to the portfolio optimization

problem in reasonable time. The distinctive advantage of that heuristic was

the independence of the computational time from the number of assets.

In another application, Gilli and K¨ellezi (2002) optimized their portfolio

choice using as risk measures VaR and ES (4.1.5). In that application the

problem structure allowed for transaction lots constraints. Also, transaction

10

A good solution is the one that has low distance from the theoretical MV framework)

26

costs and cardinality constraints were considered. For the optimization, a tra-

jectory heuristic, TA, was applied instead of a population based one. They

conﬁrmed the robustness of TA when compared with quadratic programming

under MV framework and without any integer constraints. Both approaches

resulted in the same optimum MV portfolio. When other non-convex ob-

jective functions, such as VaR and ES, together with integer constraints,

like transaction lots, were introduced, TA could easily deal with those in

reasonable computational time.

By nature, asset weights are continuous variables. So, Krink and Paterlini

(2009) propose a Diﬀerential Evolution algorithm for Multiobjective Portfolio

Optimization (DEMPO) for portfolio selection under the MV framework with

risk measures other than variance, i.e. VaR and ES. Transaction lots, weight

constraints and limit on the proportion of assets in a particular sector or

industrie are considered. The algorithm adopts DE for weight selection. DE

could approximate closer and faster the MV frontier compared to quadratic

programming (QP) and another GA. Still, the convergence of DE to the

exact front shape can be further improved. To overcome local minima, the

acceptance criterion of a local search method can be used. It would also be

of interest to test its performance in an even more realistic framework with

transaction costs and cardinality constraints (some additional constraints are

included in Fastrich and Winker (2010)).

4.1.3 Cardinality versus Diversity

In reality, private investors choose a rather small-sized portfolio which is not

necessarily diversiﬁed internationally (Cuthbertson and Nitzsche (2005), Ch.

18). Small portfolios, when optimally chosen, can diversify the risk away

and thus increase investor’s utility. Besides, portfolios with fewer stocks are

easier to manage and less expensive in terms of total transaction costs than

larger more diversiﬁed portfolios.

In order to select the optimal size of a portfolio, an additional integer

constrain is needed, namely a cardinality constraint. Cardinality constrains

the number of diﬀerent assets (A) in a portfolio, #{A} = K. When K varies

from 1, . . . , K the constraint is formulated as

M

a

m

≤ k

max

, (5)

where

k

max

maximum number of diﬀerent assets in portfolio

A integer quantities of active positions in portfolio

27

a

m

active position in asset, m = 1,. . . ,M.

The optimization algorithm is slightly altered to ﬁnd not only which as-

set weights are optimal but also which assets should be ideally selected from

the available ones by optimizing a given objective function. So, diﬀerent

k from M assets

M

k

**and their combinations are selected based on the in-
**

vestors risk proﬁle. This additional integer constraint (especially, together

with transaction costs and transaction lots) results in a discrete combinato-

rial optimization. While traditional selection algorithms result in suboptimal

portfolios as the number of assets and constraints increases, heuristics serve

as a panacea. Heuristics can capture the discrete selection nature of the

problem and consider at the same time the additional constraints. The fol-

lowing literature imposes a cardinality constraint in the portfolio selection

problem and improves or optimizes the risk-return SR or the MV portfolio,

respectively.

First Chang et al. (2000) and later on Maringer (2005) used heuristics to

replicate the eﬃcient frontier with integer constraints. Chang et al. (2000)

used three diﬀerent heuristics, GA, Tabu Search (TS)

11

and SA to replicate

all the Pareto-optimal (non dominated) portfolios of the eﬃcient frontier

under cardinality and transaction lots constraints. While a multiple agent

heuristic, GA, resulted in fairly better approximations of the theoretical ef-

ﬁcient frontier, it needed comparatively more time. In contrary, Maringer

(2005) reported the superiority of a single agent heuristic, SA, in selecting a

portfolio that maximizes SR.

12

Alternatively, Maringer (2005) combined a local and a population search

method to improve the selection procedure for Markowitz eﬃcient frontier.

13

The performance of the hybrid meta-heuristic was tested against a pure meta-

heuristics, like SA and a variant of it. While all three methods share local

search characteristics, which help to escape fast inferior solutions, the hy-

brid meta-heuristic was more reliable – closer to the best-known solution.

11

TS belongs to local search heuristics. For a comprehensive representation of the

algorithm see Glover and Laguna (1997).

12

Its distinctive selecting performance lies on the random initial selection of assets and

the update procedure of the candidate solution. Initially, assets are randomly selected

and included in the portfolio. It is ﬂexible to select not only between asset with high

risk premium per unit of risk (SR), but also good asset combinations. In the update

procedure an asset is added not only if it improves the SR but also if it contributes to the

diversiﬁcation of risk. Ideally, these are assets with negative (or low) correlation.

13

In the Sharpe framework the heuristic encoded only the asset selection, whereas in the

Markowitz eﬃcient frontier identiﬁcation, the heuristic encoded both weight and model

selection. The last approach improved the performance of heuristics.

28

The results suggest a rather small-sized portfolio, which when optimized by

heuristics can replicate relatively accurately the market index. For the few

k from M assets and their combinations, the weights are quickly identiﬁed

by the algorithm and more computational eﬀort is then devoted to update

(improving) the candidate selection. Thus, the algorithm results at the end

in a portfolio that can be as diversiﬁed as the market index.

Further research can be devoted in identifying the factors aﬀecting the

optimal K. More precisely, how diﬀerent risk aversion, utility functions or

stock index aﬀect the choice of K. One alternative is to endogenize in the

optimization process the choice of K.

4.1.4 Index Tracking

A straight-forward extension of an active portfolio optimization with cardi-

nality considerations is the passive tracking of an index with a limited number

of assets. In the former case, investors choose the portfolio that best opti-

mizes their utility function (risk-return equilibrium) under constraints. This

decision is called active portfolio management. In the other case, investors

are convinced that in the long run the best thing to do is to mimic the market

index. In passive selection, investors look for the optimal combination of as-

sets to track the index as close as possible. Typically, the objective function

of interest, hereafter called the tracking error, is to minimize the distance

between the selected portfolio’s daily returns and index returns.

This is a twofold optimization where the optimal asset positions and their

weights should be selected. In addition, investors face all the constraints

discussed in the previous sections which result in a complex optimization

problem. Selecting the optimal asset positions is a discrete combinatorial

problem whereas selecting the optimal asset weights is a continuous problem.

In that respect, alternative heuristics (suitable for discrete or continuous

problems) or a combination of them are applied.

Maringer and Oyewumi (2007) applied DE to track a portfolio with con-

strained number of assets that mimics best the DJIA64 index. The objective

function of interest is to minimize the root-mean-squared deviation of the

selected portfolio’s daily returns from those of the index. The resulted track-

ing portfolio outperformed both in- and out-of-sample, in terms of daily asset

returns, in many cases the index. For smaller portfolios (especially for cardi-

nality below 30), the positive diﬀerence increases for the in-sample training

set. Yet, for cardinality below 40 the suggested portfolio is more volatile

than the index. However, the higher volatility is outweighed with a higher

return-to-risk ratio. Whereas heuristics perform reliably also in this complex

problem, there was no clear-cut indication of the optimal number of assets

29

that best minimize the tracking error.

More insides to the trade-oﬀ between the optimal number of asset in

a portfolio and the tracking error is given by Krink et al. (2009). They

also applied DE but with a combinatorial search operator (DECS-IT)

14

to

track the perfectly diversiﬁed market portfolio by minimizing the volatility

between the log-returns of the benchmark (DJIA65 and Nikkei 225 price

indices) portfolio and the tracking portfolio in every time period.

15

DECS-

IT could potentially solve that selection problem with constraints and provide

quality results in reasonable time. The results suggested that by increasing

the number of assets in a portfolio the in-sample annualized tracking error

volatility reduces up to a certain limit. The same is true for the relationship

between cardinality and out-of-sample performance.

There are still however some open questions for constrained index track-

ing, namely the performance of the tracking portfolio in diﬀerent window

set-ups

16

and the factors aﬀecting the volatility in the results. Finally, one

can further investigate the eﬀect of index composition in selecting the maxi-

mum number of assets of the tracking portfolio.

4.1.5 Downside Risk Measures

As mentioned in Section 4.1.2, alternative risk measures can be used in place

of variance, e.g. VaR and ES. These measures relax the normality assumption

of returns imposed by the MV framework. So, instead of minimizing the mean

(positive and negative) squared deviations from the expected return, the risk

of having negative deviations from the expected portfolio value is minimized.

Speciﬁcally, the above risk measures minimize either the probability that

the expected value of the portfolio falls below a given value (VaR) or the

expectancy of such a fall (ES).

Like semi-absolute risk function, VaR and ES cannot always be opti-

mized by traditional linear or quadratic solvers. Especially, in the presence

14

DE is by nature more suitable for continuous problems. Thus, a combinatorial operator

is needed to improve the performance of DE in the discrete combinatorial asset selection.

Using this tool the quality of the results and the runtime of the algorithm improved.

15

Additionally, by building an artiﬁcial index they evaluated the performance of DE

using the asset weights instead of the returns. They evaluated the asset weights’ distance

of the benchmark portfolio (Nikkei 225 price index was considered) from those found in the

tracking portfolio. The benchmark asset weight are calculated from market values. For

this experiment they neglected the cardinality constraint. DECS-IT could very precisely

replicate the 225 asset weights. In a comparison with QP (with no cardinality or weight

constraints), the two methods could obtain similar results.

16

Maringer and Oyewumi (2007) experimented with one and two year windows and

suggested that shorter in-sample windows together with higher cardinalities are more

important for the out-of-sample performance.

30

of integer constraints, i.e., transaction lots and cardinality constraints, the

search space is highly non-smooth and discontinuous. Thus, it is diﬃcult to

choose an optimal portfolio that satisﬁes these constraints in a reasonable

time. Gilli and K¨ellezi (2002), Gilli et al. (2006) and Krink and Paterlini

(2009) approach the portfolio selection problem under discrete scenarios by

optimization heuristics. Apart from one application of DE (Krink and Pater-

lini (2009)), TA is often the heuristic used for discrete optimization. Winker

and Maringer (2004) constructed a hybrid population based algorithm with

local search characteristics for the same problem.

In more recent extensions, Gilli and Schumann (2010c) presented alter-

native measures of reward and risk, like the Omega function, and Gilli and

Schumann (2009) and Gilli and Schumann (2010d) compared the performance

of portfolios selected by various risk measures with MV portfolio. TA was

used as an optimization algorithm for portfolio construction. A unanimous

ﬁnding all along was that minimizing risk, as opposed to maximizing reward,

often resulted in good prediction of the out-of-sample frontier.

4.2 Robust Model Estimation

Apart from using meta- or hybrid-heuristics for the direct optimization of ﬁ-

nancial problems, e.g. for asset allocation, much literature is concentrated on

the application of heuristics for the indirect optimization of models through

parameter estimation. In due course, heuristic algorithms calibrate the pa-

rameters of ﬁnancial and economic nonlinear models that improve the accu-

racy (precision) and reliability of core ﬁnancial models.

4.2.1 Nonlinear Model Estimation

Convergence and Estimation of GARCH Models

In reality, sporadic extreme observations can force the distribution of histor-

ical stock returns to lean away from the mean. For daily returns the prob-

ability of observing this phenomenon is higher than normal, forcing their

distribution to exhibit excess kurtosis. Yet, the excess kurtosis decreases

with increasing time horizons. As such, daily ﬁnancial returns exhibit time

varying volatility. A one period ARCH model depicts this characteristic,

V ar(ε

t

) = α

0

+ α

1

ε

2

t−1

, where ε

t

denotes the volatility of daily returns of

an AR(1) process (Gouri´eroux (1997)). Furthermore, a model conditional

also on the previous risk (error) components is a generalized autoregressive

conditional heteroskedasticity (GARCH) model.

31

The parameters of such non-linear models are most often estimated by

maximizing the (log)-likelihood function. But, ﬁnding the parameters that

give an optimal solution is not guaranteed using standard econometric soft-

wares. Sometimes, diﬀerent softwares result in diﬀerent optimal solutions

(local optima) and the solution is very sensitive to the parameters’ starting

values. Heuristic optimization overcomes this problem and provides a close

to optimum solution for such non-linear models.

Maringer (2005) applied SA to ﬁnd the parameters of a GARCH model

which maximize the (log)-likelihood function. Maringer and Winker (2009)

applied TA for the same non-linear problem and simultaneously provided a

framework for studying the convergence properties of such estimators based

on heuristics. Maringer (2005) used the GARCH(1,1) model to estimate the

daily volatility of the German mark/British pound exchange rate. Therefore,

the author implemented a SA algorithm like the one presented in Section

2.1.1. In every iteration one parameter was chosen for further alteration.

The selected parameter changed by adding a uniform random error. The

random error narrowed as the iteration number increased. With I = 50 000

iterations and a shrinking neighborhood structure, a low deviation was re-

ported between heuristic optimization outcome and true parameters’ value.

Maringer and Winker (2009) inferred the same superior performance of TA

in comparison to standard numerical econometric packages.

A more reﬁned statistical framework is suggested by Maringer and Winker

(2009) for analyzing the parameters of heuristics that will further improve the

converge of estimators. In due course, they estimate the number of iterations,

as a function of sample size, needed by a TA approximation to converge, with

a given probability, to the true maximum likelihood estimator of a GARCH

model. This is possible, due to the good parameter approximation of search

heuristics, reported above. It is evident that the value of the TA maximum

likelihood estimator is much closer to the true one compared to what other

numerical packages report. This is true in particular for small sample sizes.

An interesting extension is the derivation of joint converge properties for

higher order GARCH models (or more complex estimators in general). Be-

sides, heuristic techniques can also be considered for selecting the lag order

for historic volatility estimation approaches.

Indirect Estimation and Agent Based Models

Modeling the behavior of market participants becomes even more challeng-

ing when it cannot be determined by prior information like in a GARCH

model. Especially when agents’ behavior is either less rational or heteroge-

nous. ABMs are ﬂexible to simulate the behavior of market participants,

called agents, and the interaction between them. Simulating the actual be-

32

havior of agents using ABM can result in a realistic representation of the

basic features of daily ﬁnancial returns, namely excess kurtosis and time

varying volatility.

However, the estimation of ABM parameters and the validation of their

performance in a realistic market setting with actual data are complex tasks.

This is mainly due to the large number of parameters and the non-analytical

evaluation of such models. Gilli and Winker (2003) introduced a hybrid TA

algorithm to estimate an ABM of foreign exchange markets. The hybrid

combined simplex search ideas with TA algorithm. Later on, Winker et al.

(2007) proposed an evaluation tool for such a model.

Gilli and Winker (2003) used a hybrid TA approach to optimize the ABM

parameters. ABM were applied to the model of foreign exchange (DM/US-

$) market agents suggested by Kirman (1991) and Kirman (1993). It was

tested whether the interaction between two types of agents (fundamentalists

and chartists) could realistically explain the excess kurtosis and time varying

volatility of returns. For this test, excess kurtosis (k

d

) as well as time varying

volatility (α

1

) of an ARCH(1) model were calculated by simulating the expec-

tations of foreign exchange rate based on agents behavior. The eﬃciency of

agent based modeling was tested with empirical data. A hybrid TA (Section

2.3.2) was applied for optimizing the parameter of ABM. Importantly, in the

particular application simplex did not even result in a local optimum due to

the high Monte Carlo sampling variance of the initial parameter estimates.

To overcome the problem of high Monte Carlo variance on estimating the

parameters with relatively low number of repetitions, the authors applied

the acceptance criterion of TA. Larger values of threshold sequence correct

for the high variance (see also Section 2.1.2). Using these techniques they

showed that interactions between market agents could realistically represent

foreign exchange market expectations.

Future research can introduce alternative models to ABM, such as MS-AR

or SETAR or STAR.

17

In addition, a design approach, like the one discussed

in the Appendix, can serve as a benchmark for the estimation of parameters’

starting values.

Yield Curve Estimation

Apart from the log-likelihood function of GARCH estimation, there are

also other non-linear functions that are used to estimate ﬁnancial and eco-

nomic models. Yet, most of these models depend heavily on the parameter

17

Maringer and Meyer (2007) introduced and compared SA, TA and DE to the estima-

tion of parameters of the STAR model.

33

estimation value, as ABM do. Likewise, the term structure of interest rates,

or as it is widely known the yield curve, is typically estimated (Gimeno and

Nave (2009)) using non-linear models, while the input parameters’ values are

crucial.

The term structure of interest rates refers to market’s expectations of

interest rates in diﬀerent time horizons. For example, it depicts the expected

interest rate for a loan starting in one, two or ten years from now that

lasts for one or several years. Central banks need to calculate the forward-

looking interest rates of diﬀerent maturities on a daily basis. These interest

rates serve as an instrument of monetary policy or they are oﬀered by banks

to loans of diﬀerent maturities or they are used in other sectors for asset

and derivative pricing. For the calculation of forward rates central banks

most frequently use the Nelson-Siegel (Nelson and Siegel (1987)) function

or the Nelson-Siegel-Svensson (NSS) function (Svensson (1994)). The model

estimates the (spot and) forward interest rates that form the yield curve.

For that, the NSS approach models forward interest rates based on short

and long term future maturities and the transition between them. This way

it can capture the shape of the yield curve.

Financial pricing calls for an accurate estimation of forward interest rates.

According to Gimeno and Nave (2009) traditional optimization techniques

failed to converge towards a global optimum solution, when estimating the

parameters of NSS (a non-linear function). Besides, traditional optimization

techniques are sensitive to initial coeﬃcient values, which causes variability

in the estimated parameters. The following papers show how this problem

can be avoided by using optimization heuristic techniques.

Gimeno and Nave (2009) and Fern´andez-Rodr´ıguez (2006) suggested ap-

plication of GA to accurately estimate the coeﬃcients of the Nelson-Siegel

function and its extension, the NSS function, while Gilli and Schumann

(2010b) applied DE. The ﬁtness function corresponds to the weighted sum

of squared deviations of the actual (observed) coupon bond prices of diﬀer-

ent maturities from the ones calculated using the estimated forward interest

rates. The error term is a weighted function of the bond’s duration period.

The authors compared the ﬁtness function value using both traditional opti-

mization techniques, like non-linear least squares optimization, and heuristics

for simulated and real world bond prices of diﬀerent maturities. Overall, GA

and DE resulted in lower and less volatile estimation errors.

An interesting extension would be interesting to compare the computa-

tional time and eﬃciency of both GA and DE. An alternative implementation

is to use an improved hybrid optimization algorithm which combines a heuris-

tic and a numerical optimization method. Gilli and Schumann (2010a) and

Gilli and Schumann (2011) discuss the application of DE together with a

34

direct local search optimizer (Nelder-Mead search) in parameter estimation.

They estimated the parameters of the Heston (more easily) and the Bates

model, both used in pricing options. Good model ﬁt was reported for the

suggested hybrid.

4.2.2 Robust Estimation of Financial Models

Least Median of Squares Estimation

The last core ﬁnancial model discussed in this section is the CAPM. A

substantial amount of research in ﬁnancial market economics has focused

on the robust estimation of the CAPM parameters. Robust estimation tech-

niques have been suggested in the statistical literature as a solution to the low

breakdown point of simple estimation approaches, principally, ordinary least

squares (OLS). In particular, only a single inﬂuential observation can change

the parameters of the OLS coeﬃcients signiﬁcantly (Rousseeuw 1984).

For the linear explanation of the risk premium on individual securities

relative to the risk premium on the market portfolio, Chan and Lakonishok

(1992) had used least absolute deviations arguing that absolute residuals

have smaller inﬂuence on LS than squared residuals. Ronchetti and Genton

(2008) used shrinkage robust estimators for estimating and predicting the

model. Recently, Winker et al. (2010) compare OLS and least median of

squares (LMS) estimation in the CAPM and the multifactor model of Fama-

French factors (Fama and French (1993)). Rousseeuw and Leroy (1987) deﬁne

the LMS estimator as the solution to the following optimization problem:

min

α,β

(med(ε

2

i,t

)) , (6)

where ε

i,t

= y

i,t

−α−βx

i,t

are the residuals of a factor model. While the

LMS estimator has a high breakdown value, its objective function is com-

plex with many local minima. Figure 6 shows the above objective function,

evaluated for the CAPM residuals, using the 200 daily stock returns of the

IBM stock starting on January 2nd, 1970.

The existence of many local minima is evident, which calls for a heuristic

optimization application. Using a rolling window analysis on some stocks

from the DJIA index, DE compared to TA allowed a faster and more reliable

estimation of β in the factor models. However, the estimates obtained by

LMS do not exhibit less variation (from those resulting from OLS) as might

have been expected from the outlier related argument. Furthermore, the

relative performance of both estimators in a simple one-day-ahead conditional

forecasting experiment is mixed. Still, a proof of concept is provided, i.e.,

35

−0.01

−0.005

0

0.005

0.01

1

1.2

1.4

1.6

1.8

2

4

6

8

x 10

−5

α

β

m

e

d

(

ε

2 t

)

Figure 6: Least median of squared residuals as a function of α and β

LMS estimates obtained by means of heuristic optimization can be used for

real life applications.

Some extensions of the paper are straightforward based on the results re-

ported. First, the method should be applied to diﬀerent data sets, e.g., stock

returns from other stock indices or stock markets. Furthermore, it would be

of interest to identify in more detail the situations when the estimation and

forecast based on LMS outperforms OLS and vice versa.

4.3 Model Selection

This section presents the use of heuristics in selecting a set of independent

variables which better explain a given explanatory variable. One can ar-

gue that the more factors we include in a model, the better the model ﬁt

or its predictive performance will be, as we have more chances to include

the important factors. But, the higher the chances are also to include some

unimportant factors which worsen the predictive performance of our model.

Traditionally, statistical methods are applied for model selection, e.g. discrim-

inant analysis, stepwise analysis etc. However, statistical methods ignore the

economic or theoretical relationship between the variables and rely on strong

distributional assumption for the factors.

Heuristic optimization methods can be trained, by the objective function,

to select some economically meaningful factors. Also, for their application

no strong statistical hypotheses are made, e.g. normality assumption for the

distribution of factors.

36

4.3.1 Risk Factor Selection

A diﬀerent perspective to the asset pricing yields the risk factor selection

for the Asset Pricing Theory (APT) model. Like asset selection, risk factor

selection is a complex combinatorial problem which in high dimensions is

computationally very demanding. More importantly, for problems including

more than ten factors, complete enumeration of all possible combinations is

diﬃcult to be tested in reasonable time. One alternative is to use heuristic

optimization methods for model selection (Winker (2001)).

Maringer (2005), Ch. 7 applied a hybrid meta-heuristic, a MA (Section

2.3.1) to select the appropriate ﬁrm and industry speciﬁc indices for asset

return estimation, at a given point in time, based on the APT model. The

hybrid meta-heuristic initially selects a subset of factors and combines the

properties of SA and GA (Section 2.1.1 and 2.2.1) to ﬁnd an optimum subset

of MSCI factors that best explain asset returns based on the APT model.

That is, get the highest possible explanatory power, adjusted R

2

. MA was

able to identify the 5 out of 103 MSCI indices that explain a satisfactory

percentage of S&P 100 asset price variation.

Future work can concentrate on improving the selection performance of

the hybrid meta-heuristic. The use of another trajectory search method, like

TA, instead of SA can also be tested. TA has the advantage to construct the

candidate solution acceptance criterion based on information from problem’s

search area. To enhance the economic credibility of the neighborhood con-

struction (factor sets), more weight can be given to neighborhood solutions

with economically sound interpretation (and less weight on non-economic

sound sets). This will only be possible in the presence of information about

the economic factors aﬀecting the returns of a speciﬁc asset at a given time.

In addition, more emphasis can be given in the model weight selection. A

meta-heuristic technique can be used to assign factors’ weights based on their

economic credibility.

4.3.2 Selection of Bankruptcy Predictors

Factor selection refers also to the selection of predictors so as to evaluate

the credibility (solvency) of a debtor (bank client/portfolio). The Basel II

Accord on Banking Supervision legislates the framework for banks credit

risk assessment (Basel Committee on Banking Supervision 2006). Under

this framework banks are required to detain a minimum capital to cover

portfolio losses from expected defaults. Expected losses equal the product of

the exposure at default (EAD), the loss given default (LGD) and a binary

variable that determines the borrower’s default grade (D). Based on the

37

internal rating approach (IRB) banks can estimate internally these three risk

components. Extensive literature (see Kumar and Ravi (2007) for a review)

is devoted on determining the default grade D. This section addresses the

development of a credit model responsible for this binary assignment.

It is evident from the current ﬁnancial and credit market crisis that credit

institutions should develop a more valid credit risk model for the classiﬁca-

tion of borrowers. Its derivation is subject to a diverse range of risk factors

and their weights. Hamerle et al. (2003) report a generalized factor model

for credit risk composed by ﬁrm speciﬁc ﬁnancial data, economic factors and

their lags, systematic and idiosyncratic factors. For credit model and weight

selection, statistical techniques vary among linear discriminant analysis (Alt-

man (1968), Varetto (1998), etc), conventional stepwise analysis (Butera and

Faﬀ (2006)), logit analysis, neural networks (Angelini et al. (2008) and Di

Tollo and Lyra (2010)) and genetic algorithms (Back et al. (1996)).

Back et al. (1996) applied discriminant analysis, logit analysis and neural

networks for score selection and failure prediction. Each one of this tech-

niques assumes a diﬀerent interaction among factors. Discriminant analy-

sis assumes a linear relationship, logit analysis adopts a logistic cumulative

probability function, whereas neural networks allow a non-linear relationship

between explanatory variables. They used stepwise selection to choose the

risk factors for the discriminant model and the logistic regression model. Ad-

ditionally, GA were used in NN for choosing the links. While each technique

constructed diﬀerent models (one, two and three years prior to failure), their

predictive ability diﬀer. The GA based model resulted in perceptibly smaller

type I and type II errors,

18

at least one and three years prior to failure.

In the same vein Trabelsi and Esseghir (2005), Ahn et al. (2006) and Min

et al. (2006) trained GA that resulted in an optimal selection of bankruptcy

predictors. Besides model selection, these papers applied GA to tune the pa-

rameters of bankruptcy prediction techniques. Trabelsi and Esseghir (2005)

constructed a hybrid algorithm that evolves an appropriate (optimal) arti-

ﬁcial neural network structure and its weights. Ahn et al. (2006) and Min

et al. (2006) forecast bankruptcy with SVM. Genetic algorithms select the

best set of predictors and estimate the parameters of a kernel function to

improve SVM performance using data from Korean companies. The latter

paper stressed the potentials of incorporating GA to SVM in future research.

Varetto (1998) applied GA to train neural networks. The meta-heuristic

was used not only to model selection, but also to weight estimation. Besides,

they used genetic rules for credit clustering. The latter technique is discussed

in the following section. NN were compared with a statistical technique,

18

Smaller type I and type II errors can be interpreted as lower misclassiﬁcation error.

38

discriminant analysis, in terms of insolvency diagnosis (failure prediction) for

4738 small Italian companies. To estimate the insolvency criterion, the ﬁtness

function, a regression model was ﬁtted using a set of weighted indication

variables. GA were used to select the optimal set of ﬁnancial ratios. For

weight estimation, a predeﬁned range of values was assigned to the weights

and split in intervals of equal length. GA identiﬁed the number of intervals

to be added to the lower bound of the range. The above estimation asserts

an economic interpretation of the weights in less time than human intuition.

In the context of risk factor selection for bankruptcy prediction other

meta-heuristics or hybrid meta-heuristics can also be tested. For example,

TA and MA. Alternatively, a diﬀerent heuristic optimization technique can

be applied for weight selection. Given the continuous nature of this problem

DE can be suitable. In that respect, a population algorithm for continuous

problems can result faster and more eﬃciently in the optimum weight values.

Other heuristics can also be used in factor selection in SVM framework. DE

for selecting the parameters of a kernel function and TA for choosing the

important factors of the credit risk model. Such an application may further

improve the performance of SVM.

What is still an open question in selecting bankruptcy predictors is whether

it is necessary (and how) to chose the desired number of risk factors included

in a model. The more factors we include in a model, the better will be

the model ﬁt, as we have more chances to include the important factors.

But, the higher the chances are also to include some unimportant factors

(‘overﬁt’) which aﬀect (worsen) the predictive performance of our model.

Meta-heuristics can be used to endogenously determine the optimal number

of factors that can be included in a model (see Lyra et al. (2010b) and Lyra

et al. (2010a) for an application of TA for determining the optimal number

of credit risk grades).

4.4 Financial Clustering

Risk factor selection, credit risk quantiﬁcation and credit risk assignment

are complementary procedures for qualiﬁed credit risk management. While

Section 4.3 presented the development of a credit model and the quantiﬁca-

tion of credit risk using meta-heuristics, the clustering of credit risk using

heuristic techniques is addressed in this section.

Data clustering has always been on the epicentrum of research work using

either statistical or intelligent techniques, namely discriminant analysis and

logit or neural networks, support vector machines and case-based reasoning,

respectively. Yet, classiﬁcation of data in several groups with constraint

consideration can make the problem NP-hard (Brucker (1978)). Heuristics

39

are proven to be a reliable tool for ﬁnancial data classiﬁcation.

4.4.1 Credit Risk Assignment

Basel II (Basel Committee on Banking Supervision (2006)) requires banks to

detain suﬃcient capital to bare losses that arise not only from expected but

from unexpected economic downturns. Provision can cover expected losses,

however additional capital or regulatory capital (RC) is required to cover

unexpected losses. To calculate RC banks should ﬁrst assign their client

into groups based on their credit risk characteristics, i.e. the probability that

borrowers will default the subsequent year (PD).

Then, each group is assigned a ‘pooled PD’ or a ‘mean PD’ which distin-

guishes it from other groups. The deviation from the ‘mean PD’ determines

the required capital detained and consequently the lending rate assigned to

each risk group. An important aspect for banks is to avoid misstatement (un-

der/over statement) of capital requirements resulting from misclassiﬁcation

of clients. To achieve this, borrowers should be assigned to a number of ho-

mogeneous groups. Therefore, an eﬃcient classiﬁcation tool is required that

minimizes the misclassiﬁcation error (or the regulatory capital) and satisﬁes

the other constraints imposed by Basel II.

The design of a risk rating system for the assignment of borrowers into

homogenous grades is subject to a number of constraints imposed by Basel

II. Apart from having at least seven clusters for non-defaulted borrowers (§

404 of Basel Committee on Banking Supervision (2006)), the EAD in each

bucket shall be no higher than 35% of the total borrowers’ exposure in a

given portfolio (Krink et al. (2007)). Thus, we avoid having high concentra-

tion of exposure in a given grade. Further, the ‘mean PD’ for each grade

should exceed 0.03%, (§ 285 of Basel Committee on Banking Supervision

(2006)). An additional constraint regarding the lower bound on the number

of borrowers in each bucket is necessary to ensure that no bucket is empty

and that the number of borrowers in each bucket is adequate to statistically

evaluate ex-post the precision of the classiﬁcation system.

19

Introducing the

above constraints into the clustering problem increases its complexity. Thus,

heuristics have been suggested in the recent literature to tackle ﬁnancial

classiﬁcation problems.

Krink et al. (2007) and Krink et al. (2008) contribute to this context by

clustering borrowers to a ﬁxed (given) number of buckets and optimizing dif-

ferent objective functions. In either case, DE compared with other heuristics,

19

Previous literature (Krink et al. (2007)) speciﬁes that it should exceed a given per-

centage, e.g. 1%, of the total number of borrowers in a given portfolio. In Lyra et al.

(2010a), the lower bound is set using statistical benchmarks.

40

like GA and Particle Swarm Optimization (PSO), showed consistently supe-

rior performance in the discrete problem of credit risk rating. In addition,

little parameter tuning was required compared with GA and PSO. Lyra et al.

(2010b) extent the previous literature by determining not only the optimal

size but also the optimal number of clusters. By doing so, the precision of

the classiﬁcation approach could be statistically evaluated ex-post. They pro-

posed a TA algorithm to exploit the inherent discrete nature of the clustering

problem. This algorithm is found to outperform alternative methodologies

already proposed in the literature, such as standard k-means and DE. While

DE performed reliably for a set of small and medium size Italian companies,

TA was more eﬃcient (better grouping solutions) and faster.

The superiority of TA in this problem instance stems from the fact that

a fast update of the objective function is feasible and thus computation time

is largely independent of the number of clusters. However, for larger data

sets and for other more realistic linear and non-linear constraints (such as

the modeling of the dependence structure of defaults or the application of

heuristics in a dynamic clustering framework) the problem becomes compu-

tationally more demanding. It is an open challenge to improve the computa-

tional eﬃciency of heuristics in the computationally demanding problem of

ﬁnancial clustering.

4.4.2 Portfolio Performance Improvement by Clustering

So far, we addressed the direct application of heuristics in asset selection (Sec-

tion 4.1) when complex utility functions and constraints are considered. The

discussed literature reported, in many instances, higher portfolio returns, for

a given risk, using heuristics. An indirect application of heuristics in portfo-

lio selection (asset allocation) is found in Zhang and Maringer (2010a) and

Zhang and Maringer (2010b). They suggest pre-clustering assets according

to their performance so as to improve the asset allocation procedure.

The clustering of assets before portfolio selection serves as an alterative to

the equally weighted asset allocation 1/N or the Markowitz allocation based

on risk-return equilibrium. In latter case, high dimensional portfolios or pos-

sible inaccurate estimations of expected returns or covariance matrices,

20

may

result in a sub-optimal allocation of resources. In due course, pre-clustering

may help reduce the dimensionality of the problem and thus facilitate the

optimization procedure. Pre-clustering involves ﬁrst clustering assets into a

given number of clusters so as to improve the overall portfolio SR. Then,

20

The equally weighted asset allocation is data independent. So, its performance does

not depend on a portfolio’s dimensions.

41

with-in class, traditional asset allocation procedures are applied. Finally, a

weighted composition of these clusters constructs the target portfolio.

Zhang and Maringer (2010a) used and compared two diﬀerent heuris-

tics, DE and GA, to clustering ﬁnancial data (FTSE assets). Compared

with traditional non-clustering allocation approaches, pre-clustering of as-

sets improved the SR distribution of clustered portfolios, according to both

in-sample and out-of-sample simulation studies. Heuristics are proven to be

a reliable tool for clustering ﬁnancial data, as indicated by the improved

SR distributions. Nevertheless, GA resulted in better SRs than DE in all

restarts. Besides, GA outperformed DE. GA resulted in higher and more

stable ﬁtness values. As indicated from the results, GA performed well in

the discrete problem of clustering, mainly because of their discrete nature

(construction), Section 2.2.1.

21

In the above application a predeﬁned number of clusters was used to parti-

tion assets. A possible extension in this direction is to determine the optimal

number of clusters based on the sample size and the index composition used.

Further, the potential performance of the suggested clustering design in the

presence of highly volatile asset returns should further be investigated.

4.4.3 Identiﬁcation of Mutual Funds Style

Clustering data, especially data with complex schemes, can help extract all

the available information. Besides, as discussed in Section 4.4.2, clustering

avoids, to a high extent, the ‘curse of dimensionality’ problem often faced in

large portfolio management. The aim of clustering is the partition of data in

order to minimize the within (homogeneity) and maximize he between clus-

ters variance (heterogeneity). Brucker (1978) has shown that for a number of

clusters higher than three the problem is highly complex. So, the recent liter-

ature has developed a number of new classiﬁcation approaches which combine

classical statistical methodologies, e.g. principal component analysis (PCA),

and intelligent techniques, e.g. DE, NN to tackle the problem.

Pattarin et al. (2004) proposed a combination of PCA and GA to group

the historical returns of the Italian mutual funds and identify their style.

First, PCA selected the order of the process (the lag order). Second, the

evolutionary algorithm (GA for medoids evolution - GAME) classiﬁed the

21

To adjust the continuous optimization technique, DE, to a discrete problem, a random

noise (z) term is added to the scaled diﬀerence of two randomly selected vectors, during

the diﬀerential mutation process, P

(υ)

.,i

= P

(0)

.,r1

+ (F + z

.,1

) × (P

(0)

.,r2

− P

(0)

.,r3

+ z

.,2

). The

random noise is added with a given probability and it follows a normal distribution. The

candidate solutions are rounded up to the nearest integer.

42

historical returns in homogenous groups.

22

The new classiﬁcation method

based on GA was successful in identifying the mutual fund style with less in-

formation than the classiﬁcation method of the Italian mutual fund managers

association (‘Assogestioni’).

Future applications can consider the use of other heuristics in ﬁnancial

data clustering. Das and Sil (2010) recommended a revised DE algorithm

in clustering the pixels of an image. They used a kernel induced similarity

function to measure the distance from the cluster center. The advantage of

this measure, in relation to the Euclidean distance, is that it allows non-linear

classiﬁcation of data. That implementation improved the accuracy, the speed

and the robustness of DE compared to GA. A similar implementation could

be used in ﬁnancial clustering problems.

5 Conclusions

This survey gives an overview of some modern ﬁnancial optimization prob-

lems. Often the additional constraints and the high-dimensionality of these

problems, make them complex and thus diﬃcult to solve by standard opti-

mization algorithms. The paper suggests heuristic optimization techniques as

an alternative. In the presented literature heuristics provide reliable approx-

imations and are successfully applied to optimize ﬁnancial problems with

great potentials. The paper shows some promising ﬁelds of application of

heuristic optimization techniques and some possible extensions in that direc-

tion.

Optimization heuristic techniques are ﬂexible to tackle a broad variety of

complex optimization problems. This is due to the stochastic elements intro-

duced during the optimization process. Explicitly, the optimization process

involves stochastic initialization, intermediate stochastic selection and ac-

ceptance of the candidate solution. Nonetheless, the outcomes (optimization

results) should be carefully interpreted, while numerous repetitions of the

algorithm are recommended.

The choice of the appropriate heuristic technique is of particular impor-

tance for its eﬃciency. Heuristic methods diﬀer in their acceptance criteria,

the actual way of creating new solution and the number of solutions they

maintain and generate in every iteration. In the literature discussed in this

survey, DE has shown remarkable performance in continuous numerical prob-

lems, e.g. parameter estimation, when compared with other heuristics, i.e.

22

GAME oﬀers the ﬂexibility of either using heuristics or statistical approaches in choos-

ing the number of lags included in the model. In that application a PCA was used for the

selection.

43

TA. Even if DE is specialized on continuous numerical problems, it has al-

ready shown better performance than GA and PSO in tackling the credit

risk bucketing. Nonetheless, local search heuristics, particularly TA, is faster

and more robust to deal with the discrete problem of credit risk assignment.

It seems that there is no clear-cut recipe when to use which type of

heuristics. It may depend on the nature of the problem (discrete or continu-

ous search space), the computational resources and the application. Future

research might analyze further the criteria for optimal algorithm selection.

An additional aspect aﬀecting the eﬃciency of heuristics is the initial pa-

rameter settings. They are vital, for they provide a trade-oﬀ between com-

putational time and solution quality. An open challenge in the application

of heuristics is to set a more reﬁned statistical framework for estimating the

heuristic parameters, which will further improve the converge of the estima-

tion results. Maringer and Winker (2009) estimated the number of iterations,

as a function of sample size, needed by a TA to precisely converge, with a

given probability, to the true maximum likelihood estimator of a GARCH

model. An interesting extension in this direction is the derivation of joint

convergence properties for more complex estimators.

To sum up, heuristic methods have been applied to many promising

ﬁelds of commercial applications, like portfolio selection with real world con-

straints, robust model estimation and selection and ﬁnancial clustering with

good potential. They are ﬂexible enough to account for diﬀerent constraints

and provide reliable results where traditional optimization methods fail to

work. Still, more research should be devoted to identifying the problem

framework (what is a ‘good problem for heuristics’) in which heuristics clearly

outperform traditional optimization methods or in which some heuristics are

preferred instead of other heuristic methods.

Appendix

Central Composite Design

This section provides a detailed explanation and a numerical representation of

the CCD experiments. First, for the factorial design we assign to the param-

eters two discrete ‘levels’(F = 0.7, 0.9, CR = 0.7, 0.9 and n

p

= 20, 50). This

range constitutes the experimental region and is carefully speciﬁed around

an initial value (center point) for each parameter. The initial levels depend

upon a priori knowledge about the best parameter combination that optimize

44

a given objective function.

23

Three diﬀerent levels of computational resources

are considered, with n

G

× n

p

= 500, 2500 and 5000 meaning, low, medium

and high complexity, respectively. Since the product of n

p

× n

G

should

be constant, an adjustment is made on n

G

in each alteration. The aim is to

ﬁnd the combination of parameters that optimize an objective function value

using these ﬁxed computational times.

Second, we conduct an additional experiment using the center points.

The center points equal the median (center) of the ’levels’ used in the above

factorial design (F = 0.8, CR = 0.8 and n

p

= 35). This experiment is

repeated 4 times to measure the magnitude of random error (σ

2

), resulting

from diﬀerent values of the response variable for identical settings (Fang et

al. (2006)). Table 3 illustrates the experimental region where the three factor

levels, low, center and high, are coded -1, 0 and 1, respectively.

24

Table 3: Experimental Design Coding.

Code F CR n

p

-1 0.7 0.7 20

0 0.8 0.8 35

1 0.9 0.9 50

Finally, two more extreme levels are considered for each input variable

which extent the experimental region by α times the original range, deﬁned

above e.g. ±α·△F = ±α·0.1. A proportion of α determines the distance of

the additional extreme levels from the center. The value α is calculated in

this experiment based on orthogonal and rotatable designs (Box and Draper

(1987) and Barker (1985)). Each design suggests a diﬀerent distance from

the center points. An orthogonal design is more concentrated around the

center point while a rotatable design covers a wider surface area. When little

is known about the search domain a rotatable design might be more helpful

in exploring it, since it allows for a higher dispersion around the origin.

However, one should be careful in choosing the distribution of experimental

points around the origin, since unreasonably low or high values might result

23

Practical advice for optimizing objective functions with DE is given on

www.icsi.berkeley.edu/∼storn/.

24

To code the factor levels the following rule is applied, e.g. for the high level of F:

F

high

− F

center

(F

high

− F

low

)/2

=

0.9 − 0.8

(0.9 − 0.7)/2

= 1 (7)

45

Figure 7: CCD Graph for three factors.

ψω χψω

ψ

CR

χψ

•

−αχ

¸

+αω

−αω

.

+αχ

−α

ψ

+α

ψ

¸¸

ω χψ

r

F

χ

np

¸

**in high uncertainty of the response output.
**

25

These additional levels are coded with α and are sequentially placed at

±α. When α is positive, the equivalent proportion of α is +0.1414, +0.1414

and +7 value for F, CR and n

p

, respectively. The equivalent α is added to

the center point of each input parameter (e.g. for F 0.8 + α). When α is

negative, a proportion of it is deducted from the center point (e.g. for F 0.8

- α).

26

Only one parameter is altered every time, while all other parameters

take the center value. In a similar way as above, the Greek letters indicate

which factor deviates from the center point.

27

Table 4 demonstrates the experimental design with all 18 level combina-

tions considered in the exercise. The graphical representation of these com-

binations is shown in Figure 7. There are only 15 possible factor level com-

binations and the experiment using factors’ center levels (F = 0.8, CR = 0.8

and n

p

= 35) is repeated four times. Also, Figures 8 and 9 illustrate all factor

level-combinations used to form the experimental region for orthogonal and

rotatable designs, respectively.

All 18 set-ups of the experimental design are used to estimate Equation

2. Then, a second order polynomial is ﬁtted and the optimal combination

of DE factors is determined. Finally, Equation 2 for CAMP is re-estimated

25

While experimental methodologies should be applied when there is some knowledge

about the levels of the input variables, Lin et al. (2010) suggest that uniform design might

be even better than orthogonal and rotatable ones in the absence of design knowledge.

26

The decimal positions are omitted for n

p

value since they do not improve or destroy

the integrity (mathematical results) of the design.

27

For the explicit calculation of α’s based on orthogonal and rotatable design see Hill

and Lewicki (2006).

46

Table 4: Experimental Design (Simulation Setup).

Design Combination Setup F CR n

p

Factorial r 1 -1 -1 -1

χ 2 1 -1 -1

ψ 3 -1 1 -1

χψ 4 1 1 -1

ω 5 -1 -1 1

χω 6 1 -1 1

ψω 7 -1 1 1

χψω 8 1 1 1

Center node 9 0 0 0

10 0 0 0

11 0 0 0

12 0 0 0

Orthogonal +α

χ

13 α 0 0

-α

χ

14 −α 0 0

+α

ψ

15 0 α 0

-α

ψ

16 0 −α 0

+α

ω

17 0 0 α

-α

ω

18 0 0 −α

47

based on the optimal factor values and its performance is tested against the

LMS residuals reported in Winker et al. (2010).

Figure 8: CCD with orthogonal design

0.7

0.75

0.8

0.85

0.9

0.7

0.75

0.8

0.85

0.9

15

20

25

30

35

40

45

50

55

F

Cr

N

p

Orthogonal Design

Figure 9: CCD with rotatable design

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0.65

0.7

0.75

0.8

0.85

0.9

0.95

10

20

30

40

50

60

F

Cr

N

p

Rotatable Design

48

References

Ahn, H., K. Lee and K. J. Kim (2006). Global optimization of support vector

machines using genetic algorithms for bankruptcy prediction. Lecture

Notes in Computer Science 4234, 420–429.

Altman, E. (1968). Financial ratios, discriminant analysis and the prediction

of corporate bankruptcy. The Journal of Finance 23, 589–601.

Angelini, E., G. di Tollo and A. Roli (2008). A neural network approach for

credit risk evaluation. The Quarterly Review of Economics and Finance

48, 733–755.

Back, B., T. Laitinen, K. Sere and M. V. Wezel (1996). Choosing bankruptcy

predictors using discriminant analysis, logit analysis, and genetic algo-

rithms. Technical Report 40. Computational Intelligence for Business.

Turku Center for Computer Science.

Barker, T. B. (1985). Quality by Experimental Design. Marcel Dekker, New

York.

Basel Committee on Banking Supervision (2006). International convergence

of capital measurement and capital standards a revised framework com-

prehensive version. Technical report. Bank for International Settlements.

Box, G. E. P. and N. R. Draper (1987). Empirical Model Building and Re-

sponse Surface. Wiley, New York.

Brucker, P. (1978). On the complexity of clustering problems. In: Optimisa-

tion and Operations Research (M. Beckmenn and H.P. Kunzi, Eds.). Vol.

157 of Lecture Notes in Economics and Mathematical Systems. pp. 45–

54. Springer. Berlin.

Butera, G. and R. Faﬀ (2006). An integrated multi-model credit rating sys-

tem for private ﬁrms. Review of Quantitative Finance and Accounting

27, 311–340.

Chan, L. K. C. and J. Lakonishok (1992). Robust measurement of beta risk.

Journal of Financial and Quantitative Analysis 27, 265–282.

Chang, T.-J., N. Meade, J.E. Beasley and Y.M. Sharaiha (2000). Heuris-

tics for cardinality constrained portfolio optimisation. Computers and

Operations Research 27(13), 1271–1302.

49

Cuthbertson, K. and D. Nitzsche (2005). Quantitative Financial Economics.

John Wiley & Sons Ltd. Chichester.

Das, S. and S. Sil (2010). Kernel-induced fuzzy clustering of image pixels

with an improved diﬀerential evolution algorithm. Information Sciences

180, 1237–1256.

Di Tollo, G. and M. Lyra (2010). Elman nets for credit risk assessment. In:

Decision Theory and Choices: a Complexity Approach (M. Faggini and

P. C. Vinci, Eds.). pp. 147–170. Springer. Milan.

Dueck, G. and P. Winker (1992). New concepts and algorithms for portfolio

choice. Applied Stochastic Models and Data Analysis 8, 159–178.

Dueck, G. and T. Scheuer (1990). Threshold accepting: A general purpose

optimization algorithm appearing superior to simulated annealing. J.

Comp. Phys 90, 161–175.

Fama, E. F. and K. R. French (1993). Common risk factors in the returns on

stocks and bonds. Journal of Financial Economics 33, 3–56.

Fang, K. T., R. Lin and A. Sudjianto (2006). Design and Modeling for Com-

puter Experiments. Chapman and Hall/CRC.

Fastrich, B. and P. Winker (2010). Robust portfolio optimization with a

hybrid heuristic algorithm. Technical Report 41. COMISEF Working

Papers Series.

Fern´andez-Rodr´ıguez, F. (2006). Interest rate term structure modeling using

free-knot splines. The Journal of Business 79, 3083–3099.

Gilli, M. and E. K¨ellezi (2002). Portfolio optimization with VaR and ex-

pected shortfall. in: EJ. Kontoghiorghes, B. Rustem, S. Siokos (eds),

Computational Methods in Decision Making, Economics and Finance.

pp. 167–183.

Gilli, M. and E. Schumann (2009). An empirical analysis of alternative port-

folio selection criteria. Technical Report No. 09-06. Swiss Finance Insti-

tute Research Paper.

Gilli, M. and E. Schumann (2010a). Calibrating the Heston model with diﬀer-

ential evolution. In: Lecture Notes in Computer Science (number 6025

EvoApplications 2010, Part II, Ed.). pp. 242–250. Springer.

50

Gilli, M. and E. Schumann (2010b). Calibrating the Nelson Siegel Svensson

model. Technical Report 31. COMISEF Working Papers Series.

Gilli, M. and E. Schumann (2010c). Distributed optimization of a portfolio’s

Omega. Parallel Computing 36(7), 381–389.

Gilli, M. and E. Schumann (2010d). Risk–reward optimization for long-run

investors: an empirical analysis. European Actuarial Journal (forthcom-

ing).

Gilli, M. and E. Schumann (2011). Calibrating option pricing models with

heuristics. In: Natural Computing in Computational Finance (B. An-

thony and M. O’Neill, Eds.). Vol. 4. Springer (forthcoming).

Gilli, M. and P. Winker (2003). A global optimization heuristic for estimat-

ing agent based models. Computational Statistics and Data Analysis

42, 299–312.

Gilli, M. and P. Winker (2009). Heuristic optimization methods in economet-

rics. In: Handbook on Computational Econometrics (E. Kontoghiorghes

et al., Ed.). pp. 81–119. Elsevier. Amsterdam.

Gilli, M., D. Maringer and P. Winker (2008). Applications of heuristics in

ﬁnance. In: Handbook On Information Technology in Finance (D. Seese,

C. Weinhardt and F. Schlottman, Eds.). pp. 635–653. International

Handbooks on Information Systems. Springer, Germany.

Gilli, M., E. K¨ellezi and H. Hysi (2006). A data-driven optimization heuristic

for downside risk minimization. Journal of Risk 8(3), 1–18.

Gimeno, R. and J. M. Nave (2009). A genetic algorithm of the term structure

of interest rates. Computational Statistics and Data Analysis 53, 2236–

2250.

Glover, F. and M. Laguna (1997). Tabu Search. Kluwer Academic Publishers.

Boston, MA.

Gouri´eroux, C. (1997). ARCH Models and Financial Applications. Springer.

New York.

Hamerle, A., T. Liebig and D. R¨osch (2003). Credit risk factor modeling and

the Basel II IRB approach. Technical Report 2. Banking and Financial

Supervision. Deutsche Bundesbank.

51

Hill, T. and P. Lewicki (2006). Statistics: Methods and Applications. A Com-

prehensive Reference for Science, Industry and Data Mining. StatSoft,

Inc.

Holland, J. H. (1975). Adaptation in Natural and Artiﬁcial Systems. Univer-

sity of Michigan Press, Ann Harbor.

Hoos, H. H. and T. St¨ utzle (2005). Stochastic Local Search: Foundations and

Applications. Elsevier Inc.

Jacobson, S. H., S. N. Hall and L. A. McLay (2006). Visiting near-optimal

solutions using local search algorithms. In COMPSTAT 2006, Proceed-

ings in Computational Statistics (Alferdo Rizzi and Maurizio Vichi, Ed.)

pp. 471–481.

Kirman, A. (1991). Epidemics of opinion and speculative bubbles in ﬁnancial

markets. In: Money and Financial Markets (M. Taylor, Ed.). pp. 354–

368. Macmillan. New York.

Kirman, A. (1993). Ants, rationality and recruitment. Quarterly Journal of

Economics 108, 137–156.

Krink, T. and S. Paterlini (2009). Multiobjective optimization using dif-

ferential evolution for real-world portfolio optimization. Computational

Management Science (forthcoming).

Krink, T., S. Mittnik and S. Paterlini (2009). Diﬀerential evolution and com-

binatorial search for constrained index tracking. Annals of Operations

Research 152(1), 153–176.

Krink, T., S. Paterlini and A. Resti (2008). The optimal structure of pd

buckets. Journal of Banking and Finance 32(10), 2275–2286.

Krink, T., S. Paterlini and A. Restic (2007). Using diﬀerential evolution to

improve the accuracy of bank rating systems. Computational Statistics

& Data Analysis 52(1), 68–87.

Kumar, P. R. and V. Ravi (2007). Bankruptcy prediction in banks and ﬁrms

via statistical and intelligent techniques - a review. European Journal of

Operational Research 180, 1–28.

Lagarias, J. C., J. A. Reeds, M. H. Wright and P. E. Wright (1999). Conver-

gence behavior of the Nelder-Mead simplex algorithm in low dimensions.

SIAM J. Optimization 9, 112–147.

52

Lin, D. K.J., C. Sharpe and P. Winker (2010). Optimized U-type designs on

ﬂexible regions. Computational Statistics & Data Analysis 54(6), 1505–

1515.

Lyra, M., A. Onwunta and P. Winker (2010a). Threshold accepting for credit

risk assessment and validation. Technical Report 39. COMISEF Working

Paper Series.

Lyra, M., J. Paha, S. Paterlini and P. Winker (2010b). Optimization heuris-

tics for determining internal rating grading scales. Computational Statis-

tics & Data Analysis 54(11), 2693–2706.

Mansini, R. and M. G. Speranza (1999). Heuristic algorithms for the portfolio

selection problem with minimum transaction lots. European Journal of

Operational Research 114, 219–233.

Maringer, D. (2005). Portfolio Management with Heuristic Optimization.

Springer. Dordrecht.

Maringer, D. and M. Meyer (2007). Smooth transition autoregressive models

- new approaches to the model selection problem. Studies in Nonlinear

Dynamics and Econometrics (forthcoming).

Maringer, D. and O. Oyewumi (2007). Index tracking with constrained

portfolios. Intelligent Systems in Accounting, Finance and Management

15, 57–71.

Maringer, D. and P. Winker (2009). The convergence of estimators based on

heuristics: Theory and application to a GARCH model. Computational

Statistics 24, 533–550.

Min, S. H., J. Lee and I. Han (2006). Hybrid genetic algorithms and sup-

port vector machines for bankruptcy prediction. Expert Systems with

Applications 31, 652–660.

Moscato, P. (1989). On evolution, search, optimization, genetic algorithms

and martial arts – towards memetic algorithms. Technical Report 790.

CalTech California Institute of Technology.

Moscato, P. and J.F. Fontanari (1990). Stochastic versus deterministic up-

date in simulated annealing. Physics Letters A 146(4), 204–208.

Nelson, C. R. and A. F. Siegel (1987). Parsimonious modeling of yield curves.

Journal of Business 60(4), 473–489.

53

Pattarin, F., S. Paterlini and T. Minerva (2004). Clustering ﬁnancial time

series: an application to mutual funds style analysis. Computational

Statistics and Data Analysis 47, 353–372.

Price, K. V., R. M. Storn and J. A. Lampinen (2005). Diﬀerential Evolution:

A Practical Approach to Global Optimization. Springer, Germany.

Ronchetti, E. and M. Genton (2008). Robust prediction of beta. In:

Computational Methods in Financial Engineering (E. Kontoghiorghes,

B. Rustem and P. Winker, Eds.). pp. 147–161. Springer. Berlin.

Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the

American Statistical Association 79, 871–880.

Rousseeuw, P. J. and A. M. Leroy (1987). Robust Regression and Outlier

Detection. John Wiley & Sons, New Jersey.

Savin, I. and P. Winker (2010). Heuristic optimization methods for dynamic

panel data model selection. Application on the Russian innovative per-

formance. Technical Report 27. COMISEF Working Papers Series.

Storn, R. and K. Price (1997). Diﬀerential evolution: a simple and eﬃ-

cient adaptive scheme for global optimization over continuous spaces.

J. Global Optimization 11, 341–359.

Svensson, L. E. O. (1994). Estimating and interpreting forward interest rates:

Sweden 1992-1994. Technical Report 4871. National Bureau of Economic

Research.

Talbi, E. G. (2002). A taxonomy of hybrid metaheuristics. Journal of Heuris-

tics 8, 541–564.

Trabelsi, A. and M. A. Esseghir (2005). New evolutionary bankruptcy fore-

casting model based on genetic algorithms and neural networks. in: Pro-

ceedings of the eleventh IEEE conference on tools with artiﬁcial intelli-

gence (ICTAI-2005).

Varetto, F. (1998). Genetic algorithms applications in the analysis of insol-

vency risk. Journal of Banking and Finance 22, 1421–1439.

Winker, P. (2001). Optimization Heuristics in Econometrics: Applications of

Threshold Accepting. Wiley, New York.

Winker, P. and D. Maringer (2004). The hidden risks of optimizing bond

portfolios under VaR. Deutsche Bank Research Notes 13, Frankfurt a.M.

54

Winker, P. and D. Maringer (2007). The threshold accepting optimisation

algorithm in economics and statistics. In: Optimisation, Econometric

and Financial Analysis (Erricos J. Kontoghiorghes and C. Gatu, Eds.).

Vol. 9 of Advances in Computational Management Science. pp. 107–125.

Springer.

Winker, P. and K.-T. Fang (1997). Application of threshold accepting to

the evaluation of the discrepancy of a set of points. SIAM Journal on

Numerical Analysis 34(5), 2028–2042.

Winker, P., M. Gilli and V. Jeleskovic (2007). An objective function for

simulation based inference on exchange rate data. Journal of Economic

Interaction and Coordination 2, 125–145.

Winker, P., M. Lyra and C. Sharpe (2010). Least median of squares estima-

tion by optimization heuristics with an application to the CAPM and

multi factor models. Computational Management Science (forthcoming).

Zhang, J. and D. Maringer (2010a). Asset allocation under hierarchical clus-

tering. Technical Report 36. COMISEF Working Papers Series.

Zhang, J. and D. Maringer (2010b). A clustering application in portfolio man-

agement. In: Electronic Engineering and Computing Technology (S. Ao

and L. Gelman, Eds.). Vol. 60 of Lecture Notes in Electronical Engineer-

ing. pp. 309–321. Springer.

55

**Heuristic Strategies in Finance – An Overview∗
**

Marianna Lyra† September 21, 2010

Abstract This paper presents a survey on the application of heuristic optimization techniques in the broad ﬁeld of ﬁnance. Heuristic algorithms have been extensively used to tackle complex ﬁnancial problems, which traditional optimization techniques cannot eﬃciently solve. Heuristic optimization techniques are suitable for non-linear and non-convex multi-objective optimization problems. Due to their stochastic features and their ability to iteratively update candidate solutions, heuristics can explore the entire search space and reliably approximate the global optimum. This overview reviews the main heuristic strategies and their application to portfolio selection, model estimation, model selection and ﬁnancial clustering.

Keywords: ﬁnance, heuristic optimization techniques, portfolio management, model selection, model estimation, clustering.

Financial support from the EU Commission through MRTN-CT-2006-034270 COMISEF is gratefully acknowledged. † Department of Economics, University of Giessen, Germany.

∗

1

Contents

1 Introduction 2 Heuristic Optimization Techniques 2.1 Trajectory Methods . . . . . . . . . . . 2.1.1 Simulated Annealing . . . . . . 2.1.2 Threshold Accepting . . . . . . 2.2 Population based Methods . . . . . . . 2.2.1 Genetic Algorithms . . . . . . . 2.2.2 Diﬀerential Evolution . . . . . . 2.3 Hybrid Methods . . . . . . . . . . . . . 2.3.1 Hybrid Meta-Heuristics . . . . . 2.3.2 Other Hybrids . . . . . . . . . . 2.4 Technical Details . . . . . . . . . . . . 2.4.1 Constraint Handling Techniques 2.4.2 Calibration Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 6 6 6 9 9 10 13 13 14 15 15 17

3 Response Surface Analysis for Diﬀerential Evolution 18 3.1 Response Surface Analysis . . . . . . . . . . . . . . . . . . . . 18 3.2 Implementation Details for Diﬀerential Evolution . . . . . . . 19 3.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4 Heuristics in Finance 4.1 Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Transaction Costs and integer Constraints . . . . 4.1.2 Minimum Transaction Lots . . . . . . . . . . . . 4.1.3 Cardinality versus Diversity . . . . . . . . . . . . 4.1.4 Index Tracking . . . . . . . . . . . . . . . . . . . 4.1.5 Downside Risk Measures . . . . . . . . . . . . . . 4.2 Robust Model Estimation . . . . . . . . . . . . . . . . . 4.2.1 Nonlinear Model Estimation . . . . . . . . . . . . 4.2.2 Robust Estimation of Financial Models . . . . . . 4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Risk Factor Selection . . . . . . . . . . . . . . . . 4.3.2 Selection of Bankruptcy Predictors . . . . . . . . 4.4 Financial Clustering . . . . . . . . . . . . . . . . . . . . 4.4.1 Credit Risk Assignment . . . . . . . . . . . . . . 4.4.2 Portfolio Performance Improvement by Clustering 4.4.3 Identiﬁcation of Mutual Funds Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 24 26 27 29 30 31 31 35 36 37 37 39 40 41 42

2

5 Conclusions Bibliography

43 49

List of Tables

1 2 3 4 Optimal Orthogonal Design to minimize Median of Squares Residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimal Rotatable Design to minimize Median of Squared Residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Design Coding. . . . . . . . . . . . . . . . . . Experimental Design (Simulation Setup). . . . . . . . . . . . . 23 . 23 . 45 . 47

List of Figures

1 2 3 4 5 6 7 8 9 Decomposition of heuristic optimization techniques . . . . . . The crossover mechanism in GA. . . . . . . . . . . . . . . . . Crossover representation for the the ﬁrst and the last parameter Classiﬁcation of Hybrid Meta-Heuristics . . . . . . . . . . . . Calibration of technical parameters. . . . . . . . . . . . . . . . Least median of squared residuals as a function of α and β . . CCD Graph for three factors. . . . . . . . . . . . . . . . . . . CCD with orthogonal design . . . . . . . . . . . . . . . . . . . CCD with rotatable design . . . . . . . . . . . . . . . . . . . . 7 10 12 14 18 36 46 48 48

3

1

Introduction

The ﬁnancial paradigm entails simple mathematical models but often imposes unrealistic assumptions. Basic ﬁnancial models consist of linear and/or quadratic functions which ideally can be solved by traditional optimization techniques like quadratic programming. However, the ﬁnance community soon realizes the complexity of the economic environment and the market ineﬃciency. To cope with this, the ﬁnancial community develops more realistic mathematical models which include more complex, often non-convex and non-diﬀerentiable functions with real world constraints. Under these more challenging conditions quadratic programming routines or other classical techniques will often not result in the optimal solution (or even in a near-optimum). Heuristic optimization methods1 are ﬂexible enough to tackle many complex optimization problems (without a linear approximation of the objective function). Heuristics’ ﬂexibility stems either from the random initialization or from the intermediate stochastic selection and acceptance criterion of the candidate solutions. However, the optimization results should be carefully interpreted, due to the random eﬀects introduced during the optimization process. Besides, repetitions of the algorithm might generate heterogenous solutions from an unknown distribution. To approximate the optimum solution without any prior knowledge of the distribution, Gilli and Winker (2009) suggested repeating an experiment with diﬀerent parameter settings or applying extreme value theory. Also, Jacobson et al. (2006) developed a model to approach the unknown distribution of the results. This paper examines the modern ﬁnancial problems of portfolio selection, robust model estimation and selection and ﬁnancial clustering. It analyzes research papers published during the last two decades, where alternative heuristic techniques were applied independently or complementarily to other computational techniques. Furthermore, this overview extends the contributions collected in Gilli et al. (2008), who presented the most popular heuristics and their usage in ﬁnance. Moreover, this work emphasizes and distinguishes the implementation characteristics of several applications. The paper is organized as follows: Section 2 reports on heuristic and hybrid meta-heuristic methods and describes the basic algorithms. Also, it discusses some implementation issues that should be considered when applying heuristics. Section 3 introduces in more detail the technical issue of parameter setting in heuristic optimization algorithms. In due course, an

The expressions ‘heuristic optimization methods’, ‘heuristic strategies’, ‘heuristic techniques’, ‘meta-heuristics’ and ‘heuristics’ will be used interchangeably throughout the paper.

1

4

The two main classes of heuristic algorithms are construction heuristics and improvement (or local search) heuristics. e. Heuristics provide an alternative or optimize the current intelligent techniques. This overview explains the basic optimization heuristic algorithms applied in the recent literature to ﬁnancial problems. Finally. in principle.1. Local search heuristics iteratively update a set of u initial candidates that improve the objective function. the candidates that form an optimum solution (Hoos and St¨ tzle (2005). This chapter addresses only local search methods.1). Construction heuristics start from an empty initial solution and construct. Section 4 addresses contemporary problems in ﬁnance and analyzes how heuristics can be applied to approach and overcome them. is used for parameter calibration. 2 Heuristic Optimization Techniques Heuristic optimization techniques form part of the broad category of computational ‘intelligent’ techniques which have. Their basic methodology is described in Section 4. This might be due to their tendency to local optima and their problem-speciﬁc performance. In comparison with other intelligent techniques. by the means of nearest neighborhood. solved non-linear ﬁnancial models. The ﬁrst application of heuristics in ﬁnance can be found in the work of Dueck and Winker (1992). as well as the number of solutions that are updated simultaneously is what distinguish local search methods. ch. Response Surface Methodology (RSM) (Box and Draper (1987)). A diverse range of optimization heuristics have been developed to solve non-linear constrained ﬁnancial problems. (2006)). like Neural Networks (NN) (Ahn et al.established methodology. heuristics need little parameter tuning and they are less dependant on the choice of starting solutions. Section 5 concludes and suggests further applications of heuristics to open ﬁnancial questions.g. The mechanisms of neighborhood structure. they do not appear in practical ﬁnancial problems. Figure 1 decomposes the local search methods to trajectory methods and population based methods. To the best of my knowledge. RSM has not been applied so far in the context of parameter tuning for heuristics. The interaction of heuristics with other intelligent techniques is discussed in Section 4. While constructive methods are easy and fast to implement and can be applied to problems like scheduling and network routing. who used Threshold Accepting (TA) for portfolio optimization with diﬀerent utility criteria. Here we present the four most common local search heuristics 5 .

However. u is a uniformly distributed value lying in the interval [0.e. Ceteris paribus. In that sense. the lower the probability that an impairment will be accepted.1. The new solution is carried onto the next iteration and is used as a benchmark. 2.1.1 2. i.5. In every iteration a new neighboring solution is randomly generated (4:). the following steps are repeated. they can accept not only improvements but also controlled impairments of the objective function value as long as a threshold is satisﬁed. the algorithm randomly generates a solution (2:). the higher the (negative) change. ch.2) are threshold methods that accept a neighboring solution based on a threshold value. The lower its value is.found in the ﬁnancial literature. the less tolerant it is to negative changes. 1]. an impairment is also accepted with a diminishing probability (6:). Algorithm 1 describes the general outline of a SA implementation. The probability value is subject to a macroscopic variable. for a predeﬁned number of iterations. 2. where the changes (transformations) in the energy are related to macroscopic variables. I. ∆. the algorithm has a chance to escape local minima. The probability value is inﬂuenced by changes in the objective function value and the ‘temperature’ T. where we believe we are closer to the optimum solution and impairments are unwelcome. This is required. For an extensive description of heuristic optimization techniques see also Winker (2001). T.1. Initially. The ‘temperature’ T controls the tolerance of the algorithm in accepting downhill moves. temperature. T is iteratively changed and the best objective function value is reported. especially at the end of the search. Next. SA is inspired by the principals of statistical thermodynamics. The algorithm always accepts the new solution if it results in an improvement of the objective function value.1 Trajectory Methods Simulated Annealing Simulated Annealing (SA) as well as TA (section 2. e−∆/T > u. The number of iterations or the properties of the temperature deﬁne the stopping criterion. SA relates the change in the objective value to a probability threshold. By allowing moves towards worse solutions. In the latter case the temperature reduces until an equilibrium state is found where no further change in the objective function value can be observed.2 Threshold Accepting Dueck and Scheuer (1990) and Moscato and Fontanari (1990) conceived the 6 .

hhhhh hhhh hhhh hh sh h 1 h h 1 1 Construction methods 1 Heuristic T echniques + methods Threshold accepting 1 1 1 T rajectory iiii tiiii Local isearch methods iii iiii i P opulation search ) Diﬀerential evolution 1 1 1 7 Simulated Annealing h Hybrid Meta Heuristics Figure 1: Decomposition of heuristic optimization techniques Genetic algorithms 7 .

Local diﬀerences actually calculated during the optimization run are considered. Moreover. Like SA. A threshold value (τ ) determines to which extent not only local improvements. χ1 ∈ N (χ0 ) Compute ∆ = f (χ0 ) − f (χ1 ) if ∆ < 0 or e−∆/T > u then χ0 = χ1 end if Reduce T end for idea of building a simple search tool appropriate for many types of optimization problems named TA. a smooth threshold sequence is obtained. TA enables the search to escape local minima by accepting not only improvements. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: Initialize I. For a comprehensive overview of TA. (2010b) proposed a threshold sequence based on the diﬀerences in the ﬁtness of candidate solutions that are found in a speciﬁc place of the search process. The reduction of the neighborhood allows a wider search space at the beginning of the search and a greedy search towards the end. but also local impairments are accepted (∆ < τ ). the thresholds decrease linearly with the number of iterations. Lyra et al. Nonetheless. The local diﬀerences of the ﬁtness function are sorted in descending orders.Algorithm 1 General Simulated Annealing Algorithm. the impairment acceptance is not a probabilistic decision but rather a given threshold sequence. In addition. but also impairments of the objective function value. More precisely. Winker and Fang (1997) and Winker and Maringer (2007) suggested a data driven approach for the construction of the threshold sequence. The threshold sequence is ex-ante constructed based on the search space. T Generate at random a solution χ0 for r = 1 to I do Generate neighbor at random. the threshold sequence adapts to the region where the current solution belongs and to the objective function used. Thus. By using a moving average. its parameter settings and its convergence properties. see Winker (2001). 8 . the current candidate solution is compared with a neighboring solution (∆ = f (χ0 ) − f (χ1 )). As a result. they represent a diminishing threshold sequence. until the stopping criterion is met.

a (0) population np of initial solutions Pj.i ) < f (P.i = P.r1 and P. ‘reproduce’. bankruptcy prediction. The objective function value (or the ﬁtness value) is estimated for every initial solution.. credit evaluation and budget allocation. the solutions are encoded in binary stings but. are chosen from the initial set to generate a new candidate solution. First in (2:). d. r1 = r2 = i (0) (0) (u) 6: Crossover and mutate P. nG (0) 2: Initialize and evaluate population Pj. Their wide applicability springs from generating new candidate solutions using logical operators and from evaluating the ﬁttest solution. Second.i = P. we construct the ﬁrst child chromosome. 1: Initialize parameters np . ‘genes’. the algorithm iteratively generates nG new candidate (0) solutions.i ) if f (P. Algorithm 2 depicts the implementation details of GA.i 12: end if 13: end for 14: end for There are alternative ways to crossover which are problem-speciﬁc and can improve the performance of GA..i .r2 ∈1. · · · .r2 to produce P... np 3: for k = 1 to nG do 4: for i = 1 to np do 5: Select r1 .i ) then (1) (u) (u) (0) (u) 9: P.. ‘parent chromosomes’. as follows: The two ﬁttest candidate solutions P. j = 1. They were developed by Holland (1975).2 2. other representations can be used.i 10: else (1) (0) 11: P. the recombination of existing elements.. · · · .2.. ‘chromosomes’.. stock and portfolio selection. The second child chromosome 9 .1 Population based Methods Genetic Algorithms Genetic Algorithms (GA) are one of the oldest evolutionary optimization methods.i 7: 8: Evaluate f (P.i is randomly initialized. · · · . Algorithm 2 Genetic Algorithms.np . i = 1. crossover and random mutation.r2 . The simplest way is to choose randomly a crossover point and copy the elements before this point from the ﬁrst parent and the elements after the crossover point from the second parent.. Usually.2. They have been applied to numerous ﬁnancial problems including trading systems.r1 (0) and P. A thorough explanation of the algorithm follows. This way.. The reproduction or update consists of two mechanisms...

In order to achieve convergence. Diﬀerential Evolution (DE) is a population based technique. the population size np and the generation number nG should be determined. ‘children or oﬀspring’.. 1 1×k 1×k child1 = 1 1 0 1 | 1 1 0 .2.2 Diﬀerential Evolution Similar to GA.. but more appropriate for continuous problems. During the initialization phase. 1 child2 = 1 0 1 0 | 0 1 0 . a random mutation of the elements takes place. np should be increased more than ten times the number 10 .. To complete the update.↓crossover-point parent1 = 1 1 0 1 | 0 1 0 . During mutation the new elements change randomly.. Hence. for each member of the population a new candidate solution. Finally. combines the second part of the ﬁrst parent chromosome and the ﬁrst part of the second parent chromosome.. An alternative crossover operators is to select more than one random crossover points (Back et al.. (1996)) or to apply a uniform crossover (Savin and Winker (2010)). Algorithm 3 (1:). are transferred unmodiﬁed to the next step.. 2. originated by Storn and Price (1997). In some implementations of GA a number of elite solutions. In the case of binary encoding a few randomly chosen bits change from 1 to 0 or from 0 to 1. ‘elite chromosomes’. These are solutions with distinguished characteristics like good ﬁtness value. 1 1×k 1×k Figure 2: The crossover mechanism in GA.. is produced and evaluated (7:). Figure 2 shows such a simple crossover procedure. 1 parent2 = 1 0 1 0 | 1 1 0 . The attractiveness of this technique is the little parameter tuning needed. only the solutions with the best objective function value survive the generation evolution. Two technical parameters F and CR should also be initialized at this stage.

Then.i ) then 16: P..9. both are problem speciﬁc. Price et al.i = P. np 3: for k = 1 to nG do 4: P (0) = P (1) 5: for i = 1 to np do 6: Generate r1 . whereas each element of the current solution crosses with a uniformly designed candidate A practical advice for optimizing objective functions with DE given on www.r2 . According to the characteristics (0) of a population based approach a set of population solutions P.i 19: end if 20: end for 21: end for (1) (u) of parameters to be estimated.r2 .i 13: end if 14: end for (u) (0) 15: if f (P.r3 ∈1. While both in GA and DE the mutation serves as a mechanism to overcome local mimina their implementation diﬀers..i = P.i = Pj.. d.i = P.P. As mentioned above. F and CR (1) 2: Initialize population Pj.np . · · · . 2 11 .8 and CR = 0. (2005) in their book about DE state that..r1 + F × (P.i .2 while the product of np and nG deﬁnes the computational load.. · · · . 1: Initialize parameters np .Algorithm 3 Diﬀerential Evolution.edu/ ∼ storn/.icsi. Their calibration procedure as well as a more statistically oriented methodology are explained in Section 3.. j = 1. although the scale factor F has no upper limit and the crossover parameter CR is a ﬁne tuning element.. i = 1. Winker et al. the initial population of real numbers is randomly chosen and evaluated (2:). as GA does.i is updated. (2010) in an application of DE propose values for F = 0..r3 ) 8: for j = 1 to d do 9: if u < CR then (u) (υ) 10: Pj.i 17: else (1) (0) 18: P.i 11: else (u) (0) 12: Pj. The algorithm updates a set of parameters through diﬀerential mutation (7:) and crossover (9:) using arithmetic instead of logical operations..berkeley. Diﬀerential mutation generates a new candidate solution by linearly combining some randomly chosen elements and not by randomly mutated elements. r1 = r2 = r3 = i (υ) (0) (0) (0) 7: Compute P.i = Pj...i ) < f (P. · · · . nG . for a predeﬁned number of generations the algorithm performs the following procedure.

it replaces the initial element in the population.i Figure 3: Crossover representation for the the ﬁrst and the last parameter 12 .i (Figure 3). the value of the objective function f (P. . where F is the scale factor that determines the speed of shrinkage in exploring the search space. CR ud . (u) Finally. The above process repeats until all elements of the population are considered and for a predeﬁned number of generations. .i resulting in a new trial vector Pj.i is replaced by (υ) (u) a mutant one Pj.and not with an existing one. . CR ud .i (u) . . . . (7:) demonstrates the ‘metamorphosis’ (modiﬁcation). Only if the trial vector results in a better value of the objective function. during crossover. .. diﬀerential mutation constructs new parameter vectors P.i ). P (u) d.i P1. . Further. . . P (u) d..i by adding the scaled diﬀerence of two randomly selected vectors to a third one. DE combines the initial elements with the (0) new candidates. each component Pj.. (υ) Particularly. With a probability CR. .i ) of the trial vector is (0) compared with that of the initial element f (P. . CR u1 . CR uniform crossover uniform crossover P1. (9:). u1 .i (u) .

In a high-level method the meta-heuristics are autonomous (column 1). Two characteristics distinguish optimization heuristics. A low-level hybrid crosses components of diﬀerent metaheuristics. whereas special hybrids combine meta-heuristics that solve diﬀerent types of problems. a combination of them can be beneﬁcial for ﬁnding the optimal solution.3 2. global or partial. These are homogenous or heterogenous. the number of solutions they iteratively update and the mechanism for this update. Let us take a threshold heuristic as an example. The ﬁrst characteristic refers to the distinction between trajectory search and population search heuristics and the second to the mechanism of neighborhood solution construction. are suitable for optimizing many combinatorial problems.2.1 Hybrid Methods Hybrid Meta-Heuristics Heuristic optimization algorithms. i. When two or more meta-heuristics are combined. while the last column the ﬂat scheme. and a population based heuristic. and general or special. Both a high and a low level methods distinguish relay and cooperative hybridization (column 2). Numerous hybrid meta-heuristics can be constructed when combining different heuristics and their characteristics. When the whole search space is explored by heuristics we have a global hybridization. Figure 4 presents this classiﬁcation. also known as meta-heuristics in their general setting. The ﬁrst two columns represent the hierarchical classiﬁcation. Two diﬀerent meta-heuristics construct a heterogenous hybrid. as diﬀerent meta-heuristics exhibit diﬀerent (advantageous) characteristics. diﬀerent heuristics explore only a part of the solution space. The basic advantage of hybrid meta-heuristics is that they can combine the outstanding characteristics of speciﬁc heuristics. GA. which becomes more intense as compared to single heuristics. In a partial hybrid.3. Though. The taxonomy based on hybrids’ design (algorithm architecture) merges a hierarchical and an independent scheme (see Talbi (2002)).e. new hybrid computational intelligent techniques are constructed. i. In the latter group the heuristics cooperate and in the former group they are combined in a sequence. hybrid meta-heuristics can be categorized based on other characteristics. When identical in structure meta-heuristics are used.e. independent of the hierarchy. hybrids should be applied with caution since algorithm performance will be well aﬀected by parameter tuning. Furthermore. Besides their advantages. TA. General hybrid algorithms are suitable for solving the same type of problems. The new hybrid resulted from a composition of a threshold and a population based method is a high-level or 13 . a homogenous hybrid is constructed.

general hybrid also known as Memetic Algorithm (MA) (Moscato (1989)).3. However. Using that technique they showed that interactions between market agents could realistically represent foreign exchange market expectations. TA can well escape local minima by its acceptance criterion. (1999)) has the advantage of ﬁnding an optimum solution with few computational steps. They model the behavior of foreign exchange (DM/US-$) market agents while optimizing the Agent Based Model (ABM) parameters. combining these meta-heuristics can be beneﬁcial for the algorithm convergence.2). To overcome the problem.3 Starting from d + 1 (in this application 3) initial solutions. The Nelder-Mead simplex search (Lagarias et al. as long as they do not exceed the threshold value. the search moves to a better solution that is a reﬂection of this initial triangle or makes bigger steps if the reﬂection results in an improvement of the objective function value. Some selected applications are presented in the following. 14 . In the next sections some applications of hybrid meta-heuristics are described. global.Low S Coop S S ÙÙ S Ù Ù SS Ù Ù Ù Homo/Hetero S Global/Partial S S High Ù Ù Relay General/Special Figure 4: Classiﬁcation of Hybrid Meta-Heuristics heterogenous. 2. if the simplex method cannot ﬁnd a reﬂection point that outperforms the initial parameter settings. 3 The simplex search chooses an eﬃcient step size. e.1. additional hybridizations of meta-heuristics with either constructive methods or other intelligent techniques can be found.g. Thus. the authors applied the acceptance criterion of TA (Section 2. Maringer (2005) used a MA in the context of risk factor and portfolio selection. a GA can explore eﬃciently a wider search space using the cross over and mutation mechanisms. Gilli and Winker (2003) constructed a hybrid optimization technique using the Nelder-Mead simplex search and the TA algorithm. It accepts neighborhood solutions even if they result in a deterioration. On the one hand. On the other hand.2 Other Hybrids Often in the literature. the search space shrinks and the algorithm gives a solution which is nothing more than a local optimum.

Selecting the appropriate heuristic technique will improve the performance of the hybrid. it will be diﬃcult to evaluate the consistency of the results. imposing a penalty on infeasible solutions. NN and SVM. applying a repair function to ﬁnd the closest feasible solution to an infeasible one. The latter paper stressed the potentials of incorporating GA to SVM in future research. keeping the implementation simple is more advisable. So.g. Heuristics are able to consider almost any kind of constraints. GA selected the best set of predictors and estimated the parameters of a kernel function to improve SVM performance using data from Korean companies. alternative methods can be considered: rewriting the deﬁnition of domination. However. resulting generally in good approximations of the optimal solution. (2006) forecast bankruptcy with Support Vector Machines (SVM). such that it includes the constraint handling. Besides model selection. e. the more complex an algorithm is. but also for weight estimation. The meta-heuristic was used not only for model selection.4. First. To this end. they should be applied with caution.4 2. Despite of the potentials of meta-heuristics in tuning or training other techniques.g. the problem constraints have to be taken into account. The ﬁrst possibility has been described for the application of DE in credit 15 . Second. e. diﬀerent heuristics are suitable for diﬀerent types of problems. Trabelsi and Esseghir (2005). Ahn et al. Hence. Several methods to handle constraints are suggested in the literature so as to ensure at the end a feasible solution. Trabelsi and Esseghir (2005) constructed a hybrid algorithm which evolves an appropriate (optimal) artiﬁcial neural network structure and its weights. these papers applied GA to tune the parameters of bankruptcy prediction models. the more unknown parameters a heuristic algorithm has. Varetto (1998) applied GA to train NN. instead of GA. (2006) used GA together with other intelligent technics to select an optimal set of bankruptcy predictors. the application of DE. 2. Ahn et al. When running the optimization heuristics. (2006) and Min et al. (2006) and Min et al. the presence of constraints makes the search spaces discontinuous and ﬁnding a solution that satisﬁes all the constraints might be a demanding task.In a diﬀerent application. in weight estimation might improve both the eﬃciency and the computational time of the hybrid since DE appears to be more suitable for continuous problems.1 Technical Details Constraint Handling Techniques Real world ﬁnancial problems face simultaneously several constraints.

i satisfy the side constraints. . depending on the kind of objective function used the penalty technique may improve the reliability of heuristics. (2010b) (Section 4. Pj.i and the current candidate solution (0) (u) (0) (u) Pj. (2007).2) in an GA implementation to parameter estimation. In contrast. . Krink et al.i ). (2010b) found that the constraint-dominated handling technique performed well while taking comparatively little computation time. Dueck and Winker (1992) (Section 4.i ) satisﬁes the condition f (Pj. (2009) in an application of DE to index tracking (Section 4.i if its ﬁtness f (Pj. 2. If the new candidate solution Pj. In the second possibility. (a) . (b) . which ensures at the end a feasible solution. In this case. Section 2. For minimization problems.i ) ≤ (0) f (Pj.1) provide an implementation of TA where infeasible solutions are accepted to pass on to the next step by imposing a punishment function.risk assignment in Krink et al. Lyra et al. i.4). Lyra et al. 16 (u) (u) . the procedure can be described as follows within Algorithm 3.2. Pj. a penalty technique allows infeasible candidate solutions while running the algorithm as a stepping stone to get closer to promising regions of the search space. If only one candidate solution is feasible.i replaces Pj. To guarantee a feasible solution at the end and avoid early convergence to a local minimum. An increasing penalty function was also applied by Gimeno and Nave (2009) (Section 4. Generally.e. if both solutions violate the same number of constraints. reduce the variance of the results obtained. select the feasible one 3.i if its ﬁtness f (Pj. Depending on the type of the objective function. a penalty term is added to the objective function. .i (0) (u) (u) replaces Pj. Solutions should be penalized the more they violate the constraints. .2: 1. .4) implemented TA for credit risk assignment with both the constraint-dominated handling technique and a penalty technique.i ). applied a repair function to ﬁnd the closest feasible solution to an infeasible one. .1. If both solutions violate constraints.i ) ≤ f (Pj. the penalty was increased over the runtime of the algorithm. select the one that violates fewer constraints. The penalty function is added to the objective function (utility function). . However. The intuitive idea of this constraint handling technique is to leave the infeasible area of the search space as quickly as possible and never return.i ) (u) (0) satisﬁes the condition f (Pj.

· · · . in higher dimensions (higher number of parameters) a heuristic can optimize the parameters of another heuristic. F and CR. but sometimes time consuming way to select the parameter values is to iteratively run the algorithm for diﬀerent parameter values and then extract some descriptive statistics (Maringer (2005)).2 Calibration Issues The selection of parameter values for a heuristic optimization algorithm is essential for its eﬃciency.9 do Repeat Algorithm 3 from line 3-21 end for end for Figure 5 exhibits the dependence of the best objective value obtained for See Section 3 for an application of Response Surface Analysis to the estimation of DE parameters in an application to Least Median of Squares (LMS) estimation of the CAPM and a multifactor model. 4 17 . np for F = 0. linear. These parameters are the iteration number. as any optimization heuristic technique.9. A simple. a regression analysis or a Response Surface Analysis (Box and Draper (1987))) can also be tested. d. 0. A detailed description of this optimization problem is given in Section 4. e.4 Alternatively. Algorithm 4 Calibration Issues. nG (1) Initialize population Pj. i = 1. respectively. exponential. a rather simple penalty function is recommended. · · · . determines the speed of convergence to the global solution and the variability of the results. To ﬁnd the optimal combination of F and CR.diﬀerent penalty weight functions can be applied.4.2.5 to 0.2. 0. the neighborhood structure and the new solution (candidate) acceptance criterion.g. 1: 2: 3: 4: 5: 6: 7: Initialize parameters np . Winker et al. While DE. Since calculation of the penalty weight function might increase the CPU time. · · · . For a small number of parameters. The procedure is illustrated in Algorithm 4 for parameter values ranging from 0. Nonetheless.i . should not depend on the speciﬁc choice of starting values. the above methods are eﬃcient and relatively fast in ﬁnding the optimal parameter settings for a heuristic algorithm. (2010) implemented DE for LMS estimation of the CAPM and the multifactor model of Fama-French. · · · .5. the authors run the algorithm for diﬀerent combinations of F and CR.9 do for CR = 0. increasing or decreasing with the iteration number. j = 1.5. the optimal combination of the technical parameters. 2.

7.2 −5 f 5. while the right side shows the mean over 30 restarts.8 5 0.8 0. 2nd. Instead of repeating a core algorithm numerous times.5 a more eﬃcient experimental design methodology can be applied.5:0. The aim of RSM is to ﬁnd the parameters value with little tuning and without precise prior knowledge of their optimum value.6 0.9. The population size np and the number of generations nG are set to 50 and 100. values of F = 0. it becomes smoother as CR reaches 0.1 5 0.diﬀerent combinations of F and CR for the LMS objective function (Section 4.9 are proposed for estimating the parameters of the CAPM and the Fama-French factor model.1 Response Surface Analysis for Diﬀerential Evolution Response Surface Analysis RSM helps explore the relationship between the objective function value and the level of parameters using only few level combinations (Box and Draper (1987)). while the dependency on F appears to be less pronounced.2) using the ﬁrst 200 daily stock returns of the IBM stock starting on January. 1970. results improve. Based on these results.8 and CR = 0.8 CR F 1 1 0. namely RSM. x 10 6 −5 x 10 5.8 independent of the choice of F . The left side of Figure 5 presents the results for a single run of the algorithm.2.6 F 1 1 0. The initialization is based on a prior knowledge about the best 5 In this case 41 × 41 for all combinations of F and CR ranging in 0. In the ﬁrst stage. the input variables are initialized.6 Figure 5: Calibration of technical parameters.8 CR 0. respectively. Although the surface is full of local minima for CR below 0. 18 . 3 3. The results clearly indicate that for higher values of CR. RSM consists of three main stages.01:0.5 f 5.6 0.

If the lower and upper bounds of the factor levels are known. The reason for choosing a polynomial relationship is that no prior information about the existing relationship between the input and the response variable is needed. one can model the relationship by considering either a linear or a non-linear shape. this results in the optimal set of parameters.combination of parameters that will optimize the objective function.6 Inside this domain the factors are assigned speciﬁc values (levels). 3. yi = β0 + where yi β ′s X ′s ε minimum median of squared residuals for i − th setup of input variables parameters of equation input variables for setup i residual . the experimental domain is speciﬁed. (1) To ﬁnd the optimal combination of factors’ value that optimize the objective function of interest one should ﬁrst understand how the objective function value changes with possible changes in the factors’ level. one can evaluate the objective function for these factor combinations and then make some inference about the possible relationship between the parameters and the objective function. When the shape of the relationship is not known a priori. After diﬀerentiation. In this exercise. In the ﬁnal stage. 19 . at which the response variable is estimated. βi Xi + βii Xi2 + βij Xi Xj + ε . It is based on orthogonal and/or rotatable experimental designs.2 Implementation Details for Diﬀerential Evolution RSM explores the relationship between several input variables (factors) and a response output. Alternatively. the performance of RSM is tested in ﬁnding 6 7 Orthogonal and rotatable designs are described in detail in what follows. By randomly changing the factor values (either simultaneously or one at a time) one can investigate the reaction of the objective function value. A higher order polynomial can be tested when there are enough data points.7 This process approximates either linear or quadratic relationship. a second order polynomial is ﬁtted using the designed experiments. the most common process is trial and error. These values constitute the reference or the center points. a second order polynomial relationship can be tested (see Equation 1). Next. for three factors. To discover the relationship.

t − α − βxi. both above and below the high and low values considered above. Fcenter ±0. a level that represents the best known value for the factors (Fcenter ). The input variables are the (scaling) factor F . a two level full factorial design assigns two levels. the crossover probability CR and the population size np . For that. The eﬃciency of the CCD stems from the fact that it uses only 18 level combinations to calibrate the input variables. a low and a high level.β (2) where εi. the aim is to ﬁnd the optimal combination of the parameters F . a ﬁve level full factorial design uses 53 = 125 level-combinations.the optimal combination of DE parameters value in the speciﬁc application of obtaining the CAPM parameters to minimize the median of squares error.g. Eﬃciency means to obtain a representative relationship between the parameters and the objective function of interest with the least parameter tuning. Also. even up to a fourth level. Since the product of np × nG should be constant. A two level factorial combination alone can capture a possible linear relationship between the input variable and the response output. a Central Composite Design (CCD) is applied.1).t are the residuals of a factor model. Three diﬀerent levels of computational resources are considered. To specify the factor levels that help estimate the parameters of Equation 1. with nG × np = 500. Besides. The response variable is the minimum value obtained for min(med(ε2 )) . Alternatively. Note that for DE the product of the population size np and number of generations nG determines the computational complexity of the algorithm. CCD is an experimental method that eﬃciently builds a second order model for the response variable (Barker (1985)). By using ﬁve levels for each factor a curvilinear relationship. in order to expand the experimental domain. respectively.t α. 20 .t = yi. Hence. low. three additional levels are assigned to the factors. to the three factors (input variables). two additional levels are considered. 2500 and 5000 meaning. The Appendix gives a detailed explanation and a numerical representation of the CCD experiments. medium and high complexity. That is. CCD contains ﬁve level-combinations. The two levels range symmetrically around a center point (e. we are able to search the optimal factor levels over a wider experimental domain. or a quartic relationship can be investigated. In addition to the two level factorial design. First. i. Thus. ﬁrst. CR and np for a given computational load np × nG . eight experiments result from the two level-combinations (23 ) in our problem. an adjustment is made on nG in each alteration. the computational load (np × nG ) is kept constant.

It is eﬀective for quadratic functions when only bound constraints are considered.3 Output Tables 1 and 2 report the optimal combination of DE parameters. To control for the randomness in the heuristic’s output. Factors’ optimal values are determined by diﬀerentiating the ﬁtted second order polynomial with respect to each input variable. low. quadprog is used. Using the response values from all setups (experiments). With np at least ten times the number For this step the build-in Matlab 7. while the upper bounds are 2. the worst value. as suggested by RSM using orthogonal and rotatable designs. The median of squares residuals for CAPM estimation is compared with the best known optimum reported by Winker et al. Since the values of np are large enough.8 ∂y = βi + 2βii Xi + βij Xj = 0 (3) ∂Xi Then. The evaluation is repeated 30 times to control for the stochastic eﬀects of heuristics. the 5th percentile. The reported optimal parameter combinations are used to evaluate the performance of DE. The mean response value (mean out of 10 repetitions of Equation 2) for each combination is reported. The optimal values of np are rounded up to the nearest integer. The upper bound of F is based on Price et al. 1 and inf. and the frequency of the best value occurs over 30 repetitions of the algorithm. respectively. 3. Equation 2. whereas np has no real upper limit. Equation 2 is evaluated and the median of squares residuals for CAPM estimation is reported.6 function. Columns 5 to 11 report for each set of parameters the best value. We report results for diﬀerent computational complexities. the experiments are repeated 10 times. the 90th percentile. The choice of lower bounds prevents the Newton method from stacking in a local optimum.For each level-combination of input variables a response value is obtained. CR. medium and high (columns 1-4). In our problem the lower bound of F . the variance. 8 21 . Equation 3. the minimum median of squared residuals. the rounding does not affect the mathematical results of the design (does not inﬂuence dramatically the objective function value). respectively. the optimal combination of input variables is used to estimate Equation 2. This tool uses a reﬂective Newton method for minimizing a quadratic function. Table 1 presents the results of RSM with orthogonal design. For that. a second order polynomial is ﬁtted. (2005). The output suggests that the level of the scale factor F is irrelevant for the convergence of DE especially when CR is above 0.7. the median. np reﬂect the lower level CCD design. CR is a probability. (2010).

but not repetitively. For high computational complexity the algorithm is more stable and results in the best reported optimum (Winker et al. Similarly. Possible extensions of the approach might include diﬀerent formulations (or transformations). np must be at least ten times the number of parameters in order for the algorithm to reach the global optimum. investors choose their optimal portfolio by maximizing the expected portfolio return (E = w1 µ1 + w2 µ2 ) given their risk proﬁle. other than a polynomial one.7. np is an important parameter in the convergence of DE and CR stabilizes the algorithm. (2010)) in all 30 repetitions. for high computational complexity the global optimum can be found. one should bear in mind that the analysis is sensitive to the starting values (centers) as well as the dispersion of the high and low design points around the center. For medium computational complexity the optimum is identical with that of orthogonal design with lower variance in 30 repetitions.1 Heuristics in Finance Portfolio Selection In the simple case of the Markowitz mean-variance (MV) framework with two assets.1 . 4 4. RSM results in slightly diﬀerent optimal parameter settings. These models consist of a linear function and a quadratic function of the weighted asset covariances which can be solved by traditional optimization techniques like 22 . Table 2 reports the results of RSM for rotatable design.2 ) for a given desirable level 2 of return.of parameters of the objective function (parameters of CAPM) and CR above 0. to represent the relationship between the parameters and the objective function. the algorithm can converge to the best results reported in Winker et al. The variance of a stock σ1 is the covariance σ1. RSM is proven to be a useful starting tool for calibrating the parameters of DE that best minimize the median of squares residual (Equation 2) without using high computational resources. Generally. Of course. Yet. (2010). for medium computational load the algorithm ﬁnds the global optimum just once. in higher dimensions (higher number of parameters) a heuristic tool can be used to optimize the parameters’ values. or by minimizing 2 2 2 2 the portfolio risk (V = w1 σ1 + w2 σ2 + 2w1 w2 σ1. Also. in 13 other cases it approaches the optimum with accuracy of 4 decimals. Although. Shrinking or widening the distance between the design points (distance of factorial and orthogonal designs from center) might result in highly uncertain parameter estimates.

9935 · 10−5 4.9935 · 10−5 4.0069 · 10−5 Best 4.7501 · 10−41 1.6616 1.3041 CR 0.1175 · 10−5 6.3457 F 0.9232 · 10−5 q90% 4.2711 · 10−5 Freq 30 1 1 Freq 1 1 1 23 Table 2: Optimal Rotatable Design to minimize Median of Squared Residuals.0089 · 10−5 q90% 4.9871 0.9935 · 10−5 5.6318 0.9768 · 10−5 Worst 4.9935 · 10−5 5.9935 · 10−5 5.9935 · 10−5 4. CPU H M L CPU H M L F 0.9935 · 10−5 5.6322 1.9935 · 10−5 5.9935 · 10−5 5.5042 · 10−10 Var 4.Table 1: Optimal Orthogonal Design to minimize Median of Squares Residuals.7501 · 10−41 2.0079 · 10−5 Med 4.7226 0.7661 · 10−12 1.9935 · 10−5 4.8666 · 10−29 9.0143 · 10−13 q5% 4.9935 · 10−5 4.9935 · 10−5 5.6318 0.9910 1.9935 · 10−5 5.9935 · 10−5 5.2804 · 10−5 Med 4.9935 · 10−5 4.9935 · 10−5 4.9935 · 10−5 5.4474 · 10−5 Var 4.1599 · 10−5 Worst 4. .6897 · 10−5 9.0075 · 10−5 q5% 4.9935 · 10−5 4.6587 CR 0.0000 np 25 26 13 np 81 26 11 Best 4.6597 0.

The constraint considered here is transaction costs. However.1 Transaction Costs and integer Constraints Apart from the alternative utility criteria that can be pursued by an investor. cardinality constraints. Thus. due to extensive availability of computational resources. even with linear side conditions on the weights. etc. two more complex risk functions other than variance were optimized. with fairly limited computational resources available. mainly for simplicity. Considering a slightly more complex objective function. taxes. being a semi variance function and a weighted mean geometric return function. However. Yet. No transaction costs were considered though. The optimum might be even diﬃcult to approach for a linear utility function (an absolute or semiabsolute deviation risk function) when constraints are considered (Mansini and Speranza (1999)). more precisely using TA. several other implications and constraints appear in real life in portfolio selection problems. expected shortfall (ES)). meaning human portfolio selection. heuristic optimization techniques have been applied to higher-dimension problems. The seminal work of Dueck and Winker (1992) opened new horizons to portfolio optimization using heuristics. mod24 . In the last two decades. the problem was not tractable with standard optimization algorithms. In that application. the selection process with transaction costs can often result in totally diﬀerent optimal portfolios (Dueck and Winker (1992)).quadratic programming. TA. In the early literature transaction costs were omitted. The following subsections describe the most recent research in this direction. complex. non-diﬀerentiable utility function (with linear or non-linear side conditions) will not result in the optimum (eﬃcient) portfolio selection (or in any optimum near) by quadratic programming routines. The portfolios were 5% to 30% more proﬁtable than the usual practice until that point. 4. optimization heuristics are ﬂexible enough to optimize portfolios based on complex objective functions without the need for a linear approximation of the objective function. investors may well seek to optimize utility criteria other than MV (like a Cobb-Douglas objective function for portfolio choice) or other risk criteria (like Value at Risk (VaR). A non-convex. They have been applied to portfolio optimization problems that also allow for side conditions like transaction cost.1. Alternative risk criteria are mainly chosen to relax the normality assumption of returns and to accommodate the asymmetry of asset returns. could select among 71 bonds of the Federal Republic of Germany proﬁtable portfolios (minimize risk functions given expected return) in reasonable time.

The optimization algorithm selects the assets am and their weights by using (ideally) the total budget amount. Section 2. GA has the additional advantage of simultaneously maintaining a population of candidate portfolios. In such a case the excess amount can be deposited. SA. budget constraints with transaction costs and integer constraints are included and compared. Diﬀerent cost schemes could aﬀect the diversity of the portfolio which could lead to diﬀerent utilities. In that respect. 9 25 . ch. If one of the above conditions is considered the search space is far from being smooth and convex. The study concluded that the presence of transaction costs. a budget amount is uninvested since only integer numbers of assets can be bought. cv } cf + cv ﬁxed costs (0) variable costs where Sm is the current stock price variable costs with lower (minimum) limit ﬁxed plus variable costs . the simultaneous maintenance of many candidate solutions can be a computational burden. TA and GA are often used in portfolio selection problems. SA and TA are ﬂexible enough to adjust approximately to any restrictions.9 Maringer (2005). Transaction costs can appear in the form of minimum amount. heuristic techniques can be applied. Integer constraints and several types of transaction schemes were used and their relationship to the investment level was tested.1. One exception are investors facing only a ﬁxed cost scheme with an initial investment over 1 million. applied SA to portfolio selection with transaction costs. M. . 3. .1. transaction costs (c) can vary between cf (0) cv = cp · am · Sm max{cf . could help ﬁnd a portfolio composition with better diversiﬁcation (improvement of the utility function) and higher Sharpe Ratio (SR) than the simpliﬁed model with no transaction costs. In high dimensions. SA. Then. For an investment in asset am m = 1. it might be diﬃcult to approach the global optimum or even something near. aﬀected the optimal portfolio structure. He used 30 German assets from the DAX index. Their construction (Section 2) allows for faster and more eﬃcient application to discrete ﬁnancial problems. when also non-negativity and integer constraints are included. . For the optimal portfolio composition. The number of diﬀerent assets invested are reduced with increasing transaction schemes. transaction costs are part of the budget. Nevertheless. proportional amount on trading volume and ﬁxed fees per trade. Hence. Often.ern portfolio selection includes transaction costs in the optimization process (Maringer (2005)). .

transaction 10 A good solution is the one that has low distance from the theoretical MV framework) 26 . They applied variants of evolutionary heuristic algorithms to solve the portfolio selection problem with transaction lots.2 Minimum Transaction Lots Minimum transaction lots is a type of integer constraint. The half non-selected assets were replaced by the k low cost assets in the portfolio. but also other constraints like transaction limits and cardinality constraints. Gilli and K¨llezi (2002) optimized their portfolio e choice using as risk measures VaR and ES (4. reach ﬁxed iteration number. Mansini and Speranza (1999) showed that for a linear utility function (an absolute or semi-absolute deviation risk function) when linear constraints (minimum transaction lots and proportional transaction costs) are considered the problem is NP hard. Instead of a typical stopping criterion.g. In the late 1990s. the algorithm was repeated until a number of assets was tested in the portfolio construction. That is. a minimum bundle of units. (0) lm = Nm · Sm . In another application. In that application the problem structure allowed for transaction lots constraints. e.5). from a speciﬁc asset should be bought. The distinctive advantage of that heuristic was the independence of the computational time from the number of assets. Together these constraints can add to the complexity of the problem and harden the construction of an optimal portfolio.In reality not only transaction costs appear.1. The algorithm would select the k cheapest assets ak to construct the portfolio. This basic algorithm (and some variations) allowed for a good10 solution to the portfolio optimization problem in reasonable time.1. Also. It constraints the number of units from a speciﬁc stock or bond that should be bought simultaneously. The constraint is formulated in terms of money as. Diﬀerent crossover points indicated the securities in and out of the portfolio. e. This is especially prominent when we consider small portfolios. The induction of additional constraints into the portfolio selection problem is presented in the following sections. (4) where lm Nm minimum transaction lots in money minimum required units of asset m . 4.g. 5000 or multiples.

DE could approximate closer and faster the MV frontier compared to quadratic programming (QP) and another GA. the acceptance criterion of a local search method can be used. So. By nature. The algorithm adopts DE for weight selection. To overcome local minima. . Still. It would also be of interest to test its performance in an even more realistic framework with transaction costs and cardinality constraints (some additional constraints are included in Fastrich and Winker (2010)). were introduced. K the constraint is formulated as am ≤ kmax . Ch. weight constraints and limit on the proportion of assets in a particular sector or industrie are considered. 4. TA. When other non-convex objective functions. . portfolios with fewer stocks are easier to manage and less expensive in terms of total transaction costs than larger more diversiﬁed portfolios. a trajectory heuristic. when optimally chosen. together with integer constraints. an additional integer constrain is needed. TA could easily deal with those in reasonable computational time. Cardinality constrains the number of diﬀerent assets (A) in a portfolio. #{A} = K. M (5) where kmax A maximum number of diﬀerent assets in portfolio integer quantities of active positions in portfolio 27 . When K varies from 1. namely a cardinality constraint. Transaction lots. Krink and Paterlini (2009) propose a Diﬀerential Evolution algorithm for Multiobjective Portfolio Optimization (DEMPO) for portfolio selection under the MV framework with risk measures other than variance.e. Besides. . They conﬁrmed the robustness of TA when compared with quadratic programming under MV framework and without any integer constraints. In order to select the optimal size of a portfolio. VaR and ES. 18).3 Cardinality versus Diversity In reality. private investors choose a rather small-sized portfolio which is not necessarily diversiﬁed internationally (Cuthbertson and Nitzsche (2005). . was applied instead of a population based one. asset weights are continuous variables. the convergence of DE to the exact front shape can be further improved. For the optimization. Small portfolios. can diversify the risk away and thus increase investor’s utility.costs and cardinality constraints were considered. i.1. such as VaR and ES. Both approaches resulted in the same optimum MV portfolio. like transaction lots.

M . GA. GA. in selecting a portfolio that maximizes SR. This additional integer constraint (especially.12 Alternatively.am active position in asset. Initially. m = 1.. While all three methods share local search characteristics. The following literature imposes a cardinality constraint in the portfolio selection problem and improves or optimizes the risk-return SR or the MV portfolio. assets are randomly selected and included in the portfolio. First Chang et al. respectively. 12 Its distinctive selecting performance lies on the random initial selection of assets and the update procedure of the candidate solution. Chang et al. So. TS belongs to local search heuristics. While traditional selection algorithms result in suboptimal portfolios as the number of assets and constraints increases. the heuristic encoded both weight and model selection. Heuristics can capture the discrete selection nature of the problem and consider at the same time the additional constraints. (2000) used three diﬀerent heuristics. In the update procedure an asset is added not only if it improves the SR but also if it contributes to the diversiﬁcation of risk. it needed comparatively more time. It is ﬂexible to select not only between asset with high risk premium per unit of risk (SR). For a comprehensive representation of the algorithm see Glover and Laguna (1997). Maringer (2005) reported the superiority of a single agent heuristic. Maringer (2005) combined a local and a population search method to improve the selection procedure for Markowitz eﬃcient frontier. The optimization algorithm is slightly altered to ﬁnd not only which asset weights are optimal but also which assets should be ideally selected from the available ones by optimizing a given objective function. . but also good asset combinations. like SA and a variant of it. diﬀerent k from M assets M and their combinations are selected based on the ink vestors risk proﬁle. which help to escape fast inferior solutions. resulted in fairly better approximations of the theoretical efﬁcient frontier. (2000) and later on Maringer (2005) used heuristics to replicate the eﬃcient frontier with integer constraints. these are assets with negative (or low) correlation.13 The performance of the hybrid meta-heuristic was tested against a pure metaheuristics. heuristics serve as a panacea. The last approach improved the performance of heuristics. . While a multiple agent heuristic. together with transaction costs and transaction lots) results in a discrete combinatorial optimization. In contrary. Tabu Search (TS)11 and SA to replicate all the Pareto-optimal (non dominated) portfolios of the eﬃcient frontier under cardinality and transaction lots constraints. 13 In the Sharpe framework the heuristic encoded only the asset selection. the hybrid meta-heuristic was more reliable – closer to the best-known solution. Ideally. . 11 28 . whereas in the Markowitz eﬃcient frontier identiﬁcation. SA.

in terms of daily asset returns.The results suggest a rather small-sized portfolio. the objective function of interest. investors face all the constraints discussed in the previous sections which result in a complex optimization problem. the algorithm results at the end in a portfolio that can be as diversiﬁed as the market index. Maringer and Oyewumi (2007) applied DE to track a portfolio with constrained number of assets that mimics best the DJIA64 index. The objective function of interest is to minimize the root-mean-squared deviation of the selected portfolio’s daily returns from those of the index. Further research can be devoted in identifying the factors aﬀecting the optimal K.and out-of-sample. The resulted tracking portfolio outperformed both in. there was no clear-cut indication of the optimal number of assets 29 . Yet. One alternative is to endogenize in the optimization process the choice of K. the higher volatility is outweighed with a higher return-to-risk ratio. Typically. However. Thus. For the few k from M assets and their combinations. This is a twofold optimization where the optimal asset positions and their weights should be selected. hereafter called the tracking error. This decision is called active portfolio management. in many cases the index. the weights are quickly identiﬁed by the algorithm and more computational eﬀort is then devoted to update (improving) the candidate selection. In that respect. investors look for the optimal combination of assets to track the index as close as possible. In the other case. investors choose the portfolio that best optimizes their utility function (risk-return equilibrium) under constraints. alternative heuristics (suitable for discrete or continuous problems) or a combination of them are applied. which when optimized by heuristics can replicate relatively accurately the market index.4 Index Tracking A straight-forward extension of an active portfolio optimization with cardinality considerations is the passive tracking of an index with a limited number of assets. In passive selection. for cardinality below 40 the suggested portfolio is more volatile than the index. how diﬀerent risk aversion. is to minimize the distance between the selected portfolio’s daily returns and index returns. More precisely. investors are convinced that in the long run the best thing to do is to mimic the market index. In addition. the positive diﬀerence increases for the in-sample training set. utility functions or stock index aﬀect the choice of K. Selecting the optimal asset positions is a discrete combinatorial problem whereas selecting the optimal asset weights is a continuous problem. In the former case. For smaller portfolios (especially for cardinality below 30).1. 4. Whereas heuristics perform reliably also in this complex problem.

g. They evaluated the asset weights’ distance of the benchmark portfolio (Nikkei 225 price index was considered) from those found in the tracking portfolio. 15 Additionally. 4. These measures relax the normality assumption of returns imposed by the MV framework. (2009).5 Downside Risk Measures As mentioned in Section 4.1. 16 Maringer and Oyewumi (2007) experimented with one and two year windows and suggested that shorter in-sample windows together with higher cardinalities are more important for the out-of-sample performance.2. the two methods could obtain similar results. 14 30 . Speciﬁcally. There are still however some open questions for constrained index tracking. namely the performance of the tracking portfolio in diﬀerent window set-ups16 and the factors aﬀecting the volatility in the results. Like semi-absolute risk function. Thus. For this experiment they neglected the cardinality constraint.that best minimize the tracking error.1. VaR and ES.15 DECSIT could potentially solve that selection problem with constraints and provide quality results in reasonable time. the above risk measures minimize either the probability that the expected value of the portfolio falls below a given value (VaR) or the expectancy of such a fall (ES). More insides to the trade-oﬀ between the optimal number of asset in a portfolio and the tracking error is given by Krink et al. In a comparison with QP (with no cardinality or weight constraints). instead of minimizing the mean (positive and negative) squared deviations from the expected return. by building an artiﬁcial index they evaluated the performance of DE using the asset weights instead of the returns. So. in the presence DE is by nature more suitable for continuous problems. e. the risk of having negative deviations from the expected portfolio value is minimized. Finally. Using this tool the quality of the results and the runtime of the algorithm improved. The results suggested that by increasing the number of assets in a portfolio the in-sample annualized tracking error volatility reduces up to a certain limit. a combinatorial operator is needed to improve the performance of DE in the discrete combinatorial asset selection. one can further investigate the eﬀect of index composition in selecting the maximum number of assets of the tracking portfolio. They also applied DE but with a combinatorial search operator (DECS-IT)14 to track the perfectly diversiﬁed market portfolio by minimizing the volatility between the log-returns of the benchmark (DJIA65 and Nikkei 225 price indices) portfolio and the tracking portfolio in every time period. Especially. The same is true for the relationship between cardinality and out-of-sample performance. The benchmark asset weight are calculated from market values. VaR and ES cannot always be optimized by traditional linear or quadratic solvers. alternative risk measures can be used in place of variance. DECS-IT could very precisely replicate the 225 asset weights.

the excess kurtosis decreases with increasing time horizons.1 Nonlinear Model Estimation Convergence and Estimation of GARCH Models In reality. A unanimous ﬁnding all along was that minimizing risk.2 Robust Model Estimation Apart from using meta. Apart from one application of DE (Krink and Paterlini (2009)). heuristic algorithms calibrate the parameters of ﬁnancial and economic nonlinear models that improve the accuracy (precision) and reliability of core ﬁnancial models. as opposed to maximizing reward. where εt denotes the volatility of daily returns of an AR(1 ) process (Gouri´roux (1997)). a model conditional e also on the previous risk (error) components is a generalized autoregressive conditional heteroskedasticity (GARCH) model. sporadic extreme observations can force the distribution of historical stock returns to lean away from the mean. In more recent extensions. i. 4. In due course. For daily returns the probability of observing this phenomenon is higher than normal. 2 V ar(εt ) = α0 + α1 εt−1 .2. TA is often the heuristic used for discrete optimization. Furthermore. Yet. TA was used as an optimization algorithm for portfolio construction. much literature is concentrated on the application of heuristics for the indirect optimization of models through parameter estimation. Gilli et al. daily ﬁnancial returns exhibit time varying volatility. Gilli and K¨llezi (2002). Gilli and Schumann (2010c) presented alternative measures of reward and risk. forcing their distribution to exhibit excess kurtosis. Winker and Maringer (2004) constructed a hybrid population based algorithm with local search characteristics for the same problem.or hybrid-heuristics for the direct optimization of ﬁnancial problems. As such. and Gilli and Schumann (2009) and Gilli and Schumann (2010d) compared the performance of portfolios selected by various risk measures with MV portfolio. it is diﬃcult to choose an optimal portfolio that satisﬁes these constraints in a reasonable time. e. (2006) and Krink and Paterlini e (2009) approach the portfolio selection problem under discrete scenarios by optimization heuristics.of integer constraints. transaction lots and cardinality constraints.e. 4. like the Omega function. 31 . often resulted in good prediction of the out-of-sample frontier. Thus. A one period ARCH model depicts this characteristic.. for asset allocation. the search space is highly non-smooth and discontinuous.g.

The selected parameter changed by adding a uniform random error. ﬁnding the parameters that give an optimal solution is not guaranteed using standard econometric softwares. Maringer (2005) used the GARCH(1. Maringer and Winker (2009) inferred the same superior performance of TA in comparison to standard numerical econometric packages. they estimate the number of iterations. as a function of sample size. This is possible. called agents. with a given probability. diﬀerent softwares result in diﬀerent optimal solutions (local optima) and the solution is very sensitive to the parameters’ starting values. Therefore. The random error narrowed as the iteration number increased. due to the good parameter approximation of search heuristics. Especially when agents’ behavior is either less rational or heterogenous. reported above.1.1) model to estimate the daily volatility of the German mark/British pound exchange rate. ABMs are ﬂexible to simulate the behavior of market participants. a low deviation was reported between heuristic optimization outcome and true parameters’ value. This is true in particular for small sample sizes. But. In due course. Indirect Estimation and Agent Based Models Modeling the behavior of market participants becomes even more challenging when it cannot be determined by prior information like in a GARCH model. heuristic techniques can also be considered for selecting the lag order for historic volatility estimation approaches. It is evident that the value of the TA maximum likelihood estimator is much closer to the true one compared to what other numerical packages report. needed by a TA approximation to converge. Maringer (2005) applied SA to ﬁnd the parameters of a GARCH model which maximize the (log)-likelihood function. Maringer and Winker (2009) applied TA for the same non-linear problem and simultaneously provided a framework for studying the convergence properties of such estimators based on heuristics. A more reﬁned statistical framework is suggested by Maringer and Winker (2009) for analyzing the parameters of heuristics that will further improve the converge of estimators.The parameters of such non-linear models are most often estimated by maximizing the (log)-likelihood function. Heuristic optimization overcomes this problem and provides a close to optimum solution for such non-linear models.1. Besides. to the true maximum likelihood estimator of a GARCH model. Simulating the actual be32 . With I = 50 000 iterations and a shrinking neighborhood structure. the author implemented a SA algorithm like the one presented in Section 2. In every iteration one parameter was chosen for further alteration. and the interaction between them. An interesting extension is the derivation of joint converge properties for higher order GARCH models (or more complex estimators in general). Sometimes.

like the one discussed in the Appendix. a design approach. there are also other non-linear functions that are used to estimate ﬁnancial and economic models. Larger values of threshold sequence correct for the high variance (see also Section 2. Gilli and Winker (2003) used a hybrid TA approach to optimize the ABM parameters. the authors applied the acceptance criterion of TA. However. the estimation of ABM parameters and the validation of their performance in a realistic market setting with actual data are complex tasks. The hybrid combined simplex search ideas with TA algorithm. 17 33 .1. such as MS-AR or SETAR or STAR.2) was applied for optimizing the parameter of ABM. The eﬃciency of agent based modeling was tested with empirical data. Using these techniques they showed that interactions between market agents could realistically represent foreign exchange market expectations. A hybrid TA (Section 2. Gilli and Winker (2003) introduced a hybrid TA algorithm to estimate an ABM of foreign exchange markets. Later on.havior of agents using ABM can result in a realistic representation of the basic features of daily ﬁnancial returns. Winker et al. in the particular application simplex did not even result in a local optimum due to the high Monte Carlo sampling variance of the initial parameter estimates. excess kurtosis (kd ) as well as time varying volatility (α1 ) of an ARCH(1) model were calculated by simulating the expectations of foreign exchange rate based on agents behavior. To overcome the problem of high Monte Carlo variance on estimating the parameters with relatively low number of repetitions. Yet. It was tested whether the interaction between two types of agents (fundamentalists and chartists) could realistically explain the excess kurtosis and time varying volatility of returns. (2007) proposed an evaluation tool for such a model. ABM were applied to the model of foreign exchange (DM/US$) market agents suggested by Kirman (1991) and Kirman (1993). namely excess kurtosis and time varying volatility. TA and DE to the estimation of parameters of the STAR model. Importantly. most of these models depend heavily on the parameter Maringer and Meyer (2007) introduced and compared SA. Future research can introduce alternative models to ABM.3. For this test. can serve as a benchmark for the estimation of parameters’ starting values.17 In addition.2). This is mainly due to the large number of parameters and the non-analytical evaluation of such models. Yield Curve Estimation Apart from the log-likelihood function of GARCH estimation.

or as it is widely known the yield curve. the NSS function. The model estimates the (spot and) forward interest rates that form the yield curve. The ﬁtness function corresponds to the weighted sum of squared deviations of the actual (observed) coupon bond prices of diﬀerent maturities from the ones calculated using the estimated forward interest rates. The following papers show how this problem can be avoided by using optimization heuristic techniques. the NSS approach models forward interest rates based on short and long term future maturities and the transition between them. which causes variability in the estimated parameters. it depicts the expected interest rate for a loan starting in one. The error term is a weighted function of the bond’s duration period. like non-linear least squares optimization. An interesting extension would be interesting to compare the computational time and eﬃciency of both GA and DE. This way it can capture the shape of the yield curve. is typically estimated (Gimeno and Nave (2009)) using non-linear models. The term structure of interest rates refers to market’s expectations of interest rates in diﬀerent time horizons. Likewise. when estimating the parameters of NSS (a non-linear function). and heuristics for simulated and real world bond prices of diﬀerent maturities. while Gilli and Schumann (2010b) applied DE. GA and DE resulted in lower and less volatile estimation errors. An alternative implementation is to use an improved hybrid optimization algorithm which combines a heuristic and a numerical optimization method. while the input parameters’ values are crucial. For that. Central banks need to calculate the forwardlooking interest rates of diﬀerent maturities on a daily basis. Financial pricing calls for an accurate estimation of forward interest rates. Gimeno and Nave (2009) and Fern´ndez-Rodr´ a ıguez (2006) suggested application of GA to accurately estimate the coeﬃcients of the Nelson-Siegel function and its extension. For example. Gilli and Schumann (2010a) and Gilli and Schumann (2011) discuss the application of DE together with a 34 . as ABM do. Besides. For the calculation of forward rates central banks most frequently use the Nelson-Siegel (Nelson and Siegel (1987)) function or the Nelson-Siegel-Svensson (NSS) function (Svensson (1994)). traditional optimization techniques are sensitive to initial coeﬃcient values.estimation value. the term structure of interest rates. Overall. two or ten years from now that lasts for one or several years. The authors compared the ﬁtness function value using both traditional optimization techniques. According to Gimeno and Nave (2009) traditional optimization techniques failed to converge towards a global optimum solution. These interest rates serve as an instrument of monetary policy or they are oﬀered by banks to loans of diﬀerent maturities or they are used in other sectors for asset and derivative pricing.

direct local search optimizer (Nelder-Mead search) in parameter estimation. Ronchetti and Genton (2008) used shrinkage robust estimators for estimating and predicting the model. Figure 6 shows the above objective function. However. the relative performance of both estimators in a simple one-day-ahead conditional forecasting experiment is mixed. which calls for a heuristic optimization application. DE compared to TA allowed a faster and more reliable estimation of β in the factor models. Recently. The existence of many local minima is evident. Using a rolling window analysis on some stocks from the DJIA index.e. While the LMS estimator has a high breakdown value.t = yi. a proof of concept is provided. evaluated for the CAPM residuals. A substantial amount of research in ﬁnancial market economics has focused on the robust estimation of the CAPM parameters. the estimates obtained by LMS do not exhibit less variation (from those resulting from OLS) as might have been expected from the outlier related argument.t α. i. Chan and Lakonishok (1992) had used least absolute deviations arguing that absolute residuals have smaller inﬂuence on LS than squared residuals. (2010) compare OLS and least median of squares (LMS) estimation in the CAPM and the multifactor model of FamaFrench factors (Fama and French (1993)). ordinary least squares (OLS). both used in pricing options.t are the residuals of a factor model. For the linear explanation of the risk premium on individual securities relative to the risk premium on the market portfolio. 35 . Winker et al. Still.β (6) where εi. In particular. only a single inﬂuential observation can change the parameters of the OLS coeﬃcients signiﬁcantly (Rousseeuw 1984).t − α − βxi. Good model ﬁt was reported for the suggested hybrid. its objective function is complex with many local minima.2 Robust Estimation of Financial Models Least Median of Squares Estimation The last core ﬁnancial model discussed in this section is the CAPM. i.. using the 200 daily stock returns of the IBM stock starting on January 2nd. Robust estimation techniques have been suggested in the statistical literature as a solution to the low breakdown point of simple estimation approaches. Rousseeuw and Leroy (1987) deﬁne the LMS estimator as the solution to the following optimization problem: min(med(ε2 )) . principally. 4.2. Furthermore. 1970. They estimated the parameters of the Heston (more easily) and the Bates model.

However. discriminant analysis.01 0 −0. it would be of interest to identify in more detail the situations when the estimation and forecast based on LMS outperforms OLS and vice versa.g.005 1. to select some economically meaningful factors. the method should be applied to diﬀerent data sets.3 Model Selection This section presents the use of heuristics in selecting a set of independent variables which better explain a given explanatory variable. the higher the chances are also to include some unimportant factors which worsen the predictive performance of our model.g. Also. e.x 10 8 −5 med(ε2 ) t 6 4 2 1. Heuristic optimization methods can be trained. e.8 1.4 0. 36 . First. But.01 β α Figure 6: Least median of squared residuals as a function of α and β LMS estimates obtained by means of heuristic optimization can be used for real life applications. as we have more chances to include the important factors. One can argue that the more factors we include in a model.6 1.g. Traditionally. for their application no strong statistical hypotheses are made. Some extensions of the paper are straightforward based on the results reported.005 0.. the better the model ﬁt or its predictive performance will be. 4.2 1 −0. stock returns from other stock indices or stock markets. stepwise analysis etc. statistical methods are applied for model selection. statistical methods ignore the economic or theoretical relationship between the variables and rely on strong distributional assumption for the factors. Furthermore. e. by the objective function. normality assumption for the distribution of factors.

at a given point in time. Under this framework banks are required to detain a minimum capital to cover portfolio losses from expected defaults.4. like TA. Future work can concentrate on improving the selection performance of the hybrid meta-heuristic.3. instead of SA can also be tested. get the highest possible explanatory power. Based on the 37 . Maringer (2005). a MA (Section 2. More importantly. complete enumeration of all possible combinations is diﬃcult to be tested in reasonable time.1. This will only be possible in the presence of information about the economic factors aﬀecting the returns of a speciﬁc asset at a given time. more emphasis can be given in the model weight selection. the loss given default (LGD) and a binary variable that determines the borrower’s default grade (D). Like asset selection. for problems including more than ten factors. 7 applied a hybrid meta-heuristic. MA was able to identify the 5 out of 103 MSCI indices that explain a satisfactory percentage of S&P 100 asset price variation.1 Risk Factor Selection A diﬀerent perspective to the asset pricing yields the risk factor selection for the Asset Pricing Theory (APT) model. The Basel II Accord on Banking Supervision legislates the framework for banks credit risk assessment (Basel Committee on Banking Supervision 2006). based on the APT model. Expected losses equal the product of the exposure at default (EAD).1) to select the appropriate ﬁrm and industry speciﬁc indices for asset return estimation.1) to ﬁnd an optimum subset of MSCI factors that best explain asset returns based on the APT model. adjusted R2 . In addition.2. That is. To enhance the economic credibility of the neighborhood construction (factor sets). TA has the advantage to construct the candidate solution acceptance criterion based on information from problem’s search area.1 and 2. more weight can be given to neighborhood solutions with economically sound interpretation (and less weight on non-economic sound sets). The use of another trajectory search method. risk factor selection is a complex combinatorial problem which in high dimensions is computationally very demanding. One alternative is to use heuristic optimization methods for model selection (Winker (2001)). 4. Ch. A meta-heuristic technique can be used to assign factors’ weights based on their economic credibility.3. The hybrid meta-heuristic initially selects a subset of factors and combines the properties of SA and GA (Section 2.3.2 Selection of Bankruptcy Predictors Factor selection refers also to the selection of predictors so as to evaluate the credibility (solvency) of a debtor (bank client/portfolio).

Each one of this techniques assumes a diﬀerent interaction among factors. The latter paper stressed the potentials of incorporating GA to SVM in future research. (1996)). The GA based model resulted in perceptibly smaller type I and type II errors. Varetto (1998). logit analysis. economic factors and their lags. While each technique constructed diﬀerent models (one.18 at least one and three years prior to failure. The meta-heuristic was used not only to model selection. whereas neural networks allow a non-linear relationship between explanatory variables. Ahn et al. Ahn et al. their predictive ability diﬀer. Extensive literature (see Kumar and Ravi (2007) for a review) is devoted on determining the default grade D. They used stepwise selection to choose the risk factors for the discriminant model and the logistic regression model. neural networks (Angelini et al. (1996) applied discriminant analysis. etc). Back et al. conventional stepwise analysis (Butera and Faﬀ (2006)). (2006) trained GA that resulted in an optimal selection of bankruptcy predictors. Its derivation is subject to a diverse range of risk factors and their weights. (2006) and Min et al. two and three years prior to failure). logit analysis and neural networks for score selection and failure prediction. Varetto (1998) applied GA to train neural networks. Hamerle et al. Trabelsi and Esseghir (2005) constructed a hybrid algorithm that evolves an appropriate (optimal) artiﬁcial neural network structure and its weights. 38 . (2006) forecast bankruptcy with SVM. This section addresses the development of a credit model responsible for this binary assignment. statistical techniques vary among linear discriminant analysis (Altman (1968). (2006) and Min et al. It is evident from the current ﬁnancial and credit market crisis that credit institutions should develop a more valid credit risk model for the classiﬁcation of borrowers. For credit model and weight selection. Discriminant analysis assumes a linear relationship. Besides. Genetic algorithms select the best set of predictors and estimate the parameters of a kernel function to improve SVM performance using data from Korean companies. GA were used in NN for choosing the links. NN were compared with a statistical technique.internal rating approach (IRB) banks can estimate internally these three risk components. but also to weight estimation. (2003) report a generalized factor model for credit risk composed by ﬁrm speciﬁc ﬁnancial data. (2008) and Di Tollo and Lyra (2010)) and genetic algorithms (Back et al. Besides model selection. systematic and idiosyncratic factors. In the same vein Trabelsi and Esseghir (2005). they used genetic rules for credit clustering. these papers applied GA to tune the parameters of bankruptcy prediction techniques. The latter technique is discussed in the following section. logit analysis adopts a logistic cumulative probability function. Additionally. 18 Smaller type I and type II errors can be interpreted as lower misclassiﬁcation error.

the better will be the model ﬁt. For example. Data clustering has always been on the epicentrum of research work using either statistical or intelligent techniques. But. TA and MA. Alternatively. namely discriminant analysis and logit or neural networks. Given the continuous nature of this problem DE can be suitable. a regression model was ﬁtted using a set of weighted indication variables. Heuristics 39 . respectively. Other heuristics can also be used in factor selection in SVM framework. GA identiﬁed the number of intervals to be added to the lower bound of the range. In that respect. Yet. classiﬁcation of data in several groups with constraint consideration can make the problem NP-hard (Brucker (1978)). DE for selecting the parameters of a kernel function and TA for choosing the important factors of the credit risk model. the higher the chances are also to include some unimportant factors (‘overﬁt’) which aﬀect (worsen) the predictive performance of our model. as we have more chances to include the important factors. While Section 4. In the context of risk factor selection for bankruptcy prediction other meta-heuristics or hybrid meta-heuristics can also be tested. credit risk quantiﬁcation and credit risk assignment are complementary procedures for qualiﬁed credit risk management. a population algorithm for continuous problems can result faster and more eﬃciently in the optimum weight values. The more factors we include in a model.discriminant analysis. What is still an open question in selecting bankruptcy predictors is whether it is necessary (and how) to chose the desired number of risk factors included in a model. (2010a) for an application of TA for determining the optimal number of credit risk grades). the ﬁtness function. in terms of insolvency diagnosis (failure prediction) for 4738 small Italian companies. support vector machines and case-based reasoning. To estimate the insolvency criterion. a diﬀerent heuristic optimization technique can be applied for weight selection. GA were used to select the optimal set of ﬁnancial ratios.4 Financial Clustering Risk factor selection. Such an application may further improve the performance of SVM. The above estimation asserts an economic interpretation of the weights in less time than human intuition. (2010b) and Lyra et al. For weight estimation.3 presented the development of a credit model and the quantiﬁcation of credit risk using meta-heuristics. a predeﬁned range of values was assigned to the weights and split in intervals of equal length. 4. the clustering of credit risk using heuristic techniques is addressed in this section. Meta-heuristics can be used to endogenously determine the optimal number of factors that can be included in a model (see Lyra et al.

(2007) and Krink et al. Provision can cover expected losses. 19 40 .are proven to be a reliable tool for ﬁnancial data classiﬁcation. of the total number of borrowers in a given portfolio. (2010a). the EAD in each bucket shall be no higher than 35% of the total borrowers’ exposure in a given portfolio (Krink et al. e. i. Apart from having at least seven clusters for non-defaulted borrowers (§ 404 of Basel Committee on Banking Supervision (2006)).e. an eﬃcient classiﬁcation tool is required that minimizes the misclassiﬁcation error (or the regulatory capital) and satisﬁes the other constraints imposed by Basel II. 4. To calculate RC banks should ﬁrst assign their client into groups based on their credit risk characteristics. (§ 285 of Basel Committee on Banking Supervision (2006)). The design of a risk rating system for the assignment of borrowers into homogenous grades is subject to a number of constraints imposed by Basel II. however additional capital or regulatory capital (RC) is required to cover unexpected losses. An additional constraint regarding the lower bound on the number of borrowers in each bucket is necessary to ensure that no bucket is empty and that the number of borrowers in each bucket is adequate to statistically evaluate ex-post the precision of the classiﬁcation system.4. 1%. borrowers should be assigned to a number of homogeneous groups. DE compared with other heuristics. Thus.19 Introducing the above constraints into the clustering problem increases its complexity. the ‘mean PD’ for each grade should exceed 0. the lower bound is set using statistical benchmarks. each group is assigned a ‘pooled PD’ or a ‘mean PD’ which distinguishes it from other groups.03%. Previous literature (Krink et al. (2007)).g. An important aspect for banks is to avoid misstatement (under/over statement) of capital requirements resulting from misclassiﬁcation of clients. To achieve this. Thus. the probability that borrowers will default the subsequent year (PD). heuristics have been suggested in the recent literature to tackle ﬁnancial classiﬁcation problems. Therefore. we avoid having high concentration of exposure in a given grade. Then. The deviation from the ‘mean PD’ determines the required capital detained and consequently the lending rate assigned to each risk group. In Lyra et al. (2007)) speciﬁes that it should exceed a given percentage. Krink et al. (2008) contribute to this context by clustering borrowers to a ﬁxed (given) number of buckets and optimizing different objective functions. In either case. Further.1 Credit Risk Assignment Basel II (Basel Committee on Banking Supervision (2006)) requires banks to detain suﬃcient capital to bare losses that arise not only from expected but from unexpected economic downturns.

we addressed the direct application of heuristics in asset selection (Section 4.like GA and Particle Swarm Optimization (PSO). for larger data sets and for other more realistic linear and non-linear constraints (such as the modeling of the dependence structure of defaults or the application of heuristics in a dynamic clustering framework) the problem becomes computationally more demanding. They suggest pre-clustering assets according to their performance so as to improve the asset allocation procedure. In addition. high dimensional portfolios or possible inaccurate estimations of expected returns or covariance matrices. In due course. However. pre-clustering may help reduce the dimensionality of the problem and thus facilitate the optimization procedure. its performance does not depend on a portfolio’s dimensions. The clustering of assets before portfolio selection serves as an alterative to the equally weighted asset allocation 1/N or the Markowitz allocation based on risk-return equilibrium. The discussed literature reported. TA was more eﬃcient (better grouping solutions) and faster. such as standard k-means and DE.1) when complex utility functions and constraints are considered.2 Portfolio Performance Improvement by Clustering So far. An indirect application of heuristics in portfolio selection (asset allocation) is found in Zhang and Maringer (2010a) and Zhang and Maringer (2010b). the precision of the classiﬁcation approach could be statistically evaluated ex-post.20 may result in a sub-optimal allocation of resources. Then. 4. The superiority of TA in this problem instance stems from the fact that a fast update of the objective function is feasible and thus computation time is largely independent of the number of clusters. This algorithm is found to outperform alternative methodologies already proposed in the literature. By doing so. Pre-clustering involves ﬁrst clustering assets into a given number of clusters so as to improve the overall portfolio SR. for a given risk. 20 41 . While DE performed reliably for a set of small and medium size Italian companies. They proposed a TA algorithm to exploit the inherent discrete nature of the clustering problem.4. (2010b) extent the previous literature by determining not only the optimal size but also the optimal number of clusters. In latter case. little parameter tuning was required compared with GA and PSO. showed consistently superior performance in the discrete problem of credit risk rating. using heuristics. in many instances. So. Lyra et al. higher portfolio returns. It is an open challenge to improve the computational eﬃciency of heuristics in the computationally demanding problem of ﬁnancial clustering. The equally weighted asset allocation is data independent.

The random noise is added with a given probability and it follows a normal distribution. 4. according to both in-sample and out-of-sample simulation studies.1. especially data with complex schemes. a random noise (z) term is added to the scaled diﬀerence of two randomly selected vectors.. DE and GA.2. clustering avoids. The aim of clustering is the partition of data in order to minimize the within (homogeneity) and maximize he between clusters variance (heterogeneity). traditional asset allocation procedures are applied.i = P. Besides. to a high extent.. NN to tackle the problem. A possible extension in this direction is to determine the optimal number of clusters based on the sample size and the index composition used. First.GAME) classiﬁed the To adjust the continuous optimization technique.2 ).4. As indicated from the results. can help extract all the available information.3 Identiﬁcation of Mutual Funds Style Clustering data. the potential performance of the suggested clustering design in the presence of highly volatile asset returns should further be investigated. as discussed in Section 4.2. Section 2. mainly because of their discrete nature (construction). as indicated by the improved SR distributions. Finally. the evolutionary algorithm (GA for medoids evolution .1 ) × (P. to clustering ﬁnancial data (FTSE assets).. pre-clustering of assets improved the SR distribution of clustered portfolios..r2 − P. principal component analysis (PCA). Second. Heuristics are proven to be a reliable tool for clustering ﬁnancial data. a weighted composition of these clusters constructs the target portfolio. P. Compared with traditional non-clustering allocation approaches. during (υ) (0) (0) (0) the diﬀerential mutation process.g.21 In the above application a predeﬁned number of clusters was used to partition assets. e. Besides. DE. the recent literature has developed a number of new classiﬁcation approaches which combine classical statistical methodologies. GA resulted in better SRs than DE in all restarts. GA resulted in higher and more stable ﬁtness values. GA outperformed DE. Brucker (1978) has shown that for a number of clusters higher than three the problem is highly complex. 21 42 . Zhang and Maringer (2010a) used and compared two diﬀerent heuristics. Pattarin et al.r3 + z. Nevertheless. and intelligent techniques. (2004) proposed a combination of PCA and GA to group the historical returns of the Italian mutual funds and identify their style. So.4.r1 + (F + z. the ‘curse of dimensionality’ problem often faced in large portfolio management. GA performed well in the discrete problem of clustering. The candidate solutions are rounded up to the nearest integer.g. DE. to a discrete problem. e... Further. PCA selected the order of the process (the lag order).with-in class.

Optimization heuristic techniques are ﬂexible to tackle a broad variety of complex optimization problems. The paper shows some promising ﬁelds of application of heuristic optimization techniques and some possible extensions in that direction. when compared with other heuristics. Future applications can consider the use of other heuristics in ﬁnancial data clustering.e. 5 Conclusions This survey gives an overview of some modern ﬁnancial optimization problems.22 The new classiﬁcation method based on GA was successful in identifying the mutual fund style with less information than the classiﬁcation method of the Italian mutual fund managers association (‘Assogestioni’). GAME oﬀers the ﬂexibility of either using heuristics or statistical approaches in choosing the number of lags included in the model. This is due to the stochastic elements introduced during the optimization process. intermediate stochastic selection and acceptance of the candidate solution. in relation to the Euclidean distance. DE has shown remarkable performance in continuous numerical problems. Explicitly. 22 43 . parameter estimation. A similar implementation could be used in ﬁnancial clustering problems.historical returns in homogenous groups. That implementation improved the accuracy. The choice of the appropriate heuristic technique is of particular importance for its eﬃciency. is that it allows non-linear classiﬁcation of data. Often the additional constraints and the high-dimensionality of these problems. The paper suggests heuristic optimization techniques as an alternative. In that application a PCA was used for the selection. In the presented literature heuristics provide reliable approximations and are successfully applied to optimize ﬁnancial problems with great potentials. while numerous repetitions of the algorithm are recommended. the speed and the robustness of DE compared to GA. make them complex and thus diﬃcult to solve by standard optimization algorithms. They used a kernel induced similarity function to measure the distance from the cluster center. Das and Sil (2010) recommended a revised DE algorithm in clustering the pixels of an image. i.g. the optimization process involves stochastic initialization. Heuristic methods diﬀer in their acceptance criteria. the outcomes (optimization results) should be carefully interpreted. the actual way of creating new solution and the number of solutions they maintain and generate in every iteration. e. In the literature discussed in this survey. Nonetheless. The advantage of this measure.

Still. Future research might analyze further the criteria for optimal algorithm selection. is faster and more robust to deal with the discrete problem of credit risk assignment.TA. An interesting extension in this direction is the derivation of joint convergence properties for more complex estimators. CR = 0. the computational resources and the application. 0. as a function of sample size.9. it has already shown better performance than GA and PSO in tackling the credit risk bucketing. particularly TA. Appendix Central Composite Design This section provides a detailed explanation and a numerical representation of the CCD experiments.7. First. heuristic methods have been applied to many promising ﬁelds of commercial applications. This range constitutes the experimental region and is carefully speciﬁed around an initial value (center point) for each parameter. for the factorial design we assign to the parameters two discrete ‘levels’(F = 0. local search heuristics. They are ﬂexible enough to account for diﬀerent constraints and provide reliable results where traditional optimization methods fail to work. The initial levels depend upon a priori knowledge about the best parameter combination that optimize 44 . robust model estimation and selection and ﬁnancial clustering with good potential. with a given probability. Nonetheless. They are vital. for they provide a trade-oﬀ between computational time and solution quality. like portfolio selection with real world constraints. which will further improve the converge of the estimation results. more research should be devoted to identifying the problem framework (what is a ‘good problem for heuristics’) in which heuristics clearly outperform traditional optimization methods or in which some heuristics are preferred instead of other heuristic methods. It may depend on the nature of the problem (discrete or continuous search space).9 and np = 20. to the true maximum likelihood estimator of a GARCH model. An open challenge in the application of heuristics is to set a more reﬁned statistical framework for estimating the heuristic parameters. It seems that there is no clear-cut recipe when to use which type of heuristics. needed by a TA to precisely converge. 50). An additional aspect aﬀecting the eﬃciency of heuristics is the initial parameter settings.7. To sum up. 0. Even if DE is specialized on continuous numerical problems. Maringer and Winker (2009) estimated the number of iterations.

g. The aim is to ﬁnd the combination of parameters that optimize an objective function value using these ﬁxed computational times. medium and high complexity. 0 and 1.berkeley. an adjustment is made on nG in each alteration.8 and np = 35).9 − 0. ±α·△F = ±α·0.8. deﬁned above e. The center points equal the median (center) of the ’levels’ used in the above factorial design (F = 0. since it allows for a higher dispersion around the origin. two more extreme levels are considered for each input variable which extent the experimental region by α times the original range. for the high level of F : Fhigh − Fcenter 0. e. low. since unreasonably low or high values might result Practical advice for optimizing objective functions with DE is given on www.24 Table 3: Experimental Design Coding. CR = 0. This experiment is repeated 4 times to measure the magnitude of random error (σ 2 ). resulting from diﬀerent values of the response variable for identical settings (Fang et al. we conduct an additional experiment using the center points. Code -1 0 1 F 0. 2500 and 5000 meaning.8 0. When little is known about the search domain a rotatable design might be more helpful in exploring it. (2006)).g. Table 3 illustrates the experimental region where the three factor levels. Each design suggests a diﬀerent distance from the center points.edu/∼storn/. Second. are coded -1.1.9 np 20 35 50 Finally. one should be careful in choosing the distribution of experimental points around the origin. The value α is calculated in this experiment based on orthogonal and rotatable designs (Box and Draper (1987) and Barker (1985)).23 Three diﬀerent levels of computational resources are considered. center and high.9 − 0. respectively.icsi.8 0.7 0. Since the product of np × nG should be constant. low. An orthogonal design is more concentrated around the center point while a rotatable design covers a wider surface area.7 0.9 CR 0.8 = =1 (Fhigh − Flow )/2 (0. However. 24 To code the factor levels the following rule is applied. A proportion of α determines the distance of the additional extreme levels from the center.a given objective function. respectively.7)/2 (7) 23 45 . with nG × np = 500.

respectively. When α is negative. There are only 15 possible factor level combinations and the experiment using factors’ center levels (F = 0. 25 46 .8 + α). CR = 0.8. The equivalent α is added to the center point of each input parameter (e. Equation 2 for CAMP is re-estimated While experimental methodologies should be applied when there is some knowledge about the levels of the input variables.27 Table 4 demonstrates the experimental design with all 18 level combinations considered in the exercise.Figure 7: CCD Graph for three factors. CR and np . In a similar way as above.g. the equivalent proportion of α is +0. the Greek letters indicate which factor deviates from the center point. Lin et al.1414 and +7 value for F .8 and np = 35) is repeated four times. for F 0. 26 The decimal positions are omitted for np value since they do not improve or destroy the integrity (mathematical results) of the design. a proportion of it is deducted from the center point (e.26 Only one parameter is altered every time. Figures 8 and 9 illustrate all factor level-combinations used to form the experimental region for orthogonal and rotatable designs.25 These additional levels are coded with α and are sequentially placed at ±α. respectively.1414. +0. Finally. a second order polynomial is ﬁtted and the optimal combination of DE factors is determined. O ψω ψ CR χψω +αψ χψ • ω −αψ F 4 / o t −αχ −αω +αω +αχ r / χ ? np χψ in high uncertainty of the response output. When α is positive. (2010) suggest that uniform design might be even better than orthogonal and rotatable ones in the absence of design knowledge. All 18 set-ups of the experimental design are used to estimate Equation 2. for F 0. The graphical representation of these combinations is shown in Figure 7.g. Then.8 .α). 27 For the explicit calculation of α’s based on orthogonal and rotatable design see Hill and Lewicki (2006). Also. while all other parameters take the center value.

Design Factorial Combination r χ ψ χψ ω χω ψω χψω node Setup 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 F -1 1 -1 1 -1 1 -1 1 0 0 0 0 α −α 0 0 0 0 CR -1 -1 1 1 -1 -1 1 1 0 0 0 0 0 0 α −α 0 0 np -1 -1 -1 -1 1 1 1 1 0 0 0 0 0 0 0 0 α −α Center Orthogonal +αχ -αχ +αψ -αψ +αω -αω 47 .Table 4: Experimental Design (Simulation Setup).

8 0.75 0. (2010).85 0.65 0.9 0.7 0.85 0.7 0.65 0.95 Cr F 48 .7 0.75 0.75 0. Figure 8: CCD with orthogonal design Orthogonal Design 55 50 45 40 Np 35 30 25 20 15 0.8 0.based on the optimal factor values and its performance is tested against the LMS residuals reported in Winker et al.9 0.9 0.75 0.7 0.9 Cr F Figure 9: CCD with rotatable design Rotatable Design 60 50 40 Np 30 20 10 0.8 0.8 0.95 0.85 0.85 0.

B. Lecture Notes in Computer Science 4234.). Sharaiha (2000). Beasley and Y. Marcel Dekker. Chan. Altman. Turku Center for Computer Science. G. H. 1271–1302. T. 49 . J. 45– 54. and J. Eds. Springer. Computational Intelligence for Business. di Tollo and A. (1968). 265–282.E. Kunzi. P.. and R. E.. B. 157 of Lecture Notes in Economics and Mathematical Systems. (1978). G. A neural network approach for credit risk evaluation.. Chang. N. 420–429. discriminant analysis and the prediction of corporate bankruptcy. K. Bank for International Settlements. Angelini. Robust measurement of beta risk. Quality by Experimental Design. Beckmenn and H. G. Berlin. Sere and M.M. Global optimization of support vector machines using genetic algorithms for bankruptcy prediction. logit analysis. 733–755. Empirical Model Building and Response Surface. and genetic algorithms. V. Lakonishok (1992). Vol. An integrated multi-model credit rating system for private ﬁrms. E. Lee and K. In: Optimisation and Operations Research (M. T. Review of Quantitative Finance and Accounting 27. C.P. The Quarterly Review of Economics and Finance 48. L. Draper (1987).. J. New York. New York. Barker. Technical Report 40. The Journal of Finance 23. T. Laitinen. Heuristics for cardinality constrained portfolio optimisation. Technical report. K. Roli (2008). P. Back.References Ahn. R. On the complexity of clustering problems. and N. Meade. pp. 311–340. Computers and Operations Research 27(13). Financial ratios. Faﬀ (2006). Wiley.-J. (1985). Journal of Financial and Quantitative Analysis 27. Kim (2006). K. International convergence of capital measurement and capital standards a revised framework comprehensive version. Choosing bankruptcy predictors using discriminant analysis. Butera. Box. Brucker. Wezel (1996). E. Basel Committee on Banking Supervision (2006). 589–601.

159–178. and E. The Journal of Business 79. Gilli. in: EJ. Computational Methods in Decision Making. Sudjianto (2006). pp. Fern´ndez-Rodr´ a ıguez. B. Fastrich. F. Milan. Das. Ed. Sil (2010). Winker (2010). Calibrating the Heston model with diﬀerential evolution. Schumann (2010a). and E. R. Gilli. and K. Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing. 3083–3099. E. Part II. pp. G. K. Lin and A. 161–175. Common risk factors in the returns on stocks and bonds. pp. Robust portfolio optimization with a hybrid heuristic algorithm. and P. Di Tollo. 09-06. Dueck. COMISEF Working Papers Series. K¨llezi (2002). Technical Report 41. S. Lyra (2010).. M. T. French (1993). Journal of Financial Economics 33. Vinci. Springer. Design and Modeling for Computer Experiments. K. Chapman and Hall/CRC. 147–170. Dueck. In: Decision Theory and Choices: a Complexity Approach (M. 242–250. Springer. S. Kernel-induced fuzzy clustering of image pixels with an improved diﬀerential evolution algorithm. Comp. Siokos (eds). and M. Information Sciences 180. Fama. Quantitative Financial Economics. and P. Winker (1992).). New concepts and algorithms for portfolio choice. J. Interest rate term structure modeling using free-knot splines. Elman nets for credit risk assessment. Applied Stochastic Models and Data Analysis 8. and E. M. 167–183. and S. Chichester. 50 . John Wiley & Sons Ltd. An empirical analysis of alternative portfolio selection criteria. Schumann (2009). Gilli. 1237–1256. 3–56. Technical Report No. Scheuer (1990). Eds.Cuthbertson. (2006). Economics and Finance. Phys 90. F. Kontoghiorghes. and D. Rustem. Faggini and P. and T. G. M. Swiss Finance Institute Research Paper. Nitzsche (2005). Fang. B. R. Portfolio optimization with VaR and exe pected shortfall. In: Lecture Notes in Computer Science (number 6025 EvoApplications 2010. C.). G.

Schumann (2011). Heuristic optimization methods in econometrics. 2236– 2250.. Kluwer Academic Publishers. M. Weinhardt and F. 1–18. Risk–reward optimization for long-run investors: an empirical analysis. M.Gilli. Schlottman. Nave (2009). Hamerle. Elsevier. International Handbooks on Information Systems. Gilli. M. Applications of heuristics in ﬁnance. Gilli. Hysi (2006). Springer. T. and P. 51 . Technical Report 2. Laguna (1997). D. Gilli. Distributed optimization of a portfolio’s Omega. and E.. Calibrating option pricing models with heuristics. Germany. In: Handbook on Computational Econometrics (E. Liebig and D. Maringer and P. Technical Report 31. Glover. Banking and Financial Supervision. C. Winker (2008). Anthony and M. Winker (2009). Ed. M. Seese. Amsterdam. MA. O’Neill. K¨llezi and H. Tabu Search. and J. 635–653. Computational Statistics and Data Analysis 42. Eds. A genetic algorithm of the term structure of interest rates. In: Natural Computing in Computational Finance (B. Schumann (2010d). Deutsche Bundesbank. A data-driven optimization heuristic e for downside risk minimization. Winker (2003). Calibrating the Nelson Siegel Svensson model. 4. 381–389. and E. COMISEF Working Papers Series.. Vol. and E. and P. Credit risk factor modeling and o the Basel II IRB approach. 299–312. Computational Statistics and Data Analysis 53. F. R. Schumann (2010c). and E. (1997). E. M. Springer (forthcoming). Springer. M. A. R¨sch (2003). A global optimization heuristic for estimating agent based models. In: Handbook On Information Technology in Finance (D. Gilli. C. M. Parallel Computing 36(7).. Kontoghiorghes et al. Schumann (2010b).). Gilli. 81–119. Gouri´roux. ARCH Models and Financial Applications. e New York. M. Gilli. Journal of Risk 8(3). European Actuarial Journal (forthcoming). pp. Gimeno. pp. M. Gilli. Eds. and M.).). Boston.

Kumar. Krink. Journal of Banking and Finance 32(10). Resti (2008). S... rationality and recruitment. Ants. (1991). Computational Management Science (forthcoming). A. Hoos. Proceedings in Computational Statistics (Alferdo Rizzi and Maurizio Vichi. H. H. H. McLay (2006). 471–481. S.). A. P. 52 . J. (1993). 112–147. StatSoft. Computational Statistics & Data Analysis 52(1). S. Inc. Lewicki (2006). New York. Paterlini (2009). Jacobson. Mittnik and S. and S. 137–156. A. Using diﬀerential evolution to improve the accuracy of bank rating systems. A. In: Money and Financial Markets (M. Krink. Visiting near-optimal solutions using local search algorithms. C. Kirman. R. pp. Restic (2007). The optimal structure of pd buckets. (1975). 1–28. Ravi (2007). Macmillan. E. T. Convergence behavior of the Nelder-Mead simplex algorithm in low dimensions. T. Ed.) pp. Quarterly Journal of Economics 108. H.Hill. Ann Harbor. St¨ tzle (2005). Diﬀerential evolution and combinatorial search for constrained index tracking. Hall and L. Wright (1999). University of Michigan Press. SIAM J. Lagarias. T. and P. Paterlini (2009). M. Reeds. Statistics: Methods and Applications.. Bankruptcy prediction in banks and ﬁrms via statistical and intelligent techniques . Taylor. 68–87. J. T. Stochastic Local Search: Foundations and u Applications. In COMPSTAT 2006. Elsevier Inc. Industry and Data Mining... Krink. Kirman. S. 2275–2286. A Comprehensive Reference for Science. H. Annals of Operations Research 152(1). Paterlini and A. and T. Wright and P.a review. and V. 354– 368. N. Ed. 153–176. Multiobjective optimization using differential evolution for real-world portfolio optimization. Epidemics of opinion and speculative bubbles in ﬁnancial markets. Optimization 9. Paterlini and A. Krink. Holland. J. T. S. Adaptation in Natural and Artiﬁcial Systems. European Journal of Operational Research 180.

Maringer. Finance and Management 15. genetic algorithms and martial arts – towards memetic algorithms. (1989). Expert Systems with Applications 31. P. Min. Meyer (2007). Sharpe and P. D. Maringer. Lee and I. Oyewumi (2007). Intelligent Systems in Accounting. Computational Statistics & Data Analysis 54(11).J. On evolution. Paterlini and P. Lyra. Optimized U-type designs on ﬂexible regions. D. Winker (2009).. M. and A. search. Winker (2010a). Mansini. 204–208. and O. Stochastic versus deterministic update in simulated annealing. Parsimonious modeling of yield curves. Dordrecht. K.F. 533–550. 219–233. Hybrid genetic algorithms and support vector machines for bankruptcy prediction. C. R. and J.Lin. Winker (2010b). and M. optimization. 2693–2706. Heuristic algorithms for the portfolio selection problem with minimum transaction lots. J.. Paha. Index tracking with constrained portfolios. COMISEF Working Paper Series. J. D. Computational Statistics & Data Analysis 54(6).. Maringer. Technical Report 39. Computational Statistics 24. Journal of Business 60(4). 57–71. C. Springer. 473–489. 53 . Studies in Nonlinear Dynamics and Econometrics (forthcoming).. Smooth transition autoregressive models . Han (2006). H. S. Optimization heuristics for determining internal rating grading scales. D. and M. Threshold accepting for credit risk assessment and validation. R. F. European Journal of Operational Research 114. G. D. 1505– 1515. Speranza (1999). Physics Letters A 146(4). Lyra. Moscato. A. Maringer. Winker (2010). Onwunta and P. Siegel (1987). CalTech California Institute of Technology. Technical Report 790. Moscato.new approaches to the model selection problem. Nelson. S. M. Fontanari (1990). 652–660. Portfolio Management with Heuristic Optimization. (2005). and P. P. The convergence of estimators based on heuristics: Theory and application to a GARCH model.

I. Least median of squares regression. 541–564. (1998). E. Computational Statistics and Data Analysis 47. Rousseeuw.. (1994). R. V. P. and M. A taxonomy of hybrid metaheuristics. Technical Report 27. and M. Optimization Heuristics in Econometrics: Applications of Threshold Accepting. Journal of Heuristics 8. O. Price. In: Computational Methods in Financial Engineering (E. Paterlini and T. Eds. Winker. Frankfurt a. E. J. J. Rousseeuw. Springer. Technical Report 4871. and A. Storn. 871–880. Genton (2008). Savin. Ronchetti. The hidden risks of optimizing bond portfolios under VaR. Varetto. K. pp. G. Maringer (2004). Minerva (2004). P. Svensson. Deutsche Bank Research Notes 13. Berlin. L. Kontoghiorghes. (2002). J. Storn and J. Estimating and interpreting forward interest rates: Sweden 1992-1994. and P. A. 353–372. Esseghir (2005). Winker. Global Optimization 11. F. Robust Regression and Outlier Detection. in: Proceedings of the eleventh IEEE conference on tools with artiﬁcial intelligence (ICTAI-2005). R.Pattarin. 54 . Price (1997). Heuristic optimization methods for dynamic panel data model selection. John Wiley & Sons. 1421–1439. Wiley. E. (1984). Lampinen (2005). (2001). P. New evolutionary bankruptcy forecasting model based on genetic algorithms and neural networks. F. Genetic algorithms applications in the analysis of insolvency risk. Leroy (1987). Winker (2010). Robust prediction of beta. M. New Jersey. A. Diﬀerential evolution: a simple and eﬃcient adaptive scheme for global optimization over continuous spaces. B. COMISEF Working Papers Series. 147–161. P. Clustering ﬁnancial time series: an application to mutual funds style analysis. M.M. Trabelsi. National Bureau of Economic Research. and K. Application on the Russian innovative performance. Journal of the American Statistical Association 79. Springer. S. Diﬀerential Evolution: A Practical Approach to Global Optimization.). Germany. Journal of Banking and Finance 22. and D. A. Winker. Talbi. Rustem and P.. 341–359. New York.

Winker. and D. Econometric and Financial Analysis (Erricos J. Sharpe (2010). Winker. Lyra and C.-T. Springer. Gatu. Maringer (2010b). P. Winker. Zhang. Least median of squares estimation by optimization heuristics with an application to the CAPM and multi factor models. Zhang. The threshold accepting optimisation algorithm in economics and statistics. 9 of Advances in Computational Management Science. pp. P. 55 . M. and K. In: Optimisation. Application of threshold accepting to the evaluation of the discrepancy of a set of points. In: Electronic Engineering and Computing Technology (S. Eds. Gilli and V. J. Computational Management Science (forthcoming).). 107–125. Journal of Economic Interaction and Coordination 2..). Vol. Fang (1997). M. Kontoghiorghes and C. A clustering application in portfolio management. Springer. 60 of Lecture Notes in Electronical Engineering. pp. Ao and L. Gelman. Jeleskovic (2007). Maringer (2007). Vol.. Eds. J. P. 309–321.Winker. Maringer (2010a). SIAM Journal on Numerical Analysis 34(5). An objective function for simulation based inference on exchange rate data. COMISEF Working Papers Series. Technical Report 36. and D. 125–145. and D. 2028–2042. Asset allocation under hierarchical clustering. P.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd