You are on page 1of 23

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1

Objective Reduction in Many-objective


Optimization: Linear and Nonlinear Algorithms
Dhish Kumar Saxena, João A. Duro, Ashutosh Tiwari, Kalyanmoy Deb and Qingfu Zhang

Abstract—The difficulties faced by existing Multi-objective it is alarming that the search ability of some of the most
Evolutionary Algorithms (MOEAs) in handling many-objective well known MOEAs severely deteriorates when more than
problems relate to the inefficiency of selection operators, high three objectives are involved [1]–[3], resulting in a poor POF-
computational cost and difficulty in visualization of objective
space. While many approaches aim to counter these difficulties approximation (convergence to and diversity across the POF).
by increasing the fidelity of the standard selection operators, the This makes evolutionary many-objective optimization one
objective reduction approach attempts to eliminate objectives that of the most challenging research areas in the field of evolution-
are not essential to describe the Pareto-optimal Front (POF). ary optimization and explains the growing research emphasis
If the number of essential objectives are found to be two or in this direction. The main difficulties associated with many-
three, the problem could be solved by the existing MOEAs.
It implies that objective reduction could make an otherwise objective problems relate to:
unsolvable (many-objective) problem solvable. Even when the 1) High computational cost: If a continuous multi-objective
essential objectives are four or more, the reduced representation optimization problem (with M objectives) meets the regularity
of the problem will have favorable impact on the search efficiency, property, the dimension of its Pareto-optimal front (POF) can
computational cost and decision-making. Hence, development be M − 1 [4]. Therefore, the number of points needed to
of generic and robust objective reduction approaches becomes
important. This paper presents a Principal Component Analysis approximate the whole POF increase exponentially with M .
and Maximum Variance Unfolding based framework for linear The same phenomenon can be observed in discrete problems.
and nonlinear objective reduction algorithms, respectively. The 2) Poor scalability of most existing MOEAs: With an increase
major contribution of this paper includes: (a) the enhancements in M , almost the entire population acquires the same-rank
in the core components of the framework for higher robustness of non-domination. This makes the Pareto-dominance based
in terms of applicability to a range of problems with disparate
degree of redundancy; mechanisms to handle input data that primary selection ineffective and the role of secondary se-
poorly approximates the true POF; and dependence on fewer lection based on diversity preservation becomes crucial. The
parameters to minimize the variability in performance, (b) density based MOEAs (such as NSGA-II [5], SPEA2 [6]
proposition of an error measure to assess the quality of results, and PESA [7]) worsen the situation by favoring the remote
(c) sensitivity analysis of the proposed algorithms for the critical and boundary solutions, implying that the best diversity gets
parameter involved, and the characteristics of the input data,
and (d) study of the performance of the proposed algorithms associated with poorly converged solutions. This explains the
vis-à-vis dominance relation preservation based algorithms, on a performance deterioration reported in [1]–[3]. The more recent
wide range of test problems (scaled up to 50 objectives) and two MOEA/D [8], [9] and indicator based MOEAs [10], [11] are
real-world problems. found to cope better with an increase in M . For example, in
Index Terms—Evolutionary Multi-objective Optimization, SMS-EMOA [11], the problems faced by the density based
Many-objective Optimization, Principal Component Analysis, MOEAs are countered by the use of S-metric [12] based
Maximum Variance Unfolding and Kernels. secondary selection. However, the utility of such MOEAs is
limited by the fact that their running times increase exponen-
I. I NTRODUCTION tially with M [13]–[15].
PTIMIZATION problems involving four or more con- 3) Difficulty in visualization of a POF for problems with
O flicting objectives are typically referred to as many-
objective problems. With the multiple conflicting demands
M ≥ 4, which in turn makes the task of decision making
more difficult. Although some techniques [16], [17] exist to
faced by the industry today for superior quality, low cost and aid in visualization, they require a large number of solutions.
higher safety etc., competitive edge could only be established In the context of many-objective problems, the above diffi-
by designing the products and processes that account for culties are often referred to as the curse of dimensionality.
as many performance objectives as possible. It implies that Here, it is important to discriminate between the curse of
many objectives need to be simultaneously dealt with, in dimensionality caused by: (i) the difficulties inherent in the
an optimization problem. While this recognition is growing, problems, e.g., problems with a high-dimensional Pareto-set
with complicated shapes [18], and (ii) the poor scalability of
Manuscript received April 19, 2005; revised December 15, 2011 most of the existing MOEAs for M ≥ 4, even when the corre-
D. K. Saxena, J. A. Duro and A. Tiwari are with the Manufacturing
Department, Cranfield University, Cranfield, U.K (email:{d.saxena, j.a.duro, sponding Pareto-set has simple shapes. For instance, a many-
a.tiwari}@cranfield.ac.uk). objective problem whose Pareto-set is 1- or 2-dimensional with
K. Deb is with the Department of Mechanical Engineering, Indian Institute a linear or a quadratic shape, represents a case where the
of Technology, Kanpur, India (email: deb@iitk.ac.in)
Q. Zhang is with the School of Computer Science and Electronic Engineer- problem in itself does not have the curse of dimensionality,
ing, University of Essex, Colchester, U.K (email: qzhang@essex.ac.uk). yet most of the existing MOEAs may suffer from it.
2 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

The approaches for dealing with many-objective problems ulations1 performed on 24 versions of 6 test problems
can be broadly categorized as: and two real-world problems. In this, the sensitivity of
1) Preference-ordering approaches: These approaches assume the proposed algorithms on: (i) the randomness, (ii) the
that no objective in the given problem is redundant and aim to critical parameter involved, and (iii) the characteristics
counter the low selection pressure for convergence by inducing of the underlying non-dominated sets, is discussed.
a preference ordering over the non-dominated solutions. While 3) Performance comparison: Here, the performance of
a review of these approaches may be found in [19], it may the proposed algorithms vis-à-vis dominance relation
be noted that most of these approaches either aim for only preservation [20] based algorithms is studied, with re-
a part of the POF, report an improvement in convergence ference to their general strengths and limitations, and
with the loss in diversity, or their computational time increases computational complexity.
exponentially with the number of objectives. This paper is organized as follows: The fundamental is-
2) Objective reduction approaches: These approaches [20]– sues in objective reduction and the existing approaches are
[24] assume the existence of redundant objectives in a given discussed in Section II. Section III presents the rationale for
M -objective problem. Operating on the objective vectors of viewing of the objective reduction problem as a machine
the non-dominated solutions obtained from an MOEA, these learning problem, while Section IV presents the proposed
approaches aim to identify a smallest set of m (m ≤ M ) framework. The test suite and the experimental settings are
conflicting objectives which generates the same POF as the defined in Section V. The working of the proposed framework
original problem, by preserving: (i) the correlation structure, is demonstrated in Section VI, while Sections VII and VIII
or (ii) the dominance relations, of the original problem. Such present the results for a wider range of test problems. Sec-
m objectives are termed as essential and the remaining ones tion IX discusses the issue of customization of the framework
as redundant. If m ≤ 3, an otherwise unsolvable problem will and its parameter sensitivity. The real-world problems are
become solvable. Even if 4 ≤ m ≤ M , objective reduction covered in Section X, while a comparative analysis of the
will contribute to higher search efficiency, lower computational framework and the dominance relation preservation based
cost and ease in visualization and decision-making. objective reduction algorithms is done in Section XI. The
paper concludes with Section XII.
This paper builds upon [23], [24] and presents a framework
for both linear and nonlinear objective reduction algorithms, II. O BJECTIVE REDUCTION : D EFINITION , U SEFULNESS ,
namely, L-PCA and NL-MVU-PCA. The distinctive contribu- S ALIENT ISSUES AND EXISTING A PPROACHES
tion of this paper relates to: Objective reduction refers to finding an essential objective
set for a given problem. In that:
• An essential objective set is defined as the smallest
1) The four goals that the framework pursues: set of conflicting objectives (FT , |FT | = m) which can
• Generality: The scope of [23], [24] was limited to generate the same POF as that by the original problem,
highly redundant problems, where m ≪ M . The scope given by F0 = {f1 , f2 , . . . , fM }.
of the proposed framework is broadened to include • The dimensionality of the problem refers to the number
problems with moderate (m < M ) and negligible of essential objectives2 (m, m ≤ M ).
(m ≈ M ) redundancy. This is significant because the • A redundant objective set (Fredn = F0 \ FT ) refers to
core components of the framework need to adapt to the set of objectives, elimination of which does not affect
conflicting goals while handling the above cases with the POF of the given problem. Notably, an objective could
different degree of redundancy. be redundant if it is non-conflicting (or correlated) with
• De-noising of the input data: The proposed framework some other objectives.
accounts for the fact that the input data may be noisy, The benefits of objective reduction could be realized by
i.e., the objectives that may be correlated on the true first recognizing the difficulties that the presence of redundant
POF, may show partial conflict in the non-dominated objectives may cause for most existing MOEAs. This is illus-
solutions obtained from an MOEA. To negate the effect trated through the performance of NSGA-II on a five-objective
of such noise, the framework (unlike [23], [24]) proposes problem with a three-dimensional POF, namely, DTLZ5(3, 5).
an eigenvalue based problem-specific computation of While this problem is an instance of the DTLZ5(I, M ) prob-
a correlation threshold, and pairs of objectives whose lem introduced later, here it is sufficient to note that its POF is
strength of correlation exceeds the threshold, are con- characterized as follows: (i) f1 –f2 –f3 are perfectly correlated
sidered as correlated. (do not lead to incomparable solutions), and (ii) an essential
• Parameter reduction: Compared to [23], [24], the objective set is given by, FT = {f3 , f4 , f5 }.
number of parameters involved are significantly fewer in
1 The proposed L-PCA and NL-MVU-PCA are tested for 26 test cases,
the proposed framework, which makes it more robust.
corresponding to solutions generated–on the POF; randomly; and by NSGA-II
• Proposition of an error measure: An error measure and ǫ-MOEA (20 runs each). Simulations with the latter two solution sets are
has been proposed (not in [23], [24]) that allows an repeated for three different parameter settings of the proposed algorithms, and
assessment of the quality of the obtained results. also for dominance relation preservation based objective reduction algorithms.
2 Under regularity condition [4], for m conflicting objectives, the dimen-
2) Extensive simulations and results: The results presented sionality of the POF is m − 1. However, in the current context, it is meant
in this paper are new and are based on over 9000 sim- to refer to the number of essential objectives.
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 3

f5 3.5
3.5 True POF
3
run (online reduction) to simplify the search or it can be
Non−dominated solutions

Objective Values
2.5 B A
2.5 performed post MOEA run (offline reduction) to assist in
2
1.5
1.5
the decision making. Thirdly, that an essential objective set
0.5 1 may be expressed as a subset of the original problem or its
0
1
0.5
elements may be expressed as a linear combination of the
0
f3 2 1.5 2 1 2 3 4 5 original objectives. While the former approach is referred to as
0.5 1
0 f4 Objective Number

(a) NN S versus true POF (b) NN S : Parallel coordinate plot


feature selection, the latter, as feature extraction. Notably, most
of the existing objective reduction approaches pursue feature
Fig. 1. DTLZ5(3, 5): Illustrating that the presence of redundant objectives
can hinder the search efficiency of an MOEA. The NN S corresponds to a
selection towards the ease in subsequent decision making.
population size of 200 and 2000 generations (one run) In the wake of the above background, the existing objective
reduction approaches are presented below.
1) Dominance Relation Preservation (DRP): The objective
NSGA-II is known to perform well for problems with reduction approach proposed in [20] is based on preserv-
two- and three-dimensional POF. However, it can be seen ing the dominance relations in the given non-dominated
in Figure 1a, that in the case of DTLZ5(3, 5), NSGA-II solutions. For a given objective set F , if the dominance
solutions (NN S ) could not converge to the three-dimensional relations among the objective vectors remains unchanged
POF. This can be attributed to the presence of redundant when an objective f ∈ F is removed, then f is consid-
objectives, as explained below. In the FT subspace shown ered to be non-conflicting with the other objectives in F .
in Figure 1a, solution A dominates B. However, B remains Based on this criterion, an exact and a greedy algorithm
a non-dominated solution as it is better than A in one of is proposed to address δ-MOSS (finding the minimum
the redundant objectives, namely, f1 or f2 . This illustrates objective subset corresponding to a given error δ) and
that the presence of redundant objectives hinders the search k-EMOSS (finding an objective subset of size k with
efficiency of NSGA-II and leads to a poor POF-approximation, minimum possible error) problems.
in that, the dominance relations in NN S may be different from 2) Unsupervised Feature Selection: This approach [21]
those characterizing the POF. Figure 1b corroborates this fact, utilizes the correlation between objectives and treats the
where, a significant proportion of NN S (spurious solutions like more distant objectives as more conflicting ones. In this:
B) is a result of the conflict between f1 –f2 –f3 , even though (i) the objective set is divided into neighborhoods of
these objectives are correlated on the POF. If NSGA-II were size q around each objective, and (ii) the most compact
allowed to run for infinitely many generations, the spurious neighborhood is selected, whose center is retained and
solutions would die out and the population would converge the neighbors are eliminated. This process is repeated
to the true POF. However, with a reasonable number of until a pre-specified criterion is achieved. Based on this,
generations, the spurious solutions will prevail. This example two algorithms have been proposed to deal with δ-
highlights the potential benefits of objective reduction. In that: MOSS and k-EMOSS problems, respectively.
• If the problem could be reduced to m ≤ 3: an otherwise 3) Pareto Corner Search: A Pareto corner search evolu-
unsolvable problem could be solved using any of the tionary algorithm (PCSEA) has been proposed in [22],
existing MOEAs. For example, in the above problem, if which instead of aiming for the complete POF, searches
FT were known, NSGA-II would have converged to the for only the corners of the POF. The solutions so
true POF. obtained are assumed to appropriately capture the depen-
• Even if 4 ≤ m ≤ M : the benefits of objective reduction dency of the POF on different objectives. Then objective
could be realized in the form of relatively higher search reduction is based on the premise that omitting a redun-
efficiency, lower computational cost and ease in visual- dant and an essential objective will have negligible and
ization and decision-making. substantial effect, respectively, on the number of non-
• Objective reduction could play a complimentary role for dominated solutions in the population.
the preference-ordering approaches. If the latter were to 4) Machine Learning based objective reduction: This ap-
be applied to the reduced representation of the original proach [23], [24] utilizes the machine learning tech-
problem, higher gains can be expected. niques like Principal Component Analysis (PCA) and
• Besides the above, the benefits of the gain in knowledge Maximum Variance Unfolding (MVU) to remove the
about the problem itself, can not be discounted in real- second-order and higher-order dependencies, respec-
world problems. This is significant because quite often, tively, in the non-dominated solutions. As this approach
there is a tendency among practitioners to account for all lies at the heart of the framework proposed in this paper,
the performance indicators as objectives, while it may not the basic concepts are discussed below.
be intuitive if some of them are correlated.
III. M ACHINE L EARNING BASED OBJECTIVE REDUCTION
It is important to highlight some salient issues around objective
reduction, before introducing the existing approaches. Firstly, This approach is based on the premise that the intrinsic
that all these approaches operate on the objective vectors of the structure of a garbled high-dimensional data can be revealed
non-dominated solutions obtained from an MOEA. Secondly, by transforming it such that the effect of noise and redundancy
that objective reduction can be performed during an MOEA (dependencies) is minimized. PCA [25] achieves this goal by
4 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

projecting the given data X such that its correlation structure the characteristics of the unnoised and noised signal,
is most faithfully preserved, in that, PCA projects X on the for example, the difference in the dimensionality (m) of
eigenvectors of the correlation matrix of X. unnoised signal and that of the noised signal (which could
Notably, PCA is based on removing the second order de- be greater than m).
pendencies in the given data. Hence, PCA is likely to be inef- • Redundancy refers to the presence of objectives which
fective in capturing the structure of data sets with multi-modal are non-conflicting (or correlated) with some other ob-
Gaussian or non-Gaussian distributions [25], as in Figure 2(a). jectives. Based on discussions in Section II, it can be
Several nonlinear dimensionality reduction methods, such as inferred that redundancy may contribute to garbled data.
Kernel PCA [26] and Graph-based methods [27], exist for The above conformance in the definitions and terminology
removing the higher order dependencies. In that, the former justifies the viewing of objective reduction as a machine
nonlinearly transforms the data by using a standard kernel learning problem.
function and then applies PCA in the transformed/kernel space.
However, its success depends on the a priori chosen kernel.
Framework 1: The proposed framework for linear and
The latter does away with this limitation by deriving “data-
nonlinear objective reduction algorithms
dependent” kernels.
Maximum Variance Unfolding (MVU) [28] is a graph-based Input:
method that computes the low-dimensional representation by t = 0 and Ft = {f1 , f2 , . . . , fM }.
1 begin
explicitly attempting to ‘unfold’ the high-dimensional data
manifold, as in Figure 2. The unfolding is achieved by maxi- 2 Obtain a set of non-dominated solutions by running
mizing the Euclidean distances between the data points while an MOEA corresponding to Ft , for Ng generations
locally preserving the distances and angles between nearby with a population size of N .
points. Mathematically, this can be posed as a semidefinite 3 Compute a positive semi-definite matrix: R
programming (SDP) problem [28], the output of which is the (Equation 1) or K (Equation 2), for L-PCA and
kernel matrix (say, K) representing the kernel space to which NL-MVU-PCA, respectively.
PCA can be applied. 4 Compute the eigenvalues and eigenvectors of R and
K respectively (Section IV-B).
5 Perform the Eigenvalue Analysis (Section IV-C) to
identify the set of important objectives Fe ⊆ Ft .
6 Perform the Reduced Correlation Matrix Analysis
(Section IV-D) to identify the identically correlated
subsets (S) in Fe . If there is no such subset,
Fs = Fe .
7 Apply the selection scheme (Section IV-E) to identify
the most significant objective in each S, to arrive at
Fig. 2. Maximum Variance Unfolding (taken from [28] and edited) Fs , such that Fs ⊆ Fe ⊆ Ft .
8 Compute and store Et (Equation 6).
In the wake of the concepts above, and an MOEA’s search 9 if Fs = Ft then
characteristics highlighted in Section II, the objective reduc- 10 Stop and declare Ft as the essential objective set;
tion problem can be viewed as a machine learning problem. 11 Set T = t and compute the total error ET
In that: (Equation 7).
12 end
• The intrinsic structure of the POF refers to its intrinsic
13 else
dimensionality (m) and the essential components (FT ).
14 set t = t + 1, Ft = Fs , and go to Step 2.
• The garbled high-dimensional data refers to the non-
15 end
dominated solutions obtained from an MOEA, which
16 end
typically provide a poor POF-approximation, and hence,
are characterized by dominance relations that may be
different from those characterizing the POF. In that, the
objectives that are perfectly correlated on the POF, may
IV. P ROPOSED F RAMEWORK FOR L INEAR AND
show partial conflict in the given MOEA solutions.
N ONLINEAR O BJECTIVE R EDUCTION
• Unnoised signal refers to the exact-optimal solutions,
characterized by the same dominance relations as those This section proposes the framework for offline linear and
which define the true POF (for example, the fraction of nonlinear objective reduction algorithms, namely, L-PCA and
NN S in Figure 1a, which conformed with the true POF). NL-MVU-PCA, respectively. Given a set of non-dominated
• Noised signal refers to the non-optimal solutions, char- solutions for an M -objective problem, this framework (Frame-
acterized by dominance relations that may be different work 1) aims to find a smallest set of m (m ≤ M ) conflicting
from those which define the true POF (for example, the objectives which preserves the correlation structure of the
fraction of NN S in Figure 1a which did not conform given solution set. This is achieved by eliminating objectives
with the true POF). Noise refers to the departure in that are–non-conflicting along the significant eigenvectors of
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 5

the correlation matrix (or the kernel matrix) or are globally 2) Construction of kernel matrix for nonlinear objective
correlated. Hence, if the correlation structure of the given non- reduction (NL-MVU-PCA): It may be noted that NL-MVU-
dominated solutions conforms with the correlation structure PCA will be based on de-correlation of the kernel matrix K,
of the POF3 , then: (a) such solutions are considered as good i.e., removal of higher-order dependencies in X. This will
representative of the POF, and (b) the smallest set of conflict- allow for nonlinear objective reduction. K can be learnt from
ing objectives determined by the framework will constitute an the SDP formulation4 presented in Equation 2.
essential objective set (Section II) for the problem. The neighborhood relation ηij in Equation 2 is governed by
While the preliminary proof-of-concept of this approach can the parameter q. For each input feature (f˙i ∈ RN ), q represents
be found in [23], [24], this paper is novel in multiple ways, as the number of neighbors with respect to which the local
summarized in Section I. In that, besides the extensive range of isometry5 is to be retained during the unfolding, as in Figure 2.
results and their analysis, the proposed framework distinctively In other words, ηij ∈ {0, 1} indicates whether there is an edge
pursues the four goals of generality, de-noising of input data, between f˙i and f˙j in the graph formed by pairwise connecting
parameter reduction and proposition of an error measure. all q-nearest neighbours. While a high value of q ensures better
These goals are achieved through conceptual enhancements retention of isometry, it delays the unfolding process (say,
which are highlighted in the description of the framework’s incremental unfolding from (a) to (b), in Figure 2). In contrast,
steps below, and also summarized in Section IV-G. a low value of q offers fast unfolding (say, direct unfolding
As can be seen in Framework 1, it is designed to perform from (a) to (g), in Figure 2) but at the risk of distorting the
objective reduction iteratively, until the objective sets deduced isometry. Given this trade-off, proper selection of q is crucial.
as essential–in two successive iterations, remain the same. As While q = 4 is mostly used in literature [28], experiments were
the framework’s steps are identical for every iteration, they are √ in [24] to identify the q as a function of M and
conducted
described below with reference to the first iteration. q = ⌈ M ⌉ was found to offer accurate results.
This paper aims to propose a robust framework for objective
reduction which relies on as few parameters, as possible.
Towards it, the most constrained case of q = M − 1 is
A. Construction of a Positive Semi-definite Matrix
recommended and used. The rationale for this choice of q
This step describes the construction of the correlation (R) is the following: (i) the number of unknown elements in
and the kernel (K) matrix, to be used for linear and nonlin- K equals to M (M + 1)/2 (accounting for symmetry), (ii)
ear objective reduction, respectively. First, the non-dominated Equations 2(b) and (c) jointly represent two constraints, and
solutions are obtained by running an MOEA with the initial (iii) for q = M − 1, Equations 2(a) leads to M (M − 1)/2
objective set F0 = {f1 , . . . , fM }, corresponding to a popula- constraints. Hence, even with q = M − 1, the matrix K is
tion size of N . Let the objective vector in the non-dominated not fully specified by the constraints and sufficient degree
set, corresponding to the ith objective (fi ) be denoted by of freedom (M (M + 1)/2 − M (M − 1)/2 − 2 = M − 2) is
f˙i ∈ RN and its mean and standard deviation by µf˙i and available for the unfolding process, while ensuring that the
σf˙i , respectively. Furthermore, let f¨i = (f˙i − µf˙i )/σf˙i . Then, local isometry is retained.
the input data X, an M × N matrix, can be composed as It may be noted that: (i) this SDP is convex for which
X = [f¨1 f¨2 . . . f¨M ]T . polynomial time off-the-shelf solvers, such as SeDuMi [29]
1) Construction of correlation matrix for linear objective and CSDP [30] toolbox in MATLAB, are available in public
reduction (L-PCA): For a given X, the correlation matrix R domain, and (ii) computational complexity for solving sparse
is defined by Equation 1. It may be noted that L-PCA will SDPs [30] is O(M 3 + c3 ) where c = M q, represents the
be based on de-correlation of R, i.e., removal of second-order number of constraints. In this paper, SeDuMi has been used.
dependencies in X. Hence, the scope of L-PCA will largely
be limited to linear objective reduction. B. Computation of Eigenvalues and Eigenvectors
Once R or K are computed (each, an M × M matrix),
R= 1
M
XX T (1) their eigenvalues and eigenvectors are determined. Let these be
given by: λ1 ≥ λ2 . . . ≥ λM and V1 , V2 , . . . VM , respectively.
P (Kii −2Kij +Kjj )
To lay the background for discussions ahead, it is important
Maximize trace(K) = ij 2M to note the following:
subject to the following constraints :
(a) If ej = λj / M
P PM
j=1 λi , then j=1 ej = 1.
(a)Kii − 2Kij + Kjj = Rii − 2Rij + Rjj , ∀ ηij = 1 th th
P (b) The i component of j principal component, say fij ,
(b) ij Kij = 0
reflects on the contribution of fi towards Vj ∈ RM .
(c)K is positive-semidefinite,
(c) From orthonormality of eigenvectors, it follows that:
where : Rij is the (i, j)th element of the correlation matrix R PM
( |Vj | = i=1 fij2 = 1 ∀j = 1, . . . , M .
1, if f˙j is among the q-nearest neighbour of f˙i
ηij = 4 This
0, otherwise is the novel implementation of the original MVU, as proposed in [24]
(2) 5 Two data sets {xi }M M
i=1 and {yi }i=1 that are in one-to-one correspondence
are said to be q-locally isometric if: for every point xi , there exists a rotation
and translation that maps xi and its q-nearest neighbors {xi1 , . . . xiq }
3 This is discussed in Sections V-A and V-B with respect to each test precisely onto yi and {yi1 , . . . yiq }. In other words, the distances and angles
problem used in this paper, and in Sections VI to VIII on experimental results. between nearby inputs are preserved.
6 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

(d) The contribution of fi for all Vj ’s can be given by for the fact that the non-dominated set based on which R is
PM PM M
cM
i =
2
j=1 ej fij , where i=1 ci = 1.
computed is garbled (Section III) and hence the correlations
evident from R may not represent the true correlations on the
C. Eigenvalue Analysis POF. In a sense, Equation 3(ii) serves9 to have a de-noising
effect, while interpreting a matrix based on gabled data, to
This step aims to identify the conflicting objectives (in F0 ) minimize the chances of inaccurate identification of S.
along the significant principal components (Vj s), as follows: The following description is on the determination of Tcor .
1) The number of significant Vj s are determined as the There exist a few guidelines on interpreting the strength of
PNv
smallest number (Nv ) such that j=1 ej ≥ θ, where the correlation, one of which is the Cohen scale [31]. It interprets
variance threshold θ ∈ [0, 1] is an algorithm parameter. a correlation strength of 0.1 to 0.3 as weak, 0.3 to 0.5 as
In this study, θ = 0.997 is used6 . moderate and 0.5 to 1.0 as strong. However, in the current
2) Each significant Vj is interpreted as follows7 : context, a rigid interpretation of the correlation strength will
(a) An objective fi with the highest contribution to Vj not be helpful because Tcor needs to adapt to conflicting
by magnitude, along with all the other objectives with goals depending on the problem being solved. In that, a
opposite–sign contributions to Vj , are picked. low and a high Tcor is desired while solving problems with
(b) If all the objectives have the same–sign contributions high and negligible redundancy, respectively. Towards it, this
(all positive or all negative), then the objectives with the paper proposes a dynamic computation of Tcor based on
top two contributions by magnitude, are picked. the spectrum of the eigenvalues for a specific problem. This
Let ME denote the number of objectives deduced as is based on the understanding that with an increase in the
important after the above eigenvalue analysis and let the redundancy of a problem: (i) the first eigenvalue e1 becomes
corresponding set be referred as Fe . Clearly, Fe ⊆ F0 . predominantly higher, and (ii) fewer principal components are
required to account for a certain variance threshold. For a given
D. Reduced Correlation Matrix (RCM) Analysis M -objective problem, let M2σ denote the number of principal
components required to account for 95.4% variance10, then the
This step aims to identify the subsets of identically corre- proposed Tcor is given by Equation 4.
lated objectives within Fe . If such subsets exist, then the most
significant objective in each subset can be retained while the Tcor = 1.0 − e1 (1.0 − M2σ /M ) (4)
rest could be discarded, allowing for further reduction of Fe . It can be realized that for a highly redundant problem where
Towards it, the following steps are proposed: e1 will be very high, and M2σ will be small compared to M ,
1) Construction of a Reduced Correlation Matrix (RCM): Tcor will have a small value. In contrast, for a problem with
RCM is the same as R except that the columns corre- low redundancy, e1 will be low, and M2σ will be comparable
sponding to the objectives in F0 \ Fe , are not included8. to M , Tcor will have a very high value.
2) Identification of identically correlated subset for each
objective in Fe : For each fi ∈ Fe , all fj ∈ Fe which E. Selection Scheme for Final Reduction based on RCM
satisfy Equation 3(i) constitute a potentially identically Analysis
correlated subset Ŝi , and those fj ∈ Fe which in addi-
tion satisfy Equation 3(ii) constitute an identically cor- Once the subsets of identically correlated objectives in Fe
related subset Si . Notably, the Si s for different fi ∈ Fe are identified by the RCM analysis, the goal is to identify
could have the same composition. and retain the most significant objective in each subset (S)
and eliminate the remaining ones. This is achieved by: (i)
(i) sign(Rik ) = sign(Rjk ), k = 1, 2, . . . , M attributing a selection score11 for each objective, as given by
(ii) Rij ≥ the correlation threshold (Tcor )
(3) Equation 5, and (ii) retaining the objective with highest sci in
each S and eliminating the rest.
Equation 3(i) ensures that correlation between objectives is PNv
interpreted as a set-based property. Equation 3(ii) accounts sci = j=1 ej |fij | (5)
6 This is an enhancement over [23], [24], to make the interpretation scheme Doing so, will reduce Fe to Fs which becomes an essential
more robust in terms of its ability to handle problems with moderate or objective set, after one iteration of the proposed framework.
negligible redundancy wherein more principal components may be required
to identify the conflicting objectives. This value is simply chosen in analogy
with Gaussian distributions where it accounts for ±3σ. It does not imply that F. Computation of Error
the scope of the framework is limited to Gaussian distributions. This section proposes a measure for the error incurred in
7 This is an enhancement over [23], [24] wherein a maximum of two
objectives per principal component were picked as conflicting, depending on one iteration of the proposed framework. The error measure
several parameters. The interpretation scheme proposed here is parameterless
9 This is an enhancement over [23], [24], where such a consideration was
and also allows for more objectives per principal component to be picked
as conflicting. These two factors contribute to the goal(i) and (iii) of the not made.
10 This is simply chosen in analogy with Gaussian distributions where it
framework.
8 This is an enhancement over [23], [24], where both the rows and columns accounts for ±2σ but here it is recommended for all purposes.
11 This is based on capturing an objective’s contribution to the significant
corresponding to the objectives F0 \ Fe were eliminated from R. It
principal components, given by sci = N 2
P v
acknowledges that correlation is a set based property, hence any inference j=1 ej fij (Section IV-B). However,
on whether or not two objectives in Fe are correlated, should be based on ej and fij being individually less than one, may lead to indiscriminatingly
the entire set of objectives for which R is computed, namely, F0 . small values for sci . A remedial form is proposed in Equation 5.
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 7

TABLE I
S UMMARY OF THE ENHANCEMENTS IN THE PROPOSED FRAMEWORK , OVER [23], [24].
Proposed Status Classification of proposed enhancements Conflicting settings for problems with:
Steps in [23], [24] Adaptation Correction Addition Parameter reduction m≪M m≈M
A). q A parameter ✓
Suitable for
C). θ ✓ Low High
m≪M
Parameter Few objectives per More objectives per
dependent, principal component principal component
Fe ✓ ✓
suitable for so that: so that:
m≪M |Fe | ≪ |F0 | |Fe | ≈ |F0 |
D). RCM ✓
Erroneousa
Eq. 3(i) ✓
Eq. 3(ii) Absent ✓ Low High
E). Eq. 5 Absentb ✓
F). Eq. 6 Absent ✓
a The manner in which the reduced correlation matrix (RCM) was constructed, did not allow for correlation to be treated as a set based property.
b It was pursued on an ad hoc basis.

proposed here aims to compute the variance that is left goals that the framework pursues. In that, the steps in [23],
unaccounted when objectives constituting Fredn = F0 \ Fs [24] have either been adapted to cater to the requirements of
are discarded, as given below: problems with disparate degree of redundancy; corrected for
Et
P
cM
 their anomalies; incorporated with additional features to de-
= max {δi j · Rij })
i (1.0 − j∈F
noise the input data or compute the error; or made more robust
s

i∈Fredn 


where: 

 through parameter reduction which helps in minimizing the
M 2
cM
P
i = e
k=1 k ikf (6) variability in performance.
1, if fi and fj are identically correlated 

Table I also details the facts that the requirements of the

δi j = 

0, otherwise 
proposed framework for handling problems with contrasting


Rij = strength of correlation betweenfi andfj
degree of redundancy are opposite in nature. It could be
The rationale for the proposed measure can be realized from inferred from this table that the algorithms (resulting from
the following: (i) when fi ∈ Fredn is not identically correlated the proposed framework) could be customized to offer higher
with any fj ∈ Fs : δi j = 0 and the variance left unaccounted efficiency for different problems. For example, for problems
by elimination of fi is cM i , and (ii) when fi ∈ Fredn is where m ≪ M (as in [23], [24]): (i) a low θ, (ii) composition
identically correlated with some fj ∈ Fs : the variance left of F by picking only a few objectives as conflicting per
unaccounted by elimination of fi reduces by a factor of Rij principal component, and (iii) a low Tcor may be used. In
as this is already accounted for, by fj . Equation 6 generalizes opposition, for problems where m ≈ M or m = M , the step
this argument. of eigenvalue analysis may be skipped and RCM analysis may
Suppose, for a given problem, the framework terminates directly be performed on F0 with a high value of Tcor . It
after T iterations. While the error Et (Equation 6) represents is important to recognize that while doing so will improve
the error incurred in each iteration, the total error incurred by the efficiency of the algorithms on the problems they are
the framework in T iterations can be given by Equation 7, the customized for, it will come at the loss of generality. In that,
rationale for which is the following. For the first iteration, the an algorithm customized for the case of m ≪ M will perform
error E0 is the variance that will be left unaccounted if the poorly for the case of m ≈ M or m = M , and vice-versa12.
objectives in Fredn were to be eliminated. Hence, the error
For the sake of generality, the proposed framework assumes
E1 in the second iteration will only be with respect to 1 − E0
variance of the original problem. This argument is generalized that no a priori information about the nature of the problem
is available. In that, it adopts a high value of θ and a uniform
in Equation 7.
PT
approach of composing Fe , but dynamically assigns Tcor
ET = E0 + t=1 Et (1 − Et−1 ) (7) based on the problem information revealed by the eigenvalues.
Figure 3 summarizes the above discussions.
G. On the generality and efficiency of the framework
This section summarizes the enhancements over [23], [24] 12 In relative terms, the efficiency of an algorithm customized for m ≪ M ,
and reflects on the broader issues of generality and efficiency when applied to m ≈ M or m = M will be lower than the reverse case.
of the proposed framework. So because, the former case may lead to Fe ⊂ FT , ceasing the scope for
Table I shows, that while the basic steps of the proposed accurate deductions. However, in the reverse case, |Fe | would just be over-
sized than desired for m ≪ M but it could still be reduced to FT based on
framework overlap with those in the previous versions, these RCM analysis. This explains why the right tail of the left curve in Figure 3
steps are implemented differently in the wake of the four is lower than the left tail of the right curve.
8 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

Efficiency
Algorithms could be Proposed Algorithms could be • An essential objective set, is as given by Equation 9.
customized for framework customized for
highly redundant non-redundant FT = {fk , fM −I+2 , . . . , fM }, where
problems problems
k ∈ {1, . . . , M − I + 1} : in general
High k = M − I + 1 : for L-PCA and NL-MVU-PCA in particular
(9)
In the case of L-PCA and NL-MVU-PCA, k = M −I +1
because among the possible indices for k, the variance of
fk=M−I+1 is the highest and these algorithms are based
Low Scope of
[23, 24] on the premise that larger variance corresponds to signal
Highly Moderately Non-redundant
while smaller variance corresponds to noise.
redundant redundant
problems
problems Also considered in this paper is the WFG3(M ) [32] prob-
problems
Ex: DTLZ5(2,50) Ex: DTLZ5(3,5) Ex: DTLZ2(15) lem, where the POF degenerates into a linear hyperplane such
that ΣMm=1 fm = 1. In that, the first M − 1 objectives are
Scope of Application
perfectly correlated, while the last objective is in conflict with
Fig. 3. Highlighting the scope and efficiency of the proposed framework every other objective in the problem. Hence:
• The correlation structure, is as given by Equation 10.
TABLE II
+ i, j = 1 : M − 1

D ESCRIPTION OF DTLZ5(I, M ). M DENOTES THE NUMBER OF 

OBJECTIVES AND I DENOTES THE DIMENSIONALITY OF THE POF. A LL
)
sign(Rij ) = − j =1:M −1
OBJECTIVES ARE TO BE MINIMIZED . A LL xi ∈ [0, 1]. A S AN EXAMPLE , i=M
+ j=M


THE COMPOSITION OF FT IS DERIVED FOR DTLZ5(4, 5)a .
where: Rij = 1.0 ∀ i, j = 1 : M − 1
M −1
f1 = (1 + g) 0.5 Πi=1 cos(θi ) (10)
M −m
fm=2:M −1 = (1 + g) Πi=1 cos(θi ) sin(θM −m+1 ) • An essential objective set, is as given by Equation 11.
fM = (1 + g) sin(θ1 )
+k−1
g = ΣM i=M (xi − 0.5)2 FT = {fk , fM }, where
π
θi=1:I−1 = 2
x i k ∈ {1, . . . , M − 1} : in general (11)
π
θi=I:M −1 = (1 + 2gxi )
4(1+g) k = M − 1 : for L-PCA and NL-MVU-PCA in particular
subject to M − I + 1 constraints: ΣI−2 2
j=0 fM −j + 2pi fi2 > 1
where i = 1, . . . , M − I + 1, and: Here, k = M − 1 in the case of L-PCA and NL-MVU-
pi=1 = M −I
pi=2:M −I+1 = (M − I + 2) − i PCA, as among all the possible indices for k, the variance
of fk=M−I+1 is the highest.
a Substituting M = 5, I = 4 and g = 0, leads to following Equations:
√ √
i. cos(θ1 ) cos(θ2 ) cos(θ3 ) = 2f1 = 2f2 iii. cos(θ1 ) sin(θ2 ) = f4
ii. cos(θ1 ) cos(θ2 ) sin(θ3 ) = f3 iv. sin(θ1 ) = f5 B. Non-redundant Problems
Squaring and adding Equations i. to iv. helps define the POF as:
2f22 + f32 + f42 + f52 = 1 or 2f12 + f32 + f42 + f52 = 1. The non-redundant problems considered in this paper in-
Hence, FT = {fk , f3 , f4 , f5 }; k ∈ {1, 2}. clude the scalable DTLZ1 to DTLZ4 problems, as the degree
of convergence for these problems can be quantified by a
problem parameter g 13 . Notably, all the objectives in these
V. T EST S UITE AND E XPERIMENTAL S ETTINGS problems are equally conflicting, and hence:
• The correlation structure, is as given by Equation 12.
The test suite used in this paper consists of both redundant
and non-redundant problems, described below.
)
+ i=j
sign(Rij ) =
− i 6= j
i, j = 1 : M (12)

A. Redundant Problems • The essential objective set is as given by Equation 13.


Among the redundant problems, the scalable DTLZ5(I, M ) FT = {f1 , . . . , fM } (13)
problem [23] is used. While its formulation is presented in
Table II, it may be noted that:
C. Variable-settings and Properties for the Test Problems
• For a given M , the dimension of the POF is I ≤ M . In
that, the POF is characterized by parameter g = 0. The authors in [32] categorize the problem variables as
• The first M − I + 1 objectives are perfectly correlated, either the ‘distance’ variables (changing which results in
while the rest are in conflict with every other objective in comparable solutions) or the ‘position’ variables (changing
the problem. Hence, the correlation structure, is as given which results in incomparable solutions). Let κ and ρ denote
by Equation 8. the number of distance and position variables, respectively.
For an M -objective version of the problems considered in this
+ i, j = 1 : M − I + 1


 ) paper, while ρ = M − 1 is used, the values used for κ are
sign(Rij ) = + j=i reported in Table III.
i, j = M − I + 2 : M
− j 6= i


where: Rij = 1.0 ∀ i, j = 1 : M − I + 1 13 This is also true for DTLZ7 but it is precluded here due to the
(8) inconsistency between its formulation and its POF, as presented in [33].
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 9

TABLE III f3 Non−dominated set


1
T EST PROBLEMS : VARIABLE - SETTINGS AND Q UALITY I NDICATORS . from NSGA−II
e1 =0.961 e2 =0.038 e3 ≈ 0
V2 0
Quality Indicators for POFa V1 V2 V3 f1
Name Obj κ 0 0.7
g 2
DT 0.583 0.400 0.707 V
3 0.7
0.583 0.400 -0.707 0 f2
DTLZ1(M ) f1:M 5 0 0.25M -0.565 0.825 0.000 V1
DTLZ2(M ) f1:M 10 0 M (b) The non-dominated set and the
DTLZ3(M ) f1:M 10 0 M (a) The principal components
principal components
DTLZ4(M ) f1:M 10 0 M
DTLZ5(I, M ) f1:M 10 0 Noteb 1 1
WFG3(M) f1:M 20 NA NA

N
SG
A
−I
I
a 2

se −I
While g(X) values are picked from [33], the derivations for DT 0.5 0.5

I
A

se
SG

t
t
canbe found in the supplementary file provided with this paper.

N
1 M −I P −I+1 1 M −I+2−i
b + M 0

f3
f2
2 i=2 2
+I−1 0

−0.5 −0.5

V1
1
V
D. Experimental Settings −1
−1 −0.5 0 0.5 1
−1
−1 −0.5 0 0.5 1
f1 f1 or f2
To study the performance of the proposed algorithms, with (c) Positive correlation be- (d) Conflict between f3 and f1
a varying degree of noise, experiments are done with: tween f1 and f2 captured or f2 captured along V1
along V1
• Solutions sampled on the true POF (NP )–symbolizing
unnoised signal: Towards it, the sampling scheme in [8] Fig. 4. Use of PCA for feature selection: Demonstrated on DTLZ5(2, 3)
has been used. In that, the number of generated solutions
H+m−1
is given by N = Cm−1 , where H is a user-defined
parameter, and m represents the dimension of the POF. In stration, to highlight the absence of any bias in the proposed
this paper, for m ≤ 10 : H = 8 and for m > 10 : H = 4 algorithms for a particular class of problems (Section IV-G).
is used.
• Solutions generated by ǫ-MOEA [34] (Nǫ ) and NSGA- A. Feature selection by the proposed algorithms
II (NN S )–symbolizing a combination of the unnoised The proposed framework infers the objectives as conflicting
and noised signal: Towards it, a population size of 200 or non-conflicting, depending on whether their contributions
and 2000 generations are used. The used probability of along the significant principal components have the same or
crossover and mutation is 0.9 and 0.1, respectively, and the opposite sign. This is demonstrated on DTLZ5(2, 3) prob-
the distribution index for crossover and mutation is 5 and lem for which the principal components and the corresponding
20, respectively. ǫ = 0.3 is used for ǫ-MOEA. eigenvalues obtained from R are presented in Figure 4a, and
• Randomly generated solutions (NR )–symbolizing noised
also plotted in Figure 4b vis-à-vis the underlying NN S .
signal: Towards it, the random initial population (sized Notably, V1 accounts for 96.1% variance and while the
200) generated by NSGA-II is used. contribution of both f1 and f2 towards it is positive in
With the exception of NP (deterministic), experiments are sign (f11 = f21 = 0.583), the contribution of f3 is negative
performed with 20 different sets of Nǫ , NN S and NR , each. (f31 = −0.565). These facts are reflected in: (i) Figure 4c
The quality of POF-approximation provided by the different where both f1 and f2 can be seen to simultaneously increase or
solution sets, for the DTLZ1–DTLZ4 and DTLZ5(I, M ) prob- decrease, from any point on V1 to another, and (ii) Figure 4d
lems, will be assessed in terms of (i) convergence: measured where an increase/decrease in f 1 and f 2 corresponds to a
by the problem parameter g, and (ii) diversity: measured by decrease/increase in f3 . These observations imply that either
the normalized maximum spread indicator denoted by Is [35]. f1 or f2 is redundant and also demonstrate the basis for PCA
As the name suggests, Is normalizes the actual dispersal of based feature selection in a many-objective context.
solutions in the obtained non-dominated set (say, DA ) by In the above problem: (i) the NN S fully conformed with the
the dispersal of solutions on the true POF (say, DT ); i.e., true POF, and (ii) visualization of the 3-dimensional objective
Is = DA /DT . The g and DT for the test problems considered space was possible. Hence, the redundancy of either f1 or f2
(wherever applicable) are reported in Table III. In that, g = 0 could be inferred directly from Figure 4b, without requiring
and Is = 1 will imply a good POF-approximation for the any analysis of the principal components. However, this may
DTLZ1–DTLZ4 and DTLZ5(I, M ) problems. not be possible in the case of many-objective problems where
not only the visualization is difficult but the underlying data is
VI. W ORKING OF THE P ROPOSED A LGORITHMS predominantly characterized by noise (as in Figure 1a). This
justifies the need for the proposed algorithms.
This section first demonstrates the basis on which the
proposed algorithms pursue the feature selection approach for
objective reduction. Then, their performance is demonstrated B. DTLZ5(3, 5): L-PCA and NL-MVU-PCA based on NP
for the DTLZ5(3, 5) problem corresponding to NP , Nǫ and The correlation matrix R and the kernel matrix K based on
NN S , and, the salient issues are highlighted. The DTLZ5(3, 5) NP , and the corresponding eigenvalues and eigenvectors are
with moderate degree of redundancy is chosen for demon- presented in Table IV, leading to the following analysis:
10 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

TABLE IV
DTLZ5(3, 5): T HE R AND THE K MATRIX WITH THEIR CORRESPONDING the signs of correlation between f1 and all the remaining ob-
EIGENVALUES AND EIGENVECTORS , FOR NP ( ONE RUN ). jectives are the same as those of f2 or f3 (Table Va), the RCM
analysis (Table VIb) shows three potentially identically corre-
lated sets, namely, Sˆ1 = Sˆ2 = Sˆ3 = {f1 , f2 , f3 }. Furthermore,
f1 f2 f3 f4 f5 the fact that R12 = R13 = R23 = 1.0 while Tcor = 0.5879,
f1 1.0000 1.0000 1.0000 -0.4749 -0.4340 implies that each of the potentially correlated set is indeed
f2 1.0000 1.0000 1.0000 -0.4749 -0.4340 correlated, i.e., S1 = S2 = S3 = {f1 , f2 , f3 }.
f3 1.0000 1.0000 1.0000 -0.4749 -0.4340
f4 -0.4749 -0.4749 -0.4749 1.0000 -0.4340 Finally, as the selection scores of f1 , f2 and f3 turn
f5 -0.4340 -0.4340 -0.4340 -0.4340 1.0000 out to be equal (Table VIc), the analysis concludes with
(a) Correlation matrix (R) Fs = {fk , f4 , f5 }, where fk could be any one of f1 , f2 or
f3 . Letting Fs = {f3 , f4 , f5 } leads to E0 = cM 1 (1.0 − R13 ) +
f1 f2 f3 f4 f5 cM
2 (1.0 − R23 ) = 0.0. Iteration2 of L-PCA fails to reduce any

f1 2.5862 2.5862 4.0156 -4.7231 -4.4650


objectives further, and hence, is not shown here.
f2 2.5862 2.5862 4.0156 -4.7231 -4.4650 2) NL-MVU-PCA on NP : Based on the eigenvalues and
f3 4.0156 4.0156 6.3176 -7.4018 -6.9470 eigenvectors of the kernel matrix K, this analysis is summa-
f4 -4.7231 -4.7231 -7.4018 23.0615 -6.2134
f5 -4.4650 -4.4650 -6.9470 -6.2134 22.0906
rized in Table VI. In that, the eigenvalue analysis (Table VIIa)
leads to Fe = {f1 , f2 , f3 , f4 , f5 }. As R12 = R13 = R23 = 1.0
(b) Kernel matrix (K) (Table Va) and Tcor = 0.7086, the RCM analysis leads to
S1 = S2 = S3 = {f1 , f2 , f3 } (Table VIIb). Finally, based on
R K
the selection scores of f1 , f2 and f3 (Table VIIc), f3 emerges
e1 e2 e3 e1 e2 e3 as the representative in each of S1 , S2 and S3 , leading to
0.6867 0.2866 0.0266 0.5138 0.4855 0.0006
Fs = {f3 , f4 , f5 } with E0 = cM M
1 (1.0 − R13 )+ c2 (1.0 − R23 ) = 0.0.
V1 V2 V3 V1 V2 V3 Iteration2 of NL-MVU-PCA fails to reduce any objectives
0.538 -0.009 0.209 0.134 0.273 -0.455 further, and hence, is not shown here.
0.538 -0.009 0.209 0.134 0.273 -0.455
0.538 -0.009 0.209 0.211 0.426 0.757 TABLE VI
-0.272 -0.691 0.668 -0.874 -0.173 0.077 DTLZ5(3, 5): I TERATION 1 OF NL-MVU-PCA WITH NP ( ONE RUN ).
-0.239 0.722 0.649 0.394 -0.799 0.075
(c) Significant eigenvalues/vectors of both R and K
PCA Variance Cumulative Objectives Selected
TABLE V (N v ) (%) (%) f1 f2 f3 f4 f5
DTLZ5(3, 5): I TERATION 1 OF L-PCA WITH NP ( ONE RUN ). 1 51.38 51.38 f1 f2 f3 f4 f5
2 48.55 99.93 f1 f2 f3 f5
(a) Eigenvalue Analysis
PCA Variance Cumulative Objectives Selected
(N v ) (%) (%) f1 f2 f3 f4 f5
1) Corr. over whole set Eq 3(i) Sˆ1 = Sˆ2 = Sˆ3 = {f1 , f2 , f3 }
1 68.67 68.67 f3 f4 f5 2) Corr. threshold: Tcor Eq 4 1.0-0.5138(1.0-2/5) = 0.7086
2 28.66 97.33 f1 f2 f3 f4 f5 3) Corr. meeting Tcor Eq 3(ii) S1 = S2 = S3 = {f1 , f2 , f3 }
3 02.66 99.99 f4 f5
(b) RCM Analysis
(a) Eigenvalue Analysis
e1 =0.5138 e2 =0.4855
sci
1) Corr. over whole set Eq 3(i) Sˆ1 = Sˆ2 = Sˆ3 = {f1 , f2 , f3 } V1 V2
2) Corr. threshold: Tcor Eq 4 1.0-0.6867(1.0-2/5) = 0.5879
3) Corr. meeting Tcor Eq 3(ii) S1 = S2 = S3 = {f1 , f2 , f3 } f1 0.134 0.273 0.2017
f2 0.134 0.273 0.2017
(b) RCM Analysis f3 0.211 0.426 0.3154
(c) Selection scheme
e1 =0.6867 e2 =0.2866 e3 =0.0266
sci
V1 V2 V3
f1 0.538 -0.009 0.209 0.3778
1 1.2
f1 0.538 -0.009 0.209 0.3778
f1 0.538 -0.009 0.209 0.3778 0.8 1
Objective Number

Objective Values

0.8
(c) Selection scheme 0.6
0.6
0.4
0.4
0.2 0.2

1) L-PCA on NP : Based on the eigenvalues and eigenvec- 0


1 2 3 4 5
0
1 2 3 4 5
tors of the correlation matrix R, this analysis is summarized Objective Values Objective Number

in Table V. In that, the eigenvalue analysis (Table VIa) shows (a) NP (b) Nǫ
that the top three principal components are needed to account Fig. 5. DTLZ5(3, 5): Parallel coordinate plots for NP and Nǫ
for θ = 0.997, leading to Fe = {f1 , f2 , f3 , f4 , f5 }. Given that
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 11

TABLE VII TABLE VIII


DTLZ5(3, 5): A NALYSIS BASED ON Nǫ ( ONE RUN ; I TERATION 1) DTLZ5(3, 5): A NALYSIS BASED ON NN S ( ONE RUN ; I TERATION 1)

f1 f2 f3 f4 f5 f1 f2 f3 f4 f5
f1 1.0000 0.9875 0.9736 -0.4310 -0.3384 f1 1.0000 0.5668 0.6880 -0.2852 -0.4257
f2 0.9875 1.0000 0.9644 -0.4496 -0.3040 f2 0.5668 1.0000 0.8512 -0.2440 -0.3908
f3 0.9736 0.9644 1.0000 -0.4689 -0.3229 f3 0.6880 0.8512 1.0000 -0.2653 -0.2860
f4 -0.4310 -0.4496 -0.4689 1.0000 -0.5136 f4 -0.2852 -0.2440 -0.2653 1.0000 -0.4751
f5 -0.3384 -0.3040 -0.3229 -0.5136 1.0000 f5 -0.4257 -0.3908 -0.2860 -0.4751 1.0000
(a) Correlation matrix (R) (a) Correlation matrix (R)

e1 e2 e3 e4 e5 e1 e2 e3 e4 e5
0.6567 0.3007 0.0333 0.0069 0.0021 0.5392 0.2938 0.0983 0.0516 0.0168
sci sci
V1 V2 V3 V4 V5 V1 V2 V3 V4 V5
f1 0.546 -0.056 -0.219 0.225 -0.775 0.386 f1 0.515 -0.018 0.706 -0.335 0.352 0.376
f2 0.544 -0.030 -0.274 0.505 0.610 0.380 f2 0.548 -0.025 -0.527 0.325 0.563 0.381
f3 0.545 -0.030 -0.059 -0.820 0.165 0.375 f3 0.561 -0.092 -0.332 -0.459 -0.596 0.396
f4 -0.285 -0.660 -0.686 -0.114 0.007 0.409 f4 -0.168 0.745 -0.246 -0.531 0.274 0.366
f5 -0.166 0.748 -0.635 -0.097 -0.020 0.356 f5 -0.303 -0.660 -0.231 -0.538 0.360 0.414
(b) Eigenvalues/vectors of R, and selection scores of objectives (b) Eigenvalues/vectors of R, and selection scores of objectives

e1 e2 e3 e4 e5 e1 e2 e3 e4 e5
0.5322 0.4608 0.0057 0.0016 ≈0 0.5875 0.3440 0.0493 0.0191 ≈0
sci sci
V1 V2 V3 V4 V5 V1 V2 V3 V4 V5
f1 0.059 0.269 -0.376 -0.764 -0.447 0.158 f1 0.108 -0.042 -0.732 -0.500 0.447 0.124
f2 0.076 0.282 -0.553 0.639 -0.447 0.174 f2 0.144 0.009 -0.278 0.838 0.447 0.118
f3 0.142 0.490 0.730 0.086 -0.447 0.305 f3 0.548 0.509 0.453 -0.188 0.447 0.523
f4 -0.821 -0.337 0.108 0.027 -0.447 0.593 f4 0.017 -0.798 0.391 -0.103 0.447 0.305
f5 0.545 -0.704 0.091 0.011 -0.447 0.615 f5 -0.817 0.321 0.166 -0.046 0.447 0.599
(c) Eigenvalues/vectors of K, and selection scores of objectives (c) Eigenvalues/vectors of K, and selection scores of objectives

Features L-PCA NL-MVU-PCA Features L-PCA NL-MVU-PCA


Nv 4 3 Nv 5 4
M2σ 2 2 M2σ 4 3
Fe {f1 , f2 , f3 , f4 , f5 } {f1 , f2 , f3 , f4 , f5 } Fe {f1 , f2 , f3 , f4 , f5 } {f1 , f2 , f3 , f4 , f5 }
Sˆ1 = Sˆ2 = Sˆ3 = {f1 , f2 , f3 } Sˆ1 = Sˆ2 = Sˆ3 = {f1 , f2 , f3 }
Sˆi Sˆi
R12 = 0.9875, R13 = 0.9736 and R23 = 0.9644 R12 = 0.5668, R13 = 0.6880 and R23 = 0.8512
Tcor 0.6059 0.6806 Tcor 0.8921 0.7649
S1 = S2 = S3 S1 = S2 = S3 S1 = ∅
Si Si S1 = S2 = S3 = ∅
= {f1 , f2 , f3 } = {f1 , f2 , f3 } S2 = S3 = {f2 , f3 }
Fs {f1 , f4 , f5 } {f3 , f4 , f5 } Fs {f1 , f2 , f3 , f4 , f5 } {f1 , f3 , f4 , f5 }
(d) Key highlights of L-PCA and NL-MVU-PCA (d) Key highlights of L-PCA and NL-MVU-PCA

such that the variance contributions of f1 , f2 and f3 are equal.


3) Key inferences from the analysis based on NP : It can
From this example, it can be inferred that the accuracy of L-
be seen in Figure 5a for NP that: (i) f1 –f2 –f3 are correlated,
PCA may be marred by its inability to accurately determine
and (ii) the variance of f1 and f2 is equal, and less than the
the principal components for a given solution set.
variance of f3 . Given a set of correlated objectives, both L-
PCA and NL-MVU-PCA pick a representative objective in
the set. Towards it, these algorithms compute the variance C. DTLZ5(3, 5): L-PCA and NL-MVU-PCA based on Nǫ
contribution of each correlated objective, along the significant The performance of the proposed algorithms based on Nǫ
principal components, and that is referred as the selection (Figure 5b), has been summarized in Table VII. In that:
score (sci , Equation 5). Hence, for NP , these algorithms • L-PCA based on Nǫ leads to Fs = {f1 , f4 , f5 }, with Et =
PM
should determine that sc3 > sc2 = sc1 . However, unlike NL- cM M M
2 (1.0 − R21 ) + c3 (1.0 − R31 ). Notably, c2 =
2
k=1 ek f2k =
M
PM 2
MVU-PCA (Tables VIIc), L-PCA fails to identify this fact 0.2; R21 = 0.9875; c3 = k=1 ek f3k = 0.2; and R31 = 0.9736.
(Tables VIc). It happens because L-PCA fails to accurately de- Hence, E0 = 0.007766.
termine the principal components at first place. This is evident • NL-MVU-PCA based on Nǫ leads to Fs = {f3 , f4 , f5 },
from Table Vc, where the significant principal components are with Et = cM M M
1 (1.0 − R13 ) + c2 (1.0 − R23 ). Notably, c1 =
12 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

PM 2
PM 2
k=1 ek f1k = 0.036581; R13 = 0.9736; cM
2 = k=1 ek f2k = to accurately determine the principal components and hence
0.041888; and R23 = 0.9644.
Hence, E0 = 0.002455. is insensitive to differences in the variance of the objectives.
In each of the above cases, Iteration2 fails to reduce any Another factor relates to the degree of conformance between
objectives further, and hence, is not shown here. the correlation structure of the POF and that of NP , Nǫ and
NN S . For DTLZ5(3, 5), the correlation structure (Equations 8)
D. DTLZ5(3, 5): L-PCA and NL-MVU-PCA based on NN S of the POF can be summarized in terms of three features:
The performance of the proposed algorithms based on NN S F1 Objectives that are in conflict with each of the remaining
(Figure 1b), has been summarized in Table VIII. In that: objectives: f4 and f5 .
F2 Objectives that are identically correlated with each other:
• L-PCA based on NN S fails to reduce any objective in
f1 , f2 and f3 .
Iteration1, implying that Fs = {f1 , . . . , f5 } with E0 = 0.0.
F3 The strength of correlation among the pairs of identically
• NL-MVU-PCA based on NN S in Iteration1 leads to
correlated objectives: R12 = R13 = R23 = 1.0.
Fs = {f1 , f3 , f4 , f5 } with E0 = cM 2 (1.0 − R23 ) = 0.00439
(since cM Table XIa reveals that NP perfectly captures the correlation
2 = 0.029517 and R23 = 0.8512).
structure of the POF, while Nǫ does it better than NN S . This
Given that L-PCA could not reduce any objective in Iteration1,
is because, while both Nǫ and NN S accurately capture the
the Iteration2 is performed only in the case of NL-MVU-
conflicting and the correlated objectives (features F1 and F2),
PCA with {f1 , f3 , f4 , f5 } as the initial objective set–denoted
the strength of correlations between f1 –f2 –f3 (feature F3)
as {F1 , F2 , F3 , F4 } in Table IX, for ease of referencing.
are more accurately captured by Nǫ . This is corroborated in
It can be seen that Iteration2 further eliminates F1 , with
M Figures 1b and 5b, where, compared to Nǫ , a more significant
E1 = cM 1 (1.0 − R12 ) = 0.00845 (since c1 = 0.033984 and
proportion of solutions in NN S are wrongly in conflict with
R12 = 0.7513). Hence, finally Fs = {f3 , f4 , f5 } and the total
respect to the first three objectives. Hence, in this case, it can
error amounts to ET = 0.01280.
be inferred that NP provides a perfect POF-representation,
TABLE IX while Nǫ provides a better POF-representation than NN S .
DTLZ5(3, 5): I TERATION 2, NL-MVU-PCA BASED ON NN S ( ONE RUN ).
TABLE X
DTLZ5(3, 5): O N POF- REPRESENTATION BY NP , Nǫ AND NN S .
F 1 ≡ f1 F 2 ≡ f3 F 3 ≡ f4 F 4 ≡ f5
F1 1.0000 0.7513 -0.3201 -0.3855
F2 0.7513 1.0000 -0.2823 -0.2227 Feature: F1 Feature: F2 Feature: F3
F3 -0.3201 -0.2823 1.0000 -0.5626 f4 , f5 f1 –f2 –f3
Data Table
F4 -0.3855 -0.2227 -0.5626 1.0000 conflicting identically R12 R13 R23
with all? correlated?
(a) Correlation matrix (R) NP Va Yes Yes 1.00 1.00 1.00
Nǫ VIIIa Yes Yes 0.98 0.97 0.96
e1 e2 e3 e4 NN S IXa Yes Yes 0.73 0.82 0.87
0.6091 0.3582 0.0326 ≈0
sci (a) Assessment through POF features
V1 V2 V3 V4
F1 0.087 0.126 0.852 0.500 0.126 Algorithms NP Nǫ NN S
F2 0.155 0.742 -0.419 0.500 0.374
F3 0.564 -0.603 -0.262 0.500 0.568 L-PCA 0.5879 0.6059 0.8921
F4 -0.806 -0.265 -0.172 0.500 0.592 NL-MVU-PCA 0.7086 0.6806 0.7649

(b) Eigenvalues/vectors of K, and selection scores of objectives (b) Assessment through Tcor

Features NL-MVU-PCA It is important to note that, Tcor is determined by the


Nv , M2σ and Tcor 3, 2 and 0.6954, respectively.
eigenvalue spectrum of the correlation matrix R and kernel
matrix K (based on R), in the case of L-PCA and NL-
Fe {F1 , F2 , F3 , F4 }
MVU-PCA, respectively. In essence, Tcor is governed by
Sˆi Sˆ1 = Sˆ2 = {F1 , F2 }, where R12 = 0.7513 the correlation structure of the given solution set. Hence,
Si S1 = S2 = {F1 , F2 } Tcor could be used as an indicator of the quality of POF-
Fs {F2 , F3 , F4 } = {f3 , f4 , f5 } representation. In that, the POF-representation provided by
different solution sets could be adjudged as better or worse
(c) Key highlights of NL-MVU-PCA
than the other, depending on the proximity of their Tcor values
with those corresponding to NP . For example, Table XIb
shows that, compared to NN S , the Tcor values corresponding
E. Discussion of Results to Nǫ are closer to those corresponding to NP , for both
The results presented above can be interpreted based on the the L-PCA and NL-MVU-PCA. Hence, even from the Tcor
interplay of two factors. One of these factors relates to the perspective, the POF-representation provided by Nǫ can be
strengths and limitations of the L-PCA and NL-MVU-PCA. inferred as better than that provided by NN S .
With regard to NP , Section VI-B has established that NL- The higher accuracy of NL-MVU-PCA over L-PCA, and a
MVU-PCA is more accurate than L-PCA. In that L-PCA fails better POF-representation by Nǫ over NN S , explain the results
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 13

presented in Section VI-C and VI-D, in that: discussions in the previous section, it can be inferred that Nǫ
• For both Nǫ and NN S , NL-MVU-PCA performs better provides both a reasonably good POF-representation and POF-
than L-PCA. approximation, while NN S provides a reasonably good POF-
• Both L-PCA and NL-MVU-PCA perform better with Nǫ , representation but a poor POF-approximation (Figure 6a).
than with NN S . Hence, it could be inferred that:
• The performance of NL-MVU-PCA corresponding to Nǫ • A good POF-approximation is a sufficient condition for
is the best (finds FT in one Iteration). a good POF-representation: So because, well converged
• The performance of L-PCA corresponding to NN S is the solutions that are distributed across the whole POF will
worst (fails to reduce any objective). implicitly have a correlation structure that conforms with
• The reliability of the reported error (Table XI) is high for that of the POF.
NL-MVU-PCA operating on Nǫ , while it is the lowest for • A good POF-approximation is not a necessary condition
L-PCA operating on NN S –in that, even when an FT is for a good POF-representation: This can be justified by
not found, the reported error is zero. the fact that the correlation structure is scale–invariant,
and hence, it is possible to have a solution set which
TABLE XI
O N RELIABILITY OF THE REPORTED ERRORS
retains the correlation structure while rating poorly on
convergence and diversity measures (Figure 6b).
NL-MVU-PCA L-PCA The above discussions set the basis on which the experimen-
Data Section
ET Fs ET Fs
tal results presented in future sections may be interpreted. In
NP VI-B 0.0000 {f3 , f4 , f5 } 0.0000 {f3 , f4 , f5 } that, given two solution sets, the one providing a better POF-
Nǫ VI-C 0.0024 {f3 , f4 , f5 } 0.0077 {f1 , f4 , f5 }
NN S VI-D 0.0128 {f3 , f4 , f5 } 0.0000 {f1 , . . . , f5 } representation, i.e., Tcor values closer to those corresponding
to NP , will report more accurate results, regardless of the
corresponding g and Is values.
This can be explained by the fact that the proposed
error measure (Equation 6) depends on three factors: (i)
contribution of fi across all Vj (cM G. Benefit of objective reduction
i ), (ii) whether or not fi
and fj are identically correlated (δij ), and (iii) strength The benefit of objective reduction in terms of improving the
of correlation between fi and fj (Rij ). While the first search efficiency, and hence the POF-approximation, could be
factor depends on the ability of an algorithm to accurately realized by comparing Figures 1a and Figure 7.
determine the principal components (Vj ), the latter two
f5
depend on the POF-representativeness of a solution set.
1
0.8
0.6
F. POF-representation versus POF-approximation 0.4
0.2
As discussed above, the success of the proposed algorithms 0
0.1
0.3
depends on the quality of the POF-representation provided 0.5
1.2
f3 0.7 0.8 1
by the underlying solution set. Here, it is important to dis- 0 0.2 0.4 0.6
f4
tinguish between POF-representation (correlation structure)
and POF-approximation. For practical purposes, a solution Fig. 7. DTLZ5-(3, 5): Illustrating that the NN S (one run) obtained for the
reduced problem, FT = {f3 , f4 , f5 }, conforms with the true POF.
set whose convergence and diversity measures are reasonably
close to that of the POF can be considered as a good POF-
approximation. It may be noted that for the NP , Nǫ and
VII. E XPERIMENTAL R ESULTS FOR A WIDER RANGE OF
NN S (Sections VI-B, VI-C and VI-D, respectively), the g
R EDUNDANT P ROBLEMS
(convergence) and Is (diversity) measures are as follows:
• NP : g = 0.00 and Is = 1.00
This section presents the experimental results for a wider
• Nǫ : g = 0.07 and Is = 1.48 range of the DTLZ5(I, M ) and WFG3(M ) problems. In
• NN S : g = 0.79 and Is = 3.53
that: (i) the L-PCA and NL-MVU-PCA results are compared
with those from the Dominance Relation Preservation (DRP)
based greedy and exact algorithms [20], [36], from δ-MOSS
f5 NSGA−II
e−MOEA
f5 Poor POF−approximation, perspective with δ = 0 (0% Error), and (ii) three data sets,
3 3 yet a good
POF POF−representation namely NP , Nǫ and NN S are used.
2 2
POF It may be noted that, corresponding to NP (no noise), both
1 1

2 0
the DRP based greedy and exact algorithms, and both L-
0
0
1 1
f4
1 2
3
PCA and NL-MVU-PCA, could accurately identify an FT ,
2 f3 2 1 f4
f3 30 0 for all the considered problems. However, in most cases, L-
(a) NP vis-à-vis Nǫ and NN S (b) NP vis-à-vis scaled NP PCA failed to pick the particular FT whose objectives account
Fig. 6. DTLZ5-(3, 5): POF-Representation versus POF-Approximation for the largest variance (Equation 9 and Equation 11). This
could be attributed to the limitations of L-PCA in terms of
Furthermore, for randomly generated solutions, it is found accurately determining the principal components (as discussed
that: g = 0.82 and Is = 2.91. Given these facts and the in Section VI-B3).
14 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

TABLE XII
R EDUNDANT PROBLEMS : R ESULTS BASED ON NN S AND Nǫ WITH θ = 0.997. T HE ENTRIES BELOW I AND FT INDICATE THE NUMBER OF SUCCESSFUL
CASES IN 20 RUNS . T HE DASHES (-) IMPLY INCONSEQUENTIAL ENTRIES WHEREVER THE PREREQUISITE OF ACCURATE I IS NOT MET. T HE CROSSES (×)
DENOTE UNATTEMPTED CASES DUE TO HIGH COMPUTATIONAL TIME . T HE FOOTNOTES REPORT CASES WHICH REQUIRE MULTIPLE ITERATIONS OF THE
ALGORITHM FOR ACCURATE RESULTS , WHERE IN : P— A R( B I), IMPLIES THAT FOR THE PROBLEM P, A R UNS OUT OF 20, REQUIRED B I TERATIONS EACH

Proposed approaches DRP [20], [36]; δ-MOSS, 0% Error


Test problems NL-MVU-PCA L-PCA Greedy Approach Exact Approach
DTLZ5(I, M ) Nǫ NN S Nǫ NN S Nǫ NN S Nǫ NN S
I M Ib FT Ic FT Id FT I FT I FT I FT I FT I FT
2 5 20 20 20 20 20 14 20 1 0 - 0 - 0 - 0 -
2 20 20 20 7 7 20 2 0 - 0 - 0 - 0 - 0 -
2 50 20 20 14 14 20 0 10 - 0 - 0 - × × × ×
3 5 20 20 18 18 20 9 0 0 0 - 0 - 0 - 0 -
3 20 20 20 0 - 20 1 0 - 0 - 0 - 0 - 0 -
5 10 19 19 0 - 19 3 0 - 0 - 0 - 0 - 0 -
5 20 19 19 0 - 18 3 0 - 0 - 0 - 0 - 0 -
7 10 19 19 0 - 20 6 0 - 0 - 0 - 0 - 0 -
7 20 16 16 0 - 13 3 0 - 0 - 0 - 0 - 0 -
5 20 20 20 20 20 19 20 4 0 - 0 - 0 - 0 -
WFG3 15 15 15 20 20 10 1 19 0 0 - 0 - 0 - 0 -
25 9 9 20 20 6 0 20 0 0 - 0 - 0 - 0 -
a The tabulated results are obtained using the source codes at: http://www.tik.ee.ethz.ch/sop/download/supplementary/objectiveReduction/. The same codes
are used for the results presented later in Tables XV and XXI.
b DTLZ5: (5, 10)—3R(2I); (5, 20)—8R(2I); (7, 10)—7R(2I); (7, 20)—9R(2I) & 1R(4I); and WFG3: (15)—6R(2I) & 3R(3I); (25)—5R(2I), 1R(3I) &
1R(4I).
c DTLZ5: (2, 20)—7R(2I), 2R(4I), 2R(5I), 1R(6I) & 1R(7I); (2, 50)—8R(2I) & 2R(3I); (3, 5)—9R(2I).
d DTLZ5: (5, 20)—1R(2I); (7, 10)—3R(2I); (7, 20)—1R(2I); and WFG3: (5)—11R(2I); (15)—1R(2I).

TABLE XIII
R EDUNDANT PROBLEMS : R ESULTS BASED ON NN S
AND Nǫ WITH θ = 0.997. T HE ENTRIES ARE FORMATTED AS µ ± σ, WHERE µ AND σ REPRESENT
THE MEAN AND STANDARD DEVIATION OF THE NUMBER OF ESSENTIAL OBJECTIVES IDENTIFIED , IN 20 RUNS . T HE CROSSES (×) DENOTE UNATTEMPTED
CASES DUE TO HIGH COMPUTATIONAL TIME .

Test Problems Proposed DRP [20], [36]; δ-MOSS, δ = 0


DTLZ5(I,M) NL-MVU-PCA L-PCA Greedy Exact
I M Nǫ NN S Nǫ NN S Nǫ NN S Nǫ NN S
2 5 02.0 ± 0.0 02.0 ± 0.0 02.0 ± 0.0 02.0 ± 0.0 05.0 ± 0.0 04.7 ± 0.5 05.0 ± 0.0 05.0 ± 0.0
2 20 02.0 ± 0.0 10.9 ± 7.8 02.0 ± 0.0 18.3 ± 2.2 16.0 ± 1.4 11.8 ± 1.8 15.5 ± 1.4 11.3 ± 1.5
2 50 02.0 ± 0.0 04.4 ± 3.8 02.0 ± 0.0 05.2 ± 4.2 19.2 ± 2.2 10.9 ± 1.2 × ×
3 5 03.0 ± 0.0 03.2 ± 0.6 03.0 ± 0.0 05.0 ± 0.0 05.0 ± 0.0 04.9 ± 0.3 05.0 ± 0.0 04.9 ± 0.2
3 20 03.0 ± 0.0 19.7 ± 0.8 03.0 ± 0.0 19.8 ± 0.3 13.1 ± 1.5 10.6 ± 1.2 12.2 ± 1.4 10.3 ± 1.1
5 10 05.0 ± 0.2 10.0 ± 0.0 05.0 ± 0.2 10.0 ± 0.0 09.1 ± 0.6 09.4 ± 0.5 08.7 ± 0.6 09.3 ± 0.4
5 20 05.0 ± 0.2 20.0 ± 0.0 05.1 ± 0.3 20.0 ± 0.0 10.7 ± 1.3 10.3 ± 1.0 09.7 ± 1.3 09.8 ± 1.0
7 10 07.0 ± 0.2 10.0 ± 0.0 07.0 ± 0.0 10.0 ± 0.0 08.8 ± 0.8 09.4 ± 0.5 08.4 ± 0.9 09.3 ± 0.4
7 20 07.2 ± 0.4 20.0 ± 0.0 07.3 ± 0.4 20.0 ± 0.0 10.3 ± 1.4 10.7 ± 0.9 09.6 ± 1.6 10.5 ± 0.6
5 02.0 ± 0.0 02.0 ± 0.0 02.0 ± 0.0 02.0 ± 0.0 05.0 ± 0.0 04.7 ± 0.4 04.5 ± 0.5 04.1 ± 0.4
WFG3 15 02.2 ± 0.4 02.0 ± 0.0 02.7 ± 0.8 02.0 ± 0.2 05.7 ± 0.7 05.6 ± 0.5 04.9 ± 0.9 04.6 ± 0.5
25 02.7 ± 0.7 02.0 ± 0.0 03.1 ± 0.9 02.0 ± 0.0 05.4 ± 0.5 05.4 ± 0.6 04.4 ± 0.6 04.6 ± 0.6

TABLE XIV
R EDUNDANT PROBLEMS : O N POF- APPROXIMATION AND POF- REPRESENTATION BY Nǫ AND NN S . F OR BREVITY, DTLZ5(I, M ) AND WFG3(M ) ARE
ABBREVIATED AS D5(I, M ) AND W3(M ), RESPECTIVELY. T HE g, Is AND Tcor VALUES ARE AVERAGED OVER 20 RUNS .

POF POF
1.2 e−MOEA 1.2
e−MOEA
Convergence (g) Diversity (Is ) NSGA−II NSGA−II
1 1
D5 Nǫ NN S Nǫ NN S
IM (µ±σ) (µ±σ) (µ±σ) (µ±σ) 0.8 0.8
Tcor
Tcor

2 05 0.15±0.09 0.48±0.63 1.93±0.01 4.20±0.17 0.6 0.6


2 20 0.20±0.07 2.10±0.58 1.91±0.02 8.06±0.17
2 50 0.23±0.08 2.34±0.34 1.99±0.02 8.18±0.22 0.4 0.4
3 05 0.08±0.04 0.70±0.61 1.48±0.00 3.54±0.02
3 20 0.17±0.07 2.25±0.03 1.59±0.02 6.67±0.01 0.2 0.2
5 10 0.14±0.07 2.06±0.34 1.38±0.02 4.29±0.02
5 20 0.15±0.07 2.25±0.31 1.37±0.00 5.00±0.01 0 0
D5(2,05)
D5(2,20)
D5(2,50)
D5(3,05)
D5(3,20)
D5(5,10)
D5(5,20)
D5(7,10)
D5(7,20)
W3(05)
W3(15)
W3(25)

D5(2,05)
D5(2,20)
D5(2,50)
D5(3,05)
D5(3,20)
D5(5,10)
D5(5,20)
D5(7,10)
D5(7,20)
W3(05)
W3(15)
W3(25)

7 10 0.16±0.07 1.99±0.38 1.27±0.01 3.30±0.01


7 20 0.16±0.08 2.17±0.38 1.28±0.00 3.94±0.02
(a) g (convergence) and Is (diversity) measures (b) Tcor : NL-MVU-PCA (c) Tcor : L-PCA
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 15

Next, the experimental results corresponding to Nǫ and fixed M , as I increases (lesser redundancy) the Tcor
NN S are presented in Tables XII and XIII. It can be seen that increases. The same trend holds for WFG3(M ) problems.
the DRP based greedy and exact algorithms fail to identify an • The instances where more than one Iteration is required,
essential objective set (FT ), for all the problems. This could be are those in which: (i) the underlying solution set does not
explained with respect to the DTLZ5(3, 5), considered earlier. reveal all the identically correlated objectives, or (ii) the
In that, both Nǫ and NN S are a combination of exact-optimal strength of correlation between the identically correlated
(m = 3) and non-optimal (m > 3) solutions. This is evident objectives happens to be weaker than the computed Tcor
in Figures 5b and 1b, where, some solutions are conflicting in (Section VI-D). In such cases, only some of the redundant
f1 –f2 –f3 , even though these objectives are non-conflicting on objectives are eliminated in the first Iteration, while the
the true POF (Figure 5a). Given such characteristics (noise) remaining ones in subsequent Iterations.
of Nǫ and NN S , the DRP based algorithms which aim at • The errors corresponding to the L-PCA and NL-MVU-
preserving the dominance relations, fail to identify an FT . PCA results reported above, and also their performance
In terms of the proposed algorithms, Tables XII and XIII corresponding to NR , are presented in the supplementary
show that, in the case of the DTLZ5(I, M ) problems: file provided with this paper. Both of these, need to be in-
1) The performance of NL-MVU-PCA is better than that of terpreted in the wake of the accuracy of these algorithms
L-PCA, corresponding to both Nǫ and NN S . This could and the quality of POF-representation provided by the
be attributed to the nonlinear unfolding of the data by underlying solution set–as has been done above.
NL-MVU-PCA, which allows it to accurately determine
the principal components (Section VI-B). VIII. E XPERIMENTAL R ESULTS FOR N ON - REDUNDANT
2) The performance of both NL-MVU-PCA and L-PCA P ROBLEMS BASED ON NSGA-II AND ǫ-MOEA SOLUTIONS
based on Nǫ is better than their respective performances
This section presents the experimental results for the non-
based on NN S . This could be explained by the fact
redundant problems, corresponding to NP , Nǫ and NN S . It
that the POF-representation by Nǫ is better than that
may be noted that in the case of NP (no noise), both the
by NN S . Tables XVb and XVc confirm this, where-in
NL-MVU-PCA and L-PCA, and also the DRP based exact
compared to NN S , the Tcor values based on Nǫ are in
and greedy algorithms could accurately identify the absence
closer conformance with those based on NP .
of any redundancy, i.e. m = M and FT = {f1 , . . . , fM }.
It may be noted from Table XVa that, Nǫ also provides
The results corresponding to Nǫ and NN S are presented
a reasonably good POF-approximation–which is a suf-
in Table XV, where it can be seen that both the DRP based
ficient condition for a good POF-representation. This
greedy and exact algorithms fail to identify the true dimension
further justifies the reported results.
(m = M ) of the POF, except for the five-objective instances
Furthermore, in the case of WFG3(M ) problems: of the considered problems.
1) The performance of NL-MVU-PCA is better than that The performance of the proposed algorithms needs to be
of L-PCA, for both Nǫ and NN S . interpreted in the wake of the fact that for the DTLZ1(M )
2) Contrary to the case of DTLZ5(I, M ) problems, the per- to DTLZ4(M ) problems, the POF is characterized by M
formance of both NL-MVU-PCA and L-PCA is better– conflicting objectives (Equation 12), i.e., no two objectives are
corresponding to NN S , than Nǫ . Though this difference correlated. For two sample problems, this could be realized
in performance may seem significant in terms of the through Figures 8a and 8b corresponding to NP . Hence, a
frequency of success in 20 runs (Table XII), it can be good POF-representation by Nǫ and NN S calls for all the M
seen from Table XIII that this difference is only marginal objectives to be captured as conflicting. In this background, it
in terms of the average dimension of the POF in 20 may be noted in Table XV that:
runs. Nonetheless, these results imply that the POF- 1) Both NL-MVU-PCA and L-PCA corresponding to NN S
representation by NN S should be better than that by accurately identify m = M for all the problems.
Nǫ . This is confirmed by Tables XVb and XVc, where- This could be explained by Tables XVIIb and XVIIc,
in compared to Nǫ , the Tcor values of NN S are in closer where the Tcor values based on NN S are in strong
conformance with those of NP . In the case of Nǫ , the conformance with those based on NP , implying that
performance of L-PCA deteriorates more severely than NN S provides a very good POF-representation. For two
that of NL-MVU-PCA. This could be explained by the sample problems, this is corroborated in Figures 8c
fact that the departure in the Tcor values based on Nǫ and 8d, where, NN S captures the conflict between all
from those based on NP , is larger in the case of L-PCA. the objectives. Notably, NN S has a higher scale than
Besides the results above, the following issues are noteworthy: NP , and hence, it represents a poor POF-approximation
• Given the goal of Tcor (Section IV-D), a low and a (Table XVIIa). However, as discussed in Section VI-F,
high value of Tcor is desired for problems with high and a good POF-approximation is not a necessary condition
low redundancy, respectively. The fact that the proposed for a good POF-representation, and that explains the
formulation (Equation 4) meets its goal is confirmed by high accuracy of the results.
Tables XVb and XVc, where corresponding to NP : (i) 2) Both NL-MVU-PCA and L-PCA corresponding to Nǫ ,
for a fixed I in DTLZ5(I, M ) problems, as M increases perform well for the DTLZ2(M ) and DTLZ4(M ) prob-
(higher redundancy) the Tcor decreases, and (ii) for a lems. In that, all these problems except DTLZ2(25)
16 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

TABLE XV
N ON - REDUNDANT PROBLEMS : R ESULTS BASED ON NN S AND Nǫ WITH θ = 0.997. F OR BREVITY, K (M) REPRESENTS AN M- OBJECTIVE DTLZ K
PROBLEM . T HE ENTRIES ARE FORMATTED AS n(µ ± σ), WHERE n INDICATES THE NUMBER OF TIMES THAT FT IS ACCURATELY IDENTIFIED , IN 20
RUNS , WHILE µ AND σ REPRESENT THE MEAN AND STANDARD DEVIATION OF THE NUMBER OF ESSENTIAL OBJECTIVES IDENTIFIED , IN 20 RUNS . T HE
DASHES (-) DENOTE UNATTEMPTED CASES DUE TO HIGH COMPUTATIONAL TIME

Proposed Algorithms DRP [20], [36]; δ-MOSS, δ = 0

K(M) NL-MVU-PCA L-PCA Greedy Approach Exact Approach


Nǫ NN S Nǫ NN S Nǫ NN S Nǫ NN S
1(05) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0)
1(15) 06 (14.1±0.7) 20 (15.0±0.0) 14 (14.7±0.4) 20 (15.0±0.0) 08 (13.8±1.0) 00 (11.5±0.9) 07 (13.7±1.0) 00 (10.5±0.5)
1(25) 00 (20.5±2.2) 20 (25.0±0.0) 03 (22.2±2.1) 20 (25.0±0.0) 00 (16.3±1.6) 00 (11.9±1.0) – –
2(05) 20 (05.0±0.0) 20 (05.0±0.0) 20 (04.9±0.2) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0)
2(15) 20 (15.0±0.0) 20 (15.0±0.0) 20 (15.0±0.0) 20 (15.0±0.0) 01 (11.4±0.9) 00 (11.0±0.6) 00 (10.8±0.7) 00 (10.7±0.7)
2(25) 16 (24.7±0.5) 20 (25.0±0.0) 17 (24.7±0.5) 20 (25.0±0.0) 00 (12.4±1.0) 00 (11.3±0.8) – –
3(05) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0)
3(15) 11 (14.3±0.9) 20 (15.0±0.0) 20 (14.8±0.4) 20 (15.0±0.0) 08 (13.1±0.9) 00 (12.9±1.0) 02 (12.7±0.9) 00 (12.3±0.8)
3(25) 00 (20.9±2.2) 20 (25.0±0.0) 05 (22.0±2.6) 20 (25.0±0.0) 00 (15.5±1.5) 00 (13.8±1.9) – –
4(05) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0) 14 (04.7±0.4) 20 (05.0±0.0) 20 (05.0±0.0) 20 (05.0±0.0)
4(15) 20 (15.0±0.0) 20 (15.0±0.0) 20 (15.0±0.0) 20 (15.0±0.0) 00 (11.6±0.8) 00 (11.2±0.7) 00 (11.0±0.6) 00 (11.5±1.0)
4(25) 20 (25.0±0.0) 20 (25.0±0.0) 20 (25.0±0.0) 20 (25.0±0.0) 00 (11.4±0.8) 00 (12.0±0.8) – –

TABLE XVI
N ON - REDUNDANT PROBLEMS : O N POF- APPROXIMATION AND POF- REPRESENTATION BY Nǫ AND NN S . F OR BREVITY, K (M) REPRESENTS AN
M- OBJECTIVE DTLZ K PROBLEM . T HE g, Is AND Tcor VALUES ARE AVERAGED OVER 20 RUNS .

Convergence (g) Diversity (Is )


Nǫ NN S Nǫ NN S 1.3 1.3
POF POF
K(M) (µ±σ) (µ±σ) (µ±σ) (µ±σ) 1.2 e−MOEA 1.2 e−MOEA
NSGA−II NSGA−II
1(05) 170±129 410±141 480±0 539±6 1.1 1.1
1(15) 313±154 948±165 321±4 507±2 1 1
1(25) 346±152 944±207 269±1 331±6
0.9 0.9

Tcor
Tcor

2(05) 0.09±0.05 0.11±0.06 1.16±0.00 1.08±0.02


2(15) 0.23±0.11 2.08±0.45 0.99±0.01 2.41±0.00 0.8 0.8
2(25) 0.22±0.11 2.12±0.49 0.77±0.01 2.10±0.00 0.7 0.7
3(05) 709±223 0738±220 1168±5 1047±18
0.6 0.6
3(15) 903±225 1733±356 0880±7 1358±08
3(25) 908±220 1808±400 0669±4 1187±13 0.5 0.5
4(05) 0.12±0.06 0.13±0.06 1.13±0.05 1.14±0.00 0.4 0.4
D1(05)
D1(15)
D1(25)
D2(05)
D2(15)
D2(25)
D3(05)
D3(15)
D3(25)
D4(05)
D4(15)
D4(25)

D1(05)
D1(15)
D1(25)
D2(05)
D2(15)
D2(25)
D3(05)
D3(15)
D3(25)
D4(05)
D4(15)
D4(25)
4(15) 0.22±0.09 2.20±0.15 1.21±0.00 2.80±0.01
4(25) 0.29±0.13 2.18±0.15 1.24±0.01 2.63±0.00
(a) g (convergence) and Is (diversity) measures (b) Tcor : NL-MVU-PCA (c) Tcor : L-PCA

are accurately solved. This could be explained by Ta- weak, implying a poor POF-representation by Nǫ . Two
bles XVIIb and XVIIc, where the Tcor values based points are noteworthy here:
on Nǫ are reasonably close to those based on NP , (a) The performance in terms of the average dimension
implying that Nǫ provides a reasonably good POF- (µ) of POF determined in 20 runs is not as poor as
representation. For a sample case, this is corroborated in indicated by the frequency (n) of success in 20 runs. For
Figure 8f, where, Nǫ captures the conflict between all example, in the case of DTLZ1(15), for 20 runs: (i) L-
the objectives, and hence–the same correlation structure PCA leads to n = 14, but µ = 14.7, and (ii) NL-MVU-
as the true POF. Interestingly, the POF-approximation by PCA leads to n = 6, but µ = 14.1. Notably, these results
Nǫ for these problems is reasonably good (Table XVIIa) are still better than those obtained from the DRP based
and that being a sufficient condition for a reasonably greedy (µ = 13.7) and exact (µ = 10.5) algorithms.
good POF-representation, justifies the results. The only (b) The performance deterioration is proportional to the
exception of DTLZ2(25) can be explained through its departure in the Tcor values based on Nǫ from those
corresponding Is ≪ 1, which suggests approximation based on NP . For example, for M = 25, the departure
of only a part of the POF, and a possibly distorted in Tcor values grows in the following sequence (for both
correlation structure. NL-MVU-PCA and L-PCA): DTLZ4, DTLZ2, DTLZ3
3) The performance of NL-MVU-PCA and L-PCA corre- and DTLZ1. Notably, the corresponding departure of µ
sponding to Nǫ , for the DTLZ1(M ) and DTLZ3(M ) (from 25) also grows in the same sequence.
problems is relatively poor. This could be explained
The above discussion has explained the performance of the
by Tables XVIIb and XVIIc, where the conformance
proposed algorithms for Nǫ vis-à-vis NN S . The following
between the Tcor values corresponding to Nǫ and NP is
discussion explains why the performance of L-PCA is superior
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 17

TABLE XVII
DTLZ1(5): C ORRELATION STRUCTURE AND EIGENVALUE SPECTRUM OF L-PCA AND NL-MVU-PCA CORRESPONDING TO NP AND Nǫ

R matrix Satisfies Equation 12, in that, only the diagonal elements are positive, implying no two columns are identically correlated
e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 e13 e14 e15
L-PCA 0.08 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.00
NL-MVU-PCA 0.08 0.08 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.06 0.00
(a) NP : Correlation matrix (R), and eigenvalues of L-PCA and NL-MVU-PCA

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15


f1 1.00 0.35 0.51 0.18 0.34 0.29 0.32 0.09 0.07 0.20 0.08 -0.01 -0.13 -0.08 -0.01
f2 0.35 1.00 0.70 0.14 0.48 0.32 0.44 0.11 0.05 0.11 0.13 -0.01 -0.07 -0.08 -0.03
f3 0.51 0.70 1.00 0.36 0.58 0.42 0.51 0.24 0.22 0.32 0.13 -0.09 -0.13 -0.10 -0.06
f4 0.18 0.14 0.36 1.00 0.84 0.15 0.28 0.16 0.04 0.22 0.11 -0.06 -0.09 -0.04 -0.01
f5 0.34 0.48 0.58 0.84 1.00 0.24 0.43 0.24 0.06 0.32 0.08 -0.07 -0.13 -0.08 -0.02
f6 0.29 0.32 0.42 0.15 0.24 1.00 0.64 0.22 0.18 0.32 0.13 -0.04 -0.13 0.13 -0.02
f7 0.32 0.44 0.51 0.28 0.43 0.64 1.00 0.34 0.24 0.33 0.09 -0.03 -0.13 0.04 -0.05
f8 0.09 0.11 0.24 0.16 0.24 0.22 0.34 1.00 0.33 0.27 0.00 -0.14 -0.09 -0.06 -0.07
f9 0.07 0.05 0.22 0.04 0.06 0.18 0.24 0.33 1.00 0.20 -0.06 -0.15 -0.06 -0.04 -0.12
f10 0.20 0.11 0.32 0.22 0.32 0.32 0.33 0.27 0.20 1.00 0.08 -0.17 -0.17 -0.02 0.00
f11 0.08 0.13 0.13 0.11 0.08 0.13 0.09 0.00 -0.06 0.08 1.00 -0.12 -0.09 0.01 -0.04
f12 -0.01 -0.01 -0.09 -0.06 -0.07 -0.04 -0.03 -0.14 -0.15 -0.17 -0.12 1.00 -0.26 -0.04 -0.03
f13 -0.13 -0.07 -0.13 -0.09 -0.13 -0.13 -0.13 -0.09 -0.06 -0.17 -0.09 -0.26 1.00 -0.06 -0.11
f14 -0.08 -0.08 -0.10 -0.04 -0.08 0.13 0.04 -0.06 -0.04 -0.02 0.01 -0.04 -0.06 1.00 -0.02
f15 -0.01 -0.03 -0.06 -0.01 -0.02 -0.02 -0.05 -0.07 -0.12 0.00 -0.04 -0.03 -0.11 -0.02 1.00
e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 e13 e14 e15
L-PCA 0.26 0.10 0.09 0.08 0.08 0.07 0.06 0.05 0.05 0.04 0.04 0.03 0.02 0.02 0.01
NL-MVU-PCA 0.44 0.25 0.13 0.07 0.05 0.03 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
(b) Nǫ : Correlation matrix (R), and eigenvalues of L-PCA and NL-MVU-PCA

to that of NL-MVU-PCA, for the DTLZ1(M ) and DTLZ3(M )


problems, corresponding to Nǫ . Towards it, the DTLZ1(15)
has been chosen, and the following points may be noted:
• The predominance of the first eigenvalue, prominent gaps
in the eigenvalues, and fewer significant eigenvalues,
indicate that the data is mainly confined to a lower di-
mensional space. Hence, for a problem like DTLZ1 with
no redundancy, all the eigenvalues should be equable. It
(a) NP : DTLZ1(15) (b) NP : DTLZ2(15) can be seen in Table XVIIIa, that both L-PCA and NL-
MVU-PCA based on NP , accurately reveal this fact.
350 2.5
• Unlike NP , the obtained Nǫ poses DTLZ1(15) as a
300 redundant problem (Figure 8e, Tables XVIIIb and XVIII),
Objective Values

Objective Values

2
250
1.5
where-in three sets of correlated objectives exist, namely,
200
150
1
{f1 , . . . , f5 }, {f6 , f7 } and {f8 , f9 }. It can be seen that,
100
0.5
compared to L-PCA, NL-MVU-PCA is more efficient in
50
0 0
capturing the redundant characteristics revealed by Nǫ ,
1 3 5 7 9 11
Objective Number
13 15 1 3 5 7 9 11
Objective Number
13 15
as (i) its first eigenvalue is more dominant: e1 = 0.44
(c) NN S : DTLZ1(15) (d) NN S : DTLZ2(15) (e1 = 0.26 for L-PCA), (ii) the gaps in the eigenvalues
are more prominent, and (iii) the number of significant
300 1.4 eigenvalues are fewer: Nv = 8 (Nv = 15 for L-PCA).
250
1.2
Given the above, NL-MVU-PCA computes a lower T cor
Objective Values

Objective Values

1
200
0.8
than L-PCA (Table XVIII). Eventually, L-PCA identifies
150
0.6 m = M = 15, while NL-MVU-PCA finds m = 14.
100
0.4
50 0.2
It is critical to note that owing to the noise, some of the non-
0 0 redundant problems (m = M ) show the characteristics of the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 3 5 7 9 11 13 15
Objective Number Objective Number redundant problems (m < M ). Hence, the algorithms which
(e) Nǫ : DTLZ1(15) (f) Nǫ : DTLZ2(15) are less efficient in identifying the smallest set of conflicting
Fig. 8. DTLZ1(15) and DTLZ2(15): Parallel coordinate plots objectives (the DRP based greedy algorithm and L-PCA, in
their respective categories) appear to perform better than the
18 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

otherwise more efficient algorithms (the DRP based exact Tables V and VI where eigenvalue analysis could not
algorithm and NL-MVU-PCA, in their respective categories). reduce any objective, both the L-PCA and NL-MVU-
PCA achieve an FT only by the eigenvalue analysis,
making the RCM analysis look redundant.
TABLE XVIII
DTLZ1(15): H IGHLIGHTS OF L-PCA AND NL-MVU-PCA BASED ON Nǫ 2) If the problem is handled as a highly non-redundant
problems, then again, reference to Tables V and VI
Features L-PCA NL-MVU-PCA reveals that RCM analysis alone is capable to reduc-
Nv 15 8
ing F0 = {f1 , . . . , fM } to FT , making the eigenvalue
analysis look redundant.
M2σ 12 6
The above illustrates the potential of each of the eigenvalue
Fe {f1 , . . . , f15 } {f1 , . . . , f15 } analysis and the reduced correlation matrix analysis towards
Sˆi Sˆ1 = Sˆ2 = Sˆ3 = Sˆ4 = Sˆ5 = {f1 , f2 , f3 , f4 , f5 } reducing the number of objectives and contributing to the
Sˆ6 = Sˆ7 = {f6 , f7 }; Sˆ8 = Sˆ9 = {f8 , f9 } determination of an FT . It is important to note that while
Tcor 0.9472 0.7363 each of the eigenvalue analysis and the reduced correlation
Si ∅ S4 = S5 = {f4 , F5 } matrix analysis may independently be capable of solving a
Fs {f1 , . . . , f15 } {f1 , . . . , f4 , f6 , . . . , f15 }
specific class of problems, the simultaneous inclusion of both
will only extend the applicability of the framework to a wider
range of problems–a goal that Framework 1 pursues.
Finally, the errors reported by L-PCA and NL-MVU-PCA
TABLE XIX
corresponding to the non-redundant problems, and the results C USTOMIZED F RAMEWORK FOR REDUNDANT PROBLEMS : R E - VISITING
corresponding to NR , are presented in the supplementary file DTLZ5-(3, 5) CORRESPONDING TO NP
provided with this paper.
L-PCA (Refer Table VIa) NL-MVU-PCA (Refer Table VIIa)
IX. O N CUSTOMIZATION AND PARAMETER SENSITIVITY Eigenvalue Eigenvalue
Eigenvalues Eigenvalues
Analysis Analysis
OF THE PROPOSED FRAMEWORK
e1 = 68.67 V1 : {f3 , f4 } e1 = 51.38 V1 : {f4 , f5 }
The results reported above correspond to the proposed e2 = 28.66 V2 : {f4 , f5 } e2 = 48.55 V2 : {f3 , f5 }
framework (Framework 1) which, for generality, assumes that e3 = 02.66 V3 : {f4 , f5 }
no a priori information about the nature of a given problem is Fe = {f3 , f4 , f5 } ≡ FT Fe = {f3 , f4 , f5 } ≡ FT
available. However, as highlighted in Section IV-G, the frame-
work could be customized (at the loss of generality, though) for Besides the issue of customization of the proposed frame-
higher efficiency, for the highly redundant problems (m ≪ M ) work, it is also important to investigate its robustness in
on the one hand, and the non-redundant problems (m ≈ M ) on the wake of a variation in the critical parameter θ (variance
the other hand. While this could be done in multiple ways, as threshold). In that, a minor variation in θ, should not drasti-
suggested in Section IV-G, one specific form for each category cally influence the Framework’s performance. In this context,
of problems is recommended in this paper. experiments are performed with14 θ = 0.682 and θ = 0.954.
1) For highly redundant problems: With all the other steps Based on the discussions in Section IV-G, a reduction in θ
in Section IV-A to IV-F remaining the same, a change (smaller Nv ) could have the following effects:
in Section IV-C is recommended. In that, along each (a) For highly redundant problems, only a few of the top
significant principal component (Vj ): principal components may be sufficient to capture the essential
(a) The objectives with the most positive and most objectives. Hence, the variation (possible deterioration) in
negative contributions to Vj , be picked. performance accompanying a reduction in θ should not be too
(b) If all the objectives have the same sign contributions, drastic for such problems–more so, corresponding to a solution
then objectives with the highest and the second highest set which provides a good POF-representation (as, Nǫ ) and a
magnitudes be picked. more effective algorithm (as, NL-MVU-PCA).
2) For non-redundant problems: It is recommended that the (b) For problems with low redundancy, where most of the
step in Section IV-C (eigenvalue analysis) be skipped, objectives are essential, a reduction in θ implies a higher risk
while all the remaining steps in Section IV-A to IV-F be for Fe ⊂ FT , and hence, the scope of poorer performance.
retained as such. It implies that the reduced correlation It can be seen that the results presented in Table XX for
matrix analysis be directly performed on the initial θ = 0.682 comply15 with the expected trend discussed above.
objective set, namely, F0 . The fact that even a significant reduction in θ from 0.997 to
As the variants of the proposed framework is not the focus 0.682 does not lead to a drastic deterioration in performance,
in this paper, only the analysis for DTLZ5(3, 5) based on highlights the robustness of the proposed framework.
NP , is revisited to highlight the key connotations of the 14 This is in analogy with Gaussian distributions where ±σ and ±2σ,
recommendations above. In that: account for 68.2% and 95.4% variance, respectively. While, the results for
1) If the problem is handled as a highly redundant prob- θ = 0.682 are presented in Table XX, those with θ = 0.954 are presented
in the supplementary file provided with this paper.
lems, then the customized framework will respond as 15 The same holds for the results corresponding to θ = 0.954, presented in
summarized in Table XIX. It can be seen that unlike in the supplementary file provided with this paper.
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 19

TABLE XX
F RAMEWORK ’ S PERFORMANCE WITH θ = 0.682: T HE NUMBERS IN THE TABLE INDICATE THE FREQUENCY OF SUCCESS IN IDENTIFYING THE TRUE I
AND / OR FT , OUT OF 20 RUNS . T HE FOOTNOTES REPORT THE PROBLEMS THAT REQUIRE MULTIPLE ITERATIONS OF THE ALGORITHM TO OBTAIN
ACCURATE RESULTS , AS : P— A R( B I), IMPLYING THAT FOR THE PROBLEM P, A R UNS OUT OF 20, REQUIRED B I TERATIONS EACH

Test problems NL-MVU-PCA L-PCA Test Problems


DTLZ5(I, M ) Nǫ NN S Nǫ NN S DTLZ(M ) NL-MVU-PCA L-PCA
I M Ia FT Ib FT Ic FT Id FT Name M Nǫ NN S Nǫ NN S
2 5 20 20 20 20 20 1 20 18 DTLZ1 5 17 18 15 19
2 20 20 20 8 8 20 0 2 1 15 6 18 10 20
2 50 20 20 19 19 20 0 20 2 25 0 20 3 20
3 5 20 20 18 18 20 5 1 1 DTLZ2 5 19 18 17 17
3 20 20 20 0 0 20 0 0 0 15 19 19 20 17
5 10 17 17 0 0 18 3 0 0 25 14 20 15 20
5 20 19 19 0 0 10 0 0 0 DTLZ3 5 16 17 19 18
7 10 16 15 0 0 18 7 0 0 15 6 20 12 19
7 20 13 13 0 0 9 1 0 0 25 0 20 2 20
5 20 20 20 20 20 14 20 1 DTLZ4 5 15 20 19 18
WFG3 15 15 15 20 20 3 1 20 0 15 19 20 20 20
25 9 9 20 20 0 0 20 0 25 20 20 20 20
a DTLZ5: (5, 10)—2R(2I); (5, 20)—6R(2I); (7, 10)—4R(2I); (7, 20)—5R(2I) & 1R(4I); and WFG3: (15)—6R(2I) & 3R(3I); (25)—5R(2I), 1R(3I) & 1R(4I).
b DTLZ5: (2, 20)—1R(2I), 4R(3I), 1R(4I), 1R(5I) & 1R(7I); (2, 50)—11R(2I), 2R(3I) & 2R(4I); (3, 5)—10R(2I).
c DTLZ5: (7, 10)—3R(2I); and WFG3: (5)—9R(2I); (15)—1R(2I).
d DTLZ5: (2, 20)–1R(5I); (2, 50)—2R(2I).

TABLE XXI
P ERFORMANCE OF OBJECTIVE REDUCTION ALGORITHMS ON TWO REAL - WORLD APPLICATION PROBLEMS . T HESE RESULTS CORRESPOND TO 20
NSGA-II RUNS WITH UNIFORMLY DISTRIBUTED SEEDS ; EACH RUN CORRESPONDING TO 200 POPULATION SIZE AND 2000 GENERATIONS
Proposed approaches DRP [20], [36]: δ-MOSS, 0% Error
Real-world Problems NL-MVU-PCA L-PCA Greedy Approach Exact Approach
(a) Multi-speed gearboxa {f1 , f2 } {f1 , f2 } {f1 , f2 , f3 } {f1 , f2 , f3 }
(b) Storm drainage systemb {f2 , f3 , f4 , f5 } {f1 , f2 , f4 , f5 }c {f2 , f3 , f4 , f5 } {f2 , f3 , f4 , f5 }
a The error associated with NL-MVU-PCA and L-PCA is 0.00262 ± 0.00035 and 0.00559 ± 0.00074, respectively.
b The error associated with NL-MVU-PCA and L-PCA is 0.00002 ± 0.00001 and 0.00021 ± 0.00012, respectively.
c In 18 out of the 20 runs, L-PCA finds Fs = {f1 , f2 , f4 , f5 }, while twice it finds Fs = {f2 , f3 , f4 , f5 }.

1
X. R EAL - WORLD P ROBLEMS
0.8
This section considers two real-world problems. f3 (Min)
70 0.6
50
0.4
A. Multi-Speed Gearbox Design Problem 30
10
0.2
This is a multi-speed gearbox design problem, comprising 2500
5000
30 25 20 15 10 )
of three objectives. It does not belong to the many-objective 5 0 0 Min 0
f2 (Max) f1 ( 1 3 2

domain, yet, it is included here for a comparative analysis of (a) Non-dominated front (b) Parallel-coordinate plot (order of f2
the proposed L-PCA and NL-MVU-PCA vis-à-vis the DRP and f3 is swapped)
based algorithms. While the problem formulation16 can be Fig. 9. Multi-speed gearbox design problem. The plots correspond to one
found in [37], it may be noted that the three objectives relate run of NSGA-II.
to (i) minimization of (f1 ) overall volume of gear material
used (which is directly related to the weight and cost of the
gearbox), (ii) maximization of (f2 ) power delivered by the (f1 ). It can be seen from the Table XXI that while L-PCA
gearbox, and (iii) minimization of (f3 ) the center distance and NL-MVU-PCA identify the redundancy in the problem,
between input and output shafts. the DRP based exact and greedy algorithms fail to capture
The NN S corresponding to a population size of 200 and it. The failure of the latter can be explained by the fact that
2000 generations is as shown in Figure 9a. Here, it can be seen the obtained NN S is noisy, in that, a very small fraction of
that f1 and f3 are non-conflicting among themselves, while solutions (3 out of the 200 solutions in Figure 9a) show conflict
each is in conflict with f2 . This is also affirmed by Figure 9b. between f1 and f3 .
These observations are physically justifiable because for a
fixed number of gears, the lower the center distance between B. Storm Drainage System Problem
the input and output shafts (f3 ), the smaller the size of each
Originally described in [38], this is a five-objective, seven-
gear will be, resulting in lower overall volume of gear material
constraint problem which relates to optimal planning for a
16 The version where the gear thickness, number of teeth, power and module storm drainage system in an urban area. The results presented
are all kept as variables. in Table XXI, show that all the four algorithms identify
20 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

1 1
and (ii) accounting for only those correlations between the
0.8 0.8
objectives, whose strength is higher than dynamically com-
Objective Values

Objective Values
0.6 0.6
puted (problem-specific, eigenvalue based) correlation thresh-
0.4 0.4
old (Tcor ). This allows these algorithms to identify an FT for
0.2 0.2 a range of problems, with a reasonably high accuracy.
0 0
1 2 3 4 5 1 2 3 4 5
30
Objective Number Objective Number Correlated f1−f3
f3 20 Conflicting f1−f3
(a) NN S with all five objectives (b) f1 reconstructed from NN S ob- 20 10
tained with {f2 , f3 , f4 , f5 }
0 0

f3
Fig. 10. Storm drainage problem: Parallel coordinate plots (normalized), −10
−20
corresponding to one run of NSGA-II.
−40−2 −20

0 −30
f2 2
1 −40
4 4 3 2 1 2 3 4 5
either f1 or f3 as redundant. It may be noted that both these 5
f1 f1
sets of results are correct because NN S obtained with either (a) The POF (b) POF in f1 –f3 hyperplane
{f2 , f3 , f4 , f5 } (Figure 10) or {f1 , f2 , f4 , f5 } (not plotted for
brevity) conforms with that obtained for the original five- f1 f2 f3
objective problem. f1 1.0000 -0.7201 0.6942
f2 -0.7201 1.0000 -0.8990
f3 0.6942 -0.8990 1.0000
XI. C OMPARATIVE A NALYSIS OF THE P ROPOSED AND THE
DRP BASED A LGORITHMS (c) Correlation matrix (R) for NP

The results presented above highlight that both the L-PCA Features L-PCA NL-MVU-PCA
and NL-MVU-PCA, and the DRP based greedy and exact ei 0.848; 0.117; 0.033 0.993; 0.006; 0.000
algorithms could accurately identify an essential objective set
sci 0.561; 0.570; 0.574 0.305; 0.506; 0.803
(FT ) for a range of both the redundant and non-redundant
problems for solution sets characterized by only exact-optimal Nv and M2σ 3 and 2 3 and 1
solutions, i.e., no-noise (NP ). Fe {f1 , f2 , f3 } {f1 , f2 , f3 }
It is known that for many-objective problems, most existing Sˆi Sˆ1 = Sˆ3 = {f1 , f3 }, where R13 = 0.6942
MOEAs obtain solution sets which are a combination of exact- Tcor 0.7171 0.3375
optimal (representing the intrinsic dimensionality, m = |FT |)
Si S1 = S3 = ∅ S1 = S3 = {f1 , f3 }
and non-optimal solutions (with a dimensionality different
than m). Hence, from a practical perspective, there is a need Fs {f1 , f2 , f3 } {f2 , f3 }
for objective reduction algorithms that can handle typically Error: E0 0.0 2.89%
noisy solution sets obtained by MOEAs, and can determine (d) Key highlights of L-PCA and NL-MVU-PCA on NP
an essential objective set for a range of problems, with a
Fig. 11. Spline problem: f1 = x2 + 1, f2 = −x2 + x + 3, and f3 =
reasonably high accuracy, and computational efficiency. −(f1 + f23 ), where x ∈ [−2, 2]. The analysis are based on NP .

A. Strength and Limitations of the different Algorithms While the preservation of global correlation structure has
The experimental results presented above, corresponding to its advantages in terms of de-noising of data where noise
(noisy) solution sets such as Nǫ and NN S , reveal that: exists, it may lead to inaccurate results in some cases where
(a) The DRP based greedy and exact algorithms fail to identify the unnoised problem features may be interpreted as noise.
an essential objective set for both the redundant and non- For example, two objectives on the POF may be globally
redundant problems. In that, these algorithms identify a higher correlated but locally conflicting, and if the latter is not
and a lower dimension than m, for the redundant and the non- significant enough, it may be interpreted as noise by the
redundant problems, respectively. proposed algorithms. To demonstrate such a case, an artificial
This could be attributed to the fact that the DRP based problem has been created, as presented in Figure 11. In that:
algorithms aim to preserve the dominance relations in the • Figure 11b highlights both the conflicting and the corre-
given solution set, and hence are extremely sensitive to noise. lated interactions between f1 –f3 .
In that, even a single noisy solution in the case of δ = 0 • Figure 11c highlights that f1 and f3 are potentially iden-
(δ-MOSS) can prevent these algorithms to identify an FT . tically correlated (Equation 3(i)) and could be treated by
An evidence of it can be found in the multi-speed gearbox the L-PCA and NL-MVU-PCA as identically correlated
design problem (Section X-A), where 3 noisy solutions out of or uncorrelated, subject to R13 being higher or lower than
the 200 solutions prevented these algorithms to identify the their corresponding Tcor (Equation 3(ii)).
redundancy of one objective. • Figure 11d highlights that, compared to the L-PCA, NL-
(b) The L-PCA and NL-MVU-PCA, based on preservation of MVU-PCA (where, due to unfolding, the effect of local
the correlation structure (rather than the dominance relations), conflict subsides) reports a predominantly higher gap in
are adept in de-noising the given solution sets, by way of: (i) the eigenvalues, and hence, a significantly lower Tcor .
accounting for only the top 99.7% of the variance (θ = 0.997), • In the case of L-PCA: R13 < Tcor , hence f1 and f3
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 21

are treated as uncorrelated, resulting in no objective 2) Given that a typical solution set obtained from most
reduction, i.e. Fs = {f1 , f2 , f3 }. existing MOEAs for many-objective problems is charac-
• In the case of NL-MVU-PCA: R13 > Tcor , hence f1 and terized by noise, the merit of preserving the dominance
f3 are treated as correlated, resulting in elimination of f1 , relations in a noisy solution set is itself questionable.
and Fs = {f2 , f3 } with a corresponding error of 2.89%. 3) The L-PCA and NL-MVU-PCA proposed in this paper,
It may be noted that like the L-PCA, both the DRP based based on the preservation of global correlation structure
greedy and exact algorithms identify Fs = {f1 , f2 , f3 }. It can (as against the dominance relations), along with their
be verified that all the three objectives in this problem are de-noising features, provide an effective way of deal-
essential, in that, the POF corresponding to Fs = {f2 , f3 } is ing with typically noised solution sets obtained from
restricted to only a part of the original POF. This illustrates that MOEAs. In that, their requirement of a good POF-
an algorithm which is more effective in de-noising the data, representation is less stringent that the requirement of
may over-reduce the objectives in the presence of problem a good POF-approximation. Furthermore, if: (a) only
features that may resemble noise. While this limitation could some but not all the correlated objectives are captured by
be countered by clustering the objective space and applying a solution set, still these algorithms may identify an FT
these algorithms to each cluster, it would imply additional through the iterative approach, and (b) some objectives
computational cost. that are conflicting on the POF are found as correlated,
these algorithms through their de-noising feature based
TABLE XXII on Tcor may disregard the misleading correlations and
C OMPUTATIONAL COMPLEXITY OF OBJECTIVE REDUCTION ALGORITHMS .
N IS THE SIZE OF THE NON - DOMINATED SET; Ng THE GENERATIONS FOR lead to reasonably accurate results.
AN MOEA; M THE NUMBER OF OBJECTIVES 4) A pragmatic approach to handle truly many-objective
Approaches Computational complexity (M ≫ 4) problems, could be to: (i) first employ
A. Dom. rel. preservation (δ-MOSS) the proposed L-PCA or NL-MVU-PCA to identify an
(i) Exact Algorithm O(N 2 M 2M ) essential objective set for typically noisy solution sets,
(ii) Greedy Algorithm O(min{N 2 M 3 , N 4 M 2 }) and (ii) once the exact-optimal solutions (or at least
B. Unsupervised feature selectiona O(N M 2 )+ clustering overhead better solutions) for the reduced problem are found, then
C. Removal of data dependencies the DRP based algorithms could be employed to inform
(i) PCA based reductiona O(N M 2 + M 3 ) the decision maker of objective sets corresponding to
(ii) MVU-PCA based reduction O(M 3 q 3 ) where q is the different δs (degree of error in dominance relations).
neighborhood sizeb
In that, the proposed algorithms and the DRP based
a In [21], [39] and [40], the complexity of clustering has not been included. algorithms can complement each other to meaningfully
Furthermore, the complexity of PCA based reduction is incorrectly cited
as O(N M 2 + M 3 + N 2 M Ng ). Notably, each of the above approaches inform the decision maker.
operate on the non-dominated set, hence, their is no rationale for adding
the computational complexity of obtaining this non-dominated set, only XII. C ONCLUSIONS
to the complexity of PCA based reduction (O(N 2 M Ng ) for NSGA-II).
b In the most constrained case, q = O(M ). This paper has proposed a framework for both linear and
nonlinear objective reduction algorithms, namely, L-PCA and
NL-MVU-PCA, respectively. The performance of these algo-
B. Computational Complexity rithms has been studied for a wide range of redundant and
In terms of the computational complexity, it can be seen non-redundant test problems, and two real-world problems.
from Table XXII, that: (i) the DRP based exact algorithm is The sensitivity of these algorithms on the critical parameter in-
rather impractical to use, since it is exponential in M (number volved and on the quality of POF-approximation (convergence
of objectives) and quadratic in N (the size of solution set), and diversity) and POF-representation (correlation structure)
and (ii) given that, in the case of many-objective problems provided by the underlying solution sets has been discussed.
N ≫ M , even the greedy algorithm is likely to be more An error measure has also been proposed to assess the obtained
expensive than NL-MVU-PCA (worst case being O(M 6 ), with results. A comparative analysis of the proposed algorithms vis-
k = M − 1). à-vis the dominance relation preservation based algorithms,
has been done in terms of their scope of application and
C. Key Inferences the computational complexity. For future work, the endeavor
of the authors will be to demonstrate the application of the
In the wake of the presented results and the discussions proposed algorithms for online objective reduction, δ-MOSS
above, the following inferences can be drawn: problems with different δs and on k-EMOSS problems. The
1) The goal of finding the 0-minimum objective set based future work will also focus on testing the framework on more
on dominance relation preservation is practically diffi- real-world problems and problems with high-dimensional and
cult to realize for large problems. This is because: (i) complicated Pareto-set shapes.
the DRP based exact algorithm which guarantees a 0-
minimum objective set (δ-MOSS with δ = 0) is rather R EFERENCES
impractical to use, owing to its computational complex-
[1] V. Khare, X. Yao, and K. Deb, “Performance Scaling of Multi-objective
ity, and (ii) the greedy algorithm does not guarantee a Evolutionary Algorithms,” in Evolutionary Multi-Criterion Optimiza-
0-minimum objective set. tion, ser. Lecture Notes in Computer Science, C. Fonseca, P. Fleming,
22 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

E. Zitzler, L. Thiele, and K. Deb, Eds. Springer Berlin / Heidelberg, [21] A. L. Jaimes, C. A. C. Coello, and D. Chakraborty, “Objective Reduction
2003, vol. 2632, pp. 72–72. Using a Feature Selection Technique,” in Genetic and Evolutionary
[2] E. Hughes, “Evolutionary Many-Objective Optimisation: Many Once or Computation Conference (GECCO), 2008, pp. 673–680.
One Many?” in IEEE Congress on Evolutionary Computation, Sept. [22] H. K. Singh, A. Isaacs, and T. Ray, “A Pareto Corner Search Evolution-
2005, pp. 222–227. ary Algorithm and Dimensionality Reduction in Many-Objective Opti-
[3] J. Knowles and D. Corne, “Quantifying the Effects of Objective Space mization Problems,” IEEE Transactions on Evolutionary Computation,
Dimension in Evolutionary Multiobjective Optimization,” in Evolu- vol. 99, no. 4, pp. 1–18, August 2011.
tionary Multi-Criterion Optimization, ser. Lecture Notes in Computer [23] K. Deb and D. K. Saxena, “Searching For Pareto-Optimal Solu-
Science, S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu, and T. Murata, tions Through Dimensionality Reduction for Certain Large-Dimensional
Eds. Springer Berlin / Heidelberg, 2007, vol. 4403, pp. 757–771. Multi-Objective Optimization Problems,” in IEEE Congress on Evolu-
[4] Q. Zhang, A. Zhou, and Y. Jin, “RM-MEDA: A Regularity Model-Based tionary Computation, July 2006, pp. 3353–3360.
Multiobjective Estimation of Distribution Algorithm,” IEEE Transac- [24] D. Saxena and K. Deb, “Non-linear Dimensionality Reduction Pro-
tions on Evolutionary Computation, vol. 21, no. 1, pp. 41–63, February cedures for Certain Large-Dimensional Multi-objective Optimization
2008. Problems: Employing Correntropy and a Novel Maximum Variance
[5] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A Fast and Elitist Unfolding,” in Evolutionary Multi-Criterion Optimization, ser. Lecture
Multiobjective Genetic Algorithm: NSGA-II,” IEEE Transactions on Notes in Computer Science, S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu,
Evolutionary Computation, vol. 6, no. 2, pp. 182–197, April 2002. and T. Murata, Eds. Springer Berlin / Heidelberg, 2007, vol. 4403, pp.
[6] E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the Strength 772–787.
Pareto Evolutionary Algorithm for Multiobjective Optimization,” in [25] J. Shlens, “A Tutorial on Principal Component Analysis,”
Evolutionary Methods for Design, Optimisation and Control with Ap- Center for Neural Science, New York University, available
plication to Industrial Problems (EUROGEN 2001), K. Giannakoglou at:http://www.snl.salk.edu/∼shlens/pca.pdf (accessed: May 2011),
et al., Eds. International Center for Numerical Methods in Engineering Tech. Rep., April 2009.
(CIMNE), 2002, pp. 95–100. [26] B. Scholkopf, A. Smola, and K. R. Muller, “Nonlinear Component
[7] J. D. Knowles and D. W. Corne, “Approximating the Nondominated Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10,
Front Using the Pareto Archived Evolution Strategy,” in Evolutionary no. 5, pp. 1299–1319, 1998.
Computation, vol. 8, no. 2, 2000, pp. 149–172. [27] L. K. Saul, K. Q. Weinberger, J. H. Ham, F. Sha, and D. D. Lee,
[8] Q. Zhang and H. Li, “MOEA/D: A Multiobjective Evolutionary Algo- “Spectral Methods for Dimensionality Reduction,” in Semisupervised
rithm Based on Decomposition,” IEEE Transactions on Evolutionary Learning, O. C. B. Schoelkopf and A. Zien, Eds. MIT Press,
Computation, vol. 11, no. 6, pp. 712–731, December 2007. Cambridge, MA, 2006.
[9] H. Ishibuchi, Y. Hitotsuyanagi, H. Ohyanagi, and Y. Nojima, “Effects [28] K. Q. Weinberger and L. K. Saul, “Unsupervised learning of image man-
of the Existence of Highly Correlated Objectives on the Behavior of ifolds by semidefinite programming,” International Journal of Computer
MOEA/D,” in Evolutionary Multi-Criterion Optimization, ser. Lecture Vision, vol. 70, no. 1, pp. 77–90, 2006.
Notes in Computer Science, R. Takahashi, K. Deb, E. Wanner, and [29] J.F.Sturm, “Using SeDuMi 1.02, A Matlab Toolbox for Optimization
S. Greco, Eds. Springer Berlin / Heidelberg, 2011, vol. 6576, pp. Over Symmetric Cones,” Optimization Methods and Software, vol. 11,
166–181. no. 1, pp. 625–653, 1999.
[10] E. Zitzler and S. Künzli, “Indicator-Based Selection in Multiobjective [30] B.Borchers, “CSDP, a C library for Semidefinite Programming,” Opti-
Search,” in Parallel Problem Solving from Nature - PPSN VIII, ser. mization Methods and Software, vol. 11, no. 1, pp. 613–623, 1999.
Lecture Notes in Computer Science, X. Yao, E. Burke, J. Lozano, [31] J. Cohen, Statistical power analysis for the behavioral sciences. 2nd
J. Smith, J. Merelo-Guervós, J. Bullinaria, J. Rowe, P. Tino, A. Kabán, ed, Hillsdale, NJ: Erlbaum, 1988.
and H.-P. Schwefel, Eds. Springer Berlin / Heidelberg, 2004, vol. 3242, [32] S. Huband, P. Hingston, L. Barone, and L. While, “A Review of
pp. 832–842. Multiobjective Test Problems and a Scalable Test Problem Toolkit,”
[11] T. Wagner, N. Beume, and B. Naujoks, “Pareto-, Aggregation-, and IEEE Transactions on Evolutionary Computation, vol. 10, no. 5, pp.
Indicator-Based Methods in Many-Objective Optimization,” in Evolu- 477–506, October 2006.
tionary Multi-Criterion Optimization, ser. Lecture Notes in Computer [33] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable Test Prob-
Science, S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu, and T. Murata, lems for Evolutionary Multiobjective Optimization,” in Evolutionary
Eds. Springer Berlin / Heidelberg, 2007, vol. 4403, pp. 742–756. Multiobjective Optimization, ser. Advanced Information and Knowledge
[12] E. Zitzler, “Evolutionary Algorithms for Multiobjective Optimization: Processing, A. Abraham, L. Jain, and R. Goldberg, Eds. Springer Berlin
Methods and Applications,” Ph.D. dissertation, ETH Zurich, Switzer- Heidelberg, 2005, pp. 105–145.
land, 1999. [34] K. Deb, M. Mohan, and S. Mishra, “Evaluating the e-Domination Based
[13] L. While, P. Hingston, L. Barone, and S. Huband, “A Faster Algorithm Multi-Objective Evolutionary Algorithm for a Quick Computation of
for Calculating Hypervolume,” IEEE Transactions on Evolutionary Pareto-Optimal Solutions,” Evolutionary Computation, vol. 13, no. 4,
Computation, vol. 10, no. 1, pp. 29 – 38, February 2006. pp. 501–525, 2005.
[14] C. Fonseca, L. Paquete, and M. López-Ibáñez, “An Improved [35] R. C. Purshouse and P. J. Fleming, “Evolutionary Many-Objective Opti-
Dimension-Sweep Algorithm for the Hypervolume Indicator,” in IEEE mization: An Exploratory Analysis,” in IEEE Congress on Evolutionary
Congress on Evolutionary Computation, July 2006, pp. 1157 – 1163. Computation, June 2003, pp. 2066–2073.
[15] N. Beume, “S-Metric Calculation by Considering Dominated Hypervol- [36] D. Brockhoff and E. Zitzler, “Objective Reduction in Evolutionary
ume as Klee’s Measure Problem,” Evolutionary Computation, vol. 17, Multiobjective Optimization: Theory and Applications,” Evolutionary
no. 4, pp. 477–492, 2009. Computation, vol. 17, no. 2, pp. 135–166, 2009.
[16] A. V. Lotov, V. A. Bushenkov, and G. K. Kamenev, Interactive Decision [37] P. Jain and A. Agogino, “Theory of design: An optimization perspec-
Maps. Boston, MA: Kluwer, 2004. tive,” Mech. Mach. Theory, vol. 23, no. 3, 1990.
[17] S. Obayashi and D. Sasaki, “Visualization and Data Mining of Pareto [38] K. Musselman and J. Talavage, “A Tradeoff Cut Approach to Multiple
Solutions Using Self-Organizing Map,” in Evolutionary Multi-Criterion Objective Optimization,” Operations Research, vol. 28, no. 6, pp. 1424–
Optimization, ser. Lecture Notes in Computer Science, C. Fonseca, 1435, 1980.
P. Fleming, E. Zitzler, L. Thiele, and K. Deb, Eds. Springer Berlin / [39] A. L. Jaimes and C. A. C. Coello, “Some Techniques to Deal with
Heidelberg, 2003, vol. 2632, pp. 71–71. Many-objective Problems,” in Genetic and Evolutionary Computation
[18] H. Li and Q. Zhang, “Multiobjective Optimization Problems With Conference (GECCO), 2009, pp. 2693–2696.
Complicated Pareto Sets, MOEA/D and NSGA-II,” IEEE Transactions [40] A. López Jaimes, C. Coello, and J. Urı́as Barrientos, “Online Objective
on Evolutionary Computation, vol. 13, no. 2, pp. 284–302, April 2009. Reduction to Deal with Many-Objective Problems,” in Evolutionary
[19] H. Ishibuchi, N. Tsukamoto, and Y. Nojima, “Evolutionary Many- Multi-Criterion Optimization, ser. Lecture Notes in Computer Science,
Objective Optimization: A Short Review,” in IEEE Congress on Evolu- M. Ehrgott, C. Fonseca, X. Gandibleux, J.-K. Hao, and M. Sevaux, Eds.
tionary Computation, June 2008, pp. 2424–2431. Springer Berlin / Heidelberg, 2009, vol. 5467, pp. 423–437.
[20] D. Brockhoff and E. Zitzler, “Are All Objectives Necessary? On
Dimensionality Reduction in Evolutionary Multiobjective Optimization,”
in Parallel Problem Solving from Nature - PPSN IX, ser. Lecture Notes
in Computer Science, T. Runarsson, H.-G. Beyer, E. Burke, J. Merelo-
Guervós, L. Whitley, and X. Yao, Eds. Springer Berlin / Heidelberg,
2006, vol. 4193, pp. 533–542.
SAXENA et al.: OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION: LINEAR AND NONLINEAR ALGORITHMS 23

Dhish Kumar Saxena Biography text here.

PLACE
PHOTO
HERE

João A. Duro received the Licentiate degree in engi-


neering of systems and informatics from University
of Algarve, Faro, Portugal, in 2007. He is currently
working towards the Ph.D. degree in computer sci-
ence at the University of Cranfield, Cranfield, U.K.
From 2008 till 2009, he worked as a research as-
sistant in Sheffield Hallam University, U.K. His main
research areas are evolutionary computation, multi-
objective optimization, multiple criteria decision-
making, and machine learning.

Ashutosh Tiwari Biography text here.

PLACE
PHOTO
HERE

Kalyanmoy Deb Biography text here.

PLACE
PHOTO
HERE

Qingfu Zhang Biography text here.

PLACE
PHOTO
HERE

You might also like