You are on page 1of 9

Inverse Problems 14 (1998) 893–901.

Printed in the UK PII: S0266-5611(98)90804-8

An algorithm for quadratic optimization with one


quadratic constraint and bounds on the variables

G C Fehmers†, L P J Kamp and F W Sluijter


Eindhoven University of Technology, Department of Applied Physics, PO Box 513, NL-5600
MB Eindhoven, The Netherlands

Received 19 January 1998

Abstract. This paper presents an efficient algorithm to solve a constrained optimization


problem with a quadratic object function, one quadratic constraint and (positivity) bounds on the
variables. Against little computational cost, the algorithm allows for the inclusion of positivity
of the solution as prior knowledge. This is very useful for the solution of those (linear) inverse
problems where negative solutions are unphysical.
The algorithm rewrites the solution as a function of the Lagrange multipliers, which is
achieved with the help of the generalized eigenvectors, or equivalently, the generalized singular
value decomposition. The next step is to find the Lagrange multipliers. The multiplier
corresponding to the quadratic constraint, which is known to be active, is easy to find. The
Lagrange multipliers corresponding to the positivity constraints are found with an iterative
method that can be likened to the active set methods from quadratic programming.

1. Introduction

More often than not, optimization is used to find a solution to an inverse problem. Take as an
example the archetypal linear and discrete (or linearized and discretized) inverse problem:
Ax = d. Here, x is an n-vector with unknown model parameters and the m-vector d
contains the measured data. The (known) (m × n)-matrix A is an operator that maps the
model to the experimental data. The standard strategy to solve this problem is the following
optimization (generally attributed to Tikhonov 1963):
min kAx − dk22 + λO(x). (1)
x∈Rn

The first term in the object function is the square of the misfit and the second is the
regularization term; λ, λ > 0, is the regularization parameter. Regularization is needed in
the case in which the problem is ill-posed, which means that the minimizer of the misfit
is not unique or very sensitive to small variations in the data d. The regularizer O(x)
contains a priori information on the model x. A popular choice is the quadratic function
O(x) = kDxk22 , where D is a discrete approximation to the derivative operator; a small
O(x) thus guarantees smoothness. The definition of O(x) is subjective, and so is the choice
of λ. A popular practice is to vary λ until the solution satisfies some subjective criteria.
The indefiniteness of the regularization parameter λ can be removed by another piece of
prior information: a constraint to the misfit (Phillips 1962) or a constraint to the regularizer
† Now at Shell Research, PO Box 60, NL-2280 AB Rijswijk, The Netherlands. E-mail address:
g.c.fehmers@siep.shell.com

0266-5611/98/040893+09$19.50
c 1998 IOP Publishing Ltd 893
894 G C Fehmers et al

(Ivanov 1962). Both strategies lead to a constrained optimization problem, one being the
dual of the other. A constraint to the misfit leads to the so-called discrepancy or error
principle:

min O(x), S = {x ∈ Rn |kAx − dk22 6 E 2 }. (2)


x∈S

Here, E is based on estimates of the magnitudes of measurement, linearization and


discretization errors. Originally an estimate of the upper bound to the misfit was used.
Because the inequality constraint in (2) is almost always satisfied as an equality, this leads
to overregularization. It is therefore better to set E equal to the expectation of the misfit.
For the same reason, it is irrelevant if the constraint in (2) is an equality or an inequality
constraint. In the case in which the errors have a normal distribution and the data have
been properly weighted, E 2 has a χ-square distribution, with expectation m.
The principles sketched above are very well established. An early review paper is by
Turchin et al (1971) and a more recent one is by Bertero et al (1988). Numerical methods
to solve the (constrained) optimization problems are equally well established (e.g. Hansen
1992, Oldenburg 1994). As both the object function and the constraint are quadratic, the
derivatives are linear, which means that efficient methods from linear algebra can be used.
One such method is based on the generalized singular value decomposition (GSVD) of A
and D. Strongly related is a method based on the generalized eigenproblem of At A and
D t D. An advantage of the former method is that At A and D t D need not be computed.
The latter method may become preferable when memory restrictions become an issue.
In many inverse problems, such as tomography, negative solutions are unphysical. This
gives another piece of a priori information, which could translate into additional constraints
in the optimization problem: xi > 0.
The addition of these linear inequality constraints to equation (1) yields a quadratic
programming (QP) problem, which can be solved efficiently by an active set method
(e.g. Fletcher 1993). This approach, however, leaves us with the indefiniteness of the
regularization parameter. Again, an additional quadratic constraint will remove this
indeterminacy. This leads to a convex optimization problem with a quadratic cost function,
one quadratic constraint and a number of linear constraints. This problem can, of course,
be solved with general purpose (iterative) algorithms for convex constrained optimization,
but these fail to make use of the quadratic/linear structure and are therefore not as efficient
as they might be. Examples of this approach are substitution by squares and maximum
entropy. Substitution by squares works as follows: substitute xi by pi2 , solve for pi and
the constrained solution is given by pi2 . Unfortunately, the object function is now of
the fourth degree. Maximum entropy offers another possibility: the mere definition of
entropy guarantees positivity of the solution (Gull and Daniell 1978). The object function
is a transcendental function, which is not quadratic either. Therefore the computations are
expensive.
This paper presents an efficient method to solve an optimization problem with a quadratic
object function, one quadratic constraint and a number of bounds on the variables. Its
position in the context of optimization algorithms is as follows. The method can be seen
as a combination of two methods from the family of least squares algorithms: QP and the
GSVD. QP (see again Fletcher 1993) finds the solution to an optimization problem with a
quadratic object function and a set of linear constraints on the variables. The GSVD (Golub
and Van Loan 1989) gives the solution to a problem with a quadratic object function and
one quadratic constraint. From QP the proposed method borrows the active set approach.
Hence, the method gives the speed and the insight that comes with linear algebra.
Constrained quadratic optimization 895

In its domain of application, the proposed method is an alternative to sequential quadratic


programming (SQP). SQP algorithms are multipurpose methods for constrained optimization
that iterate in solution space (Fletcher 1993, Gill et al 1984). As the method from this paper
exploits the specific structure of the problem, it outperforms the SQP algorithms (see final
section). From another point of view, one could argue that the proposed method is a
specialized implementation of SQP.
The power of the algorithm resides in the fact that the solution is written as a function
of the Lagrange multipliers. This reformulation is based on the solution of a generalized
eigenproblem and it constitutes the main part of the work in terms of computational effort.
The next and relatively quick step, is to solve the Lagrange multipliers from the set of
constraints. Here, the important thing is to find which constraints are active and which
are not. Because the number of active constraints is generally smaller than the number of
unknowns in the solution vector, this approach reduces computing time considerably.

2. Formulation of the problem

The inverse problem has been reduced to the following constrained optimization problem:

min O(x), S = {x ∈ Rn |kAx − dk2 6 E ∧ x > 0}.


x∈S

The feasible set S is defined by n bounds on the variables to ensure positivity (x > 0),
plus one quadratic inequality constraint (kAx − dk2 6 E), which is provided by the
experiment. The solution can be interpreted as follows: of all possible states x that are
positive everywhere and that satisfy the experiment to within the measurement error, it is
the one that is most likely in the a priori sense.
Next we make an important assumption, namely that the experiment is specific enough
to add to the prior knowledge. This means that a point most likely in the a priori sense,
i.e. where O(x) is minimal, should not satisfy the experiment to within the measurement
error:

x0 solves minn O(x) −→ kAx0 − dk2 > E. (3)


x∈R

If this were not true, the experiment would not contribute to the solution. Instead, the
solution would be completely defined by the a priori information, a pathological situation.
From (3), it follows that the minimum of O(x) does not satisfy kAx − dk2 6 E. This
implies that this constraint is active. (It also means that it is irrelevant whether an inequality
or an equality constraint is imposed, as mentioned in the introduction.)
To solve the optimization problem, this paper uses the method of the Lagrange
multipliers, which introduces a Lagrange multiplier for every constraint and produces a
set of conditions (the Kuhn–Tucker (KT) conditions) that the solution must satisfy. These
KT conditions are a system of equations and inequalities; a solution to this system is a
solution to the constrained optimization problem. In general, the complexity of a constrained
optimization problem increases with the number of constraints. To be more precise, the
complexity increases with the number of active constraints, because inactive constraints
can be ignored. It is clear that knowledge about which constraint is active and which
is not simplifies the constrained optimization. Above, we have seen that the constraint
kAx − dk2 6 E is active.
We rewrite the constrained optimization problem into the following form. Consider
two continuous quadratic functions, O(x) and g(x), the object function and the constraint
896 G C Fehmers et al

function respectively:
O(x) : Rn → R, O(x) = xt Bx + 2bt x,
g(x) : R → R,
n
g(x) = xt Cx + 2ct x + e.
Here, e is a scalar, x, b, c are real n-vectors and B and C are real, symmetric and positive
semidefinite (n × n) matrices; t indicates the transpose. In our case: B = D t D, b = 0,
C = At A, c = −dt , e = dt d − E 2 . We define the constrained optimization problem and
the feasible set S by
min O(x), S = {x ∈ Rn |g(x) 6 0 ∧ x > 0}. (4)
x∈S
The Hessian matrices of the functions O and g are 2B and 2C respectively. Because
these matrices are positive semidefinite, the functions f and g are convex. As g is a
convex function, the set {x ∈ Rn |g(x) 6 0} is convex, as is the set {x ∈ Rn |x > 0}. This
guarantees that their intersection, S, is a convex set as well. S is also a closed set.
We will find a solution to the constrained optimization problem (4) if: (1) the feasible
set S is not empty and (2) the kernels of B and C only have the nilvector in common. The
second condition is written as
ker(B) ∩ ker(C) = {0}. (5)
This condition is quite logical. It merely states that the subspace of solution space that goes
unnoticed in the experiment, ker(C), does not overlap with the subspace towards which the
prior information is indifferent, ker(B). In other words, there shall be no vectors that are
both invisible to the experiment and to the a priori information. Similarly, the solution is
stable if there exists no vector z for which both kBzk and kCzk are very small.

3. The Kuhn–Tucker conditions

The KT conditions to the constrained optimization problem (4) are given by


Xn
∇O(x) + λ∇g(x) + li ∇(−2xi ) = 0, (6)
i=1
where λg(x) = 0, (7)
λ > 0,
g(x) 6 0, (8)
and for all i = 1, . . . , n, li xi = 0,
li > 0, (9)
xi > 0.
Here, the set of Lagrange multipliers (λ, li ) is introduced, one multiplier for every constraint:
λ for g(x) 6 0 and li for xi > 0. We are going to find a point that meets the KT conditions.
This point solves the constrained optimization problem. Let l be the n-vector with elements
li . The first KT condition, equation (6), gives
[B + λC]x = l − b − λc.
Because B and C are both positive semidefinite, the inverse of the matrix [B + λC] exists
if λ > 0 and if condition (5) is satisfied. When we find a general expression for the inverse,
we can rewrite the solution as a function of the Lagrange multipliers:
x = x(λ, l) = [B + λC]−1 (l − b − λc). (10)
Constrained quadratic optimization 897

4. The matrix pencil

In the literature, [B + λC] is referred to as a matrix pencil (e.g. Gantmacher 1959, Parlett
1980). Our pencil is symmetric and, by condition (5), regular. For regular pencils, it is
possible to derive a closed expression for the inverse [B + λC]−1 . One method is based on
the generalized eigenproblem, as we now show. Another method is based on the GSVD,
to which we will return at the end of the section.
Because B and C are symmetric and positive semidefinite and because of (5), matrix
B + C is symmetric and positive definite. Therefore matrix [B + C]−1/2 exists and is
symmetric and positive definite. We write
[B + λC]−1 = [B + C]−1/2 [In + (λ − 1)[B + C]−1/2 C[B + C]−1/2 ]−1 [B + C]−1/2 ,
where In is the identity matrix of order n. We are going to diagonalize the matrix between
brackets:
In + (λ − 1)[B + C]−1/2 C[B + C]−1/2 . (11)
To this end, we consider the eigenproblem
[B + C]−1/2 C[B + C]−1/2 |ui i = µi |ui i, (12)
where we use Dirac notation. As the matrix in (12) is symmetric and positive semidefinite,
the eigenvalues satisfy
µi > 0, i = 1, . . . , n. (13)
As the matrix in (12) is symmetric, the spectral theorem states that |ui i is a complete set
of normalized orthogonal eigenvectors:
X n
hui |uj i = δij and |ui ihui | = In .
i=1
Let
|yi i = [B + C]−1/2 |ui i,
then (12) becomes the generalized eigenproblem
C|yi i = µi [B + C]|yi i. (14)
The final solution will be given in terms of the generalized eigenvectors |yi i and eigenvalues
µi . Therefore, the solution of the generalized eigenproblem forms the heart of the algorithm.
From equation (14) it is not difficult to see that there are no eigenvalues µi larger than 1,
this gives with (13)
0 6 µi 6 1 i = 1, . . . , n
and
µi = 0 ←→ |yi i ∈ ker(C),
µi = 1 ←→ |yi i ∈ ker(B).
We also know that
hyi |C|yj i = µi hyi |B + C|yj i = µi hui |uj i = µi δij . (15)
The normalized orthogonal eigenvectors and the eigenvalues of the matrix in (11), are |ui i
and 1 + (λ − 1)µi respectively. Therefore
Xn
|ui ihui |
[In + (λ − 1)[B + C]−1/2 C[B + C]−1/2 ]−1 = (16)
i=1
1 + (λ − 1)µi
898 G C Fehmers et al

and after left and right multiplication of (16) with [B + C]−1/2


X n
|yi ihyi |
[B + λC]−1 = . (17)
i=1
1 + (λ − 1)µi
Substitution of (17) into (10) gives the solution
Xn
hyi |l − bi − λhyi |ci
|xi = |yi i. (18)
i=1
1 + (λ − 1)µi
A similar result can be obtained with the GSVD (Golub and Van Loan 1989), which
provides a linear and nonsingular transformation X, such that Xt BX = 6B and X t CX = 6C
are both diagonal matrices. This gives for the inverse of the pencil [B + λC]−1 =
X[6B + λ6C ]−1 Xt , which is easy to compute because the matrix between the brackets
is diagonal. It is easy to show that the columns of matrix X are the same as the generalized
eigenvectors |yi i, which illustrates the similarity between the generalized eigenproblem and
the GSVD. Either method can be used for the remainder of this paper.

5. The Lagrange multipliers

The effort of the preceding section has yielded a closed expression (18) for the solution as a
function of the Lagrange multipliers λ and l. Now we need to know the Lagrange multipliers.
This section shows how to compute the Lagrange multipliers from the constraints. We will
proceed in two steps. In the first step, we only consider the quadratic constraint g(x) 6 0.
In the second step, the positivity bounds x > 0 are also included. The inclusion of positivity
is really an iterative process that involves both steps. The solution is found in the space of
the Lagrange multipliers. Because there are relatively few active constraints, this process
is much faster than finding the solution directly in solution space.

5.1. The quadratic constraint


As we only consider the constraint g(x) 6 0, we have dropped the positivity constraints,
which implies that we do not consider l. This leaves only λ to be determined.
The prior information says that the constraint is active, which implies λ > 0 and
g(x) = 0. In other words, the KT conditions (7) through (8) reduce to
λ > 0,
g(x) = 0.
Substitution of the solution (18) into g(x) = 0, and using (15), gives an equation to which
λ is the root:
Xn
[λhc|yi i + hb − l|yi i][(2µi − 2 − λµi )hc|yi i + µi hb − l|yi i]
0 = g(x(λ)) = e + . (19)
i=1
(1 + (λ − 1)µi )2
The function g decreases monotonically for λ > 0, because
∂g Xn
[(1 − µi )hc|yi i − µi hb − l|yi i]2
= −2 6 0,
∂λ i=1
(1 + (λ − 1)µi )3
if λ > 0 and 0 6 µi 6 1.
As a result, equation (19) has a root λ, λ > 0, if
lim g(x(λ)) > 0 > lim g(x(λ)). (20)
λ↓0 λ→∞
Constrained quadratic optimization 899

As g is a decreasing function of λ, it is straightforward to solve λ numerically from (19).


It is instructive to consider the solution (18) in the limit λ ↓ 0 and in the limit λ → ∞. It
is not hard to prove that these solutions, if they exist, are the minimizers of O and g:
lim O(x(λ)) = minn O(x), (21)
λ↓0 x∈R
lim g(x(λ)) = minn g(x). (22)
λ→∞ x∈R

As we see, limλ↓0 x(λ) corresponds to an unconstrained minimum of O and limλ→∞ x(λ)


is determined solely by the constraint and corresponds to the minimum misfit solution.
Expression (18) fails if there is no root λ, λ > 0, to equation (19). This happens when
either of the two inequalities in (20) is violated. When the left-hand inequality is violated,
the solution is an interior point of S and (3) is not satisfied. The right-hand inequality is
violated when the feasible set S is empty. In other words:
lim g(x(λ)) < 0 ←→ lim x(λ) is interior point of S and minimizer of O, (23)
λ↓0 λ↓0
lim g(x(λ)) > 0 ←→ S = ∅. (24)
λ→∞

Statement (23) follows from (21) and (24) follows from (22).

5.2. The positivity constraints


We now have a solution where the positivity constraints are not included. Some of these
positivity constraints are probably violated. In this section, we describe the adaptation of
the solution so that these constraints are no longer violated. The positivity constraints are
directly coupled to an element of the solution vector and vice versa. It is important to
know which positivity constraints are active. The indices of the active constraints constitute
the active set A. The inactive set I contains the indices of the inactive constraints and
A ∪ I = {1, 2, . . . , n}. Let A contain p elements.
Which positivity constraints are active? Surely, an active constraint should violate the
positivity constraint, i.e. the corresponding element of the solution x should be negative. As
vector x is a discretization of a distribution (e.g. a tomographic reconstruction), we guess
that the active constraints correspond to the negative local minima in the distribution x. By
the imposed smoothness of the solution (via O(x)), the other (less) negative parts in the
solution are likely to follow their neighbours, so that these constraints are satisfied as well.
The active constraints are satisfied by the introduction of the vector 1x,
xnew = xold + 1x
of which the ‘active’ elements satisfy:
1xj = −xj , j ∈ A, (25)
where the xj are the negative local minima of the solution. We do not bother about the
‘inactive’ elements (j ∈ I) of 1x.
We are going to change the Lagrange multipliers in the active set by so much that (25)
is satisfied. The other Lagrange multipliers must remain zero, because they are not in the
active set. In other words, we want to update the vector of constraints by a vector 1l,
lnew = lold + 1l
of which the ‘inactive’ elements satisfy:
1lj = 0, j ∈ I. (26)
900 G C Fehmers et al

Because equation (10) is linear in l, it follows that


1x = [B + λC]−1 1l. (27)
This is a linear system of n equations with n unknowns: p unknowns 1lj , j ∈ A, plus
(n−p) unknowns 1xj , j ∈ I. (The other 1lj and 1xj are given by (25) and (26).) Because
we are not interested in the unknowns 1xj , j ∈ I, we remove them from the system (27).
This gives
X n
hyi |1li
1xj = |yi ij , j ∈ A, (28)
i=1
1 + (λ − 1)µi
where we have used equation (18). This is a linear system of p equations, with p unknowns
(1lj , j ∈ A). The solution of this system gives an update of the Lagrange multipliers. The
new set of Lagrange multipliers must be checked for positivity, condition (9). Those that fail
this test must be removed from the active set A. With the new set of Lagrange multipliers,
we return to section 5.1, where the solution to equation (19) yields a new λ and equation (18)
a new solution. This iterative process is repeated until all constraints are satisfied.
The iterative procedure aims to find the correct set of active constraints. With these, the
KT conditions will be satisfied, and consequently, the constrained optimization problem is
solved exactly. We have not proved that the procedure converges, but in practice we have
encountered no problems in this respect.

6. The algorithm in practice

The recipe for the algorithm is the following.


(1) Solve the generalized eigenproblem (14) (this is an n3 process).
(2) Calculate the inner products hb|yi i and hc|yi i (n2 processes).
(3) Solve the Lagrange multiplier λ from (19) (n1 process).
(4) Use the results in (18) to calculate the solution (n2 process).
(5) Check if solution satisfies the positivity constraints (n1 process). If so: ready. If
not: continue.
(6) Add negative local minima to the active set (n1 process).
(7) Solve 1l from (28) and update l (p 3 process).
(8) Check if the positivity constraints satisfy lj < 0 (p 1 process). If not, remove the
violators from the active set and go to step (7).
(9) Calculate the inner products hl|yi i (n × p process). Go to step (3).
For the stability of the algorithm, it is important that the matrices B and C are scaled
so that their eigenvalues are of comparable size. If either B or C is positive definite, the
algorithm can be simplified and scaling is not necessary.
The algorithm has been successfully applied to two different limited angle tomography
experiments: tomography of the ionosphere (Fehmers et al 1998) and of tokamak plasma
emissivity (Ingesson et al 1998). The results were compared with the solutions of the SQP
algorithm from the NAG library (Numerical Algorithms Group, routine E04UCF). Indeed,
the solutions were the same, but this algorithm is slower than ours. Moreover, it fails for
large problems (n = 2000), where our algorithm works smoothly.
The heart and computationally most demanding part of the algorithm is the solution
of the generalized eigenproblem. Routine F02AEF from the NAG library does this job
very well. This routine takes advantage of the symmetry and also performs the correct
normalization. In the ionospheric tomography application, where n = 2000, the routine takes
roughly 25 cpu minutes on a Silicon Graphics Power Challenge with R8000 processors. The
Constrained quadratic optimization 901

rest of the algorithm takes only a few minutes, because the number of active constraints p is
much smaller than n. Typically, 100 < p < 200. When a series of problems from the same
experiment must be solved, such as in a tomographic imaging series, the computationally
expensive part needs to be done only once.

Acknowledgment

The work presented here was financially supported by the Netherlands Foundation for
Scientific Research (NWO—Nederlandse Organisatie voor Wetenschappelijk Onderzoek).

References

Bertero M, De Mol C and Pike E R 1988 Linear inverse problems with discrete data: II. Stability and regularization
Inverse Problems 4 573–94
Fehmers G C, Kamp L P J and Sluijter F W 1998 A model-independent algorithm for ionospheric tomography: I.
Theory and tests Radio Sci. 33 149–63
Fletcher R 1993 Practical Methods of Optimization 2nd edn (Chichester: Wiley)
Gantmacher F R 1959 The Theory of Matrices (New York: Chelsea)
Gill P E, Murray W and Wright M H 1984 Practical Optimization 4th edn (London: Academic)
Golub G H and Van Loan C F 1989 Matrix Computations 2nd edn (London: Johns Hopkins University Press)
Gull S F and Daniell G J 1978 Image reconstruction from incomplete and noisy data Nature 272 686–90
Hansen P C 1992 Numerical tools for analysis and solution of Fredholm integral equations of the first kind Inverse
Problems 8 849–72
Ingesson L C, Alper B, Chen H, Edwards A W, Fehmers G C, Fuchs J C, Giannella R, Gill R D, Lauro-Taroni
L and Romanelli M 1998 Soft x-ray tomography during ELMs and impurity injection in JET Nucl. Fusion
submitted
Ivanov V K 1962 On linear problems which are not well posed Sov. Math. Dokl. 3 981–3
Oldenburg D W 1994 Practical strategies for the solution of large-scale electromagnetic inverse problems Radio
Sci. 29 1081–99
Parlett B N 1980 The Symmetric Eigenvalue Problem (Englewood Cliffs, NJ: Prentice-Hall)
Phillips D L 1962 A technique for the numerical solution of certain integral equations of the first kind J. Assoc.
Comput. Mach. 9 84–97
Tikhonov A N 1963 Solution of incorrectly formulated problems and the regularization method Sov. Math. Dokl.
4 1035–8
Turchin V F, Kozlov V P and Malkevich M S 1971 The use of mathematical-statistics methods in the solution of
incorrectly posed problems Sov. Phys.–Usp. 13 681–703

You might also like