Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2018.2869462, IEEE
Transactions on Automatic Control
1
0018-9286 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2018.2869462, IEEE
Transactions on Automatic Control
2
formulated and analyzed for the specific case of linear systems. where x n and u m are the state vector and the control
The paper is organized as follows. Basic definitions for input of the system, respectively, and f is a continuously
multiobjective optimization and for the multiobjective optimal
differentiable function.
control problem are described in Section 2. Section 3 shows the
In the multiobjective optimization problem, the performance
basic transformations employed to obtain an iterative
of system (2) is evaluated with respect to M different
suboptimal control sequence. In Section 4, a policy iteration
performance indices
algorithm to solve the multiobjective optimization problem is
designed, with considerations to allow its implementation in J j ( x(0), u ) = 0 L j ( x(0), u )dt , (3)
practical applications. Section 5 studies the linear systems case. j = 1, , M , where each L j is a continuously differentiable
Finally, Section 6 concludes with a numerical example.
function. The feedback control function u ( x) is said to be
II. BASIC DEFINITIONS admissible if it is continuous, stabilizes the dynamics (2) and
In this section, various definitions to develop multiobjective makes J j ( x, u ( x)) finite for all j = 1, , M . The class of
optimization algorithms for dynamical systems are reviewed. functions satisfying these properties is denoted as U 0 . Define
the vector J as J = J1 , , J M . It is our interest to find a
T
A. Pareto optimality
Multiobjective optimization deals with the problem of function u ( x) U 0 such that vector J is minimized in the
minimizing two or more objective functions simultaneously Pareto sense.
[10]. In mathematical terms, this problem is expressed as
For a fixed control policy u ( x) , define the value functions
min V ( x) (1)
x X
V j ( x(t )) = t L j ( x( ), u )d , j = 1, , M. (4)
where x n
is selected inside a feasible set X and
Let V = V1 , ,VM . A differential equivalent to the value
T
V : n → M is a vector function with M elements,
V ( x) = V1 ( x), , VM ( x) , with Vi ( x) , i = 1, function (4) is given by the Bellman equations
T
, M , the
functions to be minimized. In the general case, there does not 0 = L j ( x, u ) + V jT f ( x, u ) H j ( x, V j , u ) (5)
exist a solution x that achieves the minimization of all where V j is the gradient of V j , and H j ( x, V j , u) is the
functions Vi ( x) simultaneously, and the concepts of Pareto j th Hamiltonian function of the system. Note that the orbital
domination and Pareto optimality must be introduced. derivative of V j ( x) is given by
Definition 1. A vector W M is said to Pareto dominate V j ( x) = V j x = − L j ( x, u ). (6)
vector V M if W j V j for all j = 1, , M , and W j V j for Define also
at least one j , where V j and W j are the j -th entries of V * ( x(0)) inf J ( x(0), u ) (7)
uU 0
vectors V and W , respectively.
The following definition states a specific notation that we use where, in general, V * is not unique, and V * J with J the
throughout this paper. Pareto front of vector J .
Definition 2. Notation W V for vectors W M and Define the Pareto optimal vector H * as
V M , indicate that W is not Pareto dominated by V , i.e., H * ( x, V ) = min0 H ( x, V , u) (8)
uU
either V = W or there is at least one entry j such that
were H = H1 , , H M and V = V1 , , VM . H * is
T T
0018-9286 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2018.2869462, IEEE
Transactions on Automatic Control
3
III. MULTIOBJECTIVE SUBOPTIMAL CONTROL SEQUENCES For this entry j , let V j* solve the Bellman equation for u * and
This section defines and analyzes transformations to design Vj solve the Bellman equation for u . Then,
suboptimal control policies in an iterative manner. This is an
extension for multiobjective optimization of the results in [3]. H j ( x, V , u ) H j ( x, V , u ) = H j ( x, V j , u ) = 0 . Now, by
*
j
*
j
*
0018-9286 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2018.2869462, IEEE
Transactions on Automatic Control
4
this section to design a suboptimal control sequence. developed. This section presents the integral reinforcement
Theorem 3. Let V 0 and V = T (V ) , and let Assumption learning in multiobjective optimization form.
Notice that the jth value function (4) can be expressed as
1 hold. Then, H * ( x, V ) 0 implies V * V V , with V * t +T
Pareto optimal. V j ( x(t )) = t L j ( x( ), u )d +V j ( x(t + T )) (10)
Proof. Take u = u* ( x, V ) . By Assumption 1 and Lemma 4, for any time interval T >0. Given the functions Vj(x) and Lj(x,u),
H ( x, V ) 0 for every j = 1,
*
, M . Then, we can express equation (10) does not require knowledge about the system
j
dynamics (2). Lemma 6 shows that the solution Vj (x) of (10) is
V j = H ( x, V ) − L j ( x, u ) − L j ( x, u ) = J j .
*
j the value function (4) that solves equation (5) .
As V j = T (V ) = J j implies V j = J j , then V j V j . Lemma 6. Assume the control policy u(x) stabilizes the
system dynamics (2). Then, the solution Vj (x) of equation (10)
Integrating the inequality we get V j V j for all entries j . is equivalent to the solution of the Bellman equation (5).
In the single objective optimization problem, it is clear that Proof. If equation (5) holds for Vj, then
an iterative repetition of the operation in Theorem 3 leads the V j = V jT f ( x, u ) = − L j ( x, u ) . Integrating both sides of the
function vector V to the unique optimal value function V * . In
equation, we get
the multiobjective optimization case, Assumption 1 is required t +T t +T
to prevent leaping among different Pareto optima at each t L j ( x, u )d = − t V j ( x( ))d = −V j ( x(t + T )) + V j ( x(t ))
iteration, as proven in Lemma 5 and Theorem 4. which is the same equation as (10).
Lemma 5. Let V * be Pareto optimal and let Assumption 1 The following algorithm presents the multiobjective optimal
hold. If W * is any other Pareto optimal value function such that controller by reinforcement learning. The policy evaluation
V * W * , then W * T (V * ) . step consists of solving Equation (10). This corresponds to the
Proof. Assume W=T(V*). If Assumption 1 holds, by Lemma transformation T2 in Definition 4. The policy improvement step
is based on Equation (8), and corresponds to the transformation
4 we have H*j(x,∇V) ≤ 0 for all entries j. By Theorem 3, we
T1. Convergence of Algorithm 1 is proven in Theorem 6.
have Wj ≤ V*j for all j. As W*j >V*j for some j, for any other Pareto
Algorithm 1. Integral multiobjective policy iteration.
optimal vector W* , then W* cannot be reached.
1. Select an admissible control policy u0.
Theorem 4. If a Pareto optimal solution V*∈ 0 exists, then 2. Solve for Vk from the set of equations
V*=T(V*). Conversely, V=T(V) implies V=V*. t +T
V jk ( x(t )) = t L j ( x( ), u )d + V jk ( x(t + T )) . (11)
Proof. Consider two Pareto optimal vectors V* and W*. By
Theorem 3, if V = T (V * ) , then W * V V * ; by Lemma 5 and 3. Update the control policy as
u k +1 = arg min H ( x, V k , u ), (12)
definition of Pareto optimality, V V implies V = V * . u
0018-9286 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2018.2869462, IEEE
Transactions on Automatic Control
5
( x T Q j x + u T R j u ) d
t +T
xT (t ) Pj x(t ) = t means xTS*j x ≤ xTSj x for Sj in (21) using any matrix K. Now we
(16) can write xT(Sj−S*j )x ≥ 0 and, therefore, Sj−S*j is a positive
+ xT (t + T ) Pj x(t + T )
semidefinite matrix. Note that for the matrices (21) and (22),
Solving this equation becomes an easier task if we employ the we have Sj−S*j = Sj’− Sj’*. As all the eigenvalues of Sj’− Sj’* are
Kronecker product to express the term xTPjx as xTPjx = nonnegative, and the trace of a matrix is equal to the sum of its
vec(Pj)T(x x), where vec(Pj) is the column vector obtained by eigenvalues, then tr(Sj’− Sj’*) ≥ 0, which implies tr(Sj’) ≥ tr(Sj’*).
stacking the columns of Pj. Moreover, as matrix Pj is We conclude that matrix K* generates the matrix Sj’* with
symmetric and the expression x x includes all possible minimal sum of its eigenvalues.
products of the entries of x, each of the vectors vec(Pj) and x x By Lemma 7, minimization of the Hamiltonian vector H can
include repeated terms. Represent these vectors after removing be achieved by finding the gain matrix K* such that, for given
all the redundant terms as p j and x , respectively, which matrices Pj, j=1,…,M, we have
consist of n(n+1)/2 components. Now, we can write n
i ( K T R1 K − P1 BK − K T BT P1 )
xT Pj x = pTj x (17) i =1
K * = arg min n (23)
Using the expression (17), we rewrite equation (16) as
i ( K RM K − PM BK − K B PM )
K T T T
( xT Q j x + uT R j u ) d
t +T
pTj ( x (t ) − x (t + T )) = t (18) i =1
Remark 3. Problem (23) is expressed without knowledge of
and the goal is to find the values of p j that satisfy (18) given
matrix A of the system dynamics (13).
the measurements x(t) and x(t+T), and the employed control Algorithm 2 expresses the policy iteration procedure
input u. This objective can be achieved using recursive least presented in Algorithm 1, modified for the linear systems case.
squares after collecting several samples of equation (18) [5]. Algorithm 2.
The Hamiltonian functions for this system are 1. Select an admissible control policy u0 = K0x.
H j = xT Q j x + uT R j u + 2 xT Pj ( Ax + Bu ) . (19) 2. Solve the set of equations (11) for Vk.
The optimal control policy u* for system (13) is the input 3. Solve the multiobjective optimization problem (23)
u=−Kx that makes the vector H=[H1,…,HM ]T Pareto optimal. and update the control policies as uk+1 = Kk+1x.
Several methods can be used to determine u*. Here, we Go to step 2. On convergence, stop.
propose a general procedure that allows this problem to be
solved by any multiobjective optimization software package. VI. SIMULATION RESULTS
Substitute the policy u=−Kx in each of the Hamiltonian Algorithm 2 is now employed to achieve stabilization of the
functions (19), to obtain linearized double inverted pendulum in a cart [19], [20],
H j = xT Q j x + xT K T R j Kx represented by the dynamic equations (13), where
(20)
+ xT Pj ( A − BK ) x + xT ( A − BK ) Pj x
T
0018-9286 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2018.2869462, IEEE
Transactions on Automatic Control
6
0 0 0 1 0 0 0
0 0 0 0 1 0 0 0.4
Position
0 0 0 0 0 1 0 0.3 Angle 1
A= , B= , Angle 2
0 0 0 0 0 0 1 0.2
0 86.69 −21.61 0 0 0 6.64
0 −40.31 39.45 0 0 0 0.08
0.1
State trajectories
state x1 is the position of the cart, x2 and x3 are the angles of both 0
200 0 0 0 0 0 1 0 0 0 0 0 -0.4
0 200 0 0 0 0 0 1 −1 0 0 0 0 1 2 3
Time
4 5 6
0018-9286 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.