1608 06363 PDF

1
A Complete Solution to Optimal Control and

Stabilization for Mean-field Systems: Part I,
Discrete-time Case
Huanshui Zhang∗ and Qingyuan Qi
arXiv:1608.06363v1 [math.OC] 23 Aug 2016
Abstract—Different from most of the previous works, this Reference [20] dealt with the continuous-time finite horizon
paper provides a thorough solution to the fundamental problems mean-field LQ control problem, a sufficient and necessary
of linear-quadratic (LQ) control and stabilization for discrete- solvability condition of the problem was presented in terms of
time mean-field systems under basic assumptions. Firstly, the
sufficient and necessary condition for the solvability of mean-field operator criteria. By using decoupling technique, the optimal
LQ control problem is firstly presented in analytic expression controller was designed via two Riccati equations. Further-
based on the maximum principle developed in this paper, which more, the continuous-time mean-field LQ control and stabi-
is compared with the results obtained in literatures where only lization problem for infinite horizon was investigated in [13],
operator type solvability conditions were given. The optimal the equivalence of several notions of stability for mean-field
controller is given in terms of a coupled Riccati equation
which is derived from the solution to forward and backward system was established. It was shown that the optimal mean-
stochastic difference equation (FBSDE). Secondly, the sufficient field LQ controller for infinite horizon case can be presented
and necessary stabilization conditions are explored. It is shown via AREs.
that, under exactly observability assumption, the mean-field For discrete-time mean-field LQ control problem, [9] and
system is stabilizable in mean square sense if and only if a coupled [17] studied the finite horizon case and infinite horizon case
algebraic Riccati equation (ARE) has a unique solution P and
P̄ satisfying P > 0 and P + P̄ > 0. Furthermore, under the respectively. In [9], a necessary and sufficient solvability con-
exactly detectability assumption, which is a weaker assumption dition for finite horizon discrete-time mean-field LQ control
than exactly observability, we show that the mean-field system is problem was presented in operator type. Furthermore, under
stabilizable in mean square sense if and only if the coupled ARE stronger conditions, the explicit optimal controller was derived
has a unique solution P and P̄ satisfying P ≥ 0 and P + P̄ ≥ 0. using matrix dynamical optimization method, which is in fact
The key techniques adopted in this paper are the maximum
principle and the solution to the FBSDE obtained in this paper. a sufficient solvability solution to the discrete-time mean-field
The derived results in this paper forms the basis to solve the LQ control problem [9]. Besides, for the infinite time case,
mean-field control problem for continuous-time systems [18] and the equivalence of L2 open-loop stabilizability and L2 closed-
other related problems. loop stabilizability was studied. Also the stabilizing condition
Index Terms—Mean-field LQ control, maximum principle, was investigate in [17].
Riccati equation, optimal controller, stabilizable controller. However, it should be highlighted that the LQ control and
stabilization problems for mean-field systems remain to be fur-
I. I NTRODUCTION ther investigated although major progresses have been obtained
in the above works [9], [13], [17], [20] and references therein.
In this paper, the mean-field linear quadratic optimal control
The basic reasons are twofold: Firstly, the solvability for the
and stabilization problems are considered for discrete-time
LQ control was given in terms of operator type condition
case. Different from the classical stochastic control problem,
[9], which is difficult to be verified in practice; Secondly,
mean-field terms appear in system dynamics and cost function,
the stabilization control problem of the mean-field system has
which combines mean-field theory with stochastic control
not been essentially solved as only sufficient conditions of
problems. Mean-field stochastic control problem has been a
stabilization were given in the previous works.
hot research topic since 1950s. System state is described by
In this paper, we aim to provide a complete solution to the
a controlled mean-field stochastic differential equation (MF-
problems of optimal LQ control and stabilization for discrete-
SDE), which was firstly proposed in [14], and the initial
time mean-field systems. Different from previous works, we
study of MF-SDEs was given by reference [16]. Since then,
will derive the maximum principle (MP) for discrete-time
many contributions have been made in studying MF-SDEs and
mean-field LQ control problem which is new to the best of
related topics by many researchers. See, for example, [7]- [11]
our knowledge. Then, by solving the coupled state equation
and the references cited therein. The recent development for
(forward) and the costate equation (backward), the optimal
mean-field control problems can be found in [5], [6], [13],
LQ controller is obtained from the equilibrium condition nat-
[20], [9], [17] and references therein.
urally, and accordingly the sufficient and necessary solvability
This work is supported by the National Science Foundation of China under condition is explored in explicit expression. The controller is
Grants 61120106011,61573221, 61633014. designed via a coupled Riccati equation which is derived from
H. Zhang and Q. Qi are with School of Control Science and Engineering,
Shandong University, Jinan 250061, P.R. China. H. Zhang is the corresponding the solution to the FBSDE, and posses the similarity with
author(hszhang@sdu.edu.cn). the case of standard LQ control. Finally, with convergence
2
analysis on the coupled Riccati equation, the infinite horizon where Ak , Āk , Ck , C̄k ∈ Rn×n , and Bk , B̄k , Dk , D̄k ∈
LQ controller and the stabilization condition (sufficient and Rn×m , all the coefficient matrices are given deterministic.
necessary) is explored by defining the Lyapunov function xk ∈ Rn is the state process and uk ∈ Rm is the control pro-
with the optimal cost function. Two stabilization results are cess. The system noise {wk }N k=0 is scalar valued random white
obtained under two different assumptions. One is under the noise with zero mean and variance σ 2 . E is the expectation
standard assumption of exactly observability, it is shown that taken over the noise {wk }N k=0 and initial state ξ. Denote Fk as
the mean-field system is stabilizable in mean square sense if the natural filtration generated by {ξ, w0 , · · · , wk } augmented
and only if a coupled ARE has a unique solution P and P̄ by all the P-null sets.
satisfying P > 0 and P + P̄ > 0. The other one is under By taking expectations on both sides of (1), we obtain
a weaker assumption of exactly detectability, it is shown that
the mean-field system is stabilizable in mean square sense if Exk+1 = (Ak + Āk )Exk + (Bk + B̄k )Euk . (2)
and only if the coupled ARE admits a unique solution P and The cost function associated with system equation (1) is
P̄ satisfying P ≥ 0 and P + P̄ ≥ 0. given by:
It should be pointed out that the presented results are parallel
N
to the solution of the standard stochastic LQ with similar X h
JN = E x′k Qk xk + (Exk )′ Q̄k Exk
results such as controller design and stabilization conditions
k=0
under the same assumptions on system and weighting matrices. i
In particular, the weighting matrices Rk and Rk + R̄k are + u′k Rk uk + (Euk )′ R̄k Euk
only required to be positive semi-definite for optimal controller + E(x′N+1 PN+1 xN+1 )+(ExN+1 )′ P̄N+1 ExN+1 , (3)
designed in this paper. It is more standard than the previous
works [9], where the matrices are assumed to be positive where Qk , Q̄k , Rk , R̄k , PN +1 , P̄N +1 are deterministic sym-
definite. metric matrices with compatible dimensions.
A preliminary version of this paper was submitted as in The finite horizon mean-field LQ optimal control problem
[21], in which the finite horizon optimal control for mean- is stated as follows:
field system was considered. In this paper, both the finite
Problem 1. For system (1) associated with cost function (3),
horizon control problem and infinite horizon optimal control
find Fk−1 -measurable controller uk such that (3) is minimized.
and stabilization problems are investigated. The remainder of
this paper is organized as follows. Section II presents the To guarantee the solvability of Problem 1, the following
maximum principle and the solution to finite horizon mean- standard assumption is made as follows.
field LQ control. In Section III, the infinite horizon optimal
Assumption 1. The weighting matrices in (3) satisfy Qk ≥ 0,
control and stabilization problems are investigated. Numerical
Qk + Q̄k ≥ 0, Rk ≥ 0, Rk + R̄k ≥ 0 for 0 ≤ k ≤ N and
examples are given in Section IV to illustrate main results of
PN +1 ≥ 0, PN +1 + P̄N +1 ≥ 0.
this paper. Some concluding remarks are given in Section V.
Finally, relevant proofs are detailed in Appendices. 2) Preliminaries: In order to solve the above problem, a
Throughout this paper, the following notations and defini- basic result is firstly presented as below.
tions are used.
Lemma 1. For any random vector x 6= 0, i.e., E(x′ x) 6= 0 as
Notations and definitions: In means the unit matrix with
defined in Definition 1, E(x′ M x) ≥ 0, if and only if M ≥ 0,
rank n; Superscript ′ denotes the transpose of a matrix. Real
where M is a real symmetric matrix.
symmetric matrix A > 0 (or ≥ 0) implies that A is strictly
positive definite (or positive semi-definite). Rn signifies the Proof. The proof is straightforward and is omitted here.
n-dimensional Euclidean space. B −1 is used to indicate the
inverse of real matrix B. {Ω, F , P, {Fk }k≥0 } represents a Remark 1. From Lemma 1, immediately we have
complete probability space, with natural filtration {Fk }k≥0 1) For any x satisfying x = Ex 6= 0, i.e., x is deterministic,
generated by {x0 , w0 , · · · , wk } augmented by all the P-null x′ M x ≥ 0 if and only if M ≥ 0.
sets. E[·|Fk ] means the conditional expectation with respect 2) For any random vector x satisfying Ex = 0 and x 6= 0,
to Fk and F−1 is understood as {∅, Ω}. E(x′ M x) ≥ 0 if and only if M ≥ 0.
Definition 1. For random vector x, if E(x′ x) = 0, we call it Remark 2. Note that Lemma 1 and Remark 1 also hold if “≥”
zero random vector, i.e., x = 0. in the conclusion is replaced by “≤”, “<”, “>” or “=”.
II. F INITE H ORIZON M EAN - FIELD LQ C ONTROL P ROBLEM B. Maximum Principle
A. Problem Formulation and Preliminaries In this subsection, we will present a general result for the
maximum principle of general mean-field stochastic control
1) Problem Formulation: Consider the following discrete- problem which is the base to solve the problems studied in
time mean-field system this paper.
Consider the general discrete-time mean-field stochastic

 xk+1 = (Ak xk + Āk Exk + Bk uk + B̄k Euk )
+ (Ck xk + C̄k Exk + Dk uk + D̄k Euk )wk , (1) systems

x0 = ξ, xk+1 = f k (xk , uk , Exk , Euk , wk ), (4)
3
where xk and uk are the system state and control input, respec- C. Solution to Problem 1
tively. Exk , Euk are expectation of xk and uk . Scalar-valued Following Theorem 1, it is easy to obtain the following
wk is the random white noise with zero mean and variance σ 2 . maximum principle for system (1) associated with the cost
f k (xk , uk , Exk , Euk , wk ), in general, is a nonlinear function. function (3).
The corresponding scalar performance index is given in the
general form Lemma 2. The necessary condition of minimizing (3) for
system (1) can be stated as:
N ′
B + wk Dk
n o n
0 = E Rk uk+R̄k Euk+ k
X
JN =E φ(xN +1 , ExN +1 )+ Lk (xk , uk , Exk , Euk ) , (5) λk
0
k=0 ′ o
n
B̄k + wk D̄k
o
+E λk Fk−1 , (10)

where φ(xN +1 , ExN +1 ) is a function of the final time N + Bk + B̄k
1, xN +1 and ExN +1 . Lk (xk , uk , Exk , Euk ) is a function of
where costate λk satisfies the following iteration
xk , Exk , uk , Euk at each time k.
n
From system (4), we have that Qk xk + Q̄k Exk
λk−1 = E
0
Exk+1 = E[f k (xk , uk , Exk , Euk , wk )]
′
Ak + wk Ck Āk + wk C̄k o
+ λk Fk−1 , (11)

= g k (xk , uk , Exk , Euk ), (6) 0 Ak + Āk
with final condition
where g k (xk , uk , Exk , Euk ) is deterministic function. " #
(1)
The general maximum principle (necessary condition) to PN +1 P̄N +1 xN +1
λN = (2) (3) , (12)
minimize (5) is given in the following theorem. P̄N +1 P̄N +1 ExN +1
Theorem 1. The necessary conditions for the minimizing JN (1) (2) (3)
where P̄N +1 = P̄N +1 , P̄N +1 = P̄N +1 = 0, PN +1 and P̄N +1
is given as, are given by the cost function (3).
In Lemma 2, λk (1 ≤ k ≤ N ) is costate and (11) is costate-
( k ′ ( ′ ) )
k
k ′ k ′ fuk fEu
0=E (Luk ) +E(LEuk ) + k λk+E k λk Fk−1 ,

guk k
gEu state equation. (11) and state equation (1) form the FBSDE
k
system. (10) is termed as equilibrium equation (condition).
(7) The main result of this section is stated as below.
where costate λk obeys Theorem 2. Under Assumption 1, Problem 1 has a unique
(1) (2)
solution if and only if Υk and Υk for k = 0, · · · , N , as
n I
given in the below, are all positive definite. In this case, the
o
λk−1 =E n
[Lkxk +E(LkExk )]′+[f˜xkk ]′ λk Fk−1 , (8)

0 optimal controller {uk }N
k=0 is given as:
with final condition uk = Kk xk + K̄k Exk , (13)

(φ )′ + E(φExN +1 )′
where
λN = xN +1 , (9)
0 (1)
Kk = −[Υk ]−1 Mk ,
(1)
(14)
n o
(2) (2) (1) (1)
where K̄k = − [Υk ]−1 Mk − [Υk ]−1 Mk , (15)
∂fk ∂fk k ∂fk ∂fk (1) (1) (2) (2)

fxkk = , fukk = ,f = , fk = , and Υk , Mk , Υk , Mk are given as
∂xk ∂uk Exk ∂Exk Euk ∂Euk (1)
Υk = Rk + Bk′ Pk+1 Bk + σ 2 Dk′ Pk+1 DN , (16)
(1) 2
∂gk ∂gk k ∂gk ∂gk Mk = Bk′ Pk+1 Ak
+ σ Dk′ Pk+1 Ck , (17)
gxkk = , gukk = ,g = , gk = ,
∂xk ∂uk Exk ∂Exk Euk ∂Euk (2)
Υk = Rk + R̄k + (Bk + B̄k )′ (Pk+1 + P̄k+1 )(Bk + B̄k )
and + σ 2 (Dk + D̄k )′ Pk+1 (Dk + D̄k ), (18)
(2)
Mk = (Bk + B̄k )′ (Pk+1 + P̄k+1 )(Ak+Āk )
∂φ(xN +1 ,ExN +1 ) ∂φ(xN +1 ,ExN +1 )
φExN +1 = ,φxN +1 = , + σ 2 (Dk + D̄k )′ Pk+1 (Ck + C̄k ), (19)
∂ExN +1 ∂xN +1
∂Lk k ∂Lk k ∂Lk ∂Lk while Pk and P̄k in the above obey the following coupled
Lkxk = ,L = ,L = , Lk = ,
∂xk uk ∂uk Exk ∂Exk Euk ∂Euk Riccati equation for k = 0, · · · , N .
k k

fxk fEx
f˜xkk = k k
k
, k = 0, · · · , N. Pk = Qk + A′k Pk+1 Ak+σ 2 Ck′ Pk+1 Ck
gxk gExk
(1) (1) (1)
− [Mk ]′ [Υk ]−1 Mk , (20)
2
Proof. See Appendix A. P̄k = Q̄k + A′k Pk+1 Āk +σ Ck′ Pk+1 C̄k
4
+ Ā′k Pk+1 Ak + σ 2 C̄k′ Pk+1 Ck where A, Ā, B, B̄, C, C̄, D, D̄ are all constant coefficient
+ Ā′k Pk+1 Āk + σ 2 C̄k′ Pk+1 C̄k matrices with compatible dimensions. The system noise wk is
defined as in (1).
+ (Ak + Āk )′ P̄k+1 (Ak + Āk )
The associated cost function is given by
(1) (1) (1) (2) (2) (2)
+ [Mk ]′ [Υk ]−1 Mk − [Mk ]′ [Υk ]−1 Mk , (21)
∞
with final condition PN +1 and P̄N +1 given by (3). X
J= E[x′k Qxk+(Exk )′ Q̄Exk+u′k Ruk+(Euk )′ R̄Euk ],
The associated optimal cost function is given by
k=0
∗ (25)
JN = E(x′0 P0 x0 ) + (Ex0 )′ P̄0 (Ex0 ). (22)
where Q, Q̄, R, R̄ are deterministic symmetric weighting
Moreover, the costate λk−1 in (11) and the state xk , Exk
matrices with appropriate dimensions.
admit the following relationship,
" # Throughout this section, the following assumption is made
(1)
on the weighting matrices in (25).
Pk P̄k xk
λk−1 = (2) (3) , (23)
P̄k P̄k Exk Assumption 2. R > 0, R + R̄ > 0, and Q ≥ 0, Q + Q̄ ≥ 0.
(1) (2) (3) Remark 6. It should be pointed out that Assumption 2 is
where Pk obeys Riccati equation (20), P̄k + P̄k + P̄k =
P̄k , and P̄k satisfies Riccati equation (21). a basic condition in order to investigate the stabilization for
stochastic systems, see [12], [19], and so forth.
Proof. See Appendix B.
The following notions of stability and stabilization are
Remark 3. We show that the necessary and sufficient solv- introduced.
ability conditions for the mean-field LQ optimal control are
(1) (2)
that the matrices Υk , Υk are positive definite which are Definition 2. System (24) with uk = 0 is called asymptotically
parallel to the solvability condition of standard LQ control. It mean square stable if for any initial values x0 , there holds
should be noted the solvability conditions in previous works lim E(x′k xk ) = 0.
[20] and [9] for the mean-field LQ optimal control are given k→∞
with operator type which is not easy to be verified in practice. Definition 3. System (24) is stabilizable in mean square sense
if there exists Fk−1 -measurable linear controller uk in terms
Remark 4. It should be noted that the weighting matrices
of xk and Exk , such that for any random vector x0 , the closed
Rk and Rk + R̄k in cost function (3) are only required to
loop of system (24) is asymptotically mean square stable.
be positive semi-definite in this paper which is more standard
than the assumptions made in most of previous works where Following from references [12], [22] and [23], the defi-
the matrices are required to be positive definite [9], [20]. nitions of exactly observability and exactly detectability are
respectively given in the below.
Remark 5. The presented results in Theorem 2 contain the
standard stochastic LQ control problem as a special case. Definition 4. Consider the following mean-field system
Actually, when coefficient matrices Āk , B̄k , C̄k , D̄k in (1)
xk+1 = (Axk + ĀExk ) + (Cxk + C̄Exk )wk ,
and weighting matrices Q̄k , R̄k , P̄N +1 in (3) are zero for (26)
Yk = Q1/2 Xk .
0 ≤ k ≤ N , by (16)-(19) and induction method, it is easy to
(1) (2) (1) (2)
know that Υk = Υk , Mk = Mk and thus K̄k = 0.

Q 0 xk − Exk
(1) where Q = and Xk = .
Furthermore, notice (71)-(73) and (21), we have P̄k = 0 Q + Q̄ Exk
System (26) is said to be exactly observable, if for any N ≥

(2) (3) Pk xk
P̄k = P̄k = P̄k = 0, (23) becomes λk−1 = . 0,
0
Refer to reference [3], [4] and [12], we know (13), (22) and Yk = 0, ∀ 0 ≤ k ≤ N ⇒ x0 = 0,
(23) are exactly the solution to standard stochastic LQ control
problem. where the meaning of Yk = 0 and x0 = 0 are given
by Definition 1. For simplicity, we rewrite system (26) as
(A, Ā, C, C̄, Q1/2 ).
III. I NFINITE H ORIZON M EAN - FIELD LQ C ONTROL AND
S TABILIZATION Definition 5. System (A, Ā, C, C̄, Q1/2 ) in (26) is said to be
A. Problem Formulation exactly detectable, if for any N ≥ 0,
In this section, the infinite horizon mean-field stochastic LQ Yk = 0, ∀ 0 ≤ k ≤ N ⇒ lim E(x′k xk ) = 0.

k→+∞
control problem is solved. Besides, the necessary and sufficient
stabilization condition for mean-field systems is investigated. Now we make the following two assumptions.
To study the stabilization problem for infinite horizon case, Assumption 3. (A, Ā, C, C̄, Q1/2 ) is exactly observable.
we consider the following time invariant system,
 Assumption 4. (A, Ā, C, C̄, Q1/2 ) is exactly detectable.
 xk+1 = (Axk + ĀExk + Buk + B̄Euk )
+ (Cxk + C̄Exk + Duk + D̄Euk )wk , (24) Remark 7. • It is noted that Definition 5 gives a different

x0 = ξ, definition of ‘exactly detectability’ from the one given in
5
previous work [17]. In fact, [17] considers the mean-field (13)-(23) are time invariant as in (24). The terminal weighting
system with different observation yk , matrix PN +1 and P̄N +1 in (3) are set to be zero.
Before presenting the solution to Problem 2, the following
xk+1 = (Axk + ĀExk )+(Cxk + C̄Exk )wk ,
(27) lemmas will be given at first.
yk = Qxk + Q̄Exk .
Lemma 3. For any N ≥ 0, Pk (N ) and P̄k (N ) in (20)-(21)
As sated in [17], system (27) is ‘exactly detectable’, if
satisfy Pk (N ) ≥ 0 and Pk (N ) + P̄k (N ) ≥ 0.
for any N ≥ 0,
Proof. See Appendix C.
yk = 0, ∀ 0 ≤ k ≤ N ⇒ lim E(x′k xk ) = 0.
k→+∞
Lemma 4. With the assumption R > 0 and R + R̄ > 0,
Obviously, it is different from the definition given in this Problem 1 admits a unique solution.
paper.
Proof. From Lemma 3, we know that Pk (N ) ≥ 0 and
• It should be highlighted that the exactly detectability
Pk (N )+ P̄k (N ) ≥ 0. Besides, as R > 0 and R + R̄ > 0, from
made in Assumption 4 is weaker (26) than the exactly (1) (2)
(16) and (18), we know that Υk (N ) > 0 and Υk (N ) > 0
detectability made in [17]. In fact, if the system is exactly
for 0 ≤ k ≤ N . Apparently from Theorem 2, we can conclude
detectable as made in Assumption 4 of the paper, then we
that Problem 1 admits a unique solution for any N > 0. This
have that
completes the proof.
Yk = Q1/2 Xk = 0 ⇒ lim E(x′k xk ) = 0.
k→+∞ Lemma 5. Under Assumptions 2 and 3, for any k ≥ 0, there
exists a positive integer N0 ≥ 0 such that Pk (N0 ) > 0 and
Note that Yk = Q1/2 Xk = 0 implies Pk (N0 ) + P̄k (N0 ) > 0.
1/2
Q 0 xk − Exk Proof. See Appendix D.
= 0. (28)
0 Q + Q̄ Exk
Theorem 3. Under Assumptions 2 and 3, if system (24) is
Equation (28) indicates that stabilizable in the mean square sense, the following assertions
hold:
Q(xk − Exk ) = 0, and (Q + Q̄)Exk = 0, (29)
1) For any k ≥ 0, Pk (N ) and P̄k (N ) are convergent, i.e.,
and thus, Qxk + Q̄Exk = 0.
lim Pk (N ) = P, lim P̄k (N ) = P̄ ,
Hence, if (A, Ā, C, C̄, Q, Q̄) is ‘exactly detectable’ as N →+∞ N →+∞
defined in [17], then (A, Ā, C, C̄, Q1/2 ) would be exactly where P and P̄ satisfy the following coupled ARE:
detectable as defined in Definition 5.
P = Q + A′ P A + σ 2 C ′ P C−[M (1) ]′ [Υ(1) ]−1 M (1) , (30)
Remark 8. Definition 4 and Definition 5 can be reduced to ′ 2 ′ ′ 2 ′
the standard exactly observability and exactly detectability P̄ = Q̄ + A P Ā + σ C P C̄ + Ā P A + σ C̄ P C
for standard stochastic systems, respectively. Actually, with + Ā′ P Ā + σ 2 C̄ ′ P C̄ + (A + Ā)′ P̄ (A + Ā)
Ā = 0, C̄ = 0, Q̄ = 0 in system (26), Definition 4 becomes + [M (1) ]′ [Υ(1) ]−1 M (1) − [M (2) ]′ [Υ(2) ]−1 M (2) , (31)
Q1/2 xk = 0 ⇒ x0 = 0, which is exactly the observability
definition for standard stochastic linear systems. Similarly, we while
can show that the exactly detectability given in Definition Υ(1) = R + B ′ P B + σ 2 D′ P D ≥ R > 0, (32)
5 can also be reduced to the standard exactly detectability (1) ′ 2 ′
definition for standard stochastic system. One can refer to M = B P A + σ D P C, (33)
(2) ′
reference [1], [12], [15], and so forth. Υ = R + R̄ + (B + B̄) (P + P̄ )(B + B̄)
The problems of infinite horizon LQ control and stabiliza- + σ 2 (D + D̄)′ P (D + D̄) ≥ R + R̄ > 0, (34)
tion for discrete-time mean-field systems are stated as the (2) ′
M = (B + B̄) (P + P̄ )(A + Ā)
following. + σ 2 (D + D̄)′ P (C + C̄). (35)
Problem 2. Find Fk−1 measurable linear controller uk in 2) P and P + P̄ are positive definite.
terms of xk and Exk to minimize the cost function (25) and
stabilize system (24) in the mean square sense. Proof. See Appendix E.
We are now in the position to present the main results of
B. Solution to Problem 2 this section. Two results are to be given, one is based on the
For the convenience of discussion, to make the time horizon assumption of exactly observability (Assumption 3), and the
N explicit for finite horizon mean-field LQ control problem, other is based on a weaker assumption of exactly detectability
(1) (2) (1) (2)
we re-denote Υk , Υk , Mk , Mk in (16)-(19) as Υk (N ),
(1) (Assumption 4).
(2) (1) (2)
Υk (N ), Mk (N ) and Mk (N ) respectively. Accordingly, Theorem 4. Under Assumption 2 and 3, mean-field system
Kk , K̄k , Pk and P̄k in (14), (15), (20) and (21) are respectively (24) is stabilizable in the mean square sense if and only if
rewritten as Kk (N ), K̄k (N ), Pk (N ) and P̄k (N ). Moreover, there exists a unique solution to coupled ARE (30)-(31) P
the coefficient matrices Ak , Āk , Bk , B̄k , Ck , C̄k , Dk , D̄k in and P̄ satisfying P > 0 and P + P̄ > 0.
6
   
In this case, the stabilizable controller is given by 1.658 0.161 0.024 0.630 0.919 −0.457
P2 =0.161 2.547 0.839 , P̄2 = 0.919 5.282 1.439  ,
uk = Kxk + K̄Exk , (36) 0.024 0.839 3.379 −0.457 1.439 −0.520
   
where 1.907 0.315 0.268 0.982 0.924 −1.027
P1 =0.315 2.812 1.352 , P̄1 = 0.924 5.306 0.711  ,
K = −[Υ(1) ]−1 M (1) , (37) 0.268 1.352 4.408 −1.027 0.711 −1.327
(2) −1 (2) (1) −1 (1)
K̄ = −{[Υ ] − [Υ ] },
   
M M (38) 2.026 0.353 0.364 1.217 1.294 −1.198
P0 =0.353 2.896 1.472 , P̄0 = 1.294 6.232 0.644  ,
Υ(1) , M (1) , Υ(2) and M (2) are given by (32)-(35).
0.364 1.472 4.641 −1.198 0.644 −1.498
Moreover, the stabilizable controller uk minimizes the cost
function (25), and the optimal cost function is given by (1) (2)
and Υk , Υk for k = 0, 1, 2, 3 in (16) and (18) can be
calculated as
J ∗ = E(x′0 P x0 ) + Ex′0 P̄ Ex0 . (39)
(1) 14.670 5.720 (2) 45.040 22.850
Proof. See Appendix F. Υ3 = , Υ3 = ,
5.720 4.590 22.850 16.330

Theorem 5. Under Assumption 2 and 4, mean-field system (1) 29.302 13.536 (2) 73.069 46.750
Υ2 = , Υ2 = ,
(24) is stabilizable in the mean square sense if and only if 13.536 12.297 46.750 38.789
there exists a unique solution to coupled ARE (30)-(31) P

(1) 32.593 14.480 (2) 113.585 76.217
and P̄ satisfying P ≥ 0 and P + P̄ ≥ 0. Υ1 = , Υ1 = ,
14.480 12.607 76.217 64.973
In this case, the stabilizable controller is given by (36).
42.070 19.232

130.398 82.580

(1) (2)
Moreover, the stabilizable controller uk minimizes the cost Υ0 = , Υ0 = ,
19.232 15.875 82.580 68.411
function (25), and the optimal cost function is as (39). (1) (2)
det[Υ3 ] = 34.617 > 0, det[Υ3 ] = 213.381 > 0,
Proof. See Appendix G. (1) (2)
det[Υ2 ] = 117.112 > 0, det[Υ2 ] = 648.698 > 0,
Remark 9. Theorem 4 and 5 propose a new approach to (1) (2)
stochastic control problems based on the maximum principle det[Υ1 ] = 201.228 > 0, det[Υ1 ] = 1570.987 > 0,
(1) (2)
and solution to FBSDE developed in this paper, and thus det[Υ0 ] = 297.946 > 0, det[Υ0 ] = 2101.236 > 0.
essentially solve the optimal control and stabilization for (1) (2)
mean-field stochastic systems under more standard assump- Since Υk > 0 and Υk > 0, thus by Theorem 2, the
tions which is compared with previous works [9] and [17]. unique optimal controller can be given as:
uk = Kk xk + K̄k Exk , k = 0, 1, 2, 3,
IV. N UMERICAL E XAMPLES
where
A. The Finite Horizon Case
−0.517 −0.483 −0.471
Consider system (1) and the cost function (3) with N = 4 K3 = ,
0.032 −0.084 −0.223
and σ 2 = 1, we choose the coefficient matrices and weighting
0.184 0.328 0.357
matrices in (1) and (3) to be time-invariant for k = 1, 2, 3 as: K̄3 = ,
      −0.522 −0.819 −0.920
1.1 0.9 0.8 0.5 1 0.9 2 0.3
−0.385 −0.481 −0.410

Ak = 0 0.6 1.2 , Āk =0.8 0.7 1.2 , Bk =1.1 0.6 , K2 = ,
−0.030 −0.247 −0.474
0.4 0.9 1 1.1 2 1.9 0.9 1.3
0.036 0.327 0.336
K̄2 = ,
     
1.2 0.6 0.8 0.9 1.5 1 0 0.3 −0.364 −0.734 −0.828
B̄k =0.9 1  , Ck =1.2 1 0.8 , C̄k =0.5 0.6 0.9  ,
0 0.8 0 0.6 0.4 0.7 1.2 0.8 −0.413 −0.476 −0.394
K1 = ,
    −0.011 −0.256 −0.500
0.5 0.4 2 1
Dk =  2 0.9  , D̄k = 0.5 0.8  , 0.071 0.345 0.305
K̄1 = ,
1 0 0 0.5 −0.334 −0.699 −0.820

Qk = diag([0, 2, 1]), Q̄k = diag([1, − 1, 0]), −0.411 −0.487 −0.398
K0 = ,
0.001 −0.259 −0.525
Rk = diag([0, 2]), R̄k = diag([1, − 2]),
0.070 0.339 0.297
P4 = diag([1, 2, 0]), P̄4 = diag([1, − 1, 1]). K̄0 = .
−0.358 −0.692 −0.780
It is noted that Rk and Rk + R̄k are semi-positive definite,
while not positive definite for k = 1, 2, 3. B. The Infinite Horizon Case
Based on (13)-(23) of Theorem 2, the solution to coupled Consider system (24) and the cost function (25) with the
Riccati equation (20)-(21) can be given as: following coefficient matrices and weighting matrices:
   
0.995 0.298 −0.115 0.667 0.074 −0.006 A = 1.1, Ā = 0.2, B = 0.4, B̄ = 0.1, C = 0.9, C̄ = 0.5,
P3 = 0.298 2.417 0.840 , P̄3 = 0.074 1.033 0.133 ,
D = 0.8, D̄ = 0.2, Q = 2, Q̄ = 1, R = 1, R̄ = 1, σ 2 = 1.
−0.115 0.840 3.360 −0.006 0.133 −1.319
7
the initial state x0 ∼ N (1, 2), i.e., x0 obeys the normal Simulation results of the corresponding state trajectories
distribution with mean 1 and covariance 2. with the designed controllers are respectively shown as in
Note that Q = 2, Q + Q̄ = 3, R = 1, R + R̄ = 2 are all Fig. 2 and Fig. 3. As expected, the state trajectories are not
positive, then Assumption 3 and Assumption 4 are satisfied. By convergent.
using coupled ARE (30)-(31), we have P = 5.6191 and P̄ =
5.1652. From (32)-(35), we can obtain Υ(1) = 5.4953, M (1) =
6.5182, Υ(2) = 10.3152, and M (2) = 14.8765.
Notice that P > 0 and P + P̄ > 0, according to Theorem
4, there exists a unique optimal controller to stabilize mean-
field system (24) as well as minimize cost function (25), the
controller in (36) is presented as
uk = Kxk + K̄Exk = −1.1861xk − 0.2561Exk , k ≥ 0.
Using the designed controller, the simulation of system state

is shown in Fig. 1. With the optimal controller, the regulated
system state is stabilizable in mean square sense as shown in
Fig. 1.
Fig. 2. Simulation for the state trajectory E(x′k xk ).
Fig. 1. The mean square stabilization of mean-field system.
To explore the effectiveness of the main results presented

in this paper, we consider mean-field system (24) and cost
function (25) with
Fig. 3. Simulation for the state trajectory E(x′k xk ).
A = 2, Ā = 0.8, B = 0.5, B̄ = 1, C = 1, C̄ = 1,
D = −0.8, D̄ = 0.6, Q = 1, Q̄ = 1, R = 1, R̄ = 1, σ 2 = 1.
V. C ONCLUSION
The initial state are assumed to be the same as that given This paper proposes a new approach to stochastic optimal
above. control with the key tools of maximum principle and solution
By solving the coupled ARE (30), it can be found that P to FBSDE explored in this paper. Accordingly, with the
has two negative roots as P = −1.1400 and P = −0.2492. approach, the optimal control and stabilization problems for
Thus, according to Theorem 4 and Theorem 5, we know that discrete-time mean-field systems have been essentially solved.
system (24) is not stabilizable in mean square sense. The main results include: 1) The sufficient and necessary
Actually, when P = −1.1400, it is easily known that solvability condition of finite horizon optimal control problem
equation (31) has no real roots for P̄ . While in the case of has been obtained in analytical form via a coupled Riccati
P = −0.2492, P̄ has two real roots which can be solved from equation; 2) The sufficient and necessary conditions for the
(31) as P̄ = 7.0597 and P̄ = −0.6476, respectively. stabilization of mean-field systems has been obtained. It is
In the latter case, with P = −0.2492 and P̄ = 7.0597, we shown that, under exactly observability assumption, the mean-
can calculate K and K̄ from (37) and (38) as K = 0.0640, field system is stabilizable in the mean square sense if and only
K̄ = 1.5939. Similarly, with P = −0.2492 and P̄ = −0.6476, if a coupled ARE has a unique solution P and P̄ satisfying
K and K̄ can be computed as K = 0.0640, K̄ = 131.8389. P > 0 and P + P̄ > 0. Furthermore, under exactly detectabil-
Accordingly, the controllers are designed as uk = 0.0640xk + ity assumption which is weaker than exactly observability, we
1.5939Exk , uk = 0.0640xk + 131.8389Exk , respectively. show that the mean-field system is stabilizable in the mean
8
square sense if and only if the coupled ARE admits a unique where
solution P and P̄ satisfying P ≥ 0 and P + P̄ ≥ 0.
f˜xk−1 · · · f˜xl l , l = 0, · · · , k;
k−1
F̃x (k, l) = fxkk fEx
k

k
(43)
l l
A PPENDIX A f f l
P ROOF OF T HEOREM 1 F̃x (k, k + 1) = [In 0], and f˜xl l = xl l Ex l .
gxl gEx l
Proof. For the general stochastic mean-field optimal control Substituting (42) into (40) yields
problem, the control domain for system (4) to minimize (5) is N l
n X f
given by δJN =E [φxN +1+E(φExN +1 )] F̃x (N, l+1) ul l εδul
gul
l=0
U = uk ∈ Rm | uk is Fk−1 measurable, E|uk |2 < ∞ .

N l
X f l
+ [φxN +1 +E(φExN +1 )] F̃x (N, l+1) Eu l εδEul
We assume that the control domain U to be convex. Any gEu l
l=0
uk ∈ U is called admissible control. Besides, for arbitrary N k−1 l
uk , δuk ∈ U and ε ∈ (0, 1), we can obtain uεk = uk + εδuk ∈
X X f
+ [Lkxk +E(LkExk )] F̃x (k−1, l+1) ul l εδul
U. gul
k=0 l=0
Let xεk , JN
ε
be the corresponding state and cost function N k−1 l

ε
with uk , and xk , JN represent the corresponding state and
X
k k
X fEu
+ [Lxk +E(LExk )] F̃x (k−1, l+1) l εδEul l
cost function with uk . gEul

k=0 l=0
We examine the increment in JN due to increment in the XN o
controller uk . Assume that final time N + 1 is fixed, by + [Lkuk + E(LkEuk )]εδuk + O(ε2 ). (44)
using Taylor’s expansion and following cost function (5), the k=0
ε
increment δJN = JN − JN can be calculated as follows, Note the facts that
n
k−1
δJN = E φxN +1 δxN +1 + φExN +1 δExN +1
l
n X f l o
E [Lkxk +E(LkExk )]
F̃x (k−1, l+1) Eul εδEu l
N gEu l
X k o l=0
+ Lxk δxk+LkExk δExk+Lkuk εδuk+LkEuk εδEuk +O(ε2 ) k−1 l
o
k=0
n n
k k
X fEu o
=E E [Lxk +E(LExk )] F̃x (k−1, l+1) l l εδul ,
N gEul
X l=0
=E [φxN +1 +E(φExN +1 )]δxN +1+ [Lkuk +E(LkEuk )]εδuk

(45)
k=0 N
fl l
n o
N
X
X E [φxN +1 +E(φExN +1 )] F̃x (N, l+1) Eu
l εδEul
[Lkxk + E(LkExk )]δxk + O(ε2 ).

+ (40) gEu l
l=0
k=0 N l o
n n X f l o
where O(ε2 ) means infinitesimal of the same order with ε2 . =E E [φxN +1+E(φExN +1 )] F̃x (N, l+1) Eu l εδul .
gEul
Another thing to note is the variation of the initial state l=0
δx0 = δEx0 = 0. (46)

By (1) and (6), for δxk = xεk − xk , the following assertion Also,we have
holds,
N k−1 l
X X f
[Lkxk E(LkExk )]
k k k k
F̃x (k −1, l +1) ul l εδul

δxk+1 f f k δxk fuk fEuk εδuk + (47)
= xkk Ex k + , (41) gul
δExk+1 gxk gEx k
δExk gukk gEu
k
k
εδEuk k=0 l=0
N −1
( N l )
Thus the variation of δxk+1 can be presented as
X X
k k f
= [Lxk +E(LExk )]F̃x (k−1, l+1) ul l εδul ,
gul
l=0 k=l+1
δxk+1 = fxkk δxk + fukk εδuk + fEx k
k
δExk + fEu k
k
εδEuk

k k δxk
= fxk fExk + fukk εδuk + fEu k
εδEuk N k−1
( l )
δExk k X
k k
X f l
E [Lxk +E(LExk )] F̃x (k − 1, l + 1) Eu
l εδul
k k fxk−1
"
k−1
# gEu
fEx

δxk−1 k=0 l=0 l
= fxk fExk k−1 k−1k−1
k−1
N−
X1
(( N l ))
gxk−1 gExk−1 δExk−1 X f l
k k ′
= E [Lxk +E(LExk )] F̃x (k−1, l +1) Eu l εδul .
gEu
" #
k−1 k−1
k k fuk−1 fEuk−1 εδuk−1

l=0 k=l+1 l
+ fxk fExk k−1 k−1 +fukk δuk+fEu k
δEuk
guk−1 gEuk−1 εδEuk−1 k (48)
X k l l
Therefore, (44) becomes
δx0 f fEu εδul
= F̃x (k, 0) + F̃x (k, l + 1) ul l l
l
δEx0 gul gEul εδEul N −1
l=0
n X o
k k δJ N =E G(N + 1, N )εδu N + [G(l+1, N )]εδu l +O(ε2 ),
l
ful l

X X fEu l=0
= F̃x (k, l+1) l εδul+ F̃x (k, l+1) l l εδEul , (42)
gul gEul (49)
l=0 l=0
9
o
where + (f˜xkk )′ (f˜xkk )′ λk+1 Fk−1

n
G(N + 1, N ) = [φxN +1 + E(φExN +1 )]fuNN =E
In k k
[L +E(LExk )] +(fxk ) ′ ˜k ′ In
[Lkxk +1+E(Lk+1 ′
n o 0 xk 0 Exk )]
N
+E [φxN +1 +E(φExN +1 )]fEu N
+[LN N
uN +E(LEuN )], (50)
′ In
G(l + 1, N ) + (f˜xkk )′ (f˜xk+1
k +1
) [Lk+2 k+2 ′
xk +2 + E(LExk+2 )] + · · ·
0
l
f

= [φxN +1 + E(φExN +1 )]F̃x (N, l + 1) ul l + (f˜xkk )′ (f˜xk+1 )′
· · · ( ˜N −1 )′ In [LN +E(LN )]′
f
gul k +1 xN −1 0 xN ExN
l
f l
o
+ E [φxN +1 + E(φExN +1 )]F̃x (N, l + 1) Eu + (f˜xkk )′ (f˜xk+1 )′ · · · (f˜xNN )′ λN Fk−1

l k +1
gEu l
N
N
nX
F̃x′ (j − 1, k)[Ljxj + E(LjExj )]
l
X f =E
+ [Lkxk + E(LkExk )]F̃x (k − 1, l + 1) ul l
gul j=k
k=l+1 o
N + F̃x′ (N, k)[φxN +1 + E(φExN +1 )]′ Fk−1 . (56)
l o
n X f l
+E [Lkxk k
+ E(LExk )]F̃x (k − 1, l + 1) Eu
l
gEu
k=l+1 l Substituting (56) into (7), one has
+ [Llul + E(LlEul )]. (51) (
0=E [Lkuk + E(LkEuk )]′
Furthermore, (49) can be rewritten as
n o
N k ′ n
δJN = E E [G(N + 1, N ) | FN −1 ] εδuN fuk
o
F̃x′ (j − 1, k + 1)[Ljxj + E(LjExj )]′
X
+ k
n NX
−1 guk
o j=k+1
+E E [G(l + 1, N ) | Fl−1 ] εδul + O(ε2 ) k ′ n
f o
l=0 + ukk F̃x′ (N, k + 1)[φxN +1 + E(φExN +1 )]′
+ E {{G(N+1, N ) − E [G(N+1, N ) | FN −1 ]} εδuN } guk
X N k ′n o
n NX−1 fEuk ′ j j ′
+E F̃ (j−1, k+1)[L +E(L ]
o
+E {G(l+1, N )−E [G(l+1, N ) | Fl−1 ]} εδul gk x xj Exj
j=k+1 Euk
n l=0 k ′n o
)
= E E [G(N + 1, N ) | FN −1 ] εδuN fEuk ′ ′
+E k F̃x (N, k+1)[φxN +1+E(φExN +1 )] Fk−1 ,
gEu k

N
X −1 o
+ E [G(l + 1, N ) | Fl−1 ] εδul +O(ε2 ), (52) k = 0, · · · , N, (57)
l=0
which is (54). It has been proved that (7)-(9) are exactly the
where the following facts are applied in the last equality, necessary conditions for the minimum of JN . The proof is
E {{G(N + 1, N )−E [G(N + 1, N ) | FN −1 ]} εδuN } = 0, complete.
n NX
−1 o
E {G(l+1, N )−E [G(l+1, N ) | Fl−1 ]}εδul = 0. A PPENDIX B
l=0
P ROOF OF T HEOREM 2
Since δul is arbitrary for 0 ≤ l ≤ N , thus the necessary
Proof. “Necessity”: Under Assumption 1, if Problem 1 has a
condition for the minimum can be given from (52) as (1) (2)
unique solution, we will show by induction that Υk , Υk
0 = E {G(N + 1, N ) | FN −1 } , (53) are all strictly positive definite and the optimal controller is
0 = E {G(l + 1, N ) | Fl−1 } , l = 0, · · · , N − 1. (54) given by (13).
Firstly, we denote J(k) as below
Now we will show that the equation (7)-(9) is a restatement
N
of the necessary conditions (53)-(54). X h
In fact, substituting (9) into (7) and letting k = N , we have J(k) , E x′j Qj xj + (Exj )′ Q̄j Exj
n j=k
E (LN ′ N ′ N ′ ′ i
uN ) +E(LEuN ) +(fuN ) [φxN +1 + E(φExN +1 )] + u′j Rj uj + (Euj )′ R̄j Euj
N ′ o
) [φxN +1 +E(φExN +1 )]′ FN −1 =0,

+ E (fEu N
(55) + E[x′N +1 PN +1 xN+1 ]+(ExN+1 )′ P̄N+1 ExN+1 . (58)
which means that (55) is exactly (53). For k = N , equation (58) becomes
Furthermore, noting (8), we have that h
n J(N ) = E x′N QN xN + (ExN )′ Q̄N ExN
In
o
λk−1 =E [(Lkxk )′+E(LkExk )]′+[f˜xkk ]′ λk Fk−1
i
0 + u′N RN uN + (EuN )′ R̄N EuN
n
I I
= E n [Lkxk +E(LkExk )]′+(f˜xkk )′ n [Lk+1 k+1 ′
xk +1+E(LExk )] + E[x′N +1 PN +1 xN +1 ]+(ExN +1 )′ P̄N +1 ExN +1 . (59)
0 0
10
(1) (2) (3)

Using system dynamics (1), J(N ) can be calculated as a Note that P̄N +1 + P̄N +1 + P̄N +1 = P̄N +1 , it follows from
quadratic form of xN , ExN , uN and EuN . By Assumption 1, (61) that
we know that the minimum of (59) must satisfy J ∗ (N ) ≥ 0. (1) (2) (1)
Let xN = 0, since it is assumed Problem 1 admits a unique 0 = ΥN uN + [ΥN − ΥN ]EuN
(1) (2) (1)
solution, thus it is clear that uN = 0 is the optimal controller + MN xN + [MN − MN ]ExN , (62)
and optimal cost function is J ∗ (N ) = 0. (1) (2) (1) (2)
Hence, J(N ) must be strictly positive for any nonzero uN , where ΥN , ΥN , MN , MN are given by (16)-(19) for k =
i.e., for uN 6= 0, we can obtain N.
Therefore, taking expectations on both sides of (62), we
(1) (2)
J(N ) = E[(uN −EuN )′ ΥN (uN −EuN )]+Eu′N ΥN EuN have
(2) (2)
> 0. (60) ΥN EuN + MN ExN = 0. (63)
(1) (2) (1) (2)
Following Lemma 1, clearly we have ΥN > 0 and ΥN > Since ΥN , and ΥN has been proved to be strictly positive,
0 from (60). In fact, in the case EuN = 0 and uN 6= 0, thus EuN can be presented as
equation (60) becomes (2) (2)
EuN = −[ΥN ]−1 MN ExN . (64)
(1)
J(N ) = E[u′N ΥN uN ] > 0.
By plugging (64) into (62), the optimal controller uN given
Thus
(1)
ΥN > 0 can be obtained by using Lemma 1 and Remark by (13) with k = N can be verified.
1. Next we will show λN −1 has the form of (23) associated
On the other hand, if uN = EuN 6= 0, i.e., uN is with (20)-(21) for k = N .
deterministic controller, then (60) can be reduced to Notice (12) and (11), we have that
n Q x + Q̄ Ex
(2) N N N N
J(N ) = u′N ΥN uN > 0. λN −1 = E
0
(2) ′
Similarly, it holds from Lemma 1 and Remark 1 that ΥN > 0.

AN + wN CN ĀN + wN C̄N
o
+ λN FN −1

Further the optimal controller uN is to be calculated as 0 AN + ĀN
follows. n
QN xN + Q̄N ExN

Using (1) and (12), from (10) with k replaced by N , we =E
0
have that ′" (1)
#
′ AN +wN CN ĀN +wN C̄N PN +1 P̄N +1
+

n B + wN DN (2) (3)
0=E RN uN +R̄N EuN + N λN 0 AN +ĀN P̄N +1 P̄N +1
0
xN +1
" ′ # o
B̄N + wN D̄N
o × FN −1 . (65)
+E λN FN −1
ExN +1
BN + B̄N
n By using the optimal controller (13) and the system dynam-
= E RN uN + R̄N EuN ics (1), each element of λN −1 can be calculated as follows,
(1)
+ (BN + wN DN )′ (PN +1 xN +1 + P̄N +1 ExN +1 ) E[(AN + wN CN )′ PN +1 xN +1 |FN −1 ]
(1)

+ E[(B̄N +wN D̄N )′ (PN +1 xN +1+P̄N +1 ExN +1 )] = A′N PN +1 AN + σ 2 CN′
PN +1 CN
o
(2) (3)
+ E[(BN +B̄N )′ (P̄N +1 xN +1+P̄N +1 ExN +1 )]FN −1

+ A′N PN +1 BN KN + σ 2 CN′
PN +1 DN KN xN
′
= (RN + BN PN +1 BN + σ 2 DN′
PN +1 DN )uN h
h + A′N PN +1 ĀN + σ 2 CN
′
PN +1 C̄N
′ 2 ′
+ R̄N + BN PN +1 B̄N + σ DN PN +1 D̄N
+ A′N PN +1 BN K̄N + σ 2 CN
′
PN +1 DN K̄N
′ (1) ′ (1)
+ BN P̄N +1 (BN + B̄N ) + B̄N P̄N +1 (BN +B̄N ) ′
+ AN PN +1 B̄N (KN + K̄N )
+ ′
B̄N PN +1 BN + σ 2 D̄N
′
PN +1 DN
i
+ σ 2 CN
′
PN +1 D̄N (KN + K̄N ) ExN , (66)
′ 2 ′
+ B̄N PN +1 B̄N + σ D̄N PN +1 D̄N (1)
(2) (3)
i E[(AN + wN CN )′ P̄N +1 ExN +1 |FN −1 ]
+ (BN + B̄N )′ (P̄N +1 + P̄N +1 )(BN + B̄N ) EuN n
(1)
= A′N P̄N +1 (AN + ĀN )
′
+ (BN PN +1 AN + σ 2 DN
′
PN +1 CN )xN o
(1)
+ A′N P̄N +1 (BN + B̄N )(KN + K̄N ) ExN , (67)
h
′ 2 ′
+ BN PN +1 ĀN + σ DN PN +1 C̄N n
′
+ BN
(1) ′
P̄N +1 (AN + ĀN ) + B̄N
(1)
P̄N +1 (AN +ĀN ) E (ĀN +wN C̄N )′ PN +1
o
(2)
′
+ B̄N PN +1 AN + σ 2 D̄N
′
PN +1 CN + (AN + ĀN )′ P̄N +1 xN +1 FN −1

′
+ B̄N PN +1 ĀN + σ 2 D̄N
′
PN +1 C̄N n
i = Ā′N PN +1 AN + σ 2 C̄N′
PN +1 CN
(2) (3)
+ (BN + B̄N )′ (P̄N +1+P̄N +1 )(AN +ĀN ) ExN . (61)
+ Ā′N PN +1 BN KN + σ 2 C̄N
′
PN +1 DN KN
11
(2) (3)
+ (AN + ĀN )′ P̄N +1 AN + (AN + ĀN )′ P̄N +1 (BN + B̄N )(KN + K̄N )
(3)
o
(2)
+ (AN + ĀN )′ P̄N +1 BN KN xN + (AN + ĀN )′ P̄N +1 (AN + ĀN ), (73)
(1) (2) (3)
n
+ Ā′N PN +1 ĀN + σ 2 C̄N
′
PN +1 C̄N with P̄N +1 = P̄N +1 , P̄N +1 = P̄N +1 = 0.
Similarly, PN is given as
+ Ā′N PN +1 BN K̄N + σ 2 C̄N
′
PN +1 DN K̄N
′
+ ĀN PN +1 B̄N (KN + K̄N ) PN = QN + A′N PN +1 AN + σ 2 CN
′
PN +1 CN
+ σ 2 C̄N
′
PN +1 D̄N (KN + K̄N ) + A′N PN +1 BN KN + σ 2 CN
′
PN +1 DN KN
(2) = QN + A′N PN +1 AN + σ 2 CN
′
PN +1 CN
+ (AN + ĀN )′ P̄N +1 BN K̄N
(1) (1)
(2) − (A′N PN +1 BN + σ 2 CN
′
PN +1 DN )[ΥN ]−1 MN
+ (AN + ĀN )′ P̄N +1 B̄N (KN + K̄N )
(2)
o = QN + A′N PN +1 AN + σ 2 CN
′
PN +1 CN
+ (AN + ĀN )′ P̄N +1 ĀN ExN , (68) (1) (1) (1)
− [MN ]′ [ΥN ]−1 MN , (74)
and (1)
which is exactly (20) for k = N . Now we show P̄N = P̄N +
(1) (2) (3)
E{[(ĀN +wN C̄N )′ P̄N +1 P̄N + P̄N obeys (21). In fact, it holds from (71)-(74) that
(3)
+ (AN + ĀN )′ P̄N +1 ]ExN +1 |FN −1 } (1)
P̄N = P̄N + P̄N + P̄N
(2) (3)
n
(1)
= Ā′N P̄N +1 (AN + ĀN ) = Q̄N + A′N PN +1 ĀN + σ 2 CN
′
PN +1 C̄N
(1)
+ Ā′N P̄N +1 (BN + B̄N )(KN + K̄N ) + Ā′N PN +1 ĀN + σ 2 C̄N
′
PN +1 C̄N
(3) + Ā′N PN +1 AN + σ 2 C̄N
′
PN +1 CN
+ (AN + ĀN )′ P̄N +1 (BN + B̄N )(KN + K̄N ) h
+ A′N PN +1 BN + σ 2 CN ′
PN +1 DN
o
(3)
+ (AN + ĀN )′ P̄N +1 (AN + ĀN ) ExN . (69)
+ Ā′N PN +1 BN + σ 2 C̄N
′
PN +1 DN
By plugging (66)-(69) into (65), we know that λN −1 is
+ A′N PN +1 B̄N + σ 2 CN
′
PN +1 D̄N
given as,
" # + Ā′N PN +1 B̄N + σ 2 C̄N
′
PN +1 D̄N
(1)
PN P̄N xN ′
i
λN −1 = (2) (3) , (70) + (AN + ĀN ) P̄N +1 (AN + ĀN ) (KN + K̄N )
P̄N P̄N ExN
− (A′N PN +1 BN + σ 2 CN
′
PN +1 DN )KN
(1) (2) (3)
where P̄N , P̄N , P̄N are respectively calculated in the = Q̄N + A′N PN +1 ĀN + σ 2 CN
′
PN +1 C̄N
following,
+ Ā′N PN +1 ĀN + σ 2 C̄N
′
PN +1 C̄N
(1)
P̄N = Q̄N + A′N PN +1 ĀN + σ 2 CN
′
PN +1 C̄N + Ā′N PN +1 AN + σ 2 C̄N
′
PN +1 CN
+ A′N PN +1 BN K̄N + σ 2 CN
′
PN +1 DN K̄N + (AN + ĀN )′ P̄N +1 (AN + ĀN )
′
+ AN PN +1 B̄N (KN + K̄N ) (1) (1) (1)
+ [MN ]′ [ΥN ]−1 MN − [MN ]′ [ΥN ]−1 MN .
(2) (2) (2)
(75)
2 ′
+ σ CN PN +1 D̄N (KN + K̄N ) (1) (2) (3)
′ (1) where P̄N +1 + P̄N +1 + P̄N +1 = P̄N +1 has been inserted to
+ AN P̄N +1 (AN + ĀN ) the second equality of (75).
(1)
+ A′N P̄N +1 (BN + B̄N )(KN + K̄N ), (71) Thus, (23) associated with (20)-(21) have been verified for
(2) k = N.
P̄N = Ā′N PN +1 AN + σ 2 C̄N
′
PN +1 CN
Therefore we have shown the necessity for k = N in the
+ ĀN PN +1 BN KN + σ 2 C̄N
′ ′
PN +1 DN KN above. To complete the induction, take 0 ≤ l ≤ N , for any
′ (2) (2)
+ (AN +ĀN ) P̄N +1 AN +(AN +ĀN )′ P̄N +1 BN KN , k ≥ l + 1, we assume that:
(1) (2)
(72) • Υk and Υk in (16) and (18) are all strictly positive;
(3) • The costate λk−1 is given by (23), Pk satisfies (20) and
P̄N = Ā′N PN +1 ĀN + σ 2 C̄N
′
PN +1 C̄N (1) (2) (3)
P̄k , P̄k , P̄k satisfy (71)-(73) with N replaced by k,
+ ĀN PN +1 BN K̄N + σ 2 C̄N
′ ′
PN +1 DN K̄N (1) (2) (3)
respectively. Furthermore, P̄k + P̄k + P̄k = P̄k and
′
+ ĀN PN +1 B̄N (KN + K̄N ) P̄k obeys (21);
+ σ 2 C̄N
′
PN +1 D̄N (KN + K̄N ) • The optimal controller uk is as in (13).
(2)
+ (AN + ĀN )′ P̄N +1 ĀN We will show the above statements are also true for k = l.
(1) (2)
Firstly, we show Υl and Υl are positive definite if
(2)
+ (AN + ĀN )′ P̄N +1 BN K̄N Problem 1 has a unique solution.
(2) By applying the maximum principle (10)-(11) and (1), we
+ (AN + ĀN )′ P̄N +1 B̄N (KN + K̄N )
(1)
can obtain
+ Ā′N P̄N +1 (AN + ĀN ) n ′ ′ o
xk xk+1
(1)
+ Ā′N P̄N +1 (BN + B̄N )(KN + K̄N ) E λk−1 − λk
Exk Exk+1
12
n ′ n ′ h
xk Ak + wk Ck Āk + wk C̄k + Ex′l Q̄l + A′l Pl+1 Āl + σ 2 Cl′ Pl+1 C̄l + Ā′l Pl+1 Al
o
=E E λk Fk−1

Exk 0 Ak + Āk
′ + σ 2 C̄l′ Pl+1 Cl + Ā′l Pl+1 Āl + σ 2 C̄l′ Pl+1 C̄l
xk Qk xk + Q̄k Exk
+
i
Exk 0 + (Al + Āl )′ P̄l+1 (Al + Āl ) Exl
′ ′
+ x′l A′l Pl+1 Bl + σ 2 Cl′ Pl+1 Dl ul

xk Ak + wk Ck Āk + wk C̄k
− λk
Exk 0 Ak + Āk + u′l Bl′ Pl+1 Al + σ 2 Dl′ Pl+1 Cl xl

′ ′ o
uk Bk + wk Dk B̄k + wk D̄k
h
− λk + Ex′l A′l Pl+1 B̄l + σ 2 Cl′ Pl+1 D̄l + Ā′l Pl+1 Bl
Euk 0 Bk + B̄k
′ n ′ + σ 2 C̄l′ Pl+1 Dl + Ā′l Pl+1 B̄l + σ 2 C̄l′ Pl+1 D̄l
n o
=E E λk Fk−1
i
Exk 0 Ak + Āk + (Al + Āl )′ P̄l+1 (Bl + B̄l ) Eul
′
xk Qk xk + Q̄k Exk
h
+ + Eu′l Bl′ Pl+1 Āl + σ 2 Dl′ Pl+1 C̄l + B̄l′ Pl+1 Al
Exk 0
′ ′ + σ 2 D̄l′ Pl+1 Cl + B̄l′ Pl+1 Āl + σ 2 D̄l′ Pl+1 C̄l
− λk i
Exk 0 Ak + Āk + (Bl + B̄l )′ P̄l+1 (Al + Āl ) Exl
′ n B̄ + w D̄ ′ oo
Bk + wk Dk + u′l Rl + Bl′ Pl+1 Bl + σ 2 Dl′ Pl+1 Dl ul

k k k
−u′k λk−u′k E λk
0 Bk + B̄k h
n x ′ Q x +Q̄ Ex o + Eu′l Bl′ Pl+1 B̄l + σ 2 Dl′ Pl+1 D̄l + B̄l′ Pl+1 Bl
k k k k k
=E +E(u′k Rk uk+Eu′k R̄k Euk ) + σ 2 D̄l′ Pl+1 Dl + B̄l′ Pl+1 B̄l + σ 2 D̄l′ Pl+1 D̄l
Exk 0
i o
= E(x′k Qk xk+Ex′k Q̄k Exk+u′k Rk uk+Eu′k R̄k Euk ). + R̄l + (Bl + B̄l )′ P̄l+1 (Bl + B̄l ) Eul
Adding from k = l + 1 to k = N on both sides of the above = E(x′l Pl xl + Ex′l P̄l Exl )
equation, we have
n
(1)
+ E [ul − Eul − Kl (xl − Exl )]′ Υl
n ′
xl+1
o o
E λl−x′N +1 PN +1 xN +1−Ex′N +1 PN +1 ExN +1 × [ul − Eul − Kl (xl − Exl )]
Exl+1
(2)
N
X + [Eul−(Kl+K̄l )Exl ]′ Υl [Eul−(Kl+K̄l )Exl ], (78)
= E(x′k Qk xk+Ex′k Q̄k Exk+u′k Rk uk+Eu′k R̄k Euk ).
k=l+1 (1) (2)
where Υl and Υl are respectively given by (16) and (18)
Thus, it follows from (58) that for k = l.
Equation (58) indicates that xl is the initial state in min-
J(l) = E(x′l Ql xl + Ex′l Q̄l Exl + u′l Rl ul + Eu′l R̄l Eul ) (1)
imizing J(l). Now we show Υl > 0 and Υl > 0. We
(2)
N
X choose xl = 0, then (78) becomes
+ E(x′k Qk xk+Ex′k Q̄k Exk+u′k Rk uk+Eu′k R̄k Euk )
n o
k=l+1 (1) (2)
J(l)=E (ul−Eul )′ Υl (ul−Eul )+Eu′l Υl Eul . (79)
+ E(x′N +1 PN +1 xN +1+Ex′N +1 PN +1 ExN +1 )
n
= E x′l Ql xl + Ex′l Q̄l Exl + u′l Rl ul + Eu′l R̄l Eul It follows from Assumption 1 that the minimum of J(l)
′ o satisfies J ∗ (l) ≥ 0. By (79), it is obvious that ul = 0 is the
xl+1
+ λl , (76) optimal controller and the associated optimal cost function
Exl+1
J ∗ (l) = 0. The uniqueness of the optimal control implies that
Note that (23) is assumed to be true for k = l + 1, i.e., for any ul 6= 0, J(l) must be strictly positive. Thus, following
(1) (2)
"
(1)
# the discussion of (60) for J(N ), we have Υl > 0 and Υl >
Pl+1 P̄l+1 xl+1 0.
λl = (2) (3) , (77)
P̄l+1 P̄l+1 Exl+1 (1) (2)
Since Υl > 0 and Υl > 0, the optimal controller can be
(1) (2) (3) given from (61)-(62) as (13) for k = l, and the optimal cost
where Pl+1 follows the iteration (20) and P̄l+1 , P̄l+1 , P̄l+1 is function is given as (22) for k = l.
calculated as (71)-(73) with N replaced by l + 1, respectively, Now we will show that (23) associated with (20)-(21) are
(1) (2) (3)
and P̄l+1 + P̄l+1 + P̄l+1 = P̄l+1 , where P̄l+1 is given as (21). true for k = l. Since (23) is assumed to be true for k = l + 1,
By substituting (77) into (76) and using the system dynam- i.e., λl is given by (77). By substituting (77) into (11) for
ics (1), J(l) can be calculated as k = l, and applying the same lines for (65)-(75), it is easy to
(1) (2)
J(l) verify that (23) is true with Pl satisfying (20) and P̄l , P̄l ,
(3)
P̄l given as (71)-(73) with N replaced by l, furthermore
= E(x′l Ql xl + Ex′l Q̄l Exl + u′l Rl ul + Eu′l R̄l Eul (1) (2) (3)
P̄l + P̄l + P̄l = P̄l , and P̄l obeys (21) for k = l.
+ x′l+1 Pl+1 xl+1 + Ex′l+1 P̄l+1 Exl+1 )
n Therefore, the proof of necessity is complete by using
= E x′l Ql + A′l Pl+1 Al + σ 2 Cl′ Pl+1 Cl xl

induction method.
13
(1)
“Sufficiency”: Under Assumption 1, suppose Υk , and (3) can be rewritten as
(2)
Υk , k = 0, · · · , N are strictly positive definite, we will show N nh i′
(1)
X
that Problem 1 is uniquely solvable. JN = E uk − Euk − Kk (xk − Exk ) Υk (N )
VN (k, xk ) is denoted as k=0
h io
× uk − Euk − Kk (xk − Exk )
VN (k, xk ) , E(x′k Pk xk ) + Ex′k P̄k Exk , (80) N
X ′ (2)
+ Euk − (Kk + K̄k )Exk Υk (N )
where Pk and P̄k satisfy (20) and (21) respectively. It follows k=0
that

× Euk − (Kk + K̄k )Exk
+ E(x′0 P0 x0 ) + Ex′0 P̄0 Ex0 . (82)
VN (k, xk ) − VN (k + 1, xk+1 )
(1) (2)
Notice Υk > 0 and Υk > 0, we have
n
= E x′k Pk xk + Ex′k P̄k Exk
− x′k A′k Pk+1 Ak + σ 2 Ck′ Pk+1 Ck xk
JN ≥ E(x′0 P0 x0 ) + Ex′0 P̄0 Ex0 ,
h
− Ex′k A′k Pk+1 Āk + σ 2 Ck′ Pk+1 C̄k thus the minimum of JN is given by (22), i.e.,
∗
+ Ā′k Pk+1 Ak + σ 2 C̄k′ Pk+1 Ck JN = E(x′0 P0 x0 ) + Ex′0 P̄0 Ex0 .
+ Ā′k Pk+1 Āk + σ 2 C̄k′ Pk+1 C̄k In this case the controller will satisfy that
i
+ (Ak + Āk )′ P̄k+1 (Ak + Āk ) Exk uk − Euk − Kk (xk − Exk ) = 0, (83)
− x′k A′k Pk+1 Bk + σ 2 Ck′ Pk+1 Dk uk Euk − (Kk + K̄k )Exk = 0. (84)

− u′k Bk′ Pk+1 Ak + σ 2 Dk′ Pk+1 Ck xk

Hence, the optimal controller can be uniquely obtained from
(83)-(84) as (13).
h
− Ex′k A′k Pk+1 B̄k + σ 2 Ck′ Pk+1 D̄k + Ā′k Pk+1 Bk
In conclusion, Problem 1 admits a unique solution. The
+ σ 2 C̄k′ Pk+1 Dk + Ā′k Pk+1 B̄k + σ 2 C̄k′ Pk+1 D̄k proof is complete.
i
+ (Ak + Āk )′ P̄k+1 (Bk + B̄k ) Euk
h A PPENDIX C
− Eu′k Bk′ Pk+1 Āk + σ 2 Dk′ Pk+1 C̄k + B̄k′ Pk+1 Ak P ROOF OF L EMMA 3
(1) (1)
+ σ 2 D̄k′ Pk+1 Ck + B̄k′ Pk+1 Āk + σ 2 D̄k′ Pk+1 C̄k Proof. Since Kk (N ) = −[Υk (N )]−1 Mk (N ), then it holds
i from (20) that
+ (Bk + B̄k )′ P̄k+1 (Ak + Āk ) Exk (1) (1) (1)
[Mk (N )]′ [Υk (N )]−1 Mk (N )
− u′k Bk′ Pk+1 Bk + σ 2 Dk′ Pk+1 Dk uk

(1) (1)
h = −[Mk (N )]′ Kk (N ) − Kk (N )′ Mk (N )
− Eu′k Bk′ Pk+1 B̄k + σ 2 Dk′ Pk+1 D̄k + B̄k′ Pk+1 Bk (1)
− Kk (N )′ Υk (N )Kk (N ).
+ σ 2 D̄k′ Pk+1 Dk + B̄k′ Pk+1 B̄k + σ 2 D̄k′ Pk+1 D̄k
i o Thus, Pk (N ) in (20) can be calculated as
+ (Bk + B̄k )′ P̄k+1 (Bk + B̄k ) Euk
n Pk (N )
(1) (1) (1)
= E x′k {Qk − [Mk ]′ [Υk ]−1 Mk }xk =Q+A′ Pk+1 (N )A+σ 2 C ′ Pk+1 (N )C+[Mk (N )]′ Kk (N)
(1)
(1) (1)
n
(1) (1)
+ Ex′k Q̄k + [Mk ]′ [Υk ]−1 Mk
(1) + Kk′ (N )Mk (N ) + Kk′ (N )Υk (N )Kk (N )
(2) (2) (2)
o = Q + Kk′ (N )RKk (N )
− [Mk ]′ [Υk ]−1 Mk Exk
+ [A + BKk (N )]′ Pk+1 (N )[A + BKk (N )]
(1) (1)
− x′k [Mk ]′ uk − u′k Mk xk + u′k Rk uk + Eu′k R̄k Euk + σ 2 [C + DKk (N )]′ Pk+1 (N )[C + DKk (N )]. (85)
(2) (1) (2) (1)
− Ex′k [Mk − Mk ]′ Euk − Eu′k [Mk − Mk ]Exk Notice from Assumption 1 that Q ≥ 0 andPN +1 (N ) =
o
(1) (2) (1) PN +1 = 0, (85) indicates that PN (N ) ≥ 0. Using induction
− u′k Υk uk − Eu′k [Υk − Υk ]Euk
method, assume Pk (N ) ≥ 0 for l + 1 ≤ k ≤ N , by (85),
= E{x′k Qxk + Ex′k Q̄Exk + u′k Ruk + Eu′k R̄Euk } immediately we can obtain Pl (N ) ≥ 0.
n
(1) Therefore, for any 0 ≤ k ≤ N , Pk (N ) ≥ 0.
− E [uk − Euk − Kk (xk − Exk )]′ Υk
o Moreover, using similar derivation with (85), from (16)-(19)
× [uk − Euk − Kk (xk − Exk )] we have that
(2) (2) (2) (2)
− [Euk−(Kk+K̄k )Exk ]′ Υk [Euk−(Kk+K̄k )Exk ], (81) [Mk (N )]′ [Υk (N )]−1 Mk (N )
(2) (2)
=−[Mk (N )]′ [Kk (N )+K̄k (N )]−[Kk (N )+K̄k (N )]′ Mk
where Kk and K̄k are respectively as in (14) and (15). Adding (2)
from k = 0 to k = N on both sides of (81), the cost function − [Kk (N ) + K̄k (N )]′ Υk (N )[Kk (N ) + K̄k (N )].
14
Thus, Pk (N ) + P̄k (N ) can be calculated as Since JN ≤ JN +1 , then for any initial state x0 , we have
∗ ∗
JN ≤ JN +1 , it holds from (89) that
Pk (N ) + P̄k (N )
= Q + Q̄ + (A + Ā)′ [Pk+1 (N ) + P̄k+1 (N )](A + Ā) E[(x0 − Ex0 )′ P0 (N )(x0 − Ex0 )]
+ σ 2 (C + C̄)′ Pk+1 (N )(C + C̄) + (Ex0 )′ [P0 (N ) + P̄0 (N )](Ex0 )
(2) (2)
− [Mk (N )]′ Υk (N )[Mk (N )]
(2) ≤ E[(x0 − Ex0 )′ P0 (N + 1)(x0 − Ex0 )]
=Q + Q̄ + [Kk (N )+K̄k (N )]′ (R+R̄)[Kk (N )+K̄k (N )] + (Ex0 )′ [P0 (N + 1) + P̄0 (N + 1)](Ex0 ). (90)
′
+ A+Ā+(B+B̄)[Kk (N )+K̄k (N )] [Pk+1 (N )+P̄k+1 (N)] For any initial state x0 6= 0 with Ex0 = 0, (90) can be

× A+Ā+(B+B̄)[Kk (N )+K̄k (N )]
reduced to
+ σ 2 {C + C̄ + (D + D̄)[Kk (N ) + K̄k (N )]}′ Pk+1 (N ) E[x′0 P0 (N )x0 ] ≤ E[x′0 P0 (N + 1)x0 ],
× {C + C̄ + (D + D̄)[Kk (N ) + K̄k (N )]}. (86)
i.e., E{x′0 [P0 (N ) − P0 (N + 1)]x0 } ≤ 0. By Lemma 1 and
Since Q + Q̄ ≥ 0 as in Assumption 1, and PN +1 = Remark 1, therefore we can obtain
P̄N +1 = 0, then PN +1 (N )+ P̄N +1 (N ) = PN +1 + P̄N +1 = 0.
Furthermore, using induction method as above, we conclude P0 (N ) ≤ P0 (N + 1), (91)
that Pk (N ) + P̄k (N ) ≥ 0 for any 0 ≤ k ≤ N . The proof is which implies that P0 (N ) increases with respect to N .
complete. On the other hand, for arbitrary initial state x0 6= 0 with
x0 = Ex0 , i.e., x0 ∈ Rn is arbitrary deterministic, equation
A PPENDIX D (90) indicates that
P ROOF OF L EMMA 5
Proof. If follows from Lemma 3 that Pk (N ) ≥ 0 and Pk (N )+ x′0 [P0 (N ) + P̄0 (N )]x0 ≤x′0 [P0 (N + 1) + P̄0 (N + 1)]x0 .
P̄k (N ) ≥ 0 for all N ≥ 0. Via a time-shift, we can obtain Note that x0 is arbitrary, then using Remark 1, we have
Pk (N ) = P0 (N − k). Therefore, what we need to show is
that there exists N̄0 > 0 such that P0 (N̄0 ) > 0 and P0 (N̄0 ) + P0 (N ) + P̄0 (N ) ≤ P0 (N + 1) + P̄0 (N + 1), (92)
P̄0 (N̄0 ) > 0.
Suppose this is not true, i.e., for arbitrary N > 0, P0 (N ) which implies that P0 (N ) + P̄0 (N ) increases with respect to
and P0 (N ) + P̄0 (N ) are both strictly semi-definite positive. N , too. Furthermore, the monotonically increasing of P0 (N )
Now we construct two sets as follows, and P0 (N ) + P̄0 (N ) indicates that
(1) ′
(1)
n • If E{[x ] P0 (N + 1)x(1) } = 0 holds, then we can
XN , x(1) : x(1) 6= 0, E{[x(1) ]′ P0 (N )x(1) } = 0, conclude E{[x(1) ]′ P0 (N )x(1) = 0};
(2) ′
• If [x ] [P0 (N + 1) + P̄0 (N + 1)]x(2) = 0, then we can
o
Ex(1) = 0 , (87)
n obtain [x(2) ]′ [P0 (N ) + P̄0 (N )]x(2) = 0.
(2) (1) (1) (2) (2)
XN , x(2) : x(2) 6= 0, [x(2) ]′ [P0 (N ) + P̄0 (N )]x(2) = 0, i.e., XN +1 ⊂ XN and XN +1 ⊂ XN .
(1) (2)
As {XN } and {XN } are both non-empty finite dimen-
o
x(2) = Ex(2) is deterministic . (88)
sional sets, thus
(1) (2) (1) (1) (1)
From Lemma 1 and Remark 1, we know that XN and XN 1 ≤ · · · ≤ dim(X2 ) ≤ dim(X1 ) ≤ dim(X0 ) ≤ n,
are not empty.
Recall from Theorem 2, to minimize the cost function (3) and
with the weighting matrices, coefficient matrices being time- (2) (2) (2)
1 ≤ · · · ≤ dim(X2 ) ≤ dim(X1 ) ≤ dim(X0 ) ≤ n.
invariant and final condition PN +1 (N ) = P̄N +1 (N ) = 0,
the optimal controller is given by (13), and the optimal cost where dim means the dimension of the set.
function is presented as (22), i.e., Hence, there exists positive integer N1 , such that for any
∗
JN N > N1 , we can obtain
N (1) (1) (2) (2)
X dim(XN ) = dim(XN1 ), dim(XN ) = dim(XN1 ),
= min{ E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ]}
(1) (1) (2) (2)
k=0 which leads to XN = XN1 , and XN = XN1 , i.e.,
N
X ′ ′ ′ ′ \ (1) (1)
\ (2) (2)
= E[x∗k Qx∗k + Ex∗k Q̄Ex∗k + u∗k Ru∗k + Eu∗k R̄Eu∗k ] XN = XN1 6= 0, XN = XN1 6= 0.
k=0 N ≥0 N ≥0
= E[x′0 P0 (N )x0 ] + (Ex0 )′ P̄0 (N )(Ex0 ) (1) (2)
Therefore, there exists nonzero x(1) ∈ XN1 and x(2) ∈ XN1
= E[(x0 − Ex0 )′ P0 (N )(x0 − Ex0 )]
satisfying
+ (Ex0 )′ [P0 (N ) + P̄0 (N )](Ex0 ). (89)
E{[x(1) ]′ P0 (N )x(1) } = 0, (93)
In the above equation, x∗k and u∗k represent the optimal state (2) ′ (2)
trajectory and the optimal controller, respectively. [x ] [P0 (N ) + P̄0 (N )]x = 0. (94)
15
1) Let the initial state of system (24) be x0 = x(1) , where As (Exk )′ Exk + E(xk − Exk )′ (xk − Exk ) = E(x′k xk ),
(1)
x is as defined in (87), then from (89) and using (93), the thus, equation (97) implies limk→+∞ (Exk )′ Exk = 0.
optimal value of the cost function can be calculated as Substituting (96) into (24), we can obtain
N
∗
X ′ ′ ′ ′ xk+1 = [(A + wk C) + (B + wk D)L]xk (98)
JN = E[x∗k Qx∗k+Ex∗k Q̄Ex∗k+u∗k Ru∗k+Eu∗k R̄Eu∗k ]
+[(B+wk D)L̄+(Ā+wk C̄)+(B̄+wk D̄)(L+L̄)]Exk ,
k=0
= E{[x(1) ]′ P0 (N )x(1) } = 0, (95) Exk+1 = [(A + Ā) + (B + B̄)(L + L̄)]Exk . (99)

where Ex(1) = 0 has been used in the last equality. Notice xk
Denote Xk , , and Xk , E[Xk Xk′ ].
that R > 0, R + R̄ > 0, Q ≥ 0 and Q + Q̄ ≥ 0, from (95), Exk
Following from (98) and (99), it holds
we obtain that
Xk+1 = AXk , (100)
u∗k = 0, Eu∗k = 0, 0 ≤ k ≤ N,
A11 A12
and where A = , A11 = (A+wk C)+(B+wk D)L,
′ ′
0 A22
0 = E[x∗k Qx∗k + Ex∗k Q̄Ex∗k ], 0 ≤ k ≤ N, A12 = (B+wk D)L̄+(Ā+wk C̄)+(B̄+wk D̄)(L+L̄), and A22 =
′
= E[(x∗k − Ex∗k )′ Q(x∗k − Ex∗k ) + Ex∗k (Q + Q̄)Ex∗k ], (A + Ā) + (B + B̄)(L + L̄).
The mean square stabilization of limk→+∞ E(x′k xk ) = 0
i.e., Q1/2 (x∗k − Ex∗k ) = 0, and (Q + Q̄)1/2 Ex∗k = 0. implies limk→+∞ Xk =0, thus, it follows from [2] that
By Assumption 3, (A, Ā, C, C̄, Q1/2 ) is exactly observa- ∞ ∞
1/2 X X
Q 0 xk − Exk E(x′k xk ) < +∞, and (Exk )′ (Exk ) < +∞.
tion, i.e., = 0 ⇒ x0 = 0,
0 Q + Q̄ Exk k=0 k=0
then we have x(1) = x0 = Ex0 = 0, which is a contradiction Therefore, there exists constant c such that
with x(1) 6= 0. ∞
Thus, there exists N̄0 > 0, such that P0 (N̄0 ) > 0.
X
E(x′k xk ) ≤ cE(x′0 x0 ). (101)
2) Let the initial state of system (24) be x0 = x(2) , where k=0
x(2) is given by (88), then by using (89) and (94), the minimum
Since Q ≥ 0, Q + Q̄ ≥ 0, R > 0 and R + R̄ > 0, thus
of cost function can be rewritten as Q 0
N there exists constant λ such that ≤ λI and
∗
X ′ ′ ′ ′ 0 Q + Q̄
JN = E[x∗k Qx∗k+Ex∗k Q̄Ex∗k+u∗k Ru∗k+Eu∗k R̄Eu∗k ] ′
L RL 0

k=0 ≤ λI, using (96) and
0 (L + L̄)′ (R + R̄)(L + L̄)
= [x(2) ]′ [P0 (N ) + P̄0 (N )]x(2) = 0. (101), we obtain that
Using similar method with that in 1), by Assumption 3, ∞
X
we can conclude that x(2) = x0 = Ex0 = 0, which is a J= E[x′k Qxk + u′k Ruk + Ex′k Q̄Exk + Eu′k R̄Euk ]
contradiction with x(2) 6= 0. k=0
∞
In conclusion, there exists N̄0 > 0 such that P0 (N̄0 ) > 0 X n
E x′k (Q + L′ RL)xk + Ex′k Q̄ + L′ RL̄ + L̄′ RL

=
and P0 (N̄0 ) + P̄0 (N̄0 ) > 0. Via a time-shift, hence we have,
k=0
for any k ≥ 0, there exists a positive integer N0 ≥ 0 such that o
+ L̄′ RL̄ + (L + L̄)′ R̄(L + L̄) Exk

Pk (N0 ) > 0 and Pk (N0 ) + P̄k (N0 ) > 0.
The proof is complete. ∞ n ′ o
X xk−Exk Q 0 xk−Exk
= E
Exk 0 Q + Q̄ Exk
A PPENDIX E k=0
P ROOF OF T HEOREM 3 ∞ ′
L′ RL

xk−Exk 0
X n
+ E
Proof. 1) Firstly, from the proof of Lemma 5, we know that Exk 0 (L+L̄)′ (R+R̄)(L+L̄)
k=0
P0 (N ) and P0 (N )+ P̄0 (N ) are monotonically increasing, i.e.,
xk−Exk o
for any N > 0, ×
Exk
P0 (N ) ≤ P0 (N + 1), ∞
X
P0 (N ) + P̄0 (N ) ≤ P0 (N + 1) + P̄0 (N + 1). ≤ 2λ E[(Exk )′ Exk + (xk − Exk )′ (xk − Exk )]
k=0
Next we will show that P0 (N ) and P0 (N ) + P̄0 (N ) are X∞
bounded. Since system (24) is stabilizable in the mean square = 2λ E(x′k xk ) ≤ 2λcE(x′0 x0 ). (102)
sense, there exists uk has the form k=0
uk = Lxk + L̄Exk , (96) On the other hand, by (22), notice the fact that
with constant matrices L and L̄ such that the closed-loop E[x′0 P0 (N )x0 ] + (Ex0 )′ P̄0 (N )(Ex0 ) = JN
∗
≤ J,
system (24) satisfies thus, (102) yields
lim E(x′k xk ) = 0. (97) E[x′0 P0 (N)x0 ]+(Ex0 )′ P̄0 (N)(Ex0 )≤2λcE(x′0 x0 ). (103)
k→+∞
16
Now we let the state initial value be random vector with zero ≥ 0. (109)
mean, i.e., Ex0 = 0, it follows from (103) that
We claim that V (k, xk ) monotonically decreases. Actually,
E[x′0 P0 (N )x0 ] ≤ 2λcE(x′0 x0 ). following the derivation of (81), we have
Since x0 is arbitrary with Ex0 = 0, by Lemma 1 and Remark V (k, xk ) − V (k + 1, xk+1 )
1, we have
= E[x′k Qxk + Ex′k Q̄Exk + u′k Ruk + Eu′k R̄Euk ]
P0 (N ) ≤ 2λcI. − E{[uk − Euk − K(xk − Exk )]′ Υ(1)
Similarly, let the state initial value be arbitrary deterministic × [uk − Euk − K(xk − Exk )]}
i.e., x0 = Ex0 , (103) yields that − [Euk − (K + K̄)Exk ]′ Υ(2) [Euk − (K + K̄)Exk ]
x′0 [P0 (N ) + P̄0 (N )]x0 = JN
∗
≤ J ≤ 2λcx′0 x0 , = E[x′k Qxk + Ex′k Q̄Exk + u′k Ruk + Eu′k R̄Euk ]
which implies ≥ 0, k ≥ 0, (110)
P0 (N ) + P̄0 (N ) ≤ 2λcI. where uk = Kxk + K̄Exk is used in the last identity. The
last inequality implies that V (k, xk ) decreases with respect to
Therefore, both P0 (N ) and P0 (N ) + P̄0 (N ) are bounded. k, also from (109) we know that V (k, xk ) ≥ 0, thus V (k, xk )
Recall that P0 (N ) and P0 (N ) + P̄0 (N ) are monotonically is convergent.
increasing, we conclude that P0 (N ) and P0 (N ) + P̄0 (N ) are Let l be any positive integer, by adding from k = l to
convergent, i.e., there exists P and P̄ such that k = l + N on both sides of (110), we obtain that
lim Pk (N ) = lim P0 (N − k) = P, l+N
N →+∞ N →+∞ X
E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ]
lim P̄k (N ) = lim P̄0 (N − k) = P̄ . k=l
N →+∞ N →+∞
= [V (l, xl ) − V (l + N + 1, xl+N +1 )]. (111)
(1)
Furthermore, in view of (16)-(19), we know that Υk (N ),
(1) (2) (2)
Mk (N ), Υk (N ) and Mk (N ) are convergent, i.e., Since V (k, xk ) is convergent, then by taking limitation of
l on both sides of (111), it holds
(1)
lim Υk (N ) = Υ(1) ≥ R > 0, (104) l+N
N →+∞ X
(1)
lim Mk (N ) =M (1)
, (105) lim E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ]
l→+∞
N →+∞ k=l
(2) (2) = lim [V (l, xl ) − V (l + N + 1, xl+N +1 )] = 0. (112)
lim Υ (N ) =Υ ≥ R + R̄ > 0, (106)
N →+∞ k l→+∞
(2)
lim Mk (N ) = M (2) . (107) Recall from (89) that
N →+∞
N
where Υ(1) , M (1) , Υ(2) , M (2) are given by (32)-(35). Taking
X
JN = E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ]
limitation on both sides of (20) and (21), we know that P and k=0
P̄ satisfy the coupled ARE (30)-(31). ∗
≥ JN = E[x′0 P0 (N )x0 ] + Ex′0 P̄0 (N )Ex0 . (113)
2) From Lemma 5, for any k ≥ 0, there exists N0 > 0 such
that, Pk (N0 ) > 0 and Pk (N0 ) + P̄k (N0 ) > 0, hence we have Thus, taking limitation on both sides of (113), via a time-
shift of l and using (112), it yields that
P = lim Pk (N ) ≥ Pk (N0 ) > 0,
N →+∞
l+N
P+P̄ = lim [Pk (N )+P̄k (N )] ≥ Pk (N0 )+P̄k (N0 ) > 0.
X
N →+∞ 0= lim E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ]
l→+∞
k=l
This ends the proof.
≥ lim E x′l Pl (l + N )xl + Ex′l P̄l (l + N )Exl

l→+∞
A PPENDIX F
n
= lim E (xl − Exl )′ Pl (l + N )(xl − Exl )
P ROOF OF T HEOREM 4 l→+∞
o
Proof. “Sufficiency”: Under Assumptions 2 and 3, we suppose + Ex′l [Pl (l + N ) + P̄l (l + N )]Exl
that P and P̄ are the solution of (30)-(31) satisfying P > 0 n
and P + P̄ > 0, we will show (36) stabilizes (24) in mean = lim E (xl − Exl )′ P0 (N )(xl − Exl )
l→+∞
square sense. o
Similar to (80), we define the Lyapunov function candidate + Ex′l [P0 (N ) + P̄0 (N )]Exl ≥ 0. (114)
V (k, xk ) as
Hence, it follows from (114) that
V (k, xk ) , E(x′k P xk ) + Ex′k P̄ Exk . (108)
lim E[(xl − Exl )′ P0 (N )(xl − Exl )] = 0, (115)
Apparently we have l→+∞
lim Ex′l [P0 (N ) + P̄0 (N )]Exl = 0. (116)
V (k, xk ) = E[(xk−Exk )′ P (xk−Exk )+Ex′k (P+P̄ )Exk ] l→+∞
17
By Lemma 5, we know that there exists N0 ≥ 0 such that + [T (1) ]′ [∆(1) ]−1 T (1) − [T (2) ]′ [∆(2) ]−1 T (2) , (121)
P0 (N ) > 0 and P0 (N ) + P̄0 (N ) > 0 for any N > N0 . Thus
where
from (115) and (116), we have
∆(1) = R + B ′ SB + σ 2 D′ SD,
lim E[(xl−Exl )′ (xl−Exl )]=0, lim Ex′l Exl =0, (117)
l→+∞ l→+∞
T (1) = B ′ SA + σ 2 D′ SC,
which indicates that liml→+∞ E(x′l xl )
= 0. ∆(2) = R + R̄ + (B + B̄)′ (S + S̄)(B + B̄)
In conclusion, (36) stabilizes (24) in the mean square sense.
+ σ 2 (D + D̄)′ S(D + D̄),
Next we will show that controller (36) minimizes the cost
function (25). For (110), adding from k = 0 to k = N , we T (2) = (B + B̄)′ (S + S̄)(A + Ā)
have + σ 2 (D + D̄)′ S(C + C̄).
N
X
E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ] Notice that the optimal cost function has been proved to be
(39), i.e.,
k=0
= V (0, x0 ) − V (N + 1, xN +1 ) J ∗ = E(x′0 P x0 ) + Ex′0 P̄ Ex0
N
X n = E(x′0 Sx0 ) + Ex′0 S̄Ex0 . (122)
+ E [uk − Euk − K(xk − Exk )]′ Υ(1)
k=0 For any initial state x0 satisfying x0 6= 0 and Ex0 = 0,
equation (122) implies that
o
× [uk − Euk − K(xk − Exk )]
N E[x′0 (P − S)x0 ] = 0,
X
+ [Euk−(K+K̄)Exk ]′ Υ(2) [Euk−(K+K̄)Exk ]. (118) By Lemma 1 and Remark 1, we can conclude that P = S.
k=0 Moreover, if x0 = Ex0 is arbitrary deterministic initial
Moreover, following from (108) and (117), we have that state, it follows from (122) that
0≤ lim V (k, xk )= lim E{x′k P xk + Ex′k P̄ Exk } = 0. x′0 (P + P̄ − S − S̄)x0 = 0,
k→+∞ k→+∞
Thus, taking limitation of N → +∞ on both sides of (118) which indicates P + P̄ = S + S̄.

and noting (25), we have Hence we have S = P and S̄ = P̄ , i.e., the uniqueness has
∞
been proven. The proof is complete.
X
J= E[x′k Qxk+Ex′k Q̄Exk+u′k Ruk+Eu′k R̄Euk ]
k=0
A PPENDIX G
= E[(x0 − Ex0 )′ P (x0 − Ex0 )] + Ex′0 (P + P̄ )Ex0 P ROOF OF T HEOREM 5
∞
X n Proof. “Necessity:” Under Assumption 2 and 4, suppose
+ E [uk − Euk − K(xk − Exk )]′ Υ(1) mean-field system (24) is stabilizable in mean square sense,
k=0 we will show that the coupled ARE (30)-(31) has a unique
solution P and P̄ with P ≥ 0 and P + P̄ ≥ 0.
o
× [uk − Euk − K(xk − Exk )]
Actually, from (89)-(92) in the proof of Lemma 5, we know
∞
X that P0 (N ) and P0 (N )+ P̄0 (N ) are monotonically increasing,
+ [Euk−(K+K̄)Exk ]′ Υ(2) [Euk−(K+K̄)Exk ]. (119)
then following the lines of (96)-(103), the boundedness of
k=0
P0 (N ) and P0 (N ) + P̄0 (N ) can be obtained. Hence, P0 (N )
Note that Υ(1) > 0 and Υ(2) > 0, following the discussion and P0 (N ) + P̄0 (N ) are convergent. Then there exists P and
in the sufficiency proof of Theorem 2, thus, the cost func- P̄ such that
tion (25) can be minimized by controller (36). Furthermore,
directly from (119), the optimal cost function can be given as lim Pk (N ) = lim P0 (N − k) = P,
N →+∞ N →+∞
(39). lim P̄k (N ) = lim P̄0 (N − k) = P̄ .
“Necessity”: Under Assumptions 2 and 3, if (24) is sta- N →+∞ N →+∞
blizable in mean square sense, we will show that the coupled From Lemma 3, we know that Pk (N ) ≥ 0 and Pk (N ) +
ARE (30)-(31) has unique solution P and P + P̄ satisfying P̄k (N ) ≥ 0, thus we have P ≥ 0 and P + P̄ ≥ 0. Furthermore,
P > 0 and P + P̄ > 0. The existence of the solution to in view of (16)-(19), Υ(1) , Υ(2) , M (1) , M (2) in (32)-(35) can
(30)-(31) satisfying P > 0 and P + P̄ > 0 has been verified be obtained. Taking limitation on both sides of (20) and (21),
in Theorem 3. The uniqueness of the solution remains to be we know that P and P̄ satisfy the coupled ARE (30) and
shown. (31). Under Assumption 2, Lemma 4 yields that Problem 1
Let S and S̄ be another solution of (30)-(31) satisfying S > has a unique solution, then following the steps of (120)-(122)
0 and S + S̄ > 0, i.e., in Theorem 4, the uniqueness of P and P̄ can be obtained.
Finally, taking limitation on both sides of (13) and (22), the
S = Q + A′ SA + σ 2 C ′ SC −[T (1) ]′ [∆(1) ]−1 T (1) , (120)
′ 2 ′ ′ 2 ′
unique optimal controller can be given as (36), and optimal
S̄ = Q̄ + A S Ā + σ C S C̄ + Ā SA + σ C̄ SC cost function is presented by (39). The necessity proof is
+ Ā′ S Ā + σ 2 C̄ ′ S C̄ + (A + Ā)′ S̄(A + Ā) complete.
18
“Sufficiency:” Under Assumption 2 and 4, if P and P̄ are i.e., (A, Ā, C, C̄, Q1/2 ), implies that the following system is
the unique solution to (30)-(31) satisfying P ≥ 0 and P + P̄ ≥ exactly detectable
0, we will show that (36) stabilizes system (24) in mean square
Xk+1 = ÃXk + C̃Xk wk ,
sense. (128)
Ỹk = Q̃1/2 Xk .
Following from (85)-(86), the coupled ARE (30)-(31) can
be rewritten as follows: i.e., for any N ≥ 0,
P = Q + K ′ RK + (A + BK)′ P (A + BK) Ỹk = 0, ∀ 0 ≤ k ≤ N ⇒ lim E(X′k Xk ) = 0.

k→+∞
+ σ 2 (C + DK)′ P (C + DK), (123) Now we will show that the initial state X0 is an unobserv-
′
P + P̄ = Q + Q̄ + (K + K̄) (R + R̄)(K + K̄) able state of system (128), i.e., (Ã, C̃, Q̃1/2 ) for simplicity, if
+ [A + Ā + (B + B̄)(K + K̄)]′ (P + P̄ ) and only if X0 satisfies E(X′0 PX0 ) = 0.
In fact, if X0 satisfies E(X′0 PX0 ) = 0, from (126) we have
× [A + Ā + (B + B̄)(K + K̄)]
N
+ σ 2 [C + C̄ + (D + D̄)(K + K̄)]′ P X
0≤ E(X′k Q̃Xk ) = −E(X′N +1 PXN +1 ) ≤ 0, (129)
× [C + C̄ + (D + D̄)(K + K̄)], (124) k=0
PN
in which K and K̄ are respectively given as (37) and (38). i.e., k=0 E(X′k Q̃Xk ) = 0. Thus, we can obtain
Recalling that the Lyapunov function candidate is denoted N
X N
X
as in (108) and using optimal controller (36), we rewrite (110) E(Yk′ Yk ) = E(X′k Q̃Xk ) = 0,
as k=0 k=0
V (k, xk ) − V (k + 1, xk+1 ) which means for any k ≥ 0, Ỹk = Q̃1/2 Xk = 0. Hence, X0 is

an unobservable state of system (Ã, C̃, Q̃1/2 ).
= E{x′k (Q + K ′ RK)xk + Ex′k [Q̄ + K̄ ′ RK + K ′ RK̄
On the contrary, if we choose X0 as an unobservable
+ K̄ ′ RK̄ + (K + K̄)′ R̄(K + K̄)]Exk } state of (Ã, C̃, Q̃1/2 ), i.e., Ỹk = Q̃1/2 Xk ≡ 0, k ≥
= E{(xk − Exk )′ (Q + K ′ RK)(xk − Exk ) + Ex′k [Q 0. Noting that (Ã, C̃, Q̃1/2 ) is exactly detectable, it holds
+ Q̄ + (K + K̄)′ (R + R̄)(K + K̄)]Exk } limN →+∞ E(X′N +1 PXN +1 ) = 0. Thus, from (126) we can
obtain that
= E(X′k Q̃Xk ) ≥ 0. (125) ∞ ∞
X X
E(X′0 PX0 )= E(X′k Q̃Xk )= E(Ỹk′ Ỹk )=0. (130)
Q+K ′ RK

0
where Q̃= ′ ≥0, k=0 k=0
0 Q+Q̄+(K+K̄) (R+R̄)(K+K̄)
Therefore, we have shown that X0 is an unobservable state

xk − Exk
and Xk = . if and only if X0 satisfies E(X′0 PX0 ) = 0.
Exk
Taking summation on both sides of (125) from 0 to N for Next we will show system (24) is stabilizable in mean
any N > 0, we have that square sense in two different cases.
1) P > 0, i.e., P > 0 and P + P̄ > 0.
N
X In this case, E(X′0 PX0 ) = 0 implies that X0 = 0, i.e.,
E(X′k Q̃Xk ) = V (0, x0 ) − V (N + 1, xN +1 ) x0 = Ex0 = 0. Following the discussions as above we know
k=0 that system (Ã, C̃, Q̃1/2 ) is exactly observable. Thus it follows
= E(x′0 P x0 ) + (Ex0 )′ P̄ Ex0 from Theorem 4 that mean-field system (24) is stabilizable in
− [E(x′N +1 P xN +1 ) + (ExN +1 )′ P̄ ExN +1 ] mean square sense.
= E(X′0 PX0 ) − E(X′N +1 PXN +1 ), (126) 2) P ≥ 0.
Firstly, it is noticed from (123) and (124) that P satisfies
the following Lyapunov equation:

P 0
in which P = .
0 P + P̄
P = Q̃ + Ã′ PÃ + σ 2 [C̃(1) ]′ PC̃(1) + σ 2 [C̃(2) ]′ PC̃(2) , (131)
Using the symbols denoted above, mean-field system (24)
with controller (36) can be rewritten as (1) C+DK 0 (2) 0 C+C̄+(D+D̄)(K+K̄)
where C̃ = , C̃ =
0 0 0 0
Xk+1 = ÃXk + C̃Xk wk , (127) (1) (2)
and C̃ + C̃ = C̃.
Since P ≥ 0, thus there exists orthogonal matrix U with
A+BK 0 U ′ = U −1 such that
where Ã = and C̃ =
0 A+ Ā+(B+
B̄)(K+K̄)
0 0
C+DK C+C̄+(D+D̄)(K+K̄)
. Thus, the stabilization of U ′ PU = , P2 > 0. (132)
0 0 0 P2
system (24) with controller (36) is equivalent to the stability Obviously from (131) we can obtain that
of system (127), i.e., (Ã, C̃) for short.
Following the proof of Theorem 4 and Proposition 1 in U ′ PU = U ′ Q̃U + U ′ Ã′ U · U ′ PU · U ′ ÃU
[22], we know that the exactly detectability of system (26), + σ 2 U ′ [C̃(1) ]′ U · U ′ PU · U ′ C̃(1) U
19
+ σ 2 U ′ [C̃(2) ]′ U · U ′ PU · U ′ C̃(2) U. (133) Thirdly, the stability of (Ã11 , C̃11 ) will be shown as below.
(2)
We might as well choose X̄0 = 0, then from (137) we have
′ Ã11 Ã12 ′ Q̃1 Q̃12 (2)
X̄k = 0 for any k ≥ 0. In this case, (136) becomes
Assume U ÃU = , U Q̃U = ,
" #Ã21 Ã22 " Q̃21 #Q̃2
(1) (1)
C̃11 C̃12
(2)
C̃11 C̃12
(2) Zk+1 = Ã11 Zk + C̃11 Zk wk , (140)
U ′ C̃(1) U = (1) (1) and U C̃
′ (2)
U = (2) (2) , we
C̃21 C̃22 C̃21 C̃22 (1) (2)
where Zk is the value of X̄k with X̄k = 0. Thus, for
have that (1)
arbitrary initial state Z0 = X̄0 , we have
′
Ã21 P2 Ã21 Ã′21 P2 Ã22 (2) (2)
U ′ Ã′ U·U ′ PU·U ′ ÃU = , E[Ỹk′ Ỹk ] = E[X′k Q̃Xk ] = E[(X̄k )′ Q̃2 X̄k ] ≡ 0. (141)
Ã′22 P2 Ã21 Ã′22 P2 Ã22
From the exactly detectability of (Ã, C̃, Q̃1/2 ), it holds
" #
(1) (1) (1) (1)
′ (1) ′ ′ ′ (1) {C̃21 }′ P2 C̃21 {C̃21 }′ P2 C̃22
U {C̃ } U·U PU·U C̃ U = (1) (1) (1) (1)
{C̃22 }′ P2 C̃21 {C̃22 }′ P2 C̃22 lim E(X̄′k X̄k )= lim E(X̄′k U ′ U X̄k )= lim E(X′k Xk )=0.
" # k→+∞ k→+∞ k→+∞
(2) (2) (2) (2)
′ (2) ′ ′ ′ (2) {C̃21 }′ P2 C̃21 {C̃21 }′ P2 C̃22 (142)
U {C̃ } U·U PU·U C̃ U = (2) (2) (2) (2)
{C̃22 }′ P2 C̃21 {C̃22 }′ P2 C̃22 (2)
Therefore, in the case of X̄0 = 0, (142) indicates that
Thus, by comparing each block element on both sides of (133) (1) (1)
(1) (2) lim E(Z′k Zk )= lim E[(X̄k )′ X̄k ] (143)
and noting P2 > 0, we have that Ã21 = 0, C̃21 = C̃21 = 0 k→+∞ k→+∞
and Q̃1 = Q̃12 = Q̃21 = 0, i.e., (1) (1) (2) (2)
= lim {E[(X̄k )′ X̄k ]+E[(X̄k )′ X̄k ]}= lim E(X̄′k X̄k )=0.
k→+∞ k→+∞
′ Ã11 Ã12 ′ C̃11 C̃12 ′ 0 0
U ÃU = , U C̃U = , U Q̃U = , i.e., (Ã11 , C̃11 ) is mean square stable.
0 Ã22 0 C̃22 0 Q̃2
(134) Finally we will show that system (24) is stabilizable in

Ã11 0
(1) (2) (1) (2) mean square sense. In fact, we denote Ã = ,
where Q̃2 ≥ 0, C̃11 = C̃11 + C̃11 , C̃12 = C̃12 + C̃12 and 0 Ã22
(1) (2)

C̃22 = C̃22 + C̃22 . C̃ 0
C˜ = 11 . Hence, (136)-(137) can be reformulated as
Substituting (132) and (134) into (133) yields that 0 C̃22
(1) (1) (2) (2)

P2 = Q̃2 + Ã′22 P2 Ã22 + σ 2 {C̃22 }′ P2 C̃22 + σ 2 {C̃22 }′ P2 C̃22 . X̄k+1 ={ÃX̄k +
Ã12
Uk }+{C̃ X̄k+
C̃12
Uk }wk , (144)
" # (135) 0 0
(1)
X̄ (2) where Uk is as the solution to equation (137) with initial con-
Define U ′ Xk = X̄k = k
(2) , where the dimension of X̄k (2)
X̄k dition U0 = X0 . The stability of (Ã11 , C̃11 ) and (Ã22 , C̃22 )
is the same as the rank of P2 . Thus, from (127) we have ˜ is stable in mean square
as proved above indicates that (Ã, C)
sense.
PObviously from (139) it holds limk→+∞ E(U′k Uk ) = 0
U ′ Xk+1 = U ′ ÃU U ′ Xk + U ′ C̃U U ′ Xk wk , ∞ ′
and k=0 E(Uk Uk ) < +∞. By using Proposition 2.8 and
i.e., Remark 2.9 in [10], we know that there exists constant c0
such that
(1) (1) (2) (1) (2)
X̄k+1 = Ã11 X̄k +Ã12 X̄k +(C̃11 X̄k +C̃12 X̄k )wk , (136) X∞ ∞
X
(2) (2) (2) E(X̄′k X̄k ) < c0 E(U′k Uk ) < +∞. (145)
X̄k+1 = Ã22 X̄k + C̃22 X̄k wk . (137)
k=0 k=0
Next we will show the stability of (Ã22 , C̃22 ). Hence, limk→+∞ E(X̄′k X̄k )= 0 can be obtained from (145).
Actually, recall from (126) and (134), we have that Furthermore, it is noted from (142) that
N N
X (2) (2)
X lim E(x′k xk )= lim [(xk−Exk )′ (xk−Exk )+Ex′k Exk ]
E[(X̄k )′ Q̃2 X̄k ] = E(X′k Q̃Xk ) k→+∞ k→+∞
k=0 k=0 = lim E(X′k Xk )= lim E(X̄′k X̄k )=0.

k→+∞ k→+∞
= E(X′0 PX0 ) − E(X′N +1 PXN +1 )
(2) (2) (2) (2) Note that system (Ã, C̃) given in (127) is exactly mean-field
= E[(X̄0 )′ P2 X̄0 ] − E[(X̄N +1 )′ P2 X̄N +1 ]. (138)
system (24) with controller (36). In conclusion, mean-field
Similar to the discussions from (129) to (130), we conclude system (24) can be stabilizable in the mean square sense. The
(2) 1/2 proof is complete.
X̄0 is an unobservable state of (Ã22 , C̃22 , Q̃2 ) if and only
(2) (2) (2)
if X̄0 obeys E[(X̄0 )′ P2 X̄0 ] = 0. Since P2 > 0, thus
1/2
(Ã22 , C̃22 , Q̃2 ) is exactly observable as discussed in 1). R EFERENCES
Therefore, following from Theorem 4, we know that [1] Abou-Kandil, H., Freiling, G., Ionescu, V., & Jank, G. (2003). Matrix
Riccati equations in control and systems theory. Basel: Birkhäuser.
(2) (2)
lim E(X̄k )′ X̄k = 0, (139) [2] Ait Rami, M., Chen, X., Moore, J. B., & Zhou, X. (2001). Solvability
k→+∞ and asymptotic behavior of generalized Riccati equations arising in
indefinite stochastic LQ controls. IEEE Transactions on Automatic
i.e., (Ã22 , C̃22 ) is stable in mean square sense. Control, 46(3), 428-440.
20
[3] Ait Rami, M., & Zhou, X. (2000). Linear matrix inequalities, Riccati
equations, and indefinite stochastic linear quadratic control. IEEE Trans-
actions on Automatic Control, 45(6), 1131-1142.
[4] Anderson, B. D. O., & Moore, J. B. (2007). Optimal control: linear
quadratic methods. Dover Publications.
[5] Buckdahn, R., Djehiche, B., & Li, J. (2011). A general stochastic
maximum principle for SDEs of mean-field type. Applied Mathematics
and Optimization, 64(2), 197-216.
[6] Buckdahn, R., Djehiche, B., Li, J., & Peng, S. (2009). Mean-field
backward stochastic differential equations: a limit approach. Annals of
Probability, 37, 1524-1565.
[7] Dawson, D. A. (1983). Critical dynamics and fluctuations for a mean-
field model of cooperative behavior. Journal of Statistical Physics, 31,
29-85.
[8] Dawson, D. A. & Gärtner, J. (1987). Large deviations from the McKean-
Vlasov limit for weakly interacting diffusions. Stochastics: An Interna-
tional Journal of Probability and Stochastic Processes, 20(4), 247-308.
[9] Elliott, R. J., Li, X., & Ni, Y. (2013). Discrete time mean-field stochastic
linear quadratic optimal control problems. Automatica, 49(11), 3222-
3233.
[10] El Bouhtouri, A., Hinrichsen,D., & Pritchard, A.J. (1998). H∞ -type
control for discrete-time stochastic systems. International Journal of
Robust and Nonlinear Control, 9, 923-948.
[11] Gärtner, J. (1988). On the McKean-Vlasov limit for interacting diffu-
sions. Mathematische Nachrichten, 137, 197-248.
[12] Huang, Y., Zhang, W., & Zhang, H. (2008). Infinite horizon linear
quadratic optimal control for discrete-time stochastic systems. Asian
Journal of Control, 10(5), 608-615.
[13] Huang, J., Li, X., & Yong, J. (2015). A linear quadratic optimal con-
trol problem for mean-field stochastic differential equations in infinite
horizon. Mathematical Control and Related Fields, 5(1), 97-139.
[14] Kac, M. (1956). Foundations of kinetic theory. In Proceedings of The
third Berkeley symposium on mathematical statistics and probability, 3,
171-197.
[15] Li, Z., Wang, Y., Zhou, B., & Duan, G. (2009). Detectability and
observability of discrete-time stochastic systems and their applications.
Automatica, 45, 1340-1346.
[16] McKean, H. P. (1966). A class of Markov processes associated with
nonlinear parabolic equations. Proceedings of the National Academy of
Sciences of the United States of America, 56, 1907-1911.
[17] Ni, Y., Elliott, R. J., & Li, X. (2015) Discrete-time mean-field Stochastic
linear-quadratic optimal control problems, II: Infinite horizon case,
Automatica, 57, 65-77.
[18] Qi, Q., & Zhang, H. Optimal control and stabilization for mean-field
systems, part II: continuous-time case. To be submmited.
[19] Yong, J., & Zhou, X.(1999). Stochastic controls: Hamiltonian systems
and HJB equations. New York: Springer.
[20] Yong, J. (2013). A linear-quadratic optimal control problem for mean-
field stochastic differential equations. SIAM Journal on Control and
Optimization, 51(4), 2809-2838.
[21] Zhang, H., & Qi, Q. Optimal control for mean-field system: discrete-time
case. Accepted by the 55th IEEE Conference on Decision and Control,
2016, USA.
[22] Zhang, W., & Chen, B. S. (2004). On stabilizability and exact observabil-
ity of stochastic systems with their application. Automatica, 40, 87-94.
[23] Zhang, W., Zhang, H., & Chen, B. S. (2008). Generalized Lya-
punov Equation Approach to State-Dependent Stochastic Stabiliza-
tion/Detectability Criterion. IEEE Transactions on Automatic Control,
53(7), 1630-1642.

1608 06363 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1608 06363 PDF

Uploaded by

Copyright:

Available Formats

1

A Complete Solution to Optimal Control and

II. F INITE H ORIZON M EAN - FIELD LQ C ONTROL P ROBLEM B. Maximum Principle

with final condition uk = Kk xk + K̄k Exk , (13)

∂fk ∂fk k ∂fk ∂fk (1) (1) (2) (2)

In this section, the infinite horizon mean-field stochastic LQ Yk = 0, ∀ 0 ≤ k ≤ N ⇒ lim E(x′k xk ) = 0.

Using the designed controller, the simulation of system state

Fig. 2. Simulation for the state trajectory E(x′k xk ).

Fig. 1. The mean square stabilization of mean-field system.

To explore the effectiveness of the main results presented

cost function with uk . gEul

δx0 = δEx0 = 0. (46)

(1) (2) (3)

− u′k Bk′ Pk+1 Ak + σ 2 Dk′ Pk+1 Ck xk

Thus, taking limitation of N → +∞ on both sides of (118) which indicates P + P̄ = S + S̄.

P = Q + K ′ RK + (A + BK)′ P (A + BK) Ỹk = 0, ∀ 0 ≤ k ≤ N ⇒ lim E(X′k Xk ) = 0.

V (k, xk ) − V (k + 1, xk+1 ) which means for any k ≥ 0, Ỹk = Q̃1/2 Xk = 0. Hence, X0 is

k=0 k=0 = lim E(X′k Xk )= lim E(X̄′k X̄k )=0.

You might also like