You are on page 1of 6

A simple approximate dynamic programing based on integral

sliding mode control for unknown linear systems with input


Nguyen Thanh Long, Nguyen Van Huong, Dao Phuong Nam, Mai Xuan Sinh, Nguyen Thu Ha

Abstract— This work presents an adaptive optimal control inevitable in most of practical systems, it is necessary to find
algorithm based integral sliding mode control law for a class of a optimal control scheme in presence of disturbances and
continuous-time systems with input disturbance or uncertain uncertainties. Based on the redefined infinite horizon cost
and unknown parameters. The main objective is to find a general function and the nominal system, Wang et al. [1] and Liu et
form of integral sliding mode control law can assure that the al. [2] presented a novel strategy to design the robust
system states are forced to reach a sliding surface in the initial
controller for a class of nonlinear systems with perturbations,
finite time. Then, an adaptive optimal control based on the
approximate dynamic programming method is responsible for which are bounded by a known state-dependent function.
the robust stability of the closed-loop system. Finally, the Combining the two-player zero-sum differential game theory
theoretical analysis and simulation results demonstrate the and ADP, the nonlinear H f control problems were
performance of the proposed algorithm for a wheel inverted approximately solved in [5]–[7].
pendulum system. In this paper, we propose the combined idea of
approximate dynamic programming and integral sliding mode
Keywords – approximate/adaptive dynamic programing control for the purpose of presenting the controllers for
(ADP), Wheel inverted pendulum (WIP) system, Integral unknown systems with disturbance input. Unlike the general
sliding mode control (ISMC), robust stability. ISM control, the controller contains the part, which is learnt
online, and guarantee the stability and the nearly optimal
I. INTRODUCTION performance of the sliding-mode dynamics.
Sliding mode control (SMC) has been considered since II. PROBLEM STATEMENT
the 50s of the 19th century (Utkin, 1977, 1992; Pisano and
We study a class of continuous-time systems described
Usai, 2011), and many studies about the sliding mode control
method have been investigated in recent years (Man and Yu,
1997; Drakunov, 1992; Ting et al., 2012). The most advantage x Ax  B u  f x, u,t 1
feature of sliding mode control has been described that it where x  n is the measured component of the state
consists in the complete compensation of the so-called
matched disturbances (i.e., disturbances acting on the control available for feedback control, u  m m d n is the input.
input channel) when the system is in the sliding phase and a Suppose that A  nun is unknown constant matrix,
sliding mode is enforced. This latter develope when the state f x, u, t  m is the disturbance or/and uncertain of
is on a suitable subspace of the state space, called sliding
manifold (or sliding surface).
Assumption 1: The matrix B has linearly independent
The integral sliding mode (ISM) technique was first
proposed in [4], [8] as a solution to the reaching phase columns, i.e. rank B m .
problem for systems with matched disturbances only. In order Assumption 2: There exist a constant value U ! 0 , a
to avoid the phase and to obtain a robustness from the initial
continuous function P . , a continuous function O t such
time, the concept of integral sliding mode has been introduced
[3], [9]. The ISM has find many application in industrial that 0 d O t d Omax  1; t and the disturbance and uncertain
process like robots and electromechanical systems, etc, [10], of system satisfied:
Furthermore, reinforcement learning (Sutton & Barto,
1998) [16] and approximate/ adaptive dynamic programming
f x, u,t  U  P x t  O t u t
(ADP) (Werbos, 1974) [17] theories have been broadly
applied for solving optimal control problems for uncertain Remark 1: The disturbance O t u t of f x, u,t is a
systems in recent years. Because disturbances are always
component that has not been mentioned in previous articles

Dao Phuong Nam, Nguyen Thu Ha, Nguyen Van Huong, Mai Xuan Sinh Nguyen Thanh Long is with Hung Yen University of Technology and
are with Hanoi University of Science and Technology (e-mail: Education
using ADP algorithms. Because of this component, the Integrating 8 over the time interval 0 d W d t we obtain:
creation of an ISM controller is needed to solve the problem.
V 1/ 2 t  V 1/ 2 0 d  ct
We define B is the Moore-Penrose pseudo inverse of
+ 2
matrix. By assumption 1, B + can be computed as:
Consequently, V t can reach zero in a finite time t s that is
B + T
B T bounded by:

2V 1/ 2 0
Assumption 3: There exists a number V ! 0 such that ts d
1 c
MA  V , where M B+ T
BT .
Define the sliding surface as follows: In view of 3 , it is known that V t
0 when t 0 .
Based on the above proof, we derive that for all t t 0 we
^x  n
: s t 0` 2 have:

where s t is defined as: s t s t 0

t Therefore, the proof of theorem 1 is completed.

s t M x  x0  ³ v W d W 3 Remark 2: It is necessary to ensure that the time of
convergence of sliding surface is finite.
with v is later designed. Remark 3: In practice, the discontinuous function may
Theorem 1: The control signal u v  k t , with cause the undesirable chattering phenomenon in the SMC
action. In order to remarkably attenuate this chattering, the
1 discontinuous function can be replaced by a continuous
k t
1  O t

U  P x  V x  O t v t  c 4 s
approximation such as where K denotes a positive
s K
and c is a positive constant, can be guaranteed that the system
states are forced to reach the sliding surface at the initial time. constant. As K gets close to zero, there is almost no
Proof: performance difference between the approximated control
The time derivative of 3 given by: law and the original control law [12].
When s t s t 0 , from 5 we have:
s B + BT Ax  B u + f  v t
5 ueq  f x, ueq ,t  MAx  v
MAx  u  f  v
The system can be rewritten:
It follows from assumption 2, we have:
x Ax  Β  MAx  v A  BMA x  Bv
Ÿ k t ! f x, u, t  MA x t  c 6 9
x Ax  Bv

1 T where A A  BMA; B B .
We consider a candidate Lyapunov function: V s s 7
2 According to theorem 1, the equivalent control
The derivative of V is computed as: u v  k t makes the solution of x t system 1 be
V sT s sT ª¬ MAx  u  f  v º¼
equivalent to the solution of 9 . The original dynamics is
d s MA x  f  k t s 8
equivalent to the sliding-mode dynamics 9 when the
V  c s cV 1/ 2 sliding-mode controller u is designed to guarantee the
reachability of sliding manifold. The control objective is to
find an approximate optimal control v that the sliding-mode Assumption 4: There exists a number G ! 0 such that Φk has
dynamics 9 is robustly stable and have a nearly optimal full column rank for all k  Z  , k t G .
performance. The cost function is defined: By using assumption 4, Pk can be uniquely determined by:

³ x Qx  v T Rv d W Φ Φ
vec Pk T
k k ΦkT Ψ k 16

Choose e is the exploration noise and implement the

with Q; R are the symmetric positive definite matrices. We
following algorithm.
present an ADP method is used to learn a nearly optimal Algorithm 1:
control v based on the state information of the original
Select K 0 such that A  BK 0 is Hurwitz and a
system and the equivalent sliding-mode dynamics 9 .
threshold X ! 0 . Let k o 0 .
Theorem 2: [13] Let K 0 be any matrix such that A  BK 0 is Repeat
Hurwitz, and repeat the following steps for k 0,1,... 1. Apply v  K k x  e and solve Pk from 16 .
Step 1: Solve for the real symmetric positive definite 2. Update K k + 1 by using 11 .
solution Pk of the Lyapunov equation:
3. k m k  1
Until Pk  Pk+1  X
AkT Pk  Pk Ak  Q  K kT RK k 0 10
k* m k
We obtain the approximated optimal control policy:
where Ak A - BK k .
v  K k* x
Step 2: Update the matrix by:
Remark 4: Choosing the exploration noise is not a trivial task
for general reinforcement learning problems and other related
Kk+1 R B Pk 1 T
11 machine learning problems, especially for high-dimensional
systems. In solving practical problems, several types of
Then, the following properties hold: exploration noise have been adopted, such as random noise
[14], exponentially decreasing probing noise [18].
a A  BK k is Hurwitz Lemma 1: Under assumption 4, by using algorithm 1, we have
b P * d Pk + 1 d Pk lim K k K * ; lim Pk P* .
k of k of

c klim K k K * ; lim Pk P * Proof:

of k of
From 13 , 14 one see that the Pk ; K k+1 obtained from
With K * R1 BT P* and P * is a unique symmetric, positive 10 , 11 must satisfy the condition 14 , 15 . In addition,
definite matrix such that: by assumption 4, it is unique determined by 16 . Therefore,

AT P *  P * A  Q  P * BR 1 BT P * 0 12 from theorem 2, we obtain that lim K k K * ; lim Pk P* .

k of k of

Then, we find an approximate optimal control policy by using Lemma 2: There exists a sufficiently small constant H ! 0
the online measurements of the closed-loop system 9 . such that for all symmetric matrix P ! 0 satisfying
We consider that: P  P*  H the system 9 can be stable by v R1BT Px .
V x T
x Pk x 13 Because Q  P * BR 1 BT P * ! 0 , there exists D ! 0 such that
Q  P * BR 1 BT P * ! DI . For any symmetric matrix P ! 0
So, we have: we have:
t T
x t  T Pk x t  T  x t Pk x t ³ V x W d W 14 AT P  PA  Q  PBR 1 BT P 0

Applying Kronecker product representation, we obtain that:
Φk vec Pk Ψ k 15 Q  PBR 1 BT P Q  P BR * 1
BT P *  P *  P A 
AT P*  P  2 PBR 1 BT P  P* BR 1 BT P*
Because of continuity, there exists a sufficiently small
constant H ! 0 such that for all symmetric matrix P ! 0 We can choose O1 ; O 2 ! 0 to guarantee that the state
satisfying P  P*  H we have: estimation of the observer converges to the true state (This
was shown in [15]).
Step 3: Apply a similar algorithm 1:
Q  PBR 1 BT P ! Q  P * BR 1 BT P *  DI ! 0
Select K 0 such that AC  BC K 0 is Hurwitz and a
threshold X ! 0 . Let k o 0 .
1 T
We consider the Lyapunov function V x Px , we have Repeat
1. Apply vC  K k xˆ  e and solve Pk from 16 .
the system 9 is globally asymptotically stabilizes.
2. Update K k + 1 by using 11 .
Remark 5: From lemma 1 and lemma 2, it is clear that
choosing the threshold X in algorithm 1 is small enough can 3. k m k  1
guarantee the robustly stable of system 9 . Until Pk  Pk+1  X
Remark 6: The algorithm 1 requires knowledge of the full k* m k
states x of system 9 . In case it is not possible to measure We obtain the approximated optimal control policy:
the state, since any exploration noise e satisfying the vC  K k* xˆ
persistence of excitation condition will have the assurance to It is clear that in the algorithm 2, we only use the input, output
occur the convergence of matrices K, P , we can design an of system 1 and the state estimate of the observer without
observer and use the output-feedback control according to the
following algorithm: requiring any knowledge of state of system 1 .
Algorithm 2:
Step 1: The control of system 1 will be chosen: IV. SIMULATION RESULTS
In this section, we apply the proposed an approximate
s optimal control based integral sliding mode control law to a
u v  k t ; s t M C y  y0  ³ vC W d W 17
s 0 wheel inverted pendulum described as (19) and table 1.

ªxº ª0 1 0 0 0 0º ª x º ª 0 0 º
where y Cx is the output of 1 and: «x»
« »
« 0 0 0 ]1 0 »» «« x »» «« ] 3 ] 3 »»
MC CB «\ » «0 0 0 1 0 0 » «\ » « 0 0 » § ª u1 º ·
« » « »« »« » ¨ « »  f ¸ 19
«\ » «0 0 0 0 0 0 » «\ » «] 4 ] 4 » © ¬u2 ¼ ¹
Assumption 5: The matrix CB has linearly independent «I» «0 0 0 0 0 1» « I » « 0 0 »
columns. « » « »« » « »
¬« I ¼» ¬«0 0 0 0 ]2 0 ¼» ¬« I »¼ «¬ ] 5 ] 5 »¼
Similar theorem 1, we must find an approximate optimal
control vC of system: y >1 1 1 1 1 1@ x  >0 0@ u

x AC x  BC v 18
ª¬ x x \ \ I I º¼ ; u >u1 u2 @

with AC A  BMCA; BC B
M b2 d 2 gR 2
Step 2: Choose matrix AL is a Hurwitz matrix, the pair ]1 
M d 2
 I x M b R 2  2M w R 2  2 I a  M b dR

A , C is observable. Define Aˆ A  A , the state


observer for 18 is given by:

M b R 2  2M w R 2  2 I a M b gd
xˆ ˆ xˆ  B v  L y  Cxˆ
AL xˆ  Aδ C
ª¬ M b  2M w R 2  2 I a º¼ I x  2M b d 2 M w R 2  I a
R M b d 2  I x  M b dR
where L is observer gain that is selected such that AC  LC ]3
is a Hurwitz matrix, δ xˆ is activation functions and  is
M d b
 I x M b R 2  2M w R 2  2 I a  M b dR

updated by laws:
ª § I · 2 º
R « 2 ¨ M w  a2 ¸ L  Iz »
Aˆ O
O1 AL C y  Cxˆ δ xˆ  O2 y  Cxˆ Aˆ
¬ © ¹ ¼
M b R 2  2M w R 2  2 I a  M b dR
ª¬ M b  2M w R 2  2 I a º¼ I x  2M b d 2 M w R 2  I a
ª 0.01 0.01 0.01 0.15 0.1 0 º u cos 2 t
f «0.02 0.01 0.03 0.01 0.02 0.01» x sin t 
¬ ¼ 5

The Figure 1 and Figure 2 show the control and state signals
of system when we do not use algorithm 1. The Figure 3 and Figure 2. The control signal (not use algorithm 1)
Figure 4 show the control and state signals of system when
we use theorem 1 and algorithm 1. Figure 5 and The Figure 6
show the control and state signals of system when use
algorithm 2. The Figure 7 and Figure 8 show the convergence
of matrix K and P of proposed algorithm 1 and algorithm
2, and the tracking errors converge to zero.
Table 1 The parameters and variables of wheeled
Figure 3. The state of system (use algorithm 1)
inverted pendulum
Parameter Symbol Value
Mass of main body Mb 13.3kg

Mass of each wheel Mw 1.89kg

Centre of mass from base d 0.13m

Diameter of wheel R 0.13m
Distance between the wheels L 0.325m
Moments of inertia of body Ix 0.1935kgm 2
with respect to x-axis Figure 4. The control signal (use algorithm 1)

Moments of inertia of body Iz 0.3379kgm 2

with respect to z-axis
Moments of inertia of wheel Ia 0.1229kgm 2
about the centre
Acceleration due to gravity g

Figure 5. The state of system (use algorithm 2)

Figure 1. The state of system (not use algorithm 1)

Figure 6. The control signal (use algorithm 2)

continuous-time systems with unknown system dynamics and
external disturbance. The proposed algorithm pointed out the
robustly stability of system and the bound of cost function.
The theory analysis and simulation results illustrate the
effectiveness of proposed algorithm.

[1] D. Wang, D. Liu, H. Li, “Policy iteration algorithm for online design of
robust control for a class of continuous-time nonlinear systems,” IEEE
Trans. Autom. Sci. Eng., vol. 11, no. 2, pp. 627–632, Apr. 2014.
[2] D. Liu, D. Wang, F.-Y. Wang, H. Li, and X. Yang, “Neural-
networkbased online HJB solution for optimal robust guaranteed cost
control of continuous-time uncertain nonlinear systems,” IEEE Trans.
Cybern., vol. 44, no. 12, pp. 2834–2847, Dec. 2014.
[3] M. Taleb, “High order integral sliding mode control with gain
adaptation”, European Control Conference (ECC). July 17-19, 2013.J.
G. Kreifeldt, “An analysis of surface-detected EMG as an amplitude-
modulated noise,” presented at the 1989 Int. Conf. Medicine and
Biological Engineering, Chicago, IL.
Figure 7. The convergence matrices in algorithm 1 [4] G. P. Matthews, R. A. DeCarlo, “Decentralized tracking for a class of
interconnected nonlinear systems using variable structure control,”
Automatica, vol. 24, pp. 187–193, 1988.
[5] H.-N.Wu and B. Luo, “Neural network based online simultaneous
policy update algorithm for solving the HJI equation in nonlinear H f
control,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 12, pp.
1884–1895, Dec. 2012.
[6] H. Zhang, C. Qin, B. Jiang, and Y. Luo, “Online adaptive policy
learning algorithm for H f state feedback control of unknown affine
nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12,
pp. 2706–2718, Dec. 2014.
[7] M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neuro dynamic
programming and zero-sum games for constrained control systems,”
IEEE Trans. Neural Netw., vol. 19, no. 7, pp. 1243–1252, Jul. 2008.
[8] V. Utkin, J. Shi, “Integral sliding mode in systems operating under
uncertainty conditions,” in Proc. 35th IEEE Conf. Decision Control,
Kobe, Japan, pp. 4591–4596, Dec. 1996.
[9] Chieh-Chuan Feng “Integral Sliding-Based Robust Control”, Recent
Advances in Robust Control – Novel Approaches and Design Methods,
ISBN: 978-953-307-339-2, InTech, 2011
[10] J.K. Lin et al, “Integral sliding mode control and its application on
Active suspension System”, 4th International Conference on Power
Electronics Systems and Applications; The Hong Kong Polytechnic
University, Hong Kong, 2011.
[11] V.I Utkin., and al., ‘Sliding Mode Control in Electro Mechanical
Systems (second edition)’, CRC Press, Boca Raton, 2009
[12] R. A. De Carlo, S. H. Zak, and G. P. Matthews, “Variable structure
control of nonlinear multivariable systems: A tutorial,” Proc. IEEE, vol.
76, no. 3, pp. 212–232, Mar. 1988.
[13] D. Kleinman, “On an iterative technique for Riccati equation
computations”, IEEE Trans-actions on Automatic Control, 13(1):114–
115, 1968.
[14] H. Xu, S. Jagannathan, F. L. Lewis, “Stochastic optimal control of
unknown linear networked control system in the presence of random
delays and packet losses”, Automatica, 48(6):1017–1030, 2012.
[15] F. Abdollahi, H. A. Talebi, and R. V. Patel, “A stable neural network
observer with application to flexible-joint manipulators,” in Proc. 9th
Int. Conf. Neural Inform. Process., 2006, pp. 1910–1914.
[16] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an
introduction. MIT Press.
[17] Werbos, P.J. (1974). Beyond regression: New tools for prediction and
analysis in the behavioural sciences. Ph.D. Thesis. Harvard University.
Figure 8. The convergence matrices in algorithm 2 [18] K. G. Vamvoudakis, F. L. Lewis, “Multi-player non-zero-sum games:
online adaptive learning solution of coupled Hamilton–Jacobi
V. CONCLUSION equations”, Automatica, 47(8):1556–1569, 2011.

This paper presents an approximate optimal control

algorithm based integral sliding mode control law for