Professional Documents
Culture Documents
Optimal Control of A Double Inverted Pendulum On A Cart: 1 Nomenclature
Optimal Control of A Double Inverted Pendulum On A Cart: 1 Nomenclature
Nomenclature
m
li
Li
0
1 , 2
Ii
g
u
T
P
L
w
Nh
t
problem is addressed: stabilize DIPC minimizing an accumulative cost functional quadratic in states and controls. For linear systems, this leads to linear feedback
control, which is found by solving a Riccati equation
[5], and thus referred to as linear quadratic regulator
(LQR). DIPC, however, is a highly nonlinear system,
and its linearization is far from adequate for control design purposes. Nonetheless, the LQR will be considered
as a baseline controller, and other control designs will
be tested and compared against it. To solve the nonlinear optimal control problem, we will employ several
different approaches: a nonlinear extension to the LQR
called SDRE; direct nonlinear optimization using a neural network (NN); and combinations of the direct NN
optimization with the LQR and the SDRE.
mass
distance from a pivot joint to the i-th pendulum link center of mass
length of an i-th pendulum link
wheeled cart position
pendulum angles
moment of inertia of i-th pendulum link
w.r.t. its center of mass
gravity constant
control force
kinetic energy
potential energy
Lagrangian
neural network (NN) weights
number of neurons in the hidden layer
digital control sampling time
Subscripts 0, 1, 2 used with the aforementioned parameters refer to the cart, first (bottom) pendulum and second (top) pendulum correspondingly.
Introduction
where
T2
I2
l2
T1
L1
T0
m2g
T1
I1
l1
m0
L2
T0
+
u
m1 g
T2
P0
P1
P2
+
+
=
=
=
1
m0 02
2
2
2 i
1 h
m1 0 + l1 1 cos 1 + l1 1 sin 1
2
1 2
I1
2 1
1
1
m1 02 + m1 l12 + I1 12 + m1 l1 0 1 cos 1
2
2
2
1 h
m2 0 + L1 1 cos 1 + l2 2 cos 2
2
2 i 1
+ I2 22
L1 1 sin 1 + l2 2 sin 2
2
1
1
1
2
2
2
2
m2 0 + m2 L1 1 + m2 l2 + I2 22
2
2
2
m2 L1 0 1 cos 1 + m2 l2 0 2 cos 2
m2 L1 l2 1 2 cos(1 2 )
0
m1 gl1 cos 1
m2 g L1 cos 1 + l2 cos 2
m2 l2 cos(2 )0 2 + m2 L1 l2 cos(1 2 )1 2
m1 l1 + m2 L1 g cos 1 m2 l2 g cos 2
dt 0
0
L
d L
dt 1
1
L
d L
dt 2
2
Modeling
u
0
0
Or explicitly,
X
mi 0 + m1 l1 + m2 L1 cos(1 )1
u =
+ m2 l2 cos(2 )2 m1 l1 + m2 L1 sin(1 )12
m2 l2 sin(2 )22
0 = m1 l1 + m2 L1 cos(1 )0
+ m1 l2 + m2 L2 + I1 1
=Q
(1)
dt
T = T 0 + T1 + T2
=
+
P = P0 + P1 + P2
2
m2 L1 l2 cos(1 2 )2
m2 L1 l2 sin(1 2 )22
m1 l1 + m2 L1 g sin 1
m2 l2 cos(2 )0 + m2 L1 l2 cos(1 2 )1
m2 l22 + I2 2 m2 L1 l2 sin(1 2 )12
m2 l2 g sin 2
Lagrange equations for the DIPC system can be written in a more compact matrix form:
+ G() = Hu
D() + C(, )
(2)
where
!
d1
d2 cos 1
d3 cos 2
d4
d5 cos(1 2 )
(3)
D() = d2 cos 1
d3 cos 2 d5 cos(1 2 )
d6
0 d2 sin(1 )1
d3 sin(2 )2
= 0
C(, )
0
d5 sin(1 2 )2 (4)
0 d5 sin(1 2 )1
0
!
0
f1 sin 1
G() =
(5)
f2 sin 2
H = (1 0 0)T
Assuming that centers of mass of the pendulums are in
the geometrical center of the links, which are solid rods,
we have: li = Li /2, Ii = mi L2i /12. Then for the ele and G() we get:
ments of matrices D(), C(, ),
d1
m0 + m1 + m2
d2
m1 l1 + m2 L1 =
1
m2 L 2
2
1
m1 + m 2 L 1
2
d3
m 2 l2 =
d4
m1 l12 + m2 L21 + I1 =
d5
d6
1
m2 L 1 L 2
2
1
m2 l22 + I2 = m2 L22
3
m 2 L 1 l2 =
4.1
1
m1 + m2 L21
3
x = Ax + Bu
(9)
where
A
0
1
D(0) G(0)
0
1
D(0) H
I
0
(10)
(11)
Control
u = R1 BT Pc x Kc x
uk = R1 T Pxk Kxk
tf inal
Lk (xk , uk ),
(13)
In this report, optimal nonlinear stabilization control design is addressed: stabilize the DIPC minimizing an accumulative cost functional quadratic in states and controls. The general problem of designing an optimal control law involves minimizing a cost function
X
(12)
Jt =
The linear quadratic regulator yields an optimal solution to the control problem (7)(8) when system dynamics are linear. Since DIPC is nonlinear, as described by
(6), it can be linearized to derive an approximate linear
solution to the optimal control problem. Linearization
of (6) around x = 0 yields:
1
f1 = (m1 l1 + m2 L1 )g = ( m1 + m2 )L1 g
2
1
f 2 = m 2 l 2 g = m2 L 2 g
2
Note that matrix D() is symmetric and nonsingular.
(8)
(7)
k=t
4.2
Ruk2
xk
Neural network
uk
xk
Double inverted
pendulum on a cart
x k 1
q 1
x = f (x, u)
xTk Qx k
(15)
and on the other hand, SDC form of the system dynamics
is
x = A(x)x + B(x)u
Since O(x)2 0 when x 0, then A(x) A and
B(x) B. Therefore, it is natural to expect that performance of the SDRE regulator will be very close to the
LQR in the vicinity of the equilibrium, and the difference will show at larger pendulum deflections. This is
exactly illustrated in the Simulation Results section.
(16)
(17)
4.3
A neural network (NN) control is often popular in control of nonlinear systems due to universal function approximation capabilities of NNs. Neural networks with
only one hidden layer and an arbitrarily large number of neurons represent nonlinear mappings, which can
be used for approximation of any nonlinear function
f C(Rn , Rm ) over a compact subset of Rn [7, 4, 12]. In
this section, we utilize the function approximation properties of NNs to approximate a solution of the nonlinear
optimal control problem (7)(8) subject to system dynamics (6), and thus design an optimal NN regulator.
The problem can be solved by directly implementing a
feedback controller (see Figure 2) as:
0
0
0
sin
0
Gsd () = 0 f1 1 1
0
0
f2 sin22
uk = NN (xk , w) ,
Next, an optimal set of weights w is computed to
solve the optimization problem. To do this, a standard calculus of variations approach is taken: minimization of a functional subject to equality constraints
xk+1 = f (xk , uk ). Let k be a vector of Lagrange multipliers in the augmented cost function
H=
Derived system equations (18) (compare with the linearized system (9)(11)) are discretized at each time step
into (16), and control is then computed as given by (17).
Remark. In the neighborhood of equilibrium x = 0
system equations in the SDC form (18) turn into linearized equations (9) used in the LQR design. This can
be checked either by direct computation or by noting
that on one hand, linearization yields
tf inal n
X
k=t
(19)
x = Ax + Bu + O(x)2
4
fc (x,u)
,
u
where fc (x, u) is a brief notation for the righthand side of the continuous system (6), i.e. x = fc (x, u).
2 Ruk
dx kNN
NN Jacobian
duk
dx k
+
k 1
Pendulum
Jacobian
fc (x, u)
x
fc (x, u)
u
q 1
2Qx k
f NN
f
+
xk
uk xk
L
L NN
+
xk
uk xk
T
Mi =
(20)
(22)
(23)
C G D 1
+
D (C + G Hu)
i
i
i
(24)
k+1
T
0
I
D1 M 2D1 C
0
D1 H
(21)
k=t
u=
Nh
X
i=1
w2i tanh
N0
X
j=1
2. Run the adjoint system backward in time to accumulate the Lagrange multipliers (20) (see Figure 3).
System Jacobians are evaluated analytically or by
perturbation.
T
NN
= w2T dtanh W1 x + w1b
W1
x
(25)
W1 , w1b
Ruk2
Hidden layer
W11,1
W11,2
x1
tanh
xk
W11,6
W2 , w b2
Wb11
x2
W12,1
W12,2
x6
tanh
Output
W21
W22
Neural network
uk
xk
Double inverted
pendulum on a cart
x k 1
q 1
W12,6
Wb12
W2Nh
xTk Qx k
Wb2
W1Nh,1
W1Nh,2
tanh
Nh
W1Nh,6
Wb1Nh
=
=
=
=
T
4.4
xi
(26)
T
tf inal(
f
L
+
uk
uk
)
b
Tk+1
k=t
T
w1b n
tf inal("
W2Tn
k=t
dtanh W1n xk +
w2b n+1
w2b n
tf inal
X
k=t
Tk+1
w1b n
Tk+1
L
f
+
uk uk
(27)
T #
f
L
+
uk
uk
T
uk = NN (w, xk ) + Kxk
6
2 Ruk
dx kNN
NN Jacobian
duk
Pendulum
Jacobian
dx k
k 1
4.5
Is it possible to further improve control scheme designed in the previous section? Recall, that the NN was
trained numerous times over the wide range of the pendulum angles to minimize the cost functional (19). The
complexity of the approach is a function of the final time
tf inal , which determines the length of a training epoch.
Solution suggested in this section is to apply a receding
horizon framework and train the NN in a limited range
of the pendulum angles, along a trajectory starting in
the current point and having relatively short duration
(horizon) N . This is accomplished by rewriting the cost
function (19) as
q 1
2Qx k
f
f
+
xk
uk
NN
+K
xk
L
L
+
xk
uk
NN
+K
xk
T
T
k+1
H=
t+N1
Xn
k=t
(28)
o
L(xk , uk )+Tk (xk+1 f (xk , uk )) +V (xt+N )
uk
Ruk2
xk
xk
uk
x k 1
Double inverted
pendulum on a cart
xk
x k 1
q 1
xTk Qx k
Ruk2
xk
Neural network
Neural network
emulator
x k
uk
x k 1
q 1
Double inverted
pendulum on a cart
xk
x k 1
q 1
xTk Qx k
dx kNN
NN Jacobian
duk
NN emulator
Jacobian
dx k
k 1
Pendulum
q 1
2Qx k
Neural network
Neural network
emulator
Simulation results
To evaluate the control performance, two sets of simulations were conducted. In the first set, the LQR, SDRE,
NN and NN+LQR/SDRE control schemes were tested to
stabilize the DIPC when both pendulums were initially
deflected from their vertical position. System parame8
Value
1.5 kg
0.5 kg
0.75 kg
0.5 m
0.75 m
0.02 s
diag(5 50 50 20 700 700)
1
40
500
Conclusions
References
[1] R. W. Brockett and H. Li. A light weight rotary
double pendulum: maximizing the domain of attraction. In Proceedings of the 42nd IEEE Conference on Decision and Control, Maui, Hawaii, December 2003.
[2] A. Calise and R. Rysdyk. Nonlinear adaptive flight
control using neural networks. IEEE Control Systems Magazine, 18(6), December 1998.
[9] A. Jadbabaie, J. Yu, and J. Hauser. Stabilizing receding horizon control of nonlinear systems: a control Lyapunov function approach. In Proceedings of
American Control Conference, 1999.
[10] A. Jadbabaie, J. Yu, and J. Hauser. Unconstrained
receding horizon control of nonlinear systems. In
Proceedings of IEEE Conference on Decision and
Control, 1999.
[11] E. Johnson, A. Calise, R. Rysdyk, and H. ElShirbiny. Feedback linearization with neural network augmentation applied to x-33 attitude control.
In Proceedings of the AIAA Guidance, Navigation,
and Control Conference, August 2000.
[12] W. T. Miller, R. S. Sutton, and P. J. Werbos. Neural
networks for control. MIT Press, Cambridge, MA,
1990.
[13] M. Sznaizer, J. Cloutier, R. Hull, D. Jacques, and
C. Mracek. Receding horizon control Lyapunov
function approach to suboptimal regulation of nonlinear systems. Journal of Guidance, Control and
Dynamics, 23(3):399405, May-June 2000.
[14] E. Wan, A. Bogdanov, R. Kieburtz, A. Baptista,
M. Carlsson, Y. Zhang, and M. Zulauf. Model
predictive neural control for aggressive helicopter
maneuvers. In T. Samad and G. Balas, editors,
Software Enabled Control: Information Technologies for Dynamical Systems, chapter 10, pages 175
200. IEEE Press, John Wiley & Sons, 2003.
[15] E. A. Wan and A. A. Bogdanov. Model predictive
neural control with applications to a 6 DOF helicopter model. In Proceedings of IEEE American
Control Conference, Arlington, VA, June 2001.
[16] P. Werbos. Backpropagation through time: what it
does and how to do it. Proceedings of IEEE, special issue on neural networks, 2:15501560, October
1990.
[17] W. Zhong and H. Rock. Energy and passivity based
control of the double inverted pendulum on a cart.
In Proceedings of the IEEE international conference on control applications, Mexico City, Mexico,
September 2001.
10
20
15
5
SDRE
LQR
NN
NN+LQR
NN+SDRE
10
15
20
25
0
time, s
5
0
5
10
0
deg/s
SDRE
LQR
NN
NN+LQR
NN+SDRE
20
60
0
deg
deg
time, s
5
10
0
30
5
0
SDRE
LQR
NN
NN+LQR
NN+SDRE
10
LQR
SDRE
NN
NN+LQR
NN+SDRE
10
time, s
20
10
deg/s
20
10
0
20
1
time, s
SDRE
LQR
NN
NN+LQR
NN+SDRE
30
40
0
time, s
Cart position 0
0.5
0
10
20
SDRE
LQR
NN
NN+LQR
NN+SDRE
10
30
0
15
15
0
time, s
20
10
deg/s
SDRE
LQR
NN
NN+LQR
NN+SDRE
40
time, s
Cart position 0
3.5
SDRE
LQR
NN
NN+LQR
NN+SDRE
2.5
0.5
1.5
1.5
SDRE
LQR
NN
NN+LQR
NN+SDRE
2
2.5
3
0
3 time, s 4
0
100
300
0
1
0.5
0
0
time, s
SDRE
LQR
NN
NN+LQR
NN+SDRE
2
1.5
m/s
2.5
m/s
20
200
1
0.5
0
SDRE
LQR
NN
NN+LQR
NN+SDRE
1
2
0
time, s 4
0
0.5
1
0
Control force
140
100
time, s
Control force
25
SDRE
LQR
NN
NN+LQR
NN+SDRE
120
SDRE
LQR
NN
NN+LQR
NN+SDRE
20
15
80
10
60
time, s
40
0
deg/s
SDRE
LQR
NN
NN+LQR
NN+SDRE
10
100
25
deg
deg
10
40
20
10
20
40
0
0.5
1.5
time, s
2.5
15
0
0.5
1.5
2.5
time, s
11
10
30
20
10
deg
deg
20
SDRE
LQR
NN
NN+LQR
NN+SDRE
20
30
0
time, s
SDRE
LQR
NN
NN+LQR
NN+SDRE
10
0
10
deg/s
deg/s
100
SDRE
LQR
NN
NN+LQR
NN+SDRE
300
400
0
time, s
150
7
time, s
SDRE
LQR
NN
NN+LQR
NN+SDRE
20
deg
10
SDRE
LQR
NN
NN+LQR
NN+SDRE
20
0
time, s
10
0
10
time, s
SDRE
LQR
NN
NN+LQR
NN+SDRE
60
40
20
20
40
0
20
SDRE
LQR
NN
NN+LQR
NN+SDRE
60
40
0
time, s
Cart position 0
1
SDRE
LQR
NN
NN+LQR
NN+SDRE
3
2
time, s
time, s
0
1
time, s
m/s
SDRE
LQR
NN
NN+LQR
NN+SDRE
time, s
SDRE
LQR
NN
NN+LQR
NN+SDRE
SDRE
LQR
NN
NN+LQR
NN+SDRE
1
0
Cart position 0
80
30
deg/s
deg
SDRE
LQR
NN
NN+LQR
NN+SDRE
100
4
0
50
10
deg/s
0
200
50
100
m/s
time, s
200
4
0
1
0
1
2
0
Control force
time, s
Control force
250
SDRE
LQR
NN
NN+LQR
NN+SDRE
200
50
SDRE
LQR
NN
NN+LQR
NN+SDRE
40
150
30
100
N
20
50
10
50
10
100
0
0.5
1.5
2.5
20
0
0.5
1.5
2.5
time, s
time, s
12
30
40
SDRE
LQR
NN
NN+LQR
NN+SDRE
20
20
deg
10
deg
0
10
SDRE
LQR
NN+LQR
NN+SDRE
20
30
40
0
time, s
0
20
0
400
time, s
100
0
deg/s
0
200
600
0
100
SDRE
LQR
NN
NN+LQR
NN+SDRE
200
SDRE
LQR
NN+LQR
NN+SDRE
400
300
0
LQR
SDRE
NN
NN+LQR
NN+SDRE
30
10
20
deg
10
deg
0
10
SDRE
LQR
NN+LQR
NN+SDRE
20
1
time, s
10
0
time, s
SDRE
LQR
NN+LQR
NN+SDRE
0
deg/s
50
0
50
SDRE
LQR
NN
NN+LQR
NN+SDRE
100
50
1
Cart position
time, s
3
m
4
0
3 time, s 4
0
1
0
2
1
SDRE
LQR
NN+LQR
NN+SDRE
time, s
m/s
m/s
SDRE
LQR
NN+LQR
NN+SDRE
4
6
0
time, s
time, s
Control force
100
SDRE
LQR
NN+LQR
NN+SDRE
300
2
0
Control force
350
SDRE
LQR
NN
NN+LQR
NN+SDRE
SDRE
LQR
NN
NN+LQR
NN+SDRE
Cart position 0
time, s
deg/s
100
time, s
time, s
200
deg/s
SDRE
LQR
NN
NN+LQR
NN+SDRE
80
250
60
200
40
100
150
50
20
0
20
50
40
100
150
0
0.5
1.5
2.5
60
0
time, s
0.5
1.5
2.5
time, s
13
deg
20
0
20
SDRE
NN+SDRE
40
0
time, s
deg/s
500
deg
40
500
20
0
SDRE
NN+SDRE
1000
0
20
time, s
40
0
3 time, s 4
500
deg/s
20
30
2
time, s
0
500
SDRE
NN+SDRE
d1/dt
d2/dt
1000
0
10
1000
1500
0
time, s
Cart position, 0
15
100
10
100
deg/s
Pendulum velocities
1500
10
deg
1
2
60
40
0
Pendulum angles
80
200
300
400
0
SDRE
NN+SDRE
5
0
5
0
time, s
Cart position 0
time, s
25
20
m/s
15
3
0
SDRE
NN+SDRE
time, s
0
5
0
time, s
Control force
600
400
200
SDRE
NN+SDRE
10
0
7
N
time, s
Control force
1000
200
400
SDRE
NN+SDRE
800
600
600
800
400
1000
0
10
m/s
10
5
200
0
1.5
time, s
2.5
200
400
600
0
0.5
0.5
1.5
2.5
time, s
14