You are on page 1of 10

Asymptotically optimal stabilising quadratic control

of an inverted pendulum

R.F. Harrison

Abstract: A method for the design and synthesis of near-optimal, nonlinear control laws is
examined, based on a generalisation of linear quadratic optimal control theory and which
effectively provides a near-optimal gain schedule. The method is simple to apply and affords
greater design flexibility (via state-dependent weighting) than conventional approaches. The
resulting regulator can, in principle, he implemented in real time owing to the causal nature of
the required computations. However, the need to solve an algebraic Riccati equation at every time
point is burdensome, and a number of algorithms that would permit parallel computation are
discussed. The problem of stabilising an inverted pendulum is used to illustrate the method and
proves an exacting task that highlights a number of issues.

1 Introduction A further disadvantage of the LQ philosophy is that,


being a linear feedback, the control signal is affected in the
The aim of this paper is to examine a simple algorithm for same way by small and large signals. In many applications
the quadratic optimal control of a wide class of nonlinear it may be preferable to ignore small error signals (due, for
systems that is causal and can, in principle, he implemen- instance, to measurement noise) as far as possible, while
ted in real time. The method is based on a generalisation of responding optimally to large errors. It may also be desi-
the well known linear quadratic optimal control theory. We rable to switch attention between control objectives
concentrate on the use of the approach and its advantages depending on their current values. The receding horizon
and disadvantages, rather than the theoretical aspects of its strategy approximates thc former behaviour to a certain
derivation, which can be found in References [1-3]. The extent, hut does not address the latter.
method is demonstrated via the well established problem of For nonlinear dynamics the situation is exacerbated with
balancing a single inverted pendulum on a cart. few explicit solutions for their optimal quadratic control as
Linear quadratic (LQ) optimal control theory is a highly yet known, except thosc based on serics cxpansions. These
developed approach for the synthesis of linear optimal are unrealisable, unless via truncation, leading to a loss of
control laws that has been widely applied. In particular, the optimality and possibly stability. Normally optimal quad-
infinite-time-horizon solution has appeal for the regulation ratic control for nonlinear systems is conducted numeri-
of processes that are well modelled by linear time-invariant cally and tends, inherently, to be noncausal.
dynamics because the solution comprises a set of static In this paper we make use of a result that generalises the
gains that are calculated once, off-line, and are implemen- LQ theory to nonlinear systems to provide a nonlinear
ted causally thereafter. The finite-time-horizon solution, design method [ I , 21 (also [3], although [I] is not cited
while being more general, and admitting of time-varying therein) that overcomes some of the difficulties mentioned
dynamics or weighting parameters, is essentially an off-line above. This nonlinear quadratic (NLQ) method applies to
procedure because the associated (differential) Riccati systems having a broad class of nonlinear dynamics with
equation must he solved backwards in time. Incorporation state-dependent weighting matrices (the design degrees of
of such a scheme as a closed-loop, real-time process i s not freedom). In brief, it turns out that the infinite-time-horizon
therefore possible. The receding horizon approach attempts LQ regulator problem when solved afresh at every point on
to overcomc this difficulty by repeatedly solving the open- the state trajectory leads to an asymptotically optimal
loop, finite-time-horizon problem for short periods into the control policy in that it converges to the optimal control
future, and using this for the feedback gain over the time close to the origin [2]. For admissible system dynamics, the
step. It does this at the expense ofoptimality, which may or weighting parameters can be made to be functions of the
may not be important from the point of view of the state variables. Thus, in addition to handling nonlinear
practising engineer. dynamics, the design stage allows for the introduction of
state-dependence in the weighting matrices, leading to a
more flexible control strategy [4, 51. Although, in princi-
ple, time-varying weighting parameters are allowed in the
ta IEE, 200s (finite-time) LQ approach, lack of prior knowledge of
IEE Pmceeding.? online no. 20030014 potential disturbances and the acausal calculation for the
DOI: 1n.tn49iip~cta:2nn3no14 solution makes the introduction of these difficult in prac-
Paper first received 1st July and in rcvised fom? 4th November 2002 tice. The required amplitude dependence can never, there-
The author is with rhc Drpsnment of Automatic Conirol and Systcms fore, be achieved through the LQ approach. The solution to
Engineering, The University of Sheffield. Mappin Street. Sheffield, the proposed method is causal, hut has considerable
SI 3JD. UK computational overhead. However, by using a solution to
IEE P~~o.-ConiroI
Theo.ors.App., vol. l j o . ,No 1. .iilnimr.? Zoo3 7
the Riccati equation based upon the matrix sign function proposed solution is identical to the onc obtained from
[6], it is possible to derive a parallel algorithm [7] that may (3)-(5) when the dynamics arc linear.
be suitable for real-time implementation. Because the control synthesis takes place point-wise, the
designer is now free to select Q and R in ways that may be
7 . 7 Linear quadratic regulator (LORI more directly applicable to the control engineering objec-
tives. These can he made to be functions of the instan-
The LQ optimal regulation problem is expressed as tancous values of the state variables so that
follows: minimise with respect to the control signal, U,
.m
the cost hnction
J =
n
(x'Q(X)x + u'R(i)u)dI (7)
J = l ( x f Q x + u'Ru)dt (1)
subject to the requirements for the solution of the Riccati
equation and the invertahility of R.
subject to the linear time invariant dynamics
Ensuring that A(%),B(%),R ( i ) and Q(T) satisfy these
X=Ax+Bu (2) requirements a priori is difficult in general; however, for
functions whose controllability (observability) matrices are
where x is an n-vector of system states, U is an ni-vector of full rank except at isolated points (e.g. those admitting
control variables, A and B are matrices of appropriate convergent Taylor series expansions), the required proper-
dimension, and the superscript, I, indicates transposition. ties can only be lost at isolated points and may not,
The matrices Q and R are positive semi-definite and therefore, be harmful in practice provided no attempt is
definite, respectively, and are used to penalise particular made to remain in their vicinity. We postulate that, since
states and controls according to the cngineering objective.
many systems can be well approximated in the region of
It is well known, e.g. [8], that the control policy which interest by polynomial functions, loss of stabilisability
solves the above problem is a linear combination of the (detectability) is unlikely to be gcneric.
system states, given by:
u=Kx (3) 2 Solving the matrix Riccati equation
K = -R-' B'P (4)
To calculate the optimal solution, it is necessary to solve
where P is the positive definite solution of the ARE the matrix Riccati equation at each point in time. I n
(algebraic matrix Riccati equation), (9, practice this will be done on a computer and it will be
0 = PA + A'P - PBR-IB'P + Q (5)
necessary to solve the equation at each discretc time step.
The usual approach to the solution of this problem is via an
A unique, positive definite solution to the above exists if eigen-decomposition of the Hamiltonian matrix for thc
the pair (A, B) is stabilisable and (A, r) is detectable, with system [9]. For sizeable dynamics such an approach is
Q = r*r. computationally intcosive and may not be able to deliver
solutions at the required sample rate (i.e. in 'real time'). It
7.2 Nonlinear quadratic regulator INLQR) can also be numerically sensitive, depending as it does on
the numerical solution of an eigen-problem.
The extension of the above to nonlinear systems looks The matrix-sign-function (MSF) is an appealing altcr-
identical [1-3], except that, instead of performing a single native to eigen-decomposition owing to its simplicity [lo],
optimisation and applying the resulting gain matrix for all requiring only the operations of matrix inversion, addition
time, the optimisation has to be carried out at every time and scalar multiplication [6] as follows [7]:
step. Consider a nonlinear dynamical system that can he
expressed in the form:
X = f(x. U) = A(x)x f B(x)u
f(0,O) = 0 (6) where c i = ~det(Z,)I"'" is a scaling used to speed conver-
gence, and n is the dynamical order. H is the Hamiltonian
where the JaCOblaIIS of A(x) and B(x) are subject to some matrix of order 2n, given by
bounded growth conditions (Lipschitz). Then at each point,
i, on the state trajectory, a linear system is defined with
fixed A =A(%)and B = B(T). Provided that the matrices, A
and B, equate to their conventional Taylor serics linearisa-
tion at the origin it can be shown [2] that solving the Assuming H has no eigen-values on the imaginary axis, Z j
infinite-time LQ optimal control problem, point-wise on converges to sign(H) = S, say [6]. For the solution of the
the state trajectory and applying the resultant control, Riccati equation we require the quantity [IO]
results in an asymptotically optimal, stabilising quadratic
control policy for systems described by (6). The behaviour s+= sign+(S)=-(I,,
A 1
2 -
+ S)
of A and B at the origin is needed to ensure that P(x) tends
to the solution for the linearised problem at the origin. Now, by decomposing S+ thus: S+= [V W], we write P,
Hence, in a small enough neighbourhood of the origin the the solution of the Riccati equation, (5), thus:
feedback from thc algorithm is arbitrarily close to the
P = V'W(W'W)f (9)
optimal feedback [2]. The Appendix (Section 7) contains
a brief, heuristic justification of the theory [2]. The convergencc of the algorithm has been investigated in
So, by choosing the U that minimises the usual quadratic [ I O ] and its global convergence is established in [ I I]. Its
cost function at every time point, we have a near-optimal simplicity also suits the MSF algorithm to parallel compu-
control policy for a very wide class of nonlinear systems. tation [6]. In [7] an algorithm for diagonal pivoting
Evidently, A(%),B(i), Q and R are subject, point-wise, to factorisation-a form of Gaussian elimination with partial
the same conditions as for the linear case. It is clear that the pivoting-is used to develop a parallel algorithm involving
8 IEE Pmc.-Coniroi T h w v AppI., Phi. 150. N o I , hanuan. ZOO3
3
no sequential computations. Complexity is of order ( 2 n ) , where .rI =I,x2 =i, x; = 8, x4 = 0 and U = F: Recall that
which dominates the communication burden. The speed-up we require the origin of state-space to he an equilibrium
achieved over the conventional method [9] is demonstrated point, and here the equilibrium of interest is x = 0 . For
on a hypercube architecture. More recently, a Jacobi-like convenience wc define: d(.r;) = m'L2cos2(.r,) - (J +
method has been proposed with a simple function of the mL2)(M+m) and write:
matrix size to predict the number o f iterations needed for
convergence [12]. This, of course, does not guarantee that A(x) =
a solution can be found within any arbitrary sampling rn I 0 01
period.
We do not implement the parallel solution here. The
discussion is included simply to underline that real-time
operation may he possible.

3 Asymptotically optimal quadratic control of a


single inverted pendulum

It is well known from energetic considerations that the -(J t mL2)


motion of the inverted pendulum on a cart constrained to 4x3)
move in a plane (Fig. I ) can be described thus: B(x) =
0
(M
+
+ m)ji = n7~b'sin(8) - mLcos(O)8 + F
(.I m~')ij = mgL sin(8) - m ~ c o s ( ~ ) i (10)
1 mL cos(x;)

or, altematively,
4x3) J
where M and m (kg) denote the mass of the cart and the A(x) =
pendulum, respectively, L (m) is the length of the pendu-
0 1 0 0

I
lum and J (kgm') is its moment of inertia about the
centre of mass. H ( t ) (rad) denotes angular displacement m2L2gsi&) cos(x,) m L sin(x;)x,(J+mL')
0 0
from the vertical and x ( t ) (m) denotes lateral displacement d(xjb3 d(4
from the mid-point of the track. F(t) (N) is the applied 0 0 0 1
force.
By substitution in (10) we obtain the explicit state space 0 0
-mL sin(x3)g(hf +m) m2L2sin(x,)x, cos(.r;)
form: &3)x3 4x3)
(14)
4,=s2
Evidently, the terms sin ( x 3 ) / x 3 in (12) are well behaved at
-(mL.d sin(x,)(J + mL2) - ni2L2gsin(x3) cos(x,) the origin and d(0) is nonzero. It can be easily verified that
x* = +(J + mL')u) in either case, (12) or (14), A(x), B(x) reduce to their
conventional linearisations at the origin. To evaluate the
m2L2cos2(x3) - (J + mL2)(M + m )
stahilisability of A(x), B(x) we simply form the deter-
i3= .iI minant of the controllability matrix, given by
+
-mL(g sin(~r3)(M m ) - niD;: cos(.r,) sin@,)
DJx) = det([B AB A2B A'BI)
i-4 -- - COS(S3)U)
-
- cos*(x; ) sin2(x3) ~ ~ m ~ g ~
m2L2cos2(x;) - ( J + mL2)(M + ni) .<d4(.r3)
(15)

which becomes zero only for x3 = fkn/2, k = 1,. . . . In


Fig. 2 a plot of this determinant (choosing parameters as
per the next section) indicates regions of poor control-
flahility around the zeros of O,(x) (approximately 80" <
101 < 100" and 170" i101 < 18O0). Further, local analyses
of the system in these regions confirms that it is the

35

25 A

5j_
a200 -150 -100 -50 0 50 100 150 200

8, deg
Fig. 2 Detenninanl of conlmllabilil?, matrix as n Jmction of
angular displcrcement
9
unstable mode that becomes uncontrollable in both cases, a 4/5th order, variable step-length Rung-Kutta formula.
leading to a loss of stabilisability and hence a failure to Default parameter settings for the solver were used. The
fulfil the necessary requirements for the solution of the states arc sampled at an interval of 0.1 s, which means
ARE. Trajectories entering these regions can be expected that the solver is forced to evaluate the solution at the
to cause numerical difficulties in the ARE solver, and this demanded output points whenever these do not coincide
will indeed be seen to be the case. Clearly, the detectability with its automatic choices. Whilst, in general, the
requirement can be guaranteed assuming a diagonal Q, by choices of solver typc and parameters may have a consi-
ensuring that all qii > 0, i = I , . . . , 4 . derable effect on the numerical solution of nonlinear
ordinary differential equations, it has becn found that the
dominating practical factor in the following simulations is
4 Simulation results the difficulty of solving the ARE in thc vicinity
of lI= *7r.
For our simulations we use parameter settings correspond-
ing to the inverted pendulum rig that resides in the
Department of Automatic Control & Systems Engineering 4.1 Comparison with linear LOR solution
of the University of Sheffield, namely: M= 1.92 kg, First we look simply at how the NLQR solution behaves in
m=O.I85kg, L=0.508m and with g = 9 . 8 1 msC2. The comparison with the LQR solution, i.e. the nonlinear
pendulum comprises a uniform pole and no 'hob', giving system compensated with a set of linear gains computed
J = mL-'/3. All simulations are carried out in the Simulink at the origin for the same values of Q, R. We choose,
environment of Matlab v6. I. The chosen solver is based on arbitrarily, Q=diag{ 100.0, 0.1, 1.0, 0.1) and R = I to

time. 5 time, s

a b

40 100

20
0

P
U 0
.n-

-20

4 0 -200
.i . . . .. . .
O(0)-40"

2 4 2 4

time, s time, s

C d
reflect that the linear displacement will be a major (0(0)=55") we note substantial differences in the two
constraint on any practical implementation and consider behaviours (Fig. 4). While the nonlinear regulator results
only initial angular displacements, all other initial values in state trajectories are qualitatively similar to those in
being set to zero. After some numerical experimentation Fig. 3-with correspondingly larger magnitudes-the
(using 0.1" increments) we find that the LQR solution is linear compensator demonstrates wilder behaviour and a
unstable for initial conditions IS(0)l >_ 55.2", while the far higher total variation.
NLQR solution remains stable for IQ(0)lS73.8" and Fig. 5 indicates that the behaviour of the control force
96.0" 5 IS(0)l 5 179.9". This indicates a degree of asym- can he similarly summarised. It is interesting to note that
metry in the numerical conditioning of the problem in that behaviour up to the first reversal in the states a n 4 to a
it is possible to approach the singularity at -+180" much certain extent, the force, is approximately equal for both
more closely than it is possible to approach *90". approaches.
As might he expected, for small excursions the NLQR Bevond the 'linear stahilitv limit' we can onlv revort the
and LQR solutions are approximately equal. Fig. 3 illus- heha;iour of the NLQR s o h i o n . As noted earlier, while
trates this for an initial angular displacement of 40'. This is only isolated points cause a failure to meet the required
the value at which the two solutions first begin to deviate. stabilisahility conditions, numerical sensitivity excludes
Notice that the nonlinear controller generates less marked larger regions from effective control. Fig. 6 shows the
excursions, in line with what is expected. state evolution for two large initial angles, O(0) = 7O", 1 IO"
Moving now into the vicinity of the limit at which the that straddle the difficult region. In the first instance, the
current LQR design is able to stabilise the system physical situation corresponds more closely with the


<

... ..
........

-L
2 4 2 4

time, s time, s
a b

100,

-1 00
0 2 4 0 2 4
lime. s time, s
C d
Fig. 4 Comparisoa of srnre evolution jw .system under nonlineur (NLQR) and linear (LQR) conhol fur an inirial angle of 55"

/€E Pmc.~ConlrulTheory Appl., Yo/. 150. No. 1. Januan. ZOO3 11


displacement rather than the previous oscillatory approach
to zero. Whilc the state trajectories appear reasonably
smooth in these cases, the required control force is
highly nonsmooth. This arises from the need to pass

_:I
through regions of low .stabilisahility a n d indeed, to
cross the singularity at B = 90". This causes an impulsive
force to be generated and a reversal in sign. Evidently,
system inertia serves to smooth this effect but, nonetheless,
it can be seen as a serious limitation. The fact that the
system was able to recover reinforces our earlier assertion
=
U- 20 that it may still be practically possible to pass through
regions of low stahilisability provided that it is done
quickly enough. Fig. 7 shows the control forces for both
cases.

I \ /? 4.2 Nonlinearity in the control-saturation


V Realistically there will be hard constraints on the control
. .. . .. . . force that can be achieved and this is usually modelled
0(0)=40" as a piecewise linear, saturation nonlinearity. From (6) it
-)OI is clear that control nonlinearities are not permitted,
-401 and to overcome this we solve a closely related
0 2 4
problem that introduces integral action through state
time, s augmentation. We outline this for the case of scalar
inputs only. Let the control input be given by
a U = b(x,,+,) then, defining &+, = v, we can re-write the

lool
model in (6) thus:
lsoi

(16)
and find the optimal policy that minimises

where we have used the subscript a to indicate the


augmented system. The state weighting matrix now
becomes

Q' [ O
@*(x,+,)R I
... .. and is itself state-dependcnt. Subject to the point-wise
..
p F
.. . . .. . .
0(0)= 5 5 O
detectability and stabilisability conditions being satisfied,
a problem that is close to the original (assuming R, is
small) can be solved.
-1 00 I
0 2 4
Returning now to the example, and modelling the satu-
ration as a smooth function,
time, s x > i

Fig. 5 Comparison of control force for system under nonlinear


(NLQR) and linear (LQR) control for an initial angles of 40" we apply a saturation limit of 40 N, choose R, = 0.001 and
and 55" re-address the situation of Fig. 3.
Using this choice of saturation function, the determinant
previous configurations, i.e. the pendulum starts above of the new controllability matrix now becomes a function
horizontal. The nonlinear regulator attempts, therefore, to of both x3 and xj; thus,
lift the rod to the vertical, demanding a large initial control
force to do this. The state and control force evolution is
qualitatively similar to that of the earlier examples but is
more vigorous still. For the initial angle that leaves the rod
hanging helow the horizontal we see qualitatively different
behaviour.
The early linear motion is in the opposite direction, where D=(x) is the determinant of the controllability
leading to a more nearly monotonic reduction in angular matrix for the original system (15). Evidently, the
12 IEE Prm-Conirol Theov Appl.. 61. 150. No. 1. . / m u u v 2003
time, s time, s
a b

200 500

100
.... -

. . .. . . .

-01- -500
0 2 4 2 4

time, s time, s
C d

Fig. 6 Conprison ofsiure evolution under nonlineur cvnrmljor l a q e inirial angles

stabilisability of the augmented system behaves as the tuning of the parameters R and R, but this is not
original with respect to angular displacement hut will considered further.
additionally be compromised for large values of xs. This We note that beyond 10ol =47" the computation of the
is demonstrated in Fig. 8. ARE solution becomes ill-conditioned. This occurs
We simulate for the maximum feasible initial because the point in (x3. xs)-space enters the region of
angle 0(0)=47" and also show the unconstrained NLQR poor controllability, and saturation prevents its recovery in
solution so that the difference in demanded force and any reasonable time. This situation is exacerbated in the
the effect on the displacements can easily be examined case of initial conditions drawn from the controllable
(Fig. 9). regions that do not include the origin. In these cases no
It is clear from Fig. 9 that the demanded force (top return to upright is possible owing to very large early
right) now never exceeds what can be delivered by the requirements on .r5.
hardware, while the unconstrained solution demands
more than twice the allowable force-the trade-off
being larger excursions and greatcr total variation in the 5 Conclusions
angular and linear displacements. We do not show
the corresponding velocities for reasons of space, and A new method for the design and synthesis of near-
instead the input to the integrator-the new control signal optimal, nonlinear control laws is examined, based on a
1,-is shown (bottom right). It is possible to achieve generalisation of LQ optimal control theory. The method is
a closer match to the original problem by judicious simple to apply and affords greater design flexibility via
IEE Pmc.-Cmlml Theory Appl., 1'01. 150, h'o. I , .lanuoqa ZOO3 13
contours of 1Ndet MA
3001

100
-1 o
-1 2

-14

-1 6

-1 8

8. deg
1- 8(0)=70" 1 Fig. 8 Contorm of In(det(MJ) .for the augmented .sv.ytem

time, s
that would permit a parallel solution are discussed.
a Notwithstanding the technical difficulties of real-time
implementation, it is safe to say that the essential problem
5007 of real-time, nonlinear optimal control, namely the acausal
nature of the solution, can be overcome by this approach,
and a way has been opened towards real-time optimal
quadratic regulation.
A common benchmark problem-stabilisation of a
simple inverted pendulum-is used to illustrate the
method. The application throws up some interesting
points and proves an exacting task. In particular, the
requirement for local stabilisability breaks down at a
number of points in state space, points that are neces-
sarily visited for large initial angular displacements.
While these points are isolated they induce neighbour-
hoods of poor stabilisability that cause numerical ill-
conditioning in the ARE solver. Nonetheless, the NLQR
algorithm manages to recover in these situations, albeit
with an impractically large force requirement. Clearly it
would be important to ensure that such problems are
avoided in any real application and so one should think
of thc method as widening the range of operation rather
than as providing a global solution. A method to over-
come nonlinearity in the control signal is presented and
demonstrated for a saturation nonlinearity in the target
problem. This exploits thc capacity for state-dependent
time, s weighting functions in the cost functional. It is seen
from this how such a constraint affects the regions of
b stabilisability, in particular, how the new problem
becomes practically unsolvable for large initial angles.
Fig. 7 Control forces generated f o r simslations of Fig. 6
A further use of state-dependent weighting might be to
Note that for initial angles greater than 90" large impulsive forces are
required at times. These arc associated with regions on the state
discourage large lateral excursions of the cart in order
trajectory corresponding to poor stabilisability and generate very that any limits on track length may be accommodated.
sharp reversals in linear velocity This is not considcred here but will be necessary for any
practical implementation.
Finally, the question of state estimation arises as
an adjunct to the present work. The extended
Kalman filter is a natural candidate here but work has
state-dependent weighting than do conventional been carried out that exploits the duality of regulation
approaches (including receding horizon control). The and estimation problems [I31 and which provides a
resulting regulator can, in principle, be implemented in nonlinear dual estimator that fits the present framework.
real time owing to the causal nature of the required Future work will incorporate both regulator and estima-
computations. However, the need to solve an ARE every tor in the control of the physical device referred
time step is burdensome, and a number of algorithms to above.
14 IEE Proc.-Cmrroi Theon. Appt.. Yo/. 150. No. I . .hnuuw 2003
3 100
NLQRI
. ... . . .
8(0)=47'
2
50

E z
< ' U-

C
0

-1 -5c
2 4 2 4

time, s time, s
a b

50

\
2000

P > o
nm- o

-2000

-50
2 4 2 4
time, s time, s
C d

Fig. 9 Conrparison of sfore and conrrol evolrrrior, trnder nonlinear. (NLQR) and nonlinear intrgrul (N1,QRU conrrol for an inifial
mgle "f47'

I 0 ROBERTS, J.D.: 'Linear model reduction and solution ofthe algebrdic


6 References Kiccati cquvtion by use of thc sign function', Int J Conrml, 1980, 32,
pp. 677-687
I BANKS, S.P., and MHANA, K.J.: optimal control and stabili7ation for II BALZER, L.A.: 'Accelerated convergence of the matfix sign function
non-linear systems', I,IU J M d h . Conlml It$, 1992. 9, pp. 179-196 method of solving Lyapunov, Riccati and other matfix cquations'. Inf J
2 MCCAFFREY. D.. and BANKS. S.P.: 'Lagrangian manifolds and Cmtrol, 1980.32, pp. 1057-1078
asymptotically optimal stabilizing feedback control'. S p l . Conlrnl 12 BUNSE-GERSTNER. A.. and FADBENDER, H.: 'A Jacobi-like method
Lei!., 2001, 43. pp. 219-224 for solving algebraic Riccati equations on pamllcl computers', IEEE
3 MRACEK. C.P.. and CLOUTIER, J.R.: 'Control designs for the non- Puns. Aiiiorri. Conirol, 1997.42, pp. 1071-1084
linear bench-mark Droblem via the state-dcoendent R i m t i cquation 13 MCCAFFREY. D., HARRISON. R.F., and BANKS, S.P.: 'Asympto-
tically optimal nonlinear fillenng'. Research Rcputi 734, The Unwenity
ofSheReld 1998
14 VAN DER SCHAFT, A.J.: 'On a state space appmach to H, nowlinear
shop on Inteliigenl & m m n e ~ & for vehicles. ICv98, Srvilla, %in, control', SJ.~I.Control Leri., 1991, 16, pp. 1-8
1998. pp. 4 1 7 4 2 2
5 HARRISON. R.F., and BANKS, S.P.: 'Towards real-time, non-linear
quadratic optimal regulation'. Proceedings of the IFAC Workshop on
Algorithms and architectures for real-time control. AARTC'98, Cancun, 7 Appendix
M&o. I Y Y X , pp. 107-1 I?
6 GARDINER. J.D.. and LAUB, A.J.: 'A genenliiation of the matrix- A brief outline of thc NLQR derivation [Z] follows.
siun-funcrion for alnehraic Riccati eaualions'. h l . J. Cmrrol. 1986. 44.
Consider the following infinite-time, optimal control
problem:

methods, McCraw-Hill Bo& Co.. NcGYork. 1987)


9 LAUB. A.J.: 'A Schur method for solving algebraic Riccati equations'. - J ( x o )= inf i r x ' Q ( x ) x + u'R(x)udr (21)
"CL2(0.N)2 0
IEEE Ponr. A i m n t . Conlml, 1979. 24, pp. 913-921

IEE P ~ ~ ~ ~ Theon. I ~isa


. ~ c ~ ~p p~l . .%I. ~ NO
I I i. JOWV 2003 IS
+
subject to the system dynamics x = f(x) B(x)u; x E 88"; Under the 'linearisation' f(x) = A(x)x, V,f(x) = A(x) and
U E R": f(0) = 0 . Thc functions f, B, Q > 0 Vx, and R z 0 sincc V:J(X)I,=~=O [14], we can write V:J(x)=P(x)x.
Vx are analytic and of appropriate dimension. The cost Substituting into (23) and noting that 2x'P(x)A(x)x =
functional, J(x), is a solution of the Hamilton-Jacobi- +
x'(P(x)A(x) A'(x)P(x))x, we obtain
Bellman equation
x'(-P'(x)A(x) - A'(x)P(x) + P'(x)B(x)R-'(x)B(x)P
4
max -VxJ(x)(f(x)
2
I
+ B(x)u) - -x'Q(x)x - -u'R(x)u
2I 1
=0

(22)
- Q(x))x = 0

The approximation to the solution ignores the fact that


(24)

where v . ~ [a/a.\-, ajax,. . . ajax,,]. if (v,f(x)l.,o,


A B(o), P(x)x should equal the gradient of some function and,
Q1" (0)) is stabilisable and detectable, a smooth solution, instead, assumes that P(x) is symmetrical. Then, at each x,
J(x), of (22), is known to exist [ 141. Let f(x) = A(x)x such the algorithm simply finds the positive semi-definite solu-
that lim,, oA(x) + V,f(x)lx=o; then, in the region where tion to the state-dependent ARE,
J(x) is a smooth solution to (22). the maximum is attained
when U = -R-'(x)B'(x)P:J(x) and J(x) is the solution of P(x)A(x) + A'(x)P(x) - P(x)B(x)R-'(x)B(x)P + Q(x) = 0
I (25)
- VxJ(x)f(x) t -VXJ(x)B(~)R-'(x)Bi(x)V:J(x)
2
1 and applies, at that x, the control U = -R-'(x)B'(x)P(x)x.
- - x'Q(x)x = 0 (23) It should be noted that, while smoothness in the solution
2 is used in the derivation, it is not a requirement for
by substitution application.

16

You might also like