Professional Documents
Culture Documents
of an inverted pendulum
R.F. Harrison
Abstract: A method for the design and synthesis of near-optimal, nonlinear control laws is
examined, based on a generalisation of linear quadratic optimal control theory and which
effectively provides a near-optimal gain schedule. The method is simple to apply and affords
greater design flexibility (via state-dependent weighting) than conventional approaches. The
resulting regulator can, in principle, he implemented in real time owing to the causal nature of
the required computations. However, the need to solve an algebraic Riccati equation at every time
point is burdensome, and a number of algorithms that would permit parallel computation are
discussed. The problem of stabilising an inverted pendulum is used to illustrate the method and
proves an exacting task that highlights a number of issues.
or, altematively,
4x3) J
where M and m (kg) denote the mass of the cart and the A(x) =
pendulum, respectively, L (m) is the length of the pendu-
0 1 0 0
I
lum and J (kgm') is its moment of inertia about the
centre of mass. H ( t ) (rad) denotes angular displacement m2L2gsi&) cos(x,) m L sin(x;)x,(J+mL')
0 0
from the vertical and x ( t ) (m) denotes lateral displacement d(xjb3 d(4
from the mid-point of the track. F(t) (N) is the applied 0 0 0 1
force.
By substitution in (10) we obtain the explicit state space 0 0
-mL sin(x3)g(hf +m) m2L2sin(x,)x, cos(.r;)
form: &3)x3 4x3)
(14)
4,=s2
Evidently, the terms sin ( x 3 ) / x 3 in (12) are well behaved at
-(mL.d sin(x,)(J + mL2) - ni2L2gsin(x3) cos(x,) the origin and d(0) is nonzero. It can be easily verified that
x* = +(J + mL')u) in either case, (12) or (14), A(x), B(x) reduce to their
conventional linearisations at the origin. To evaluate the
m2L2cos2(x3) - (J + mL2)(M + m )
stahilisability of A(x), B(x) we simply form the deter-
i3= .iI minant of the controllability matrix, given by
+
-mL(g sin(~r3)(M m ) - niD;: cos(.r,) sin@,)
DJx) = det([B AB A2B A'BI)
i-4 -- - COS(S3)U)
-
- cos*(x; ) sin2(x3) ~ ~ m ~ g ~
m2L2cos2(x;) - ( J + mL2)(M + ni) .<d4(.r3)
(15)
35
25 A
5j_
a200 -150 -100 -50 0 50 100 150 200
8, deg
Fig. 2 Detenninanl of conlmllabilil?, matrix as n Jmction of
angular displcrcement
9
unstable mode that becomes uncontrollable in both cases, a 4/5th order, variable step-length Rung-Kutta formula.
leading to a loss of stabilisability and hence a failure to Default parameter settings for the solver were used. The
fulfil the necessary requirements for the solution of the states arc sampled at an interval of 0.1 s, which means
ARE. Trajectories entering these regions can be expected that the solver is forced to evaluate the solution at the
to cause numerical difficulties in the ARE solver, and this demanded output points whenever these do not coincide
will indeed be seen to be the case. Clearly, the detectability with its automatic choices. Whilst, in general, the
requirement can be guaranteed assuming a diagonal Q, by choices of solver typc and parameters may have a consi-
ensuring that all qii > 0, i = I , . . . , 4 . derable effect on the numerical solution of nonlinear
ordinary differential equations, it has becn found that the
dominating practical factor in the following simulations is
4 Simulation results the difficulty of solving the ARE in thc vicinity
of lI= *7r.
For our simulations we use parameter settings correspond-
ing to the inverted pendulum rig that resides in the
Department of Automatic Control & Systems Engineering 4.1 Comparison with linear LOR solution
of the University of Sheffield, namely: M= 1.92 kg, First we look simply at how the NLQR solution behaves in
m=O.I85kg, L=0.508m and with g = 9 . 8 1 msC2. The comparison with the LQR solution, i.e. the nonlinear
pendulum comprises a uniform pole and no 'hob', giving system compensated with a set of linear gains computed
J = mL-'/3. All simulations are carried out in the Simulink at the origin for the same values of Q, R. We choose,
environment of Matlab v6. I. The chosen solver is based on arbitrarily, Q=diag{ 100.0, 0.1, 1.0, 0.1) and R = I to
time. 5 time, s
a b
40 100
20
0
P
U 0
.n-
-20
4 0 -200
.i . . . .. . .
O(0)-40"
2 4 2 4
time, s time, s
C d
reflect that the linear displacement will be a major (0(0)=55") we note substantial differences in the two
constraint on any practical implementation and consider behaviours (Fig. 4). While the nonlinear regulator results
only initial angular displacements, all other initial values in state trajectories are qualitatively similar to those in
being set to zero. After some numerical experimentation Fig. 3-with correspondingly larger magnitudes-the
(using 0.1" increments) we find that the LQR solution is linear compensator demonstrates wilder behaviour and a
unstable for initial conditions IS(0)l >_ 55.2", while the far higher total variation.
NLQR solution remains stable for IQ(0)lS73.8" and Fig. 5 indicates that the behaviour of the control force
96.0" 5 IS(0)l 5 179.9". This indicates a degree of asym- can he similarly summarised. It is interesting to note that
metry in the numerical conditioning of the problem in that behaviour up to the first reversal in the states a n 4 to a
it is possible to approach the singularity at -+180" much certain extent, the force, is approximately equal for both
more closely than it is possible to approach *90". approaches.
As might he expected, for small excursions the NLQR Bevond the 'linear stahilitv limit' we can onlv revort the
and LQR solutions are approximately equal. Fig. 3 illus- heha;iour of the NLQR s o h i o n . As noted earlier, while
trates this for an initial angular displacement of 40'. This is only isolated points cause a failure to meet the required
the value at which the two solutions first begin to deviate. stabilisahility conditions, numerical sensitivity excludes
Notice that the nonlinear controller generates less marked larger regions from effective control. Fig. 6 shows the
excursions, in line with what is expected. state evolution for two large initial angles, O(0) = 7O", 1 IO"
Moving now into the vicinity of the limit at which the that straddle the difficult region. In the first instance, the
current LQR design is able to stabilise the system physical situation corresponds more closely with the
€
<
... ..
........
-L
2 4 2 4
time, s time, s
a b
100,
-1 00
0 2 4 0 2 4
lime. s time, s
C d
Fig. 4 Comparisoa of srnre evolution jw .system under nonlineur (NLQR) and linear (LQR) conhol fur an inirial angle of 55"
_:I
through regions of low .stabilisahility a n d indeed, to
cross the singularity at B = 90". This causes an impulsive
force to be generated and a reversal in sign. Evidently,
system inertia serves to smooth this effect but, nonetheless,
it can be seen as a serious limitation. The fact that the
system was able to recover reinforces our earlier assertion
=
U- 20 that it may still be practically possible to pass through
regions of low stahilisability provided that it is done
quickly enough. Fig. 7 shows the control forces for both
cases.
lool
model in (6) thus:
lsoi
(16)
and find the optimal policy that minimises
Q' [ O
@*(x,+,)R I
... .. and is itself state-dependcnt. Subject to the point-wise
..
p F
.. . . .. . .
0(0)= 5 5 O
detectability and stabilisability conditions being satisfied,
a problem that is close to the original (assuming R, is
small) can be solved.
-1 00 I
0 2 4
Returning now to the example, and modelling the satu-
ration as a smooth function,
time, s x > i
200 500
100
.... -
. . .. . . .
-01- -500
0 2 4 2 4
time, s time, s
C d
stabilisability of the augmented system behaves as the tuning of the parameters R and R, but this is not
original with respect to angular displacement hut will considered further.
additionally be compromised for large values of xs. This We note that beyond 10ol =47" the computation of the
is demonstrated in Fig. 8. ARE solution becomes ill-conditioned. This occurs
We simulate for the maximum feasible initial because the point in (x3. xs)-space enters the region of
angle 0(0)=47" and also show the unconstrained NLQR poor controllability, and saturation prevents its recovery in
solution so that the difference in demanded force and any reasonable time. This situation is exacerbated in the
the effect on the displacements can easily be examined case of initial conditions drawn from the controllable
(Fig. 9). regions that do not include the origin. In these cases no
It is clear from Fig. 9 that the demanded force (top return to upright is possible owing to very large early
right) now never exceeds what can be delivered by the requirements on .r5.
hardware, while the unconstrained solution demands
more than twice the allowable force-the trade-off
being larger excursions and greatcr total variation in the 5 Conclusions
angular and linear displacements. We do not show
the corresponding velocities for reasons of space, and A new method for the design and synthesis of near-
instead the input to the integrator-the new control signal optimal, nonlinear control laws is examined, based on a
1,-is shown (bottom right). It is possible to achieve generalisation of LQ optimal control theory. The method is
a closer match to the original problem by judicious simple to apply and affords greater design flexibility via
IEE Pmc.-Cmlml Theory Appl., 1'01. 150, h'o. I , .lanuoqa ZOO3 13
contours of 1Ndet MA
3001
100
-1 o
-1 2
-14
-1 6
-1 8
8. deg
1- 8(0)=70" 1 Fig. 8 Contorm of In(det(MJ) .for the augmented .sv.ytem
time, s
that would permit a parallel solution are discussed.
a Notwithstanding the technical difficulties of real-time
implementation, it is safe to say that the essential problem
5007 of real-time, nonlinear optimal control, namely the acausal
nature of the solution, can be overcome by this approach,
and a way has been opened towards real-time optimal
quadratic regulation.
A common benchmark problem-stabilisation of a
simple inverted pendulum-is used to illustrate the
method. The application throws up some interesting
points and proves an exacting task. In particular, the
requirement for local stabilisability breaks down at a
number of points in state space, points that are neces-
sarily visited for large initial angular displacements.
While these points are isolated they induce neighbour-
hoods of poor stabilisability that cause numerical ill-
conditioning in the ARE solver. Nonetheless, the NLQR
algorithm manages to recover in these situations, albeit
with an impractically large force requirement. Clearly it
would be important to ensure that such problems are
avoided in any real application and so one should think
of thc method as widening the range of operation rather
than as providing a global solution. A method to over-
come nonlinearity in the control signal is presented and
demonstrated for a saturation nonlinearity in the target
problem. This exploits thc capacity for state-dependent
time, s weighting functions in the cost functional. It is seen
from this how such a constraint affects the regions of
b stabilisability, in particular, how the new problem
becomes practically unsolvable for large initial angles.
Fig. 7 Control forces generated f o r simslations of Fig. 6
A further use of state-dependent weighting might be to
Note that for initial angles greater than 90" large impulsive forces are
required at times. These arc associated with regions on the state
discourage large lateral excursions of the cart in order
trajectory corresponding to poor stabilisability and generate very that any limits on track length may be accommodated.
sharp reversals in linear velocity This is not considcred here but will be necessary for any
practical implementation.
Finally, the question of state estimation arises as
an adjunct to the present work. The extended
Kalman filter is a natural candidate here but work has
state-dependent weighting than do conventional been carried out that exploits the duality of regulation
approaches (including receding horizon control). The and estimation problems [I31 and which provides a
resulting regulator can, in principle, be implemented in nonlinear dual estimator that fits the present framework.
real time owing to the causal nature of the required Future work will incorporate both regulator and estima-
computations. However, the need to solve an ARE every tor in the control of the physical device referred
time step is burdensome, and a number of algorithms to above.
14 IEE Proc.-Cmrroi Theon. Appt.. Yo/. 150. No. I . .hnuuw 2003
3 100
NLQRI
. ... . . .
8(0)=47'
2
50
E z
< ' U-
C
0
-1 -5c
2 4 2 4
time, s time, s
a b
50
\
2000
P > o
nm- o
-2000
-50
2 4 2 4
time, s time, s
C d
Fig. 9 Conrparison of sfore and conrrol evolrrrior, trnder nonlinear. (NLQR) and nonlinear intrgrul (N1,QRU conrrol for an inifial
mgle "f47'
(22)
- Q(x))x = 0
16