You are on page 1of 11

review articles

DOI:10.1145/ 2743132
Actions take place in the physical
Robots move to act. While actions operate space. Motions originate in the motor
control space. Robotsas any living
in a physical space, motions begin in systemaccess the physical space only
a motor control space. So how do robots indirectly through sensors and motors.
express actions in terms of motions? Robot motion planning and control ex-
plore the relationship between physi-
cal, sensory, and motor spaces; the
BY JEAN-PAUL LAUMOND, NICOLAS MANSARD,
three spaces that are the foundations
AND JEAN BERNARD LASSERRE of geometry.32 How to translate actions
expressed in the physical space into

Optimization
a motion expressed in motor coordi-
nates? This is the fundamental robot-
ics issue of inversion.
In life sciences, it is recognized that

as Motion
optimality principles in sensorimotor
control explain quite well empirical
observations, or at least better than
other principles.40 The idea of express-

Selection
ing robot actions as motions to be op-
timized was first developed in robot-
ics in the 1970s with the seminal work
by Whitney.41 It is now well developed

Principle in
in classical robot control,37 and also
along new paradigms jointly devel-
oped in multidisciplinary approach-
es.35 Motion optimization appears to

Robot Action
be a natural principle for action selec-
tion. However, as we explained in the
companion article,24 optimality equa-
tions are intractable most of the time
and numerical optimization is notori-
ously slow in practice. The article aims

key insights
For robots and living beings, the link
MOVEMENT IS A fundamental characteristic of living between actions expressed in the physical
space and motions originated in the motor
systems (see Figure 1). Plants and animals must move to space, turns to geometry in general
survive. Animals are distinguished from plants in that and, in particular, to linear algebra. In
life science the application of optimality
they have to explore the world to feed. The carnivorous principles in sensorimotor control
unravels empirical observations. The idea
plant remains at a fixed position to catch the imprudent to express robot actions as motions to be
optimized has been developed in robotics
insect. Plants must make use of self-centered motions. since the 1970s.
ILLUSTRATION BY PETER CROW TH ER ASSCOIATES

At the same time the cheetah goes out looking for food. Among all possible motions performing
a given action, optimization algorithms
Feeding is a paragon of action. Any action in the tend to choose the best motion according
physical world requires self-centered movements, to a given performance criterion. More
than that, they also allow the realization
exploration movements, or a combination of both. By of secondary actions.

analogy, a manipulator robot makes use of self-centered Optimal motions are action signatures.
How to reveal what optimality criterion
motions, a mobile robot moves to explore the world, underlies a given action? The question
opens challenging issues to inverse
and a humanoid robot combines both types of motions. optimal control.

64 COMMUNICATIO NS O F TH E AC M | M AY 201 5 | VO L . 5 8 | NO. 5


MAY 2 0 1 5 | VO L. 58 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 65
review articles

Figure 1. Stones and hammers do not move by themselves. tions, assuming small descent steps,
is a discretization of the real trajec-
Movement is a prerogative of living (and robot) systems. Plants (and manipulator robots) move tory from the initial configuration to
to bring the world to them via self-centered movements. Animals (and mobile robots) navigate to the goal. The drawback of the instan-
explore the world. Human (and humanoid) actions are built from both types of movement. taneous linearization is it provides no
look-ahead capabilities to the control,
which might lead the robot to a local
minimum, typically when approach-
ing non-convex obstacles. This is the
well-known curse of linearization.

Motion Selection
The dimension of the task space can
be equal, greater, or lower than the di-
mension of the control space. For the
sake of simplification, let us consider
the task space as a manifold that ex-
presses the position of the end effector
of a fully actuated manipulator. When
to provide a short overview of recent imposed by the control frequency of the dimensions of both the task space
progress in the area. We first show the robots, the problem is addressed and the configuration space are equal,
how robot motion optimization tech- only locally by considering the tangent each point in the task space defines
niques should be viewed as motion spaces of both the task space and the a single configurationa and the task
selection principles for redundant configuration space. Such a lineariza- function can be used to drive the robot
robots. In that perspective, we review tion involves the Jacobian matrix31 and to a unique configuration. There is no
results and challenges stimulated by resorts to all the machinery of linear al- problem of motion selection. The Ja-
recent applications to humanoid ro- gebra. The linearization is particularly cobian matrix is square invertible and
botics. The remainder of the article is interesting as the tangent space of the solving the linear problem is easy. The
devoted to inverse optimal control as configuration space gathers the config- task function approach was initially
a means to better understand natu- uration velocities that usually contain proposed in this context to define ad-
ral phenomena and to translate them the robot control inputs. Dynamic ex- missibility properties to connect two
into engineering. The question opens tensions of this principle allow consid- points of the configuration space while
highly challenging problems. In that ering torque-based controls.19 avoiding singularities.34
context, methods based on recent The Jacobian matrix varies with Optimization is used as motion se-
polynomial optimization techniques the robot configuration, making the lection principle in the other cases.
appear complementary to classical search for a trajectory nonlinear. How- The choice of the optimization crite-
machine learning approaches. ever, for a given configuration, it de- rion determines the way to invert the
fines a linear problem linking the un- Jacobian matrix, as we will explain.
Power and Limits of Linearization known system velocity to the velocity When the task space has a larger
Translating actions in terms of mo- in the task space given as references. dimension than the configuration
tions expressed in the robot control From a numerical point of view, this space, it is not always possible to find
space has been expressed in many problem is linear and can easily be a configuration satisfying the task tar-
ways, from the operational space for- solved at each instant to obtain the get: the task function is not onto, that
mulation19 to the task function ap- system velocity. The integration of this is, the Jacobian matrix has more rows
proach,34 to cite a few. The notion of velocity from the initial configuration than columns. It is then not possible
task encompasses the notion of action over a time interval draws a trajectory to find a velocity in the configuration
expressed in the physical space. The tending to fulfill the task. The velocity tangent space that corresponds to
task space may be the physical space can similarly be applied in real time by the velocity in the task tangent space.
(like for putting a manipulator end the robot to control it toward the goal. For instance, this is the case in visual
effector to some position defined in a The linear problem is re-initialized servoing when many points should be
world frame) or a sensory space (like at each new configuration updated tracked in a camera frame.4 This is also
tracking an object in a robot camera with the sensor measurements and the case in simultaneous localization
frame). The role of the so-called task the process is iterated. This iterative and mapping when optimizing the
function is to make the link between principle corresponds to the iterative positions of the camera with respect
the task space and the control space. descent algorithms (like the gradi-
Due to the underlying highly non- ent descent or the Newton-Raphson
linear transformations, the inversion descent), which are used to numeri- a The non-linearities in the task function can
generate a discrete set of configurations accom-
problem is very costly to solve (minutes cally compute the zero value of a given plishing the task, corresponding to several op-
or hours of computation for seconds of function. However, the method gives tions. In such cases, the discussion holds, but
motion). To meet the time constraints more: the sequence of descent itera- only locally.

66 COMMUNICATIO NS O F TH E AC M | M AY 201 5 | VO L . 5 8 | NO. 5


review articles

to the landmarks.13 Optimization is (since the transformation is not one-


used to find a velocity that minimizes to-one). Therefore, the selection prob-
the error in the task tangent space. lem becomes a double minimization
The problem is then to minimize the problem: we simultaneously mini-
distance to the reference task vector.
Generally, the reference task vector It is possible to mize the distance to the task and the
norm of the configuration velocity. A
cannot be reached. In the special case,
when the reference belongs to the im-
recognize actions solution for this double minimization

from motion
problem is given by the Moore-Pen-
age space of the Jacobian, the residual rose pseudo-inverse,2 also called the
of the optimization is null. This is the
case in visual servoing, when the tar-
observation using least-square inverse. Notice that other
minimization criteria may be consid-
get image has been acquired from a reverse engineering ered in the same framework by chang-
real scene with no noise.
On the contrary, if the dimen-
techniques. ing the metrics in the tangent spaces.
For instance, we can use weighted
sion of the task space is smaller than pseudo-inverses in which the compo-
the dimension of the configuration nents of the system input (columns
space, several configurations corre- of the Jacobian matrix) and the task
spond to a single task. The task func- residual (rows of the Jacobian) do not
tion is not one-to-one; that is, the receive the same weight. As before the
Jacobian matrix has more columns sum of the optimal vector with the
than rows. For instance, in the case of null space gives the set of all solutions
a 30-joint humanoid robot picking a that are optimal for the first problem
ball with its hand: the dimension of (smallest distance to the task) but only
the task space is three while the di- suboptimal for the second one (small-
mension of the configuration space est velocity norm).
is 30. Several motions may fulfill the
task. The system is said to be redun- Optimization as Selection Principle
dant with respect to the task. Opti- Stack of tasks for redundant systems.
mization is then used as a criterion When a robot is redundant with re-
to select one motion among all the spect to a task, it is interesting to al-
admissible ones. In that case, several locate it a secondary task. This is the
vectors in the configuration tangent case for humanoid robots that can
space produce the same effect in the perform two tasks simultaneously.
task space. Equivalently, some ve- Consider two distinct task functions
locities produce no effect in the task dealing with the positions of the right
space. This subset of the configura- and left hands respectively. How to
tion tangent space is the kernel of the check if both tasks are compatible? A
Jacobian matrix and is called the null simple idea consists of ordering the
space of the task. Any velocity in the two tasks. At each time step of the in-
null space leaves the task unchanged. tegration process, a vector of the con-
Adding up a given configuration tan- figuration tangent space associated to
gent vector satisfying the task with the first task is selected. Then the sec-
the null space gives the vector space ondary task is considered only within
of all velocities satisfying the task. the restricted velocity set lying in the
The minimization problem consists kernel of the first task. The reasoning
in selecting one sample in this space, that applies to the first task also ap-
according to some criteria, for exam- plies to the projected secondary task:
ple, the least-norm velocity. the task function may be onto, one-
In general, the task function may to-one, or neither of it. In particular,
neither be onto nor one-to-one; that if it is not onto, the task is said to be
is, the Jacobian matrix is neither full singular (in the sense the Jacobian is
row rank (its rows are not linearly in- rank deficient). Two cases can be dis-
dependent) nor full column rank (that tinguished. If the (not projected) sec-
is, its columns are not linearly inde- ondary task function is not onto, then
pendent). In general, no vector in the the singularity is said to be kinematic:
configuration tangent space satisfies it is intrinsically due to the second-
the task (since the transformation is ary task. On the opposite, if the (not
not onto), and there is infinity of vec- projected) secondary task function is
tors that minimize the distance to the onto, then the singularity is said to be
task vector in the task tangent space algorithmic: because of a conflict with

MAY 2 0 1 5 | VO L. 58 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 67


review articles

Figure 2. Examples of motions generated by the stack of task.

Left column: The stack of tasks is composed (a) (f)


of three constraints (both feet and center of mass
should remain at a fixed position; all of them
are always feasible and satisfied) and two tasks
to control the gaze and the right hand. At the final
configuration (e), the robot has reached the ball
with its right hand and the ball is centered in
the robot field of view. The left hand had moved
only to regulate the position of the center of
mass. Right column: The motion is similar
but a task has been added to control the position
of the left hand: the desired position imposed
to the left hand is the final position reached
by the left hand in the previous movement.
(b) (g)
The two motions look very similar, but their
meanings are different. In the right motion
(ae), the left hand moves to regulate the
balance; in the right motion (fj), the left hand
moves to reach a specific position.

(c) (h)

(d) (i)

0.40
Right-hand reaching only
0.35

0.30 Right-hand and left-hand


reaching
0.25
Distance (m)

0.20
(e) (j)
0.15

0.10

0.05

0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Time (s)

The motions of the left hand look similar but


they are not exactly the same. The curves show
the distance function for the left hand going
from its initial position to its final position
in both cases. The curves are different. The
exponential decreasing of the distance function Right-hand reaching while enforcing the Simultaneous right-hand and left-hand
(red curve) signs the presence of the left-hand balance: the left hand has moved only to reaching: the target imposed to the left hand
reaching task. correct the balance. is the final position reached in the previous
(right-hand only) movement.

68 COMMUNICATIO NS O F TH E AC M | M AY 201 5 | VO L . 5 8 | NO. 5


review articles

the main task, the secondary one be- be extended to more than two tasks. aries), this degree of freedom can be
comes singular.5 Outside of algorith- Up to now, we have considered a task used by the secondary tasks. In order
mic singularities, the two tasks are corresponds to equality in the configu- to take advantage of the redundancy
said to be compatible and the order ration tangent space, to be satisfied at offered inside the region defined by
between them is irrelevant. best in the least-square sense. Consider the inequality constraints, the corre-
The projection process can be it- a region defined by a set of inequalities: sponding degrees of freedom should
erated for other tasks, resulting in a the robot can move freely inside the be dynamically allocated. If the in-
so-called stack of tasks.26 In doing so, region but should stay inside it; when equality constraint is satisfied, the
the dimension of the successive null the task becomes infeasible, it should degree of freedom is left unallocated
spaces decreases. The process stops minimize its distance to the region in and can be used by a secondary task:
either when all tasks have been pro- the least-square sense. Such inequality the constraint is said to be inactive.
cessed, or as soon as the dimension of constraints cannot be solved directly If the constraint is violated, then the
the null-space vanishes (see Figure 3). with the method described here. corresponding degree of freedom is
In the latter, we can still select the min- Historically, the first solution has used to satisfy the constraint at best:
imum-norm vector among those in the been to set a zero velocity in the task the constraint is said to be active.
remaining null space. space when the inequality constraint is The set of all active constraints is
The null space was first used in satisfied. This is the artificial potential called the active set. Active-set-search
the frame of numerical analysis for field approach proposed by Khatib:18 algorithms are iterative resolution
guiding the descent of sequential op- the target region is described with a schemes searching over all possible
timization.33 It was used in robotics low or null cost, while the cost increas- active constraints. A candidate solu-
to perform a positioning task with a es when approaching the limit of the tion is computed at each iteration that
redundant robot while taking care of region, following the behavior of the fits the active constraints. Depending
the joint limits.25 A generalization to barrier functions used in the interior- on the status of the active and inactive
any number of tasks was proposed point numerical algorithms. The gradi- constraints with respect to the candi-
in Nakamura,30 and its recursive ex- ent of the function then acts as a virtual date solution, the active set is modi-
pression was proposed in Siciliano38 force that pushes the robot inside the fied and the process is iterated. Active-
(see also Baerlocher1 in the context of region when approaching the region set search algorithms are classical to
computer animation). boundaries while it has zero or very solve inequality-constrained quadratic
Here, we limit the presentation to in- little influence inside the region. programs. The priority order between
verse kinematics, that is, computing the For robot control, penalty func- multiple coning objectives can be in-
robot velocities from reference velocity tions are generally preferred to bar- troduced, leading to hierarchical qua-
task constraints. The same approach rier functions to prevent bad numeri- dratic programs.9
can be used in inverse dynamics, to cal behavior when the robot is pushed Let us illustrate the stack-of-tasks
compute the system torques19 (typical- to the limits. For a single task or when framework from the worked out ex-
ly, joint torques, but also tendon forces the inequality task has the lowest pri- ample of HRP2 humanoid robot per-
or other actuation parameters) from ority, the obtained behavior is always forming two simultaneous reaching
homogeneous operational constraints satisfactory: the robot does not have tasks, while respecting equilibrium
(typically, reference forces or accelera- to move when the inequality is satis- constraints. All elementary tasks are
tions). In that case, the Euclidean norm fied. However, it is difficult to enforce embedded into a single global trajec-
is irrelevant, and weighted inverses are a hierarchy using this approach. The tory. We will see that the hierarchy in-
generally preferred to enforce mini- gradient-projection method27 can be troduced in quadratic programming
mum energy along the motion. used if the inequality task has a sec- induces a structure in the task vector
Stack of tasks, quadratic program- ondary importance, in particular, space. Doing so, the global trajectory
ming, and inequality constraints. A when enforcing the robot constraints appears as a composition of elemen-
stack of tasks can be compared to in a very redundant context (for in- tary movements, each of them char-
quadratic programming: a quadratic stance, a three-dimensional reaching acterizing a given task (or subtask).
program is an optimization problem task performed by a six-joint robot Reverse engineering can then be used
that involves a set of linear constraints arm). When the inequality task has to identify the meaning of the mo-
and a quadratic cost (for example, a the priority, the saturation of one tion, that is, the various tasks the mo-
linear function to be approximated boundary of the task region will corre- tion is embedding.
in the least-square sense). It is then spond to the allocation of one degree
similar to a stack with two tasks: the of freedom,b which is allocated to fix Motion as Action Signature
first (with higher priority) would be the velocity orthogonal to the bound- We review here two practical applica-
the constraint; the secondary would ary. This degree of freedom is thus not tions of the stack of tasks on the hu-
be the cost to minimize. However, the available anymore for any secondary manoid robot, HRP2.c The first one
similarity is not total: the constraint in task. Moreover, when freely moving shows how to express complex actions
a quadratic program is supposed to be inside the region (far from the bound- while involving all body segments
admissible (at least one feasible solu- and respecting physical constraints.
tion exists), while it is not the case for b A degree of freedom is a linear combination of
the main task. Also, a stack of tasks can controls in the configuration tangent space. c A detailed presentation appeared in Hak et al.12

MAY 2 0 1 5 | VO L. 58 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 69


review articles

The second application shows how it motions eliminates the ambiguity. hand task is removed first (second row),
is possible to recognize actions from This is made possible by a reverse- followed by the center of mass (third
motion observation using reverse engineering approach. row): this cancels most of the motion
engineering techniques. From motion to action: A reverse-engi- of the left hand because the coupling
From action to motion: The optimiza- neering approach of action recognition. between the three tasks is important;
tion-based selection principle at work. The hierarchy artificially decouples however a small part of left-hand move-
The stack of tasks is a generic tool to the tasks of the stack in order to pre- ment remains. On the contrary, the
generate and to control a robots mo- vent any conflict between two different head movement, which is nearly decou-
tion. Given an initial configuration, a tasks. A side effect is the trajectory into pled from the right-hand and center-of-
motion is generated by adding a set of a given active task space is not influ- mass, remains important. It is totally
tasks into the stack and integrating the enced by any other task. For example, nullified after removing the gaze task
resulting velocity until the convergence on Figure 2 (right side) the stack of (fourth row). The remaining motion of
of all active tasks. The stack of tasks can tasks enforces a decoupling for the left the left hand can only be explained by
be used in various robotics scenarios. In hand, which moves independently of the left-hand task, which is detected
humanoid robotics, classical tasks deal the two other tasks. The trajectory in active and then removed. Finally, the
with reaching (expressed as the place- one task space then constitutes a sig- two feet constraints are detected and
ment of an end effector), visual servo- nature of the activity of the task in the removed. The effect of the first foot re-
ing (expressed as the regulation of the generation of the robot motion. moval (sixth row) is noticeable.
gaze on the position of an object in the Consider the following problem: The algorithm achieves very good
image plane of the robot camera), or we observed the motion of a system performances to recognize actions and
quasi-static balance (expressed as the whose possible controllers are known. to tell the differences between similar-
regulation of the center-of-mass in such Observing only the joint trajectory, looking robot motions. Beyond robot-
a way that its projection on the floor lies the question is to reconstruct which ics, the method can be applied to human
inside the support polygon of the feet). of the possible controllers were active action recognition. However, it requires
For example, the motion in Figure 2 and which were the associated param- a critical prerequisite: the knowledge
(left column) is generated by constrain- eters. Recovering one task is easy: the of the optimality principles grounding
ing the two feet and the center of mass configuration trajectory is projected the motion generation of intentional
to remain to their initial positions and in all the candidate task spaces using actions. Indeed, the algorithm is based
by setting two tasks to control the right the corresponding task function. The on action signatures that are the typical
hand and the gaze both to the ball in best task is selected by fitting the pro- results of a particular cost function. The
front of the robot. The robot bends jected trajectory with the task model approach also requires a computational
forward to reach the ball. In doing so, (once more, the fitting and thus the model of the coordination strategies
it pushes the center of mass forward. selection is done by optimization). used by the human to compose several
The left hand moves backward to com- However, if the stack of tasks artifi- simultaneous motion primitives. They
pensate for this motion of the center of cially decouples the active tasks, some are the promising routes for future re-
mass. The motion of the left hand does coupling between the candidate tasks searches combining computational
not answer a specific action. It is a side may occur: for example, there are a neuroscience and robotics.
effect of the balance maintenance. lot of similarities between the trajec- At this stage, we have seen how opti-
Setting new tasks or changing the tories of the wrist and the elbow due mization principles and the notion of
desired value of the active tasks can to their proximity in the kinematic tasks help to ground a symbolic repre-
easily modify the motion. For ex- chain. These similarities can lead to sentation of actions from motions: an
ample, the motion in Figure 2 (right false positives in the detection. To action is viewed as the result of an op-
column) is generated by adding a task avoid this problem, only the most timization process whose cost repre-
that regulates the position and ori- relevant task is chosen first. The mo- sents the signature of the action. The
entation of the left hand to the final tion due to this task is then canceled next section addresses the dual prob-
placement of the left hand in the first by projecting the configuration trajec- lem of identifying action signatures
scenario. This new task is a reaching tory in the null space of the detected from motions.
task: the left hand must reach a goal. task. The detection algorithm then
The two movements of the left hand iterates until all the tasks have been Inverse Optimal Control
in both scenarios look very similar, found, that is, until the remaining Let us introduce the section by a case
but their meanings are different. quantity of movement after successive study taken from humanoid robotics.
In the first case, the motion is not projections is null.12 Suppose we want a humanoid robot
intentional: the left hand moves to This detection algorithm can be to walk as a human, that is, following
regulate the center-of-mass position; used to disambiguate the two similar- human-like trajectories. So the ques-
its motion is then a side effect of the looking motions performed in Figure 2, tion is: What are the computational
other tasks. In the second case, the without using any contextual informa- foundations of human locomotion tra-
motion is intentional: the left-hand tion. An illustration of the successive jectories? In a first stage, we showed
explicitly moves to reach a given tar- projections is given in Figure 3. The that locomotor trajectories are highly
get. A careful analysis of slight dif- tasks are removed in the order given stereotypical across repetitions and
ferences between the two left-hand by the detection algorithm. The right- subjects. The methodology is based

70 COMMUNICATIO NS O F TH E ACM | M AY 201 5 | VO L . 5 8 | NO. 5


review articles

Figure 3. Successive projection of the motion after detecting each of the seven tasks.

From top to bottom: original movement; removing the right-hand task; removing the center-of-mass task;
removing the gaze task; removing the left-hand task; removing the left foot task; removing the right-foot
task. On the last row, all the tasks are canceled, the projected movement is totally nullified.

MAY 2 0 1 5 | VO L. 58 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 71


review articles

on statistical analysis of a huge mo- Note that, in inverse optimization, the problem is to extend the methodology
tion capture data basis of trajectories main difficulty lies in having a tractable developed for inverse polynomial opti-
(seven subjects, more than 1,500 tra- characterization of global optimality for mization22 to the context of inverse op-
jectories).15 Next, in a second stage, it a given point and some candidate cost timal control. Note that the Hamilton-
is assumed that human locomotion criterion. This is why most of all the Jacobi-Bellman (HJB) equation is the
trajectories obey some optimality prin- above works address linear programs perfect tool to certify global optimal-
ciple. This is a frequent hypothesis in or combinatorial optimization prob- ity of a given state-control trajectory
human or animal motion studies. So lems for which some characterization whenever the optimal value function is
the question is: Which cost functional of global optimality is available and can known. The basic idea is to use a relaxed
is minimized in human locomotion? In sometimes be effectively used for prac- version of the HJB-optimality equation
practice, we consider the human and tical computation. This explains why in- as a certificate of global optimality for
the robot obey the same model, that is, verse (nonlinear) optimization has not the experimental trajectories stored in
we know precisely the differential equa- attracted much attention in the past. the database. The optimal value func-
tion that describes the motions un- Recently, some progress has been tion, which is generally assumed to be
der some control action, and the con- made in inverse polynomial optimiza- continuous, can be approximated on a
straints the state of the system should tion; that is, inverse optimization prob- compact domain by a polynomial. If we
satisfy. The data basis of trajectories is lems with polynomial objective func- search for an integral cost functional
available. Based on this knowledge, de- tion and semi-algebraic set as feasible whose integrand h is also a polynomial,
termining a cost functional that is min- set of solutions.22 Powerful representa- then this certificate of global optimality
imized in human locomotion becomes tion results in real algebraic geometry21 can be used to compute h and an asso-
an inverse optimal control problem. describe the global optimality con- ciated (polynomial) optimal value func-
Pioneering work for inverse opti- straint via some certificate of positiv- tion by solving a semi-definite program.
mization in control dates back to the ity. These can be stated as linear matrix Proceeding as in Lasserre,22 we solve a
1960s in systems theory applied to inequalities (LMIs) on the unknown hierarchy of semi-definite programs of
economics.29 Similarly, for optimal vector of coefficients of the polynomial increasing size. At each step of this hier-
stabilization problems, it was known cost function. The latter set is a convex archy, either the semi-definite program
that every value function of an opti- set on which we can optimize efficient- has no solution or any optimal solution
mal stabilization problem is also a Ly- ly via semi-definite programming,23 a h is such that the trajectories of the da-
apunov function for the closed-loop powerful technique of convex optimiza- tabase are global optimal solutions for
system. Freeman and Kokotovic10 tion. We can then show that computing the problem with polynomial cost func-
have shown that the reciprocal is true: an inverse optimal solution reduces to tion h as integrand. The higher in the
namely, every Lyapunov function for solving a hierarchy of semi-definite pro- hierarchy the better is the quality of the
every stable closed-loop system is also grams of increasing size. solution (but also at a higher computa-
a value function for a meaningful op- Back to the inverse optimal control tional cost)
timal stabilization problem. problem for anthropomorphic loco- Apart from polynomial optimiza-
In static optimization, the direct motion, we can consider a basis of tion techniques, other approaches have
problem consists in finding in some functions to express the cost function been recently introduced with a geo-
set K of admissible solutions a feasible candidate. The method proposed in metric perspective of optimal control
point x that minimizes some given cost Mombaur et al.28 is based on two main theory. Significant results have been
function f. algorithms: an efficient direct mul- obtained in the context of pointing mo-
We state the associated inverse op- tiple shooting technique to handle tions:3 based on Thom transversability
timization problem as follows: given a optimal control problems, and a state- theory, the cost structure is deduced
feasible point y in K, find a cost criterion of-the-art optimization technique to from qualitative properties highlighted
g that minimizes the norm of the error guarantee a match between a solution by the experimental data. These can
(g f), with g being such that y is an opti- of the (direct) optimal control prob- be, for instance, the characterization of
mal solution of the direct optimization lem and measurements. Once an opti- inactivity intervals of the muscles dur-
problem with cost criterion g (instead mal cost function has been identified ing the motion. Such a qualitative ap-
of f). When f is the null function, this is (in the given class of basis functions), proach has been also successfully ap-
the static version of the inverse optimal we can implement a direct optimal plied to human locomotion.6
control problem. Pioneering works date control solution on the humanoid We have introduced the inverse op-
back to the 1990s for linear programs, robot. So far, the method is rather timal control problem from the per-
and for the Manhattan norm. For the efficient at least on a sample of test spective of biomimetic approaches
latter, the inverse problem is again a problems. However, it requires defin- to robot control. The question is: how
linear program of the same form. Simi- ing a priori class of basis functions. to synthesize natural motion laws to
lar results also hold for inverse linear Moreover, the direct shooting method deduce from them optimal control
programs with the infinite norm. The provides only a local optimal solution models for robots? We emphasized
interested reader will find a nice sur- at each iteration of the algorithm. recent developments in inverse poly-
vey on inverse optimization for linear Thus, there is no guarantee of global nomial optimization. We should note
programming and combinatorial op- optimality. this article is far from covering all the
timization problems in Heuberger.14 An alternative way to consider the approaches to inverse optimal con-

72 COMM UNICATIO NS O F THE ACM | M AY 201 5 | VO L . 5 8 | NO. 5


review articles

Figure 4. Two stepping movements obtained with (top) a whole-body trajectory optimization36 (courtesy from K. Mombaur) and (bottom)
a linearized-inverted-pendulum based walking pattern generator17 (courtesy from O. Stasse.39). The whole-body optimization enables
the robot to reach higher performances but the numerical resolution is yet too slow to obtain an effective controller.

trol. Inverse optimal control is also an tation implies a search of alternative has the advantage to transform the origi-
active research area in machine learn- formulations. The bottleneck is the ca- nal nonlinear problem into a linear one.
ing. In the context of reinforcement pacity of the control algorithm to meet The corresponding model is low dimen-
learning,16,20 inverse reinforcement the real-time constraints. sioned and it is possible to address (1)
learning constitutes another resolution In the current model-based simula- via an optimization formulation.8 With
paradigm based on Markov decision tion experiments the time of computa- this formulation, assumption (2) is no
processes with spectacular results on tion is evaluated in minutes. Minute is longer required. The method then gives
challenging problems such as helicop- not a time scale compatible with real rise to an on-line walking motion genera-
ter control.7 The method corpus comes time. For instance, computation time tor with automatic footstep placement.
from stochastic control (see Kober et upper-bounds of a few milliseconds This is made possible by a linear model-
al.20 and references therein.) are required to ensure the stability of predictive control whose associated qua-
a standing humanoid robot. So tak- dratic program allows much faster con-
Computation: A Practical ing advantage of general optimization trol loops than the original ones in Kajita
or Theoretical Problem? techniques for the real-time control et al.17 Indeed, running the full quadratic
In computer animation optimization- necessary requires building simplified program takes less than 1ms with state-
based motion generation is experienced models or to develop dedicated meth- of-the-art solvers. More than that, in this
as giving excellent results in terms of ods. The issue constitutes an active specific context, it is possible to devise
realism in mimicking nature. For in- line of research combining robot con- an optimized algorithm that reduces by
stance, it is possible to use numerical trol and numerical optimization. 100 the computation time of a solution.8
optimization to simulate very realistic An example is given by the research An example of this approach is given
walking, stepping, or running motions on walking motion generation for hu- in Figure 4 (bottom) that makes the real
for human-like artifacts. These complex manoid robots. The most popular walk- HRP2 step over an obstacle. The ap-
body structures include up to 12 body ing pattern generator is based on a sim- proach based on model reduction en-
segments and 25 degrees of freedom.36 plified model of the anthropomorphic ables the robot to be controlled in real
At first glance, the approach a priori body: the linearized inverted pendulum time. However, the reduced model does
applies to humanoid robotics. Figure model. It was introduced in Kajita et al.17 not make a complete use of the robot
4 (top) provides an example of the way and developed for the HRP2 humanoid dynamics. The generated movement
HRP2 steps over a very large obstacle. robot. The method is based on two ma- is less optimal than when optimizing
However, robotics imposes physi- jor assumptions: (1) the first one sim- the robot whole-body trajectory. Con-
cal constraints absent from the virtual plifies the control model by imposing a sequently, it is not possible to reach the
worlds and requiring computation per- constant altitude of the center of mass, same performances (in this case, the
formance. Biped walking is a typical (2) the second one assumes the knowl- same obstacle height): the whole-body
example where the technological limi- edge of the footprints. Assumption (1) optimization enables the robot to reach

MAY 2 0 1 5 | VO L. 58 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 73


review articles

higher performances, but only offline. to optimal control the competitive no- 18. Khatib, O. Real-time obstacle avoidance for
manipulators and mobile robots. The Intern. J.
Reaching the same performance online tion of active inference. While the paper Robotics Research 5, 1 (1986), 9098.
requires either more powerful comput- is mainly dedicated to motor control in 19. Khatib, O. A unified approach for motion and force control
of robot manipulators: The operational space formulation.
ers (running the same algorithms) or life sciences, the issue is of crucial and The Intern. J. Robotics Research 3, 1 (1987), 4353.
more clever algorithms. utmost interest for roboticists and calls 20. Kober, J. Bagnell, J. and Peters, J. Reinforcement
learning in robotics: A survey. The Intern. J. Robotics
for a reinforcement of the cooperation Research 32, 11 (Sept. 2013).
Conclusion between life and engineering sciences. 21. Lasserre, J.B. Moments, Positive Polynomials and Their
Applications. Imperial College Press, London, 2010.
The notion of robot motion optimality 22. Lasserre, J.B. Inverse polynomial optimization. Math.
is diverse in both its definitions and its Acknowledgments Oper. Res. 38, 3 (Aug. 2013), 418436.
23. Lasserre, J.B. and Anjos, M., eds. Semidefinite, Conic
application domains. One goal of this This article benefits from comments and Polynomial Optimization. Springer, 2011.
article was to summarize several points by Quang Cuong Pham, from a care- 24. Laumond, J.-P., Mansard, N. and Lasserre. J.B. Optimality
in robot motion (1): Optimality versus optimized motion.
of view and references spread out over ful reading by Joel Chavas, and above Commun. ACM 57, 9 (Sept. 2014), 8289.
25. Ligeois, A. Automatic supervisory control of the
various domains: robotics, control, all, from the quality of the reviews. The configuration and behavior of multibody mechanisms.
differential geometry, numerical opti- work has been partly supported by ERC IEEE Trans. Systems, Man and Cybernetic 7 (1977),
868871.
mization, machine learning, and even Grant 340050 Actanthrope, by a grant 26. Mansard, N. and Chaumette, F. Task sequencing for
neurophysiology. of the Gaspar Monge Program for Op- sensor-based control. IEEE Trans. on Robotics 23, 1
(2008), 6072.
The objective was to stress the ex- timization and Operations Research of 27. Marchand, E. and Hager, G. Dynamic sensor planning
pressive power of optimal motion in the Fondation Mathmatique Jacques in visual servoing. In Proceedings of the IEEE/RSJ
Int. Conf. on Intelligent Robots and Systems (1998),
robot action modeling and to pres- Hadamard (FMJH) and by the grant 19881993.
ent current challenges in numerical ANR 13-CORD-002-01 Entracte. 28. Mombaur, K., Truong, A. and Laumond, J.-P. From
human to humanoid locomotion: An inverse optimal
optimization for real-time control of control approach. Autonomous Robots 28, 3 (2010).
complex robots, like the humanoids. References
29. Mordecai, K. On the inverse optimal problem: Mathemati-
cal systems theory and economics, I, II. Lecture Notes
A second objective was to report re- 1. Baerlocher, P. and Boulic, R. An inverse kinematic
in Oper. Res. and Math. Economics 11, 12 (1969).
architecture enforcing an arbitrary number of strict
cent issues in inverse optimal con- 30. Nakamura, Y., Hanafusa, H. and Yoshikawa, T. Task-
priority levels. The Visual Computer 6, 20 (2004),
priority based redundancy control of robot manipulators.
trol. While its stochastic formulation 402417.
The Intern. J. Robotics Research 6, 2 (1987), 315.
2. Ben-Israel, A. and Greville, T. Generalized inverses:
is popular in machine learning, other 31. Paul, R. Robot Manipulators: Mathematics,
Theory and applications. MS Books in Mathematics.
Programming, and Control. MIT Press, Cambridge,
paradigms are currently emerging in Springer, 2nd edition, 2003.
MA, 1st edition, 1982.
3. Berret, B. et al. The inactivation principle: Mathematical
32. Poincar, H. On the foundations of geometry
differential geometric control theory solutions minimizing the absolute work and biological
(1898). From Kant to Hilbert: A Source Book in the
implications for the planning of arm movements. PLoS
and polynomial optimization. Comput Biol. 4, 10 (2008), e1000194.
Foundations of Mathematics. W. Ewald, Ed. Oxford
University Press, 1996.
As testified in a companion article,24 4. Chaumette, F. and Hutchinson, S. Visual servo
33. Rosen, J. The gradient projection method for nonlinear
control, Part I: Basic approaches. IEEE Robotics and
robotics offers rich benchmarks for op- Automation Magazine 13, 4 (2006), 8290.
programmimg. Part II, Nonlinear constraints. SIAM J.
Applied Mathematics 9, 4 (1961), 514532.
timal control theory. Due to real-time 5. Chiaverini, S. Singularity-robust task-priority
34. Samson, C., Borgne, M.L. and Espiau, B. Robot Control:
redundancy resolution for real-time kinematic control
computation constraints imposed by of robot manipulators. IEEE Trans. on Robotics and
The Task Function Approach. Clarendon Press, 1991.
35. Schaal, S., Mohajerian, P. and Ijspeert, A. Dynamics
effective applications, robotics also in- Automation 13, 3 (1997), 398410.
system vs. optimal control: A unifying view. Prog. Brain
6. Chitour, Y., Jean, F. and Mason, P. Optimal control
duces challenges to numerical optimi- models of goal-oriented human locomotion. SIAM J.
Research 165 (2007), 425445.
36. Schultz, G. and Mombaur, K. Modeling and optimal
zation. The difficulty for roboticists is Control and Optimization 50 (2012), 147170.
control of human-like running. IEEE/ASME Trans.
7. Coates, A., Abbeel, P. and Ng, A. Apprenticeship
to find the right compromise between learning for helicopter control. Commun. ACM 52, 7
Mechatronics 15, 5 (2010), 783792.
37. Siciliano, B., Sciavicco, L. Villani, L. and Oriolo, G. Robotics:
generality and specificity. General algo- (July 2009), 97105.
Modeling, Planning and Control. Springer, 2009.
8. Dimitrov, D., Wieber, P.-B., Stasse, O., Ferreau, H. and
rithms suffer from the classical curse 38. Siciliano, B. and Slotine, J.-J. A general framework for
Diedam, H. An optimized linear model predictive
managing multiple tasks in highly redundant robotic
of dimensionality that constitutes a control solver. Recent Advances in Optimization and its
systems. In Proceedings of the IEEE Int. Conf. on
Applications in Engineering. Springer, 2010, 309318.
bottleneck for robot control. Therefore, Advanced Robot, 1991.
9. Escande, A., Mansard, N. and Wieber, P.-B. Hierarchical
39. Stasse, O., Verrelst, B., Vanderborght, B. and Yokoi, K.
they may be used for offline motion quadratic programming. The Intern. J. Robotics
Strategies for humanoid robots to dynamically walk over
Research 33, 7 (2014), 10061028.
large obstacles. IEEE Trans. Robotics 25 (2009), 960967.
generation, but they are inefficient for 10. Freeman, R. and Kokotovic, P. Inverse optimality in
40. Todorov, E. Optimality principles in sensorimotor
robust stabilization. SIAM J. Control Optim. 34 (1996),
real-time applications. Real-time robot 13651391.
control. Nature Neuroscience 7, 9 (2004), 905915.
41. Whitney, D. Resolved motion rate control of
control requires very fast computations. 11. Friston, K. What is optimal about motor control?
manipulators and human prostheses. IEEE Trans.
Neuron 72, 3 (2011), 488498.
It requires dedicated numerical opti- 12. Hak, S., Mansard, N., Stasse, O. and Laumond, J.-P.
Man-Machine Systems 10, 2 (1969), 4753.
mization methods. We have seen how Reverse control for humanoid robot task recognition.
IEEE Trans. Sys. Man Cybernetics 42, 6 (2012),
bipedal walking illustrates this tension 15241537. Jean-Paul Laumond (jpl@laas.fr) is a CNRS director of
research at LAAS, Toulouse, France.
between generality and specificity. Ro- 13. Harris, C. and Pike, J. 3d positional integration from
image sequences. Image and Vision Computing 6, 2
boticists are today asking optimization (1988), 8790.
Nicolas Mansard (nmansard@laas.fr) is a CNRS
researcher at LAAS, Toulouse, France.
theorists for more efficient algorithms, 14. Heuberger, C. Inverse combinatorial optimization: A
survey on problems, methods and results. J. Comb. Jean Bernard Lasserre (lasserre@laas.fr) is CNRS
while they are developing at the same Optim. 8 (2004), 329361. director of research at LAAS, Toulouse, France.
time a specific know-how to this end. 15. H. Hicheur, H., Pham, Q., Arechavaleta, G., Laumond,
J.-P. and Berthoz, A. The formation of trajectories
Last but not least, let us conclude by during goal-oriented locomotion in humans. Part I: A
referring to a controversy introduced stereotyped behaviour. European J. Neuroscience 26, 8
(2007), 23762390.
by neurophysiologist K. Friston. In a 16. Kaelbling, L., Littman, M. and Moore, A. Reinforcement
recent paper,11 he asks the provocative learning: A survey. J. Artificial Intelligence Research 4 Watch the authors discuss
(1996), 237285. this work in this exclusive
question: Is optimal control theory 17. Kajita, S. et al. Biped walking pattern generation Communications video.
by using preview control of zero-moment point. In
useful for understanding motor behav- Proceedings of the IEEE Int. Conf. on Robotics and
ior or is it a misdirection? He opposes Automation (2003), 16201626. 2015 ACM 0001-0782/15/05 $15.00

74 COM MUNICATIO NS O F TH E AC M | M AY 201 5 | VO L . 5 8 | NO. 5

You might also like