1
0,
1
l
T
, x
2
T
,
1
2
2
0,
2
l
T
, ...
T
.
(1)
For the time integration of the system position,
dierent options exist for the function q
(t+t)
=
(q
(t)
, v, t), the simplest one being the rstorder
explicit Euler formula q
(t+t)
= q
(t)
+t q
(t)
. Indeed,
for body positions a straightforward rstorder dier
ential approximation is used: x
(t+t)
= x
(t)
+t x
(t)
.
However, if the same approach is used for quaternions,
as in
(t+t)
=
(t)
+ t
(t)
, an annoying situation
can happen: quaternions may slowly lose the unimod
ularity and drift away from the S
3
manifold, unless
some stabilization is used to enforce [[[[ = 1. There
fore, we prefer to integrate the rotations using the
following exponential map that preserves the unimod
ularity of the quaternions:
(t+t)
=
(t)
e
{0,
1
2
l
t}
, (2)
so we get
(t+t)
= (q
(t)
, v, t) =
_
_
x
1,(t)
+ t x
1,(t)
1,(t)
e
{0,
1
2
1
l
t}
x
2,(t)
+ t x
2,(t)
2,(t)
e
{0,
1
2
2
l
t}
...
_
_
.
(3)
We can explicitly compute the second factor of the
quaternion product thanks to the property e
{0,u}
=
cos , usin of quaternion exponentials; we obtain
e
{0,
1
2
l
t}
= e
{0,(
l
/
l
)
1
2

l
t}
=
= cos
1
2
[
l
[t,
l
[
l
[
sin
1
2
[
l
[t. (4)
Since (2) preserves the norm of the quaternions,
large t time steps can be used (although, once in a
while, it is safer to normalize all quaternions because
numerical roundo can accumulate small errors). Yet
we can demonstrate that, for t 0, the formula (2)
still corresponds to
(t+t)
=
(t)
+ t . In fact,
= lim
t0
(t+t)
(t)
t
. (5)
Hence, substituting (2) in (5), we can write
= lim
t0
(t)
cos
1
2
[
l
[t,
l

l

sin
1
2
[
l
[t 1, 0
t
.
(6)
Applying the H opital theorem and simplifying, we
obtain =
1
2
0,
l
, as expected.
2.2. Bilateral constraints
Most kinematic pairs, such as revolute joints, pris
matic joints, and glyphs, can be expressed by means
of holonomic constraints over the relative position of
two bodies. In general, we introduce a set (
B
of scalar
equations
i
(q, t) = 0, i (
B
. (7)
3
The size of the set (
B
is the number of basic scalar
bilateral constraints, not necessarily corresponding to
complex threedimensional mechanical joints
2
.
We assume that
i
(q, t) is smooth, so that it can
be dierentiated to obtain the Jacobian
q
i
=
_
i
/q
T
.
Constraints must be respected also at the velocity
level: the full timederivative of the ith constraint equa
tion is
d
i
(q, t)
dt
=
i
q
q +
i
t
=
q
i
T
q +
i
t
= 0
d
i
(q, t)
dt
=
q
i
T
(q, v) +
i
t
= 0.
For simplicity, from now on we will dene
i
T
=
i
T
(q, v).
Note that the term
i
t
is nonzero only for rheonomic
(timedependent) constraints such as motors and im
posed trajectories.
For each bilateral constraint, there exists a Lagrange
multiplier
i
B
such that the force acting on the system
by the ith bilateral constraint is
i
B
i
[20].
2.3. Unilateral contact constraints
Since rigid bodies cannot overlap each other, given
the set of body shapes =
1
,
2
, ...,
n
, we assume
that there exists a set of (
P
distance functions (q, )
that must satisfy the unilateral constraint conditions:
i
(q, ) 0, i (
P
. (8)
An example of such a mapping is the signed distance
function [23].
For convenience, we pose the problem in terms of
contact points, since in most cases we can compute a
minimal set of contact normals belonging to one of two
neighboring bodies: distances (q, ) are measured
along those normals.
For example, in the case of spherical bodies with
positions x
j
s
and radius r
s
, for all body pairs j, k the
signed distance function is = [[x
j
s
x
k
s
[[ 2r
s
. Note,
however, that, for bodies with generic shapes, nding
a proper set of contact points (and dening their
i
distance functions) is not always trivial [3,4]. In fact,
there could be multiple contact points, or it could even
happen that dening a dierentiable signed distance
function is not possible, as in the case of concave shapes
[3].
Nevertheless, since we are interested in enforcing
nonpenetration, what truly matters is that a signed
distance function be dened up to some value of the
penetration [4]. We thus assume that (q, ) can be
dierentiably dened at least on a neighborhood of the
set (q, ) 0. Such an assumption does hold for
smooth and strictly convex bodies, such as spheres [3].
2
In the context of this work, bilateral constraints are always
considered scalar, because complex mechanical joints can be
modeled by using multiple basic scalar constraints. For exam
ple, kinematic pairs such as a ball joint require three scalar
equations, a prismatic guide requires ve scalar equations, and
so on.
In addition, piecewise smooth bodies can be accommo
dated in a xedtimestep framework, by decomposing
the distance function in components attached to each
pair of features, such as point and piecewise smooth
surface or curved edge and curved edge, and using all
normals attached to such pairs [19].
Also, we consider only a subset (
A
(q, , ) (
P
of all potential contacts, that is, only those contacts
whose surfaces are under a distance threshold :
(
A
(q, , ) =
_
i
i (
P
,
i
(q, )
_
. (9)
Special attention must be paid in implementing an
ecient and robust collision algorithm for the genera
tion of the (
A
(q, , ) set of contact points.
A preliminary algorithm, called broadphase colli
sion detection, discards pairs of shapes that are farther
than , in order to avoid a combinatorial waste of time
if checking for collision points with all pairs of bodies.
We use the sweep and prune (SAP) algorithm to this
end [17].
The following narrowphase step operates on the
pairs of bodies that passed the broadphase check: it
nds the contact points and their normals. We adopt
the GJK algorithm for this purpose because it features
high eciency and robustness even in the case of non
smooth surfaces [16].
Concave shapes, if any, undergo an oline conver
sion into sets of convex shapes using a convex de
composition algorithm; a middlephase AABB binary
tree traversal is used to check collisions between these
compounds of shapes without running into superlinear
time complexity.
A thin envelope is added around all shapes using
the Minkowski sums in order to allow a small amount
of interpenetration. If larger overlapping occurs, the
Expanding Polytope (EPA) algorithm is used [11].
2.4. Frictional constraints
In the following section we introduce friction by
means of conic constraints, which are an extension of
complementarity models discussed in [6,40].
2.4.1. The Coulomb friction model
The original Coulomb model introduces static
s
and kinetic
k
friction coecients as the only param
eters to characterize the frictional phenomena at the
surface. Although simple, this model was proven to be
realistic and practical in many situations. Usually the
kinetic coecient is slightly lower than the static coef
cient, but in this work we consider both to have the
same value .
If a position q is feasible and the contact is active,
that is, (q, ) = 0, then at the contact we have a
normal force and a tangential force.
Let n be the normal at the contact pointing from
the second body to the rst body, and let t
1
and t
2
be
the tangents at the contact. Here n, t
1
, t
2
are mutually
orthogonal vectors of length one in three dimensions.
The vectors n, t
1
, and t
2
are a function of the position
q, but we ignore this fact until the end of this section.
4
The reaction force is impressed on the system by
means of multipliers
n
0,
u
, and
v
. The normal
component of the force is F
N
=
n
n, and the tangen
tial component of the force is F
T
=
u
t
1
+
v
t
2
.
The Coulomb model consists of the following con
straints:
n
0, (q) 0, (q)
n
= 0,
n
_
2
u
+
2
v
, [[v
T
[[
_
n
_
2
u
+
2
v
_
= 0,
F
T
, v
T
) = [[F
T
[[ [[v
T
[[ ,
(10)
where v
T
is the relative tangential velocity at contact.
The eect of the friction over the dynamical system is
dened by the friction coecient R
+
, which typi
cally has a value between 0 and 1 for most materials.
The rst part of the constraint can be restated as
F = F
N
+F
T
=
n
n +
u
t
1
+
v
t
2
/,
where / is a cone in three dimensions, whose slope is
arctan().
The constraint F
T
, v
T
) = [[F
T
[[ [[v
T
[[ requires
that the tangential force be opposite to the tangential
velocity. As a result, the reaction force is dissipative.
In fact, an equivalent convenient way of expressing this
constraint is by using the maximum dissipation prin
ciple [40,38,39],
(
u
,
v
) = argmin
_
2
u
+
2
v
n
(
u
t
1
+
v
t
2
)
T
v
T
.
These constraints are represented by mapping the
vectors n, t
1
, t
2
from contact coordinates to general
ized coordinates [3].
We denote the generalized vector version of n, t
1
, t
2
by D
n
, D
u
, D
v
.
In generalized coordinates, the Coulomb model be
comes [8]
F
N
=
n
D
n
, F
T
=
u
D
u
+
v
D
v
, (11)
n
0, (q) 0,
n
(q) = 0, (12)
where the tangential multipliers
u
,
v
are determined
from the maximum dissipation principle
(
u
,
v
) = argmin
_
2
u
+
2
v
n
(
u
D
u
+
v
D
v
)
T
v.
The last relation is obtained fromthe identities D
T
u
v =
t
T
1
v
T
and D
T
v
v = t
T
2
v
T
.
2.5. The Overall dynamical model
The other dynamical data needed for the model are
the mass matrix M(q), which is symmetric positive
denite; the external force f
e
(t, q, v); and the inertial
force f
c
(q, v), containing the centrifugal and Coriolis
forces.
We can dene the total force
f
t
(t, q, v) = f
e
(t, q, v) +f
c
(q, v). (13)
Assume now that we have multiple contact con
straints
i
(q, ) 0, i (
A
and multiple bilateral
constraints
i
(q, t) = 0, i (
B
. Note that the unilat
eral condition
n
0, 0,
n
= 0 can be written
as a complementarity constraint
n
0 0.
The continuous model is a dierential variational
inequality [37]:
M(q
l
)
dv
dt
=
iGA
_
i
n
D
i
n
+
i
u
D
i
u
+
i
v
D
i
v
_
+
+
iGB
i
B
i
+f
t
(t, q, v)
q = (q, v)
i
(q, t) = 0 i (
B
i
n
0
i
(q, ) 0, i (
A
_
i
u
,
i
v
_
= argmin
i
n
_
(
i
u
)
2
+(
i
v
)
2
i (
A
v
T
_
u
D
i
u
+
v
D
i
v
_
.
(14)
Unfortunately, the introduction of the Coulomb
friction model may lead to an inconsistent model. It
is known [9] that paradoxical congurations exist for
which such a model does not have a solution in terms
of unknown accelerations and reaction forces. Such
congurations are called Painleve paradoxes [39]. Nev
ertheless, a weaker formulation of the problem can be
solved in terms of vector measures, using a nonsmooth
timestepping scheme where reaction impulses are the
unknowns at each time step [39].
To this end we dene the following stepping scheme,
with time step h, known positions q
(l)
, and velocity
v
(l)
; the scheme is an equation problem with equi
librium constraints, where the unknowns are q
(l+1)
,
v
(l+1)
, and constraint impulses
n
= h
n
,
u
= h
u
,
v
= h
v
,
B
= h
B
:
M
(l)
(v
(l+1)
v
l
) =
iGA
_
i
n
D
i
n
+
i
u
D
i
u
+
i
v
D
i
v
_
+
+
iGB
_
i
B
i
_
+ hf
t
(t
(l)
, q
(l)
, v
(l)
)(15)
0 =
1
h
i
(q
(l)
) +
i
T
v
(l+1)
+
i
t
, i (
B
(16)
0
1
h
i
(q
(l)
) +
i
T
v
(l+1)
(17)
i
n
0, i (
A
_
i
u
,
i
v
_
= argmin
i
n
(
i
u
)
2
+(
i
v
)
2
i (
A
_
v
T
(
u
D
i
u
+
v
D
i
v
)
(18)
q
(l+1)
= (q
(l)
, v
(l+1)
, h). (19)
To simplify notation, we denoted M(q
l
) by M
(l)
.
In previous work, we have shown that the scheme is
convergent, as the time step h goes to 0, to the solution
of a measure dierential inclusion [2].
For the special case of zero friction, the subproblem
simplies to (1517), that is, a linear complementarity
problem. Such problems can be solved by Lemkes al
gorithm [14,6]. Introducing the Coulomb friction (18),
however, turns the problem into a nonlinear comple
mentarity problem that poses more diculties. If the
nonlinear constraint cone (the Coulomb cone) is ap
proximated by a piecewise linear cone, the subproblem
(1518) becomes again an LCP solvable by Lemkes al
gorithm [6]. Nevertheless, in [5] we have also demon
strated that, as the number of constraints in the prob
5
lem increases, the computational cost of typical LCP
solvers increases far faster than linearly with the size of
the problem. Moreover, the approximation of friction
cones by means of faceted pyramids would introduce
unwanted anisotropy.
To overcome these diculties, we modied the time
stepping scheme by relaxing the constraint (17) as
0
1
h
i
(q
(l)
) +
i
T
v
(l+1)
i
_
(D
i,T
u
v)
2
+ (D
i,T
v
v)
2
i
n
0, i (
A
.
(20)
This results in a cone complementarity problem that
can be solved with a xedpoint iteration approach, as
demonstrated in our earlier work [8].
2.6. Cone complementarity formulation
Developing the optimality conditions for the equi
librium constraint in (18), we obtain that there exists
a Lagrange multiplier
i
such that, for any i (
A
,
i
u
= D
i,T
u
v,
i
i
v
= D
i,T
v
v,
i
0
i
i
n
_
(
i
u
)
2
+ (
i
v
)
2
0.
(21)
The rst two equations imply that
i
(
i
u
)
2
+(
i
v
)
2
=
_
(D
i,T
u
v)
2
+(D
i,T
v
v)
2
, while the complementarity con
straint implies that
0 =
i
_
(
i
u
)
2
+ (
i
v
)
2
_
i
n
_
(
i
u
)
2
+ (
i
v
)
2
_
and, in turn, that
i
n
_
(D
i,T
u
v)
2
+(D
i,T
v
v)
2
=
i
_
_
i
u
_
2
+
_
i
v
_
2
_
.
(22)
We now dene, for all potential contacts, the vectors
u
i
A
=
_
1
h
i
(q
(l)
) +
i
T
v
(l+1)
, D
i,T
u
v, D
i,T
v
v
_
T
(23)
i
A
=
_
i
n
,
i
u
,
i
v
_
T
, i (
A
. (24)
We calculate the scalar product using (20),(21):
u
i
A
,
i
A
_
=
i
n
_
1
h
i
+
i
T
v
_
+
i
u
D
i,T
u
v
+
i
v
D
i,T
v
v
=
i
i
n
_
(D
i,T
u
v)
2
+(D
i,T
v
v)
2
i
_
_
i
u
_
2
+
_
i
v
_
2
_
=0 u
i
A
i
A
. (25)
We recall that the dual cone of a convex cone / is
the set /
= /
.
We now dene the friction cone T(
i
such that the
Coulomb friction model is satised if
i
A
T(
i
:
T(
i
=
_
x, y, z R
3
[
i
x
_
y
2
+ z
2
_
.
Then, from (18), (20), and (25), the frictional contact
constraints can be expressed by means of the following
cone complementarity constraints:
u
i
A
T(
i
i
A
T(
i
, i (
A
. (26)
To obtain a unied formalism, we can represent also
the bilateral constraints (16) in terms of cone comple
mentarity constraints. Of course, multipliers
i
B
, with
i (
B
, are not restrained into some special subset of
R, but even R itself is a convex cone. Thus we can in
troduce the scalar
u
i
B
=
1
h
i
(q
(l)
) +
i
T
v
(l+1)
+
t
, i (
B
, (27)
which allows us to write the bilateral constraints (16)
as
u
i
B
B(
i
i
B
B(
i
, i (
B
, (28)
where B(
i
= R and B(
i
= 0, and
u
i
b
,
i
b
_
= 0 is
always satised for u
i
B
B(
i
.
We now dene the vector
k
(l)
= M
(l)
v
(l)
+ hf
t
(t
(l)
, q
(l)
, v
(l)
). (29)
Then, equations (29), (28), and (26), together with
(15) and the denition of u
i
A
,
i
A
, u
i
B
, and
i
B
, result
in the following problem:
M
(l)
v
(l+1)
=
iGA
_
i
n
D
i
n
+
i
u
D
i
u
+
i
v
D
i
v
_
+
+
iGB
_
i
B
i
_
+
k
(l)
,
u
i
A
T(
i
i
A
T(
i
, i (
A
u
i
B
B(
i
i
B
B(
i
, i (
B
.
(30)
If we want to obtain a cone complementarity
problem as expressed in the typical form /
i1
, 0, 0,
1
h
i2
, 0, 0, . . . ,
1
h
in
A
, 0, 0
_
T
A
=
_
i1
n
,
i1
u
,
i1
v
,
i2
n
,
i2
u
,
i2
v
, . . . ,
in
A
n
,
in
A
u
,
in
A
v
_
T
b
B
=
_
1
h
1
+
1
t
,
1
h
2
+
2
t
, . . . ,
1
h
nB
+
nB
t
_
T
B
=
_
1
B
,
2
B
, . . . ,
nB
B
_
T
u
A
=
_
u
1
T
A
, u
2
T
A
, . . . , u
n
A
T
A
_
T
, u
B
=
_
u
1
B
, u
2
B
, . . . , u
nB
B
_
T
.
(31)
It is useful to merge these vectors, joining data from
both frictional constraints and bilateral constraints,
obtaining vectors with n
E
= 3n
A
+n
B
scalar elements:
b
E
=
_
b
T
A
, b
T
B
_
T
,
E
=
_
T
A
,
T
B
_
T
, u
E
=
_
u
T
A
, u
T
B
_
T
.
(32)
For each frictional contact i (
A
we also dene the
following threecolumn matrix:
D
i
=
_
D
i
n
[D
i
u
[D
i
v
. (33)
As before, it is useful to merge all Jacobians from
both frictional constraints and bilateral constraints in
a single, large matrix having n
E
columns and m
v
rows:
D
E
=
_
D
i1
[D
i2
[ . . . [D
in
A
[
1
[
2
[ . . . [
nB
.
(34)
6
From the denitions (23),(27), (31), (32), (33), and
(34) one can see that
u
E
= D
T
E
v
(l+1)
+b
E
. (35)
Also, premultiplying by M
(l)
1
equation (15), one gets
v
(l+1)
= M
(l)
1
D
E
E
+ M
(l)
1
k. (36)
Hence it is possible to substitute (36) into (35) to ob
tain
u
E
= D
T
E
M
(l)
1
D
E
E
+ D
T
E
M
(l)
1
k +b
E
. (37)
To make the expressions more compact, we intro
duce the following:
N =D
T
E
M
(l)
1
D
E
(38)
r =D
T
E
M
(l)
1
k +b
E
(39)
In this way, we can write
u
E
= N
E
+r. (40)
Consider the multidimensional cone obtained by
performing the direct sum of all T( and B( cones
and its embedding in the corresponding vector space
direct sums:
=
_
iGA
T(
i
_
iGB
B(
i
_
. (41)
From (41), (40), (31), the complementarity relation
ship in (30), and the property of convex cones [22], =
i
i
i
i,
, we can write the problem as
a cone complementarity problem (CCP):
(N
E
+r)
E
. (42)
The separable structure of the cone will allow us
to dene an algorithm based on block matrices, with
relatively small blocks (dimension no larger than 3 in
the case of the contact problem).
2.7. Physical eects of the relaxation
In [2] we demonstrated that, for h 0, the solu
tion of the timestepping scheme with the relaxed con
straints (20) will approach the solution of the same
measure dierential inclusion as the scheme that uses
the unrelaxed constraints (17). In addition, iterates
produced by the modied scheme approach the ones of
the original scheme even for one time step at xed h,
provided that
i
i
n
_
(D
i,T
u
v)
2
+ (D
i,T
v
v)
2
<< 1, that
is, with either low friction or low tangential speed [5].
We note that this regime happens frequently in gran
ular ow applications, such as in the simulation of the
refueling of pebble bed nuclear reactors [34] and dense
packing of granular matter. On the other hand, the
good behavior of our scheme occurs even beyond this
regime, for reasons we now explain.
Note that the /h term achieves constraint stabi
lization. When the termis positive and the constraint is
active, it biases the normal impulse to be smaller than
the one at exact contact and allows for soft landing
projection onto the contact manifold. When the term
is negative, it biases the normal impulse to be larger
than the one at exact contact and allows restoration
to the contact manifold. The square root term in (20),
which can be written also as [[v
i
T
[[, does not appear in
the original model and, as discussed in [2], does result
in a larger value of the normal velocity. Therefore, it
could potentially produce a departure from the predic
tion of the original model. As we discuss below, how
ever, the constraint stabilization term substantially al
leviates this eect. Indeed, the normal speed is v
i
T
=
i
T
v, so it follows that
i
(q) =
i
[[v
i
T
[[h, v
i
N
= 0, (43)
satises the complementarity constraints (20) exactly.
Hence, a solution to (20) that involves the term
i
[[v
i
T
[[
need not result in an increase value of the normal ve
locity as long as the gap
i
(q) is about
i
[[v
i
T
[[h. All
the meaningful simulations that we have carried out
indicate that this is indeed what happens, and thus the
solution approaches the solution of the original scheme
insofar as normal velocity. Hence, the model eectively
includes a separating boundary layer of size
i
[[v
i
T
[[h
that plays the role of zeroorder eective compliance.
Smaller values of
i
(q) result in higher than steady
state impulse, whereas larger values of
i
(q) result in
smaller than steadystate impulse, both of which push
the value of the gap function to the value
i
v
i
T
. This
eect is immediately seen for one contact, which leaves
from rest with zero tangential velocity or which lands
in an inelastic fashion. This is harder to prove for multi
ple contacts, though it always held in our experiments.
There is one caveat, however, as shown in [2]. That
is, if one starts with
i
(q) <<
i
v
i
T
h, then the nor
mal component of the velocity is much larger than the
one predicted by the original scheme and then what
is needed to satisfy approximately (43) in a few steps,
so the contact suers an articial takeo. This corre
sponds to the only case in which we have seen our ar
gument fail, the one of high initial tangential velocity
at a contact. Since this case occurs rarely in practice,
our relaxation approximates in most cases the solution
that would have been obtained by the original, unre
laxed scheme.
3. The iterative method
To solve the CCP (42), we propose a xedpoint
iteration with the following form:
r+1
E
=
r
E
B
r
_
N
r
E
+r + K
r
_
r+1
E
r
E
___
+
+(1 )
r
E
,
r = 0, 1, 2, . . . .
(44)
This iteration uses block matrices B
r
and K
r
. Ma
trix B is null except for blocks on the diagonal; in our
implementation blocks that correspond to the ith fric
tional contact are scaled identity matrices B
i
A
=
i
A
I,
B
i
A
R
3
, while blocks that correspond to the ith bi
lateral constraint are scalars B
i
B
=
i
B
, B
i
B
R. Matrix
K is an upper or lower block matrix, with null blocks
on the diagonal corresponding to the blocks of B; in
7
both cases the method is similar to a projected block
GaussSeidel. Another option is to use a null K ma
trix, so that the method is similar to a projected block
GaussJacobi.
Also,
: R
nE
R
nE
is the orthogonal projection
operator, whose calculation, as we show in the subse
quent sections, is considerably simplied by the sepa
rable cone structure (41).
For convergence of the scheme we need the following
assumptions about the algorithm.
A1 The matrix N of the CCP problem is symmetric and
positive semidenite.
A2 There exists a positive number, > 0 such that, at
any iteration r, r = 0, 1, 2, . . ., we have that B
r
I.
A3 There exists a positive number > 0 such that,
at any iteration r, r = 0, 1, 2, . . ., we have that
(
r+1
r
)
T
_
(B
r
)
1
+ K
r
N
2
_
(
r+1
r
)
_
_
r+1
r
_
_
2
.
Note that the rst assumption is always assured in
multibody systems because the mass matrix is denite
positive, B
r
and are under our control for satisfying
the second assumption, and it is always possible to
adjust the free and parameters in order to satisfy
the third assumption. Our main convergence result [8,
Corollary 1] is given in Theorem 1.
Theorem 1 Assume that
0 ,=
E
N
E
,= 0
(that is, there does not exist a choice of reaction forces
whose net eect is zero; bodies do not get stuck).
Then the algorithm (44) for the cone complementarity
problem applied to (42) produces a bounded sequence,
and any accumulation point results in the same velocity
solution.
We note that, even if in [8] we have demonstrated the
use of our algorithm for contacts only, Theorem 1 ap
plies to the bilateral case as well. Nevertheless, the key
observation is that we consider the bilateral constraints
as linearized constraints for which we develop the addi
tional algorithmic machinery in this paper. The direct
application of the formulation in [8], by formulating a
bilateral constraint as two unilateral constraints will
violate the main assumption in Theorem 1. Therefore,
we had to show that direct treatment of bilateral con
traints will result in the same formal abstraction (42).
3.1. Ecient computation of the projection onto the
friction cone
Because of the separable cone structure of the convex
subspace
=
_
FC
1 (
1
A
)
T
,
FC
2 (
2
A
)
T
, . . . ,
FC
n
A(
nA
A
)
T
,
BC
1(
2
B
)
T
,
BC
2 (
2
B
)
T
, . . .
BC
n
B (
nB
B
)
T
_
T
,
where
FC
i () : R
3
R
3
with i (
A
, and
BC
i () :
R R with i (
B
. The projection operator must
behave as
() = argmin
[[ [[ in order to be
FC
i
A
FC
(
A
)
n nn n
v
u
n
r
t tt t
1
t tt t
2
t tt t
g
t tt t
e
FC
i o oo o
A
f
f
n
n
r
<
n
r
<(1/)
n
O
B
A
C
D
E
t tt t
g
t tt t
e
FC
i
FC
(
A
)
FC
i o oo o
Fig. 2. A single friction cone and its dual cone, with examples
of projections.
globally nonexpansive for projection on convex sub
space
i
.
For the multipliers introduced by friction con
straints, it is enough to introduce a straightforward
mapping
FC
i () : R
3
R
3
to be applied multiple
times on all the triplets of contact multipliers
i
A
, i
(
A
. In Fig. 2 three subcases are detected:
when
i
A
is inside T(
i
, the vector is left untouched;
when
i
A
is inside the T(
i
o
polar cone, it maps to
the origin 0, 0, 0;
when
i
A
is projected to the nearest point on the
T(
i
cone.
Introducing
r
=
_
2
u
+
2
v
, applying Pythagorass
theoremon triangle OCA, and using similarity between
triangles OCA and OCD, we easily get
n
=
r
+
n
2
+ 1
,
r
=
n
,
u
=
u
r
,
v
=
v
r
.
(45)
One can verify that, i (
A
, such a mapping ex
hibits [[
FC
i (a)
FC
i (b)[[ [[a b[[; hence the
nonextensibility property holds.
Including also the case of bilateral constraints B, the
projection operator becomes
8
_
i (
B
BC
i =
i
B
i (
A
r
<
i
n
FC
i =
i
A
r
<
1
n
FC
i = 0, 0, 0
T
r
>
i
n
,
r
>
1
n
FC
i
n
=
r
i
+
n
2
i
+ 1
FC
i
u
=
u
FC
i
n
FC
i
v
=
v
FC
i
n
r
.
(46)
4. Implementation
The CCP method proposed here can be applied to
the simulation of multibody systems with a large num
ber of parts and contacts because, where an upper limit
on the number of iteration is enforced, the iteration
(44) can run in O(n) space and O(n) time.
Previous sections showed that generic multibody
problems with frictional contacts, expressed with the
system (15)(19), embed the cone complementarity
problem (42), which can be solved by the iterative
method (44).
Given (31), one can consider the nal timestepping
scheme as a sequence of three main operations: a CCP
problem that nds unknown reactions
E
(47a), an
ane scaling (47b) that gives the new speeds v
(l+1)
,
and a position update (47c):
(N
E
+r)
o
E
(47a)
v
(l+1)
= M
1
_
k + D
E
E
_
(47b)
q
(l+1)
= (q
(l)
, v
(l+1)
, h). (47c)
The biggest computational overhead is caused by
the rst problem, that is, the CCP (47a). In fact, (47c)
is immediate, and (47b) can be computed quickly be
cause in most cases the matrix M is diagonal and its
inverse M
1
can be precomputed easily.
The convergence theory about the iterative scheme
(44) leaves some degrees of freedom in choosing
i
A
,
i
B
values that build the diagonal blocks of the iteration
matrix B. A trivial choice could be to use the same
i
A
=
i
B
= value for all diagonal blocks, that is,
B = I, and then use the overrelaxation parameter
to control the convergence. However, this may slow
convergence in systems with large mass ratios, even
with an optimal . A more practical approach, which
copes better with systems aected by uneven masses, is
inspired by the GaussJacobi idea of using the inverse
of the diagonal of the system matrix N, so we use
i
B
=
1
i,T
M
1
i
and
i
A
=
1
gi
, where g
i
is the average of
the diagonal values of the ith block of the N matrix.
We note that g
i
can be computed easily from the trace
of the 3 3 matrix D
i,T
M
1
D
i
, as
g
i
=
Trace(D
i,T
M
1
D
i
)
3
. (48)
We recall that the matrix N is a product of large
matrices; N = D
T
E
M
1
D
E
, and it is full even if D
and M are sparse. For systems with a large number of
contacts, the size of N would be prohibitive and clearly
would not satisfy the goal of O(n) space complexity. To
this end, direct multiplication of vectors and matrices
in (44) must be avoided; otherwise the eort and the
space requirement would be superlinear in the number
of constraint.
For the reasons above, a scheme that does not need
the explicit building of N, B, and K has been devel
oped, exploiting the sparsity of M and D.
The K matrix in (44) can be chosen freely, within
the convergence limits posed by assumptions [A1]
[A3]. Among the most noticeable options, we have the
case K = 0, which results in a scheme like a projected
GaussJacobi, or the case where K is built by using
the lower blocks of N.
4.1. Optimized projected GaussJacobi CCP
Considering the case K = 0, and recalling Eq. (38),
we can express the rth step of the iteration (44) as an
inner loop with index i = 1 . . . n
A
on all n
A
friction
cones T(
i
:
i,r+1
A
=
i,r
A
i
A
_
D
i,T
M
1
_
nA
z=1
D
z
z,r
A
+
+
nB
z=1
z,r
B
+
k
_
+b
i
A
_
(49)
i,r+1
A
=
FC
i
_
i,r+1
A
_
+ (1 )
i,r
A
, (50)
followed by an inner loop with index i = 1 . . . n
B
on
all n
B
bilateral constraints (which we do not report
because it is like (49)(50) except for
B
instead of
A
subscripts).
However, for each iteration, the previous loop i =
1 . . . n
A
would require quadratic time in terms of po
tential contacts n
A
because of the presence of the sum
mations
nA
z=1
. This major source of slow performance
can be eliminated if one computes the algorithm in in
cremental form. In fact, from (36) it follows that
v
r
= M
1
_
nA
z=1
D
z
z,r
A
+
nB
z=1
z,r
B
+
k
_
. (51)
Substituting (51) into (49), we can write
i,r+1
A
=
i,r
A
i
A
_
D
i,T
v
r
+b
i
A
.
_
(52)
Considering the optimizations above, we can express
the nal CCP algorithm with the pseudocode of Algo
rithm 1.
In the proposed algorithm, for achieving high per
formance, some auxiliary data can be precomputed be
fore starting the iteration. Specically, we introduce
the m
v
3 matrix E
i
A
= M
1
D
i
and the vector E
i
B
=
M
1
i
.
The iterations, usually stopped when an approxi
mation threshold has been reached, can be also pre
maturely aborted when r exceeds a limit r
max
on the
9
Algorithm 1: Solve complementarity  PGJ CCP
(1) // Precompute some data for friction constraints
(2) for i := 1 to n
A
(3) E
i
A
= M
1
D
i
(4)
i
A
=
3
Trace(D
i,T
E
i
A
)
(5) // Precompute some data for bilateral constraints
(6) for i := 1 to n
B
(7) E
i
B
= M
1
i
(8)
i
B
=
1
i,T
E
i
B
(9)
(10) // Initialize impulses
(11) if warm start with initial guess
E
(12)
0
E
=
E
(13) else
(14)
0
E
= 0
(15)
(16) // Initialize speeds
(17) v
0
=
n
A
i=1
E
i
A
i,0
A
+
n
B
i=1
E
i
B
i,0
B
+M
1
k
(18)
(19) // Main iteration loop
(20) for r := 0 to rmax
(21) // Loop on frictional constraints
(22) for i := 1 to n
A
(23)
i,r+1
A
=
_
i,r
A
i
A
_
D
i,T
v
r
+b
i
A
__
;
(24)
i,r+1
A
=
i,r+1
A
_
+ (1 )
i,r
A
;
(25) // Loop on bilateral constraints
(26) for i := 1 to n
B
(27)
i,r+1
B
=
_
i,r
B
i
B
_
i,T
v
r
+b
i
B
__
;
(28)
i,r+1
B
=
i,r+1
B
_
+ (1 )
i,r
B
;
(29) // Update speeds
(30) v
r+1
=
n
A
i=1
E
i
A
i,r+1
A
+
n
B
i=1
E
i
B
i,r+1
B
+
M
1
k
(31)
(32) return
E
, v
maximum number of iterations if the simulation must
meet hardrealtime requirements.
With minimal modications to the
() operator,
the proposed method can be easily adapted to the case
of friction in 2D or the case of generic unilateral con
straints.
In our simulations, we chose = 1 and = 1, except
for the K = 0 case, where we used = 0.2. We cannot
guarantee a priori that this will satisfy condition [A3],
but it did for all our simulations. In addition, the ma
trix sequences K
r
and B
r
were constant. We can there
fore claim that Theorem 1 does apply and, since the
sequence did not diverge, any accumulation point is a
solution of the cone complementarity problem (47a).
In addition, our proofs of the theoretical results allow
for similar conclusions if varies from iteration to it
eration. Therefore, we could ensure that at some iter
ation the appropriate is chosen after decreasing its
value a few times until assumption [A3] holds. It can
be shown that if the value of is halved each time [A3]
does not hold and the respective iteration is rejected,
then [A3] will eventually be satised after a nite num
ber of steps. In our experiments, however, the values
we have chosen for and have worked for all itera
tions without need of further adjustment.
4.2. Optimized projected GaussSeidel CCP
Another option is to take K as the lower block struc
ture of N, which results in a scheme similar to a pro
Algorithm 2: Solve complementarity  PGS CCP
(1) // Precompute some data for friction constraints
(2) for i := 1 to n
A
(3) E
i
A
= M
1
D
i
(4)
i
A
=
3
Trace(D
i,T
E
i
A
)
(5) // Precompute some data for bilateral constraints
(6) for i := 1 to n
B
(7) E
i
B
= M
1
i
(8)
i
B
=
1
i,T
E
i
B
(9)
(10) // Initialize impulses
(11) if warm start with initial guess
E
(12)
0
E
=
E
(13) else
(14)
0
E
= 0
(15)
(16) // Initialize speeds
(17) v =
n
A
i=1
E
i
A
i,0
A
+
n
B
i=1
E
i
B
i,0
B
+M
1
k
(18)
(19) // Main iteration loop
(20) for r := 0 to rmax
(21) // Loop on frictional constraints
(22) for i := 1 to n
A
(23)
i,r+1
A
=
_
i,r
A
i
A
_
D
i,T
v
r
+b
i
A
__
;
(24)
i,r+1
A
=
i,r+1
A
_
+ (1 )
i,r
A
;
(25)
i,r+1
A
=
i,r+1
A
i,r
A
;
(26) v := v +E
i
A
i,r+1
A
.
(27) // Loop on bilateral constraints
(28) for i := 1 to n
B
(29)
i,r+1
B
=
_
i,r
B
i
B
_
i,T
v
r
+b
i
B
__
;
(30)
i,r+1
B
=
i,r+1
B
_
+ (1 )
i,r
B
;
(31)
i,r+1
B
=
i,r+1
B
i,r
B
;
(32) v := v +E
i
B
i,r+1
B
.
(33)
(34) return
E
, v
jected GaussSeidel. The K matrix does not need to be
explicitly built: its eect is that, as soon as computed,
a reaction impulse
i
will be used also for computing
the following
i+1
impulse, and so on for all i, without
needing to nish a single iteration. In practical terms
this means that the
nA
z=1
D
z
z,r
A
termin Eqs. (49) and
(51) is split in
i1
z=1
D
z
z,r+1
A
+
nA
z=i
D
z
z,r
A
, and the
same for bilateral constraints; so the dierence from
Algorithm 1 is that after the update of a single multi
plier we immediately update v
(l+1)
, as shown in Algo
rithm 2. Note that, to update the speeds, we avoid the
full summation (36) and add only the contributions
caused by the change of the single multiplier after the
projection (50):
i,r+1
A
=
i,r+1
A
i,r
A
; (53)
v
(l+1),i+1
=v
(l+1),i
+ M
1
D
i
i,r+1
A
. (54)
Hence, the computational overhead is not dierent
from Algorithm 1.
Numerical tests show that this scheme converges
faster than the case of K = 0; moreover, K = 0 is
more prone to slow convergence in case of redundant
constraints.
3
3
Redundancy in constraints is frequent in simulations of de
generate contact situations, such as at surface against at sur
face, where the collision engine may create a large amount of
superuous contact points.
10
i,T
=
E E E E
i,T
=
b
i
=
B
i
=
i
=
B
B
B
i,T
{ A
i,T
{ B
Pointer to rigid body A
Pointer to rigid body B
ith bilateral constraint
D
i,T
=
E
i,T
=
A
b bb b
i
=
A
i
=
i
=
A
A
i
=
Pointer to rigid body A
Pointer to rigid body B
ith frictional contact
M
j
= f f f f
j
= v v v v
j
=
x x x x
j
j
. .. .
l
{
{
jth rigid body
Fig. 3. Sparse data structures used by the solver.
5. Optimizations and improvements to the
method
The proposed method can be further developed
by introducing some algorithmic and theoretical im
provements, obtaining dierent avours of the original
scheme. This section discusses the most signicant
optimizations.
5.1. Transient data structures
Figure 3 shows how the multibody model is repre
sented by structures that are placed in memory. This
transient data can be allocated on the heap during
runtime, creating lists with unlimited numbers of con
straints and rigid bodies.
Basically, each object that builds up the lists of bi
lateral constraints encapsulates the pointers to the two
connected bodies, the Lagrange multiplier
i
B
, the con
straint residual b
i
B
, the scalar value
i
B
, and the Jaco
bian
i
.
The Jacobian should be a long row with m
v
ele
ments, but this would mean that the storage require
ment for each constraint depends on the number of
bodies, hence leading to an algorithm with superlin
ear space complexity. Instead, we store only the two
small portions of
i
that are not null, that is, the
two parts
i
A
and
i
B
corresponding to the two
connected bodies, hence requiring only a xed num
ber of 12 elements per Jacobian. When the
i,T
v
r
multiplication must be performed, it is computed as
i,T
A
v
r
A
+
i,T
B
v
r
B
. Similar considerations apply to
the data structure for frictional contacts, except that
also
i
coecients must be stored and that Jacobians
are made of two larger blocks, each with three rows.
Thanks to this major optimization, the memory re
quirement per constraint and per contact is constant,
so we obtain an optimal O(n
E
) space complexity.
4
Looking at Algorithms 1 and 2, one can see that the
solver can operate directly on these structures. Hence,
no temporary matrices are necessary. Our method is
thus a matrixfree method.
5.2. Stabilization factor
The stabilization terms
1
h
i
(q
(l)
) and
1
h
i
(q
(l)
) in
(16) and (17) are used to avoid constraint drifting dur
ing the time integration [4]. In fact, if these terms were
missing, equations (16) and (17) would simply enforce
the closure of constraints at the speed level, but errors
might slowly accumulate in constraint positions after
several integration steps.
5
.
If the model were linear (that is, the matrix D
E
were
constant in space, and, implicitly, in time), these terms
would close constraint gaps, if any, in a single step.
We experienced that, for bilateral constraints, this ap
proach works well even if the time integration step h is
small. However, this is not always the case when deal
ing with unilateral constraints, for the reason that fol
lows. Because of numerical issues and nonlinearities,
it is not possible to avoid some amount of penetration
in contact constraints. For a contact with penetration,
the term
1
h
i
(q
(l)
) has a negative value. Hence, the in
equality (17) requires that
i
v
(l+1)
(the speed of de
tachment of the contact) be large enough to bring the
two surfaces at zero distance in a single step h. After
the time integration step, at the next step the two sur
faces might keep the
1
h
i
(q
(l)
) separation speed, hence
causing a bouncy behavior. Apart from being not pre
dictable, this side eect gets worse if small timesteps
h are used, because even the smallest interpenetration
might cause macroscopic eects that can be perceived
as bouncy motions even in contacts that should have
no restitution.
We investigated dierent strategies to improve the
original stabilization method. One idea was to scale
the terms by K
i
factors, with 0 < K
i
< 1:
K
i
h
i
(q
(l)
) ,
K
i
h
i
(q
(l)
).
This diminishes, but does not exclude, the risk of ex
ceeding the interpenetration correction in a single step.
Moreover, it has the drawback of creating articial soft
ness in stacked objects, since contact stabilization is
somewhat delayed in successive frames.
A second approach, which we successfully used in
many tests, involves using a clamping A as the maxi
mum orthogonal speed for penetration recovery. This
does not eliminate the risk of popping out from an in
tersecting contact, but at least the residual speed of
separation, if any, is often negligible (comparable with
4
We experienced that in many cases of granular ow simula
tions, the number of contacts tends to scale linearly with the
number of rigid bodies n. Hence, under those circumstances
the algorithm shows approximate linear space complexity O(n)
also in the number of bodies.
5
This would happen either because of the numerical integra
tion, whose nite precision cannot catch all geometric nonlin
earities, or because of numerical truncation and errors.
11
the parameter A, which can be adjusted by the user)
and independent of the time step h. Thus, the stabi
lization terms would become
1
h
i
(q
(l)
) , max
_
1
h
i
(q
(l)
), A
_
.
Note that in this case, if the contact surfaces are sep
arated, we have
i
> 0 and the A clamping has no
eect; thus it behaves as the original scheme with the
1
h
i
(q
(l)
) term.
Until now we discussed how the stabilization term
can correct the contact penetration errors when the
surface distance is negative. However, it also has a use
ful side eect also when surfaces are not yet in con
tact. In fact, Eq. (17) can be interpreted as follows: the
contact constraint is enforced only if the two surfaces
are approaching fast enough to close the positive gap
i
(q
(l)
) in a h time step. This allows us to include in
the multibody system also contact constraints that are
not yet in contact but simply within a warning enve
lope; later, the cone complementarity solver will do
the rest.
The third approach is more expensive in terms of
CPU eort, because it avoids constraint drifting by
solving an additional complementarity problem over
body positions, at each time step [12]. In this case, one
solves the speed CCP problem using no stabilization
terms on bilateral constraints, except for contacts that
are not yet in contact, where it is useful to have
max
_
1
h
i
(q
(l)
), 0
_
,
so that contacts with clearance are still allowed to ap
proach until contact, because of positive
1
h
i
, while
surfaces already in contact are forced to have a separa
tion speed greater than or equal to 0, regardless of the
amount of penetration. Later, after the time step ad
vancement, one performs the following poststabiliza
tion step, that is a a linear complementarity problem
that corrects the positions q:
Mq =
iGA
_
i
n
D
i
n
_
+
iGB
_
i
b
i
_
(55)
0 =
i
(q
(l)
) +
i
T
q, i (
B
(56)
0
i
(q
(l)
) +
i
T
q
i
n
0, i (
A
(57)
This problem can be solved by using the same solver
used for the speed CCP problem. The poststabilization
idea performs better in the case of very large penetra
tions and illposed initial conditions.
In Fig. 4 a benchmark shows that the eort in reduc
ing the maximum penetration error [[
q
[[
with post
stabilization iterations is almost the same as that of
the basic approach using only the stabilization coe
cient in the speed CCP. However some iterations on
the speed CCP still must be performed. For instance,
in our complex simulations involving dense granular
ow, the number of iterations of the poststabilization
is comparable to the number of iterations for the speed
CCP, so computational eorts are almost doubled.
A family of methods can be generated by balancing
the total overhead toward the speed CCP iterations
20 30 40 50 60 70 80
20
30
40
50
60
70
80
0
.0
0
2
0
.0
0
2
0.004
0.004
0
.0
0
6 0
.0
0
8 0
.0
1
0
.0
1
2
0.001
post stabilization iterations
s
p
e
e
d
i
t
e
r
a
t
i
o
n
s

q

WU
Fig. 4. Penetration error: tradeo between speed iterations
and poststabilization iterations, compared to WU overhead
(GPU working units).
or the poststabilization iterations. In the extreme case
where no iterations for the speed CCP are performed
and iterations are performed only for the poststabi
lization, this method becomes similar to the position
based dynamics algorithm, proposed in [27] as a robust
method for realtime simulations.
Note that in this section we did not discuss the case
of collisions with restitution and we assumed the sim
ulation of contacts with fully plastic collisions. The
reader interested in modications to the method, for
simulating also of the restitution phase, can read [1].
5.3. Parallelization of the algorithm
Algorithm 2 is inherently sequential, so Algorithm 1
can be easier to implement on computational architec
tures exploiting parallel processing because it does not
feature data dependency for most of the inner loop.
In Algorithm 1, loops at rows 23 and 27 can be easily
executed in parallel because they do not need to write
at the same memory address at the same time. In the
best scenario, one could run up to n
A
+n
B
independent
threads, each performing an update of its multiplier .
However, the sums at row 30 are more critical if par
allelized in the same way, that is, on the basis of a
thread per contact, since there is the risk that multi
ple threads will need to update the same element of
the speed vector at the same time (this situation hap
pens, for instance, if two contacts that refer to the same
body are processed simultaneously). This problem can
be solved in many ways, for example, by using a vec
tor of Boolean mutexes, one per rigid body, which can
prevent one thread to modify the speed of a rigid body
if some other thread is writing into it. Otherwise, if the
number of physical threads is very large as in GPU ar
chitectures, it is worth performing the sums in 30 using
parallelreduction algorithms.
We implemented this parallel algorithm on both a
multicore processor and on an NVIDIA Tesla C870
GPU board featuring a massively multithreaded archi
tecture, thanks to a stream processor that is capable of
processing hundreds of threads in parallel. In the lat
ter case, we experienced a speedup of 15 times respect
to the serial implementation [44].
12
6. Examples
We present two benchmarks that we used to test the
performance of our algorithm and an application to
the simulation of the refueling of a nuclear reactor.
6.1. Forklift truck simulator
As a complex mechanical system, a forklift truck
represents a signicative benchmark for our algorithm
because it entails most aspects that we discussed in
this article: rigid bodies, bilateral constraints, applied
forces, and unilateral contact constraints with friction.
In detail, our truck model is made of seven rigid bodies
(the frame, three wheels, the steering strut, the mast,
the carriage with the forks) connected by six revolute
and prismatic joints. The tilting of the mast, the lifting
of the carriage, and the steering are obtained by intro
ducing rheonomic bilateral constraints. This is a sim
plied approach, but more detailed models of the ac
tuators, with hydraulic components and feedback con
trollers, are also possible within this framework. Mo
tors and brakes provide torques to the wheels; motors,
brakes, and actuators can be controlled in real time by
the user, using a joystick or a keyboard, or by using
automatic procedures.
In order to also test frictional constraints, a seventh
rigid body is added: a wooden pallet of EUR/ISO1
type: the truck can pick and move such a pallet with
the forks. Specic collision shapes have been dened to
detect the contact points between the pallets, the en
vironment, and the truck. Collision shapes have been
used also for frame of the forklift and its overhead
guard, so that we can simulate the rollover of the ve
hicle and other hazardous events.
On our test system, a dualcore Centrino T2600 2.17
GHz with 2 GBof RAM, the proposed algorithmis able
to simulate the truck in faster than real time, using a
xed timestep h = 0.005 s.
To assess the eciency and capability of our matrix
less approach, we simulated an increasing number of
forklift trucks, up to 1,600 vehicles with 1,600 pal
lets, see Fig.5. In the largest simulation scenario, with
12,800 rigid bodies, the algorithm must handle 54,400
bilateral constraints and, on average, 19,000 frictional
contacts: this means more than 110,000 primal and
dual variables.
We used Algorithm 1 with a limit of 20 iterations.
Table 1, averaged over 100 steps, shows that the CPU
overheadgrows almost linearly with the number of dual
variables 3n
A
+ n
B
, that is, somehow proportional to
the number of bodies.
Note that the frame rate in the case of hundreds of
trucks, although not real time as for few trucks, is still
fast and interactive.
Increasing the number of iterations results in an im
provement of the precision: going from 20 iterations to
80 iterations, the largest errors in constraint position
and in constraint velocity decrease, respectively, from
0.0310 mm and 0.024 m/s to 0.006 mm and 0.002 m/s.
These results about precision are not dependent on
Table 1
Timestep performance.
No. of Trucks CCP Solve [ms] Collision [ms] Bodies 3n
A
+n
B
1 0.22 0.1 8 70
400 135 39 3200 28000
800 268 79 6400 56000
1200 396 130 9600 84000
1600 550 200 12800 112000
the number of trucks, because the convergence of the
solver is not aected by the increasing complexity of
this type of benchmark, where the dynamics of each
vehicle is uncoupled. On the other hand, our results in
dicate the eciency of our algorithms, which is due to
our customization to rigidbody dynamics structure.
For example, the most ecient otheshelf algorithms
in [32], which are either of either the interiorpoint or
the projected gradient type, still need around 20 s per
time step for about 10,000 dual variables. Those algo
rithms solve the same problem as here. If linear scaling
would hold for those methods a big assumption in
their favor our algorithm will still be more than 100
times faster.
How precision and convergence can be aected by
systems with more stringent topology is investigated
in the following example.
6.2. Dense granular packing benchmark
Dense stacking of granular material is one of the
hardest problems involving nonsmooth rigid body dy
namics. Indeed, the benchmark described in this sec
tion involves the simulation of the progressive stacking
of many convex shapes in an empty box, with dierent
settings.
The rigid bodies have a mass m = 10 kg, the mo
ments of inertia are I
xx
= I
yy
= I
zz
= 10.24 kgm
2
,
and the friction coecient is = 0.4. The horizontal
section of the box measures 20 m20 m. We performed
tests with dierent types of colliding shapes, but for
the graphs presented here we used spheres with a ra
dius r = 1.6 m. At the beginning of the simulation the
box is lled by 220 spheres, with randomized positions
and with an initial volume fraction of 0.4, on average.
After the spheres have settled, we obtained the plots.
Figures 6 and 7 show a typical convergence pattern
of the PGS CCP algorithm:
v
is the violation of the
constraints at the speed level. For increasing , the iter
ation shows a faster convergence. However, large val
ues may lead to nonmonotonic behavior (that does not
necessarily lead to divergence). We found that a good
tradeo between convergence speed and a monotonic
nondivergent iteration, for scenarios involving equally
sized spheres, is = 1.
Figure 8 shows that the parameter acts like a
smooting factor over the iteration. The higher the pa
rameter the slower is the convergence, but the lower
is the risk of nonmonotonic or divergent patterns. The
optimal value depends on the type of simulation; we
experienced that, on average, good default values can
13
Fig. 5. Benchmark: simulation of thousands of forklifts.
0 10 20 30 40 50 60 70 80
10
1
iterations


v


=0.2
=1.4
<1.0
=1.0
>1.0
Fig. 6. Convergence of the residual for varying , for a sample
time step in the 220sphere benchmark.
0 10 20 30 40 50 60 70 80
10
1
iterations




=0.2
=1.4
<1.0
=1.0
>1.0
Fig. 7. Convergence of for varying , for a sample time
step in the 220sphere benchmark.
0 10 20 30 40 50 60 70 80
10
1.6
10
1.4
10
1.2
=0.2
=1.0
iterations


v


v


rmax = 20
rmax = 40
rmax = 60
rmax = 80
rmax = 100
Fig. 9. Eect of the vertical size of the stack on the conver
gence, for varying number of iterations.
0 10 20 30 40 50 60 70 80
10
1
iterations


v


PGS, =0.2
PGS, =0.4
PGS, =0.6
PGJ, =0.2
PGJ, =0.4
PGJ, =0.6
Fig. 10. Comparison of the convergence of the PGS and PGJ
algorithms during the granular stacking benchmark, and ex
ample of divergence.
1 2 3 4 5 6 7 8 9 10
x 10
3
20
30
40
50
60
70
80
0
.
0
0
0
2
0
.
0
0
0
2
0
.
0
0
0
2
0
.
0
0
0
4
0
.
0
0
0
4
0
.
0
0
0
4
0
.
0
0
0
6
0
.
0
0
0
6
0
.
0
0
0
8
0
.
0
0
0
8
0
.0
0
1
0
.0
0
1
0
.0
0
1
2
0
.0
0
1
4
0
.0
0
1
6
0
.0
0
1
8
h timestep [s]
i
t
e
r
a
t
i
o
n
s
1
e
0
0
5
1
e
0
0
5
1
e
0
0
5
5
e
0
0
5
5
e
0
0
5
5
e
0
0
5
0
.
0
0
0
1
0
.
0
0
0
1
0
.
0
0
0
1

q
