Professional Documents
Culture Documents
Abstract— Impedance control is one of the most effective that can be applied to the case where environmental conditions
methods for controlling the interaction between a robotic manip- are changed during task execution.
ulator and its environments. This method can regulate dynamic
properties of the manipulator’s end-effector by the mechanical In this paper, a new on-line learning method using NNs
impedance parameters and the desired trajectory. In general, is proposed to regulate all impedance parameters as well
however, it would be difficult to determine them that must be as the desired trajectory at the same time. This paper is
designed according to tasks. In this paper, we propose a new organized as follows: Section II describes related works on
on-line learning method using neural networks to regulate robot the impedance control method. Then, the proposed learning
impedance while identifying characteristics of task environments
by means of four kinds of neural networks: three for the position, method using NNs is explained in Sections III and IV. Finally,
velocity and force control of the end-effector; and one for the effectiveness of the proposed method is verified through a
identification of environments. The validity of the proposed set of computer simulations for the given task including the
method is demonstrated via a set of computer simulations of transitions from free to contact movements and the modeling
a contact task by a multi-joint robotic manipulator. error of environments.
I. I NTRODUCTION
When a manipulator performs a task in contact with its en- II. R ELATED W ORKS
vironment, the position and force control are required because
of constraints imposed by the environmental conditions. For
such contact tasks of the manipulator, the impedance control Asada [2] showed that the nonlinear viscosity of the end-
method [1] is one of the most effective methods for controlling effector could be realized by using a NN model as a force
a manipulator in contact with its environments. This method feedback controller. Cohen and Flash [3] proposed a method
can realize the desired dynamic properties of the end-effector to regulate the stiffness and viscosity of the end-effector.
by the appropriate impedance parameters with the desired Also, Yang and Asada [4] proposed a progressive learning
trajectory. In general, however, it would be difficult to design method that can obtain the desired impedance parameters by
them according to tasks and its environments including non- modifying the desired velocity trajectory. Then, Tsuji et al. [5],
linear or time-varying factors. [6] proposed the iterative learning methods using NNs that
There have been many studies by means of the optimization can regulate all impedance parameters and the desired end-
technique with an objective function depending on tasks. point trajectory at the same time. Their method can realize
Those methods can adapt the desired trajectory of the end- smooth transitions of the end-effector from free to contact
effector according to the task, but there still remains to be movements. Venkataraman et al. [7] then developed an on-
accounted for how to design the desired impedance parame- line learning method using NN for compliant control of space
ters. Besides, the methods cannot be applied into the contact robots according to the given target trajectory of the end-
task where the characteristics of its environment are nonlinear effector through identifying unknown environments with a
or unknown. For this problem, some methods using neural nonlinear viscoelastic model.
networks (NNs) have been proposed, which can regulate robot The previous methods [2], [3], [7] cannot deal with a contact
impedance properties through the learning of NNs in consider- task including free movements, while the methods [4], [5], [6]
ation of the model uncertainties of manipulator dynamics and can be applied to only cyclical tasks in which environmental
its environments. Most of such methods using NNs assume conditions are constant because the learning is conducted in
that the desired impedance parameters are given in advance, off-line. Considering to make a robot to perform realistic tasks
while several methods try to obtain the desired impedance of in a general environment, the present paper develops a new
the end-effector by regulating the impedance parameters as method that the robot can cope with an unknown task by
well as the reference trajectory of the end-effector according regulating the control properties of its movements according to
to tasks and environmental conditions. However, there does changes of environmental circumstances including non-linear
not exist an effective method to regulate impedance parameters and uncertain factors in real-time.
942
Environment model g
Xd + - PCN + Fcm
gm X X X
+ Fc +
VCN
+ + EIN
Xd + Fact 1 X
s2 .. .. ..
- Fcn . . Fc
Fd + Manipulator
FCN
-
Fc Environment model
g Fig. 4. Identification of the environment using EIN
Fc
g
Et as
Environment
(p) ∂Et (t)
∆wij (t) = −ηp (p)
, (15)
Fig. 2. Proposed impedance control system including three neural networks ∂wij (t)
(v) ∂Et (t)
PCN Me-1Ke
∆wij (t) = −ηv (v)
, (16)
X X ∂wij (t)
... .. ..
∂Et (t) ∂Et (t) ∂X(t) ∂Fp (t) ∂Op (t)
dX . = , (17)
Fp (p) ∂X(t) ∂Fp (t) ∂Op (t) ∂w(p) (t)
dX ∂wij (t) ij
Ft
∂Et (t) ∂Et (t) ∂ Ẋ(t) ∂Fv (t) ∂Ov (t)
X X
VCN Me -1B
e (v)
= (v)
, (18)
∂wij (t) ∂ Ẋ(t) ∂Fv (t) ∂Ov (t) ∂wij (t)
dX .. .. ..
. . Fv where ηp and ηv are the learning rates for PCN and VCN, re-
t (t) ∂Et (t)
dX spectively. The partial differential computations ∂E
∂X(t) , ∂ Ẋ(t) ,
∂Fp (t) ∂Fv (t)
∂Op (t) , and ∂Ov (t) can be derived by (13) and (14), while
Fig. 3. The structure of the tracking control part using neural networks ∂Op (t) v (t)
(p) and ∂O(v) can be obtained by the error back
∂wij (t) ∂wij (t)
∂X(t) ∂ Ẋ(t)
propagation method [8]. However, ∂F p (t)
and ∂F v (t)
cannot
trajectory during free movements. The FCN is then trained be computed directly because of dynamics of the manipulator.
under the PCN and VCN are fixed during contact movements, ∂X(t) ∂ Ẋ(t)
To deal with such computational problems, ∂F p (t)
and ∂F v (t)
where the desired trajectory Xd is also regulated to reduce
are approximated by the finite variations [9] as follows:
errors of the end-point force as small as possible.
∂X(t)
B. Learning During Free Movements ≈ ∆t2s I, (19)
∂Fp (t)
Figure 3 shows the detailed structure of the tracking control
∂ Ẋ(t)
part using PCN and VCN in the proposed impedance control ≈ ∆ts I, (20)
∂Fv (t)
system. The force control input Fact during free movements
is defined by where ∆ts is the sampling interval, and I is the l dimensional
unit matrix.
Fact = Ft + Ẍd = Fp + Fv + Ẍd
T T When the learning is finished, the trained PCN and VCN
op1 ov1 may express the optimal impedance parameters Me−1 Ke and
oTp2 oTv2
= Me−1 Be as the output values of the networks Op (t) and Ov (t),
· · · dX + · · · dẊ + Ẍd , (13)
respectively.
oTpl oTvl
C. Identification of Environments by NN
where Fp , Fv ∈ l are the control vectors computed with the
output of PCN and VCN, respectively. In the proposed learning method for contact movements,
The energy function for the learning of PCN and VCN is the FCN is trained by using the estimated positional error of
given by the end-effector from the force error information without any
positional information on the environment. Consequently, it
1 1
Et (t) = dX(t)T dX(t) + dẊ(t)T dẊ(t). (14) needs to identify an environment model F̂c to generate the
2 2 inputs to the FCN:
(p) (v)
The synaptic weights in the PCN, wij , and the VCN, wij ,
are modified in the direction of the gradient descent reducing F̂c = ĝ dXo , dẊo , dẌo , t . (21)
943
In some target tasks, the environment model for the manip- The energy function for the FCN learning is defined as
ulator can be expressed with the following linear and time- 1
invariant model as: Ef (t) = {Fd (t) − Fc (t)}T {Fd (t) − Fc (t)} . (28)
2
Fcm = gm dXo , dẊo , dẌo (f )
The synaptic weights in the FCN, wij , are modified in the
= Kc dXo + Bc dẊo + Mc dẌo , (22) direction of the gradient descent reducing Ef as follows:
l×l (f ) ∂Ef (t)
where Kc , Bc , and Mc ∈ denote the stiffness, viscosity, ∆wij (t) = −ηf (f )
, (29)
and inertia matrices of the environment model, respectively. ∂wij (t)
However, it is much likely that there may exist some modeling
∂Ef (t) ∂Ef (t) ∂Fc (t) ∂X(t) ∂Fc (t) ∂ Ẋ(t)
errors between the real environment g in (2) and the defined (f )
= +
∂wij (t) ∂Fc (t) ∂X(t) ∂Ff (t) ∂ Ẋ(t) ∂Ff (t)
environment model gm . In the proposed control system, such
modeling errors on the environment is dealt with the Environ- ∂Fc (t) ∂ Ẍ(t) ∂Ff (t) ∂Of (t)
ment Identification Network (EIN) which is put in parallel + , (30)
∂ Ẍ(t) ∂Ff (t) ∂Of (t) ∂w(f ) (t) ij
with the environment model gm as shown in Fig. 4. The
inputs to the EIN is the end-point force, position, velocity, and ∂Ef (t)
where ηf is the learning rate for the FCN. The terms ∂Fc (t)
acceleration, while the output is the force compensation Fcn ∂Ff (t) ∂Of (t)
and ∂Of (t) can be computed by (27) and (28), and (f) by
for the force error caused by the modeling errors. Accordingly, ∂wij (t)
the end-point force F̂c is estimated as follows: the error back propagation learning. ∂X(t) ∂ Ẋ(t)
Moreover, ∂Ff (t) , ∂Ff (t) ,
∂ Ẍ(t) ∂X(t) 2 ∂ Ẋ(t)
F̂c = Fcm + Fcn . (23) and ∂Ff (t) can be approximated as ∂Ff (t) ≈ ∆ts I, ∂Ff (t) ≈
∂ Ẍ(t)
The energy function for the learning of EIN is defined as ∆ts I, and ∂Ff (t) = I, respectively, in the similar way to the
∂Fc (t)
1 T learning rules for free movements. On the other hand, ∂X(t) ,
Ee (t) = F̂c (t) − Fc (t) F̂c (t) − Fc (t) . (24) ∂Fc (t) ∂Fc (t)
2 ∂ Ẋ(t)
and ∂ Ẍ(t)
are computed by using the estimated end-
(e) point force F̂c in order to concern dynamic characteristics of
The synaptic weights in the EIN, wij ,
are modified in the
direction of the gradient descent reducing Ee as follows: the environment during contact movements. When the EIN is
trained enough to realize F̂c = Fc , those partial differential
(e) ∂Ee (t) computations can be carried out with (22) and (23).
∆wij (t) = −ηe (e)
, (25)
∂wij (t) On the other hand, the desired trajectory, as well as the
∂Ee (t) ∂Ee (t) ∂ F̂c (t) ∂Fcn (t) impedance parameters, is regulated to reduce learning burdens
(e)
= , (26) of the FCN as small as possible. The modification of the
∂wij (t) ∂ F̂c (t) ∂Fcn (t) ∂w(e) (t)
ij desired trajectory ∆Xd (t) is executed by
∂Ee (t) ∂Ef (t)
where ηe is the learning rate for the EIN. The terms ∂ F̂c (t) ∆Xd (t) = −ηd , (31)
∂ F̂c (t) ∂Fcn (t) ∂Xd (t)
and ∂Fcn (t) can be computed by (23) and (24), while (e)
∂wij ∂Ef (t) ∂Ef (t) ∂ F̂c (t) ∂X(t) ∂Ff (t)
by the error back propagation learning. = , (32)
∂Xd (t) ∂ F̂c (t) ∂X(t) ∂Ff (t) ∂Xd (t)
The condition F̂c = Fc should be established at minimizing
the energy function Ee (t) by the EIN, so that F̂c can be where ηd is the rate of modification. The desired velocity
utilized for the learning of contact movements even if the exact trajectory is also regulated in the same way. At minimizing
environment model is unknown. the force error of the end-point with the designed learning
rule, the FCN may express the optimal impedance parameter
D. Learning During Contact Movements Me−1 as the output value of the network Of (t).
The FCN is trained to realize the desired end-point force The designed learning rules during contact movements can
Fd by utilizing the estimated end-point force F̂c . Note that the be utilized under that the EIN has been trained enough to
synaptic weights of the PCN and VCN are fixed during the F̂c ≈ Fc . However, there is the possibility that the estimated
FCN learning, and no positional information on the environ- error of F̂c would be much increased because of the unex-
ment is given to the manipulator. pected changes of the environment or the learning error of the
The learning in this stage is performed by exchanging the EIN. Therefore, the learning rates ηf and ηd during contact
force control input Fact given in (13) with movements are defined with the time-varying functions of
Ee (t) given in (24) as follows:
Fact = Ft + Ff + Ẍd
T
of 1 ηfMAX
ηf (t) = , (33)
oTf2 1 + pEe (t)
· · · (Fd − Fc ) + Ẍd .
= Ft − (27)
ηdMAX
ηd (t) = , (34)
oTfl 1 + pEe (t)
944
y
where ηfMAX and ηdMAX are the maximum values of ηf (t)
and ηd (t), respectively; and p is a positive constant. It can Manipulator
be expected that the learning rates defined here become small x
automatically to avoid the wrong learning when the learning Initial and
error Ee (t) is large. target position
t =0, 8 [s]
V. C OMPUTER S IMULATIONS
A series of computer simulations is performed with a four-
0.2 [m]
joint planar manipulator to verify the effectiveness of the
proposed method, in which the length of each link of the
manipulator is 0.2 [m], the mass 1.57 [kg] and the moment
of inertia 0.8 [kgm2]. The given task for the manipulator is a
t =4 [s]
circular motion of the end-effector including free movements Environment
and contact movements as shown in Fig. 5, in which the
end-effector rotates counterclockwise in 8 [s] and its desired
trajectory is generated by using the fifth-order polynomial with Fig. 5. An example of a contact task
respect to the time t. The impedance control law used in this
paper is designed by means of the multi-point impedance
control method for a redundant manipulator [10], and the t =0 [s]
sampling interval of dynamics computations was set at ∆ts t =2.5 [s]
t =5.5 [s]
= 0.001 [s].
The PCN and VCN used in the simulations were of four
layered networks with eight input units, two hidden layers
with twenty units and four output units. The initial values t =4 [s]
(p)
of synaptic weights were randomly generated under |wij |,
(v)
|wij | < 0.01, and the learning rates were set at ηp = 13000, ηv
= 15. The parameter ai in (9) was determined so that the output
Environment
of NNs was limited to -100 ∼ 100. The structures of FCN and
EIN were settled as four layered networks with ten and eight
input units, respectively, two hidden layers with twenty units,
(a) Stick pictures
and four output units. The parameters for the learning of the
NNs were set as ηfMAX = 0.0001, ηdMAX = 0.01, p = 10, no learning
30
ηe = 0.001, and the desired end-point force at Fd = (0, 5)T 1
End-point force, Fcy [N]
[N].
2
In the computer simulation, the estimated environment 20
model under x < −0.1 [m] was the same as the task
environment gm in (22) with Kc = diag.(0, 10000) [N/m], 10
Bc = diag.(0, 20) [Ns/m], Mc = diag.(0, 0.1) [kg], but 10
otherwise the following nonlinear characteristics was assumed
as modeling errors:
0
Fcy = Kcy dyo2 + Bcy dẏo2 + Mcy dÿo , (35) 2 3 4 5 6
t [s]
where Fcy is the normal force from the environment to the
(b) End-point force
end-effector; and the tangential force Fcx = 0. Simulations
were conducted under Kcy = 1000000 [N/m2 ], Bcy = 2000
[Ns/m2 ], and Mcy = 0.1 [kg]. Fig. 6. End-point force of the manipulator during learning of a contact
movement
Figure 6 shows the changes of arm postures and end-point
forces Fc of the manipulator in the process of learning during
contact movements after the learning of free movements. A
large interaction force is observed until the learning of the Me−1 . It can be found that the manipulator can generate the
FCN is enough progressed, because the manipulator tries to desired end-point force just after contacts with the environment
follow the initial desired trajectory with the trained PCN and when the learning on contact movements was completed (at
VCN for free movements. Then, the end-point force converges the 10-th trial). However, the force error still remains only in a
to the desired one (5 [N]) through regulating the FCN and the moment when the characteristics of the environment changes
desired trajectory of the end-effector at the same time as shown discontinuously, because the estimated environment model by
in Figs. 7, 8, where (i, j) in Fig. 7 represents the element of the trained EIN includes some identification errors.
945
0.1 10
8
(2, 2)
(1, 1) (1, 2) (2, 1) 5
4
-0.1
Me -1
y [m]
0 1
-4 -0.3
-8 no learning
2 3 4 5 6 -0.5
t [s] -0.3 -0.1 0.1 0.3
x [m]
(a) First trial
(2, 2)
8 Fig. 8. Virtual trajectories of the manipulator during learning of a contact
movement
4
[5] T. Tsuji, M. Nishida, and K. Ito: “Iterative Learning of Impedance
Me -1
R EFERENCES
946