Professional Documents
Culture Documents
To cite this article: S. G. Khan, M. Tufail, S. H. Shah & I. Ullah (2019): Reinforcement
learning based compliance control of a robotic walk assist device, Advanced Robotics, DOI:
10.1080/01691864.2019.1690574
FULL PAPER
© 2019 Informa UK Limited, trading as Taylor & Francis Group and The Robotics Society of Japan
2 S. G. KHAN ET AL.
feedback controller
hip and knee joints [15], robust adaptive assist-as-needed
control [18], model-free adaptive nonsingular terminal
Position Control
system [15]
Force control
Force control
controller
sliding mode control [19], sliding mode control [20] and
integral sliding mode control [21].
Quintero et. al. [15] uses a combination of motion
control and impedance control. The former is used to
Backdriveable harmonic
enforce the desired trajectory during the swing phase of
drive servomotors
Electric DC actuators
Brushless DC motors
Electric DC SEAs
to facilitate movement toward a given equilibrium point
(e.g. during the transition from sitting to standing
Hydraulic
Potentiometers, inertial
Sensors in foot sole to
Incremental encoders
force sensors
considered, no multi-body
ankle joints
capabilities
as Indego by Parker
Varnderbilt (marketed
TTI-Knuckle1 [17]
Hannafin) [14]
Device [12,13]
each joint (e.g. hip, knee, and ankle) according to desired penalties on peak force exerted on the environment dur-
parameters that are characterized by studies on human ing physical interaction or to execute the desired trajec-
gait [24,26]. tory while assuring safety by respecting joint or actuator
The success of impedance control greatly depends limits [33].
on the availability of complete knowledge of the robot In this paper, the main aim is to introduce optimal-
dynamic model. In real applications, target impedance ity (control) for RWAD via a dynamic-model-free RL
is, therefore, hard to achieve due to a mismatch between scheme, in contrast to traditional adaptive control which
the approximated and actual dynamics of the system. To is not optimal. We are focusing on a model reference
solve this problem, robust impedance control techniques compliance control that uses an external mass-spring-
have been developed such as Sliding Mode Control [27], damper system as a compliant reference model. Hence,
and robust impedance controller based on the passivity the main contribution of the paper is the use of a dynamic
approach [25]. model-free RL technique to bring adaptivity and opti-
Like any other serial manipulator, exoskeleton struc- mality to RWAD. Therefore, the suggested scheme will
tures have a nonlinear dynamic model. Any control automatically adjust itself to any user. In addition, com-
strategy based on the estimated dynamic model (e.g. pliance will bring a degree of safety to RWAD users. To
computed torque control, impedance control etc.) to the best of our knowledge no one else has applied this
achieve the desired behavior would have to deal with type of RL-based Optimal Adaptive control to a robotic
the underlying time-varying uncertainties arising from walk assist device.
the robot itself, the human limb, and their mutual and The remaining paper is organized as follows. Section 2
environmental interactions. Moreover, a structure that describes the muscle function and introduces our robotic
fits human subjects with different conditions and assis- walk assist device. In Section 3, a reinforcement learning
tance requirements necessitates the development of a control algorithm is explained. In Section 4 simulation
design that is inherently robust and a control strategy results are presented and discussed. In Section 5, the
that optimally adapts itself to these differing scenarios. paper is concluded.
Traditional adaptive control techniques rely on the avail-
ability of robot analytical model that is first: hard to
2. Walk assist device
derive, second: can not capture all types of uncertainties
(e.g. friction, disturbances etc.), and third: requires exten- The human leg is usually modeled as a skeletal model
sive computational resources to be computed in real-time coupled by the model of a muscle-tendon complex
(due to their nonincremental nature). (MTC). The skeletal model is an articulated rigid body
In comparison, model-free adaptive control tech- that accepts applied moments on the joints as an input
niques specially those that incorporate bio-inspired and the generated motion as an output. The MTC model
learning such as reinforcement learning [15], Artificial takes the neural signals from the brain and the MTC
Neural Networks (ANN) [28], Gaussian process regres- length as inputs and outputs the force in the MTC. This
sion [29] and learning with piecewise linear models using force can be converted to moments by multiplying with
nonparametric regression techniques [30], are not vul- the moment-arm and then fed to the skeletal model
nerable to these problems. These methods are also called as input.
direct learning as they achieve nonlinear function approx- For simplicity, we model the human leg (see Figures 1
imation without involving going through the rigorous and 2) as an open kinematic chain of three rigid bodies
system identification process. Schaal and Atkeson, 2010 only i.e. the hip, the thigh, and the lower leg connected
in [31] present a survey of different approaches to robot together by two rotational joints. The hip joint is consid-
learning. ered as fixed to the body, the thigh rotates with respect to
Recently, reinforcement learning has drawn the atten- the hip and the lower leg rotates with respect to the thigh.
tion of the research community for the control of lower The joints are taken as one degree-of-freedom rotational
extremity exoskeleton (see, for example [30,32]). In con- joints and thus the entire motion is planar with only two
trol paradigms, RL-based control is inherently both adap- degrees of freedom.
tive and optimal. Adaptive in the sense that it lets the Justification for a 2-DOF planar model for RWAD
controller adapt to uncertainty and unforeseen changes comes from available studies [34] of the normal range of
in the robot dynamics. Optimal in the sense that gen- joint motion in human adults. Knee joint (approximated
eral optimization objectives can be achieved; usually as a hinge) rotates in the sagittal plane with flexion-
defined as a set of sequential decisions leading to a extension motion in the range 0-160 degrees controlled
goal or the best possible outcome. The goal (encoded by active muscles. The other rotation (in the transverse
as cost functions) could be, for example, to include plane) in knee joint is small (about +/- 10 degrees) and
4 S. G. KHAN ET AL.
Reference Model
qr - qd
Actor
+ Reinforcement learning
Robotic walk assist
device
Critic
qd is a 2 × 1 the new reference position vector to com- e.g. if ks is decreased, the robot becomes more compli-
pensate the human effort (interaction torque, sensed in ant. In this paper, all these values are manually selected.
the torque sensors in hip and knee joints). Therefore, Js , However, this a potential future work to learn suitable
bs and ks can be employed to tune the level of compliance, compliance levels.
6 S. G. KHAN ET AL.
where, r(qi , τi , qdi ) = r̃(qi , qdi ) + τiT Rτi , R > 0. The vec-
tor qdi is the demand so that r̃(qi , qdi ) ≥ 0 represents a
cost for tracking.
The optimal cost function can be rewritten as:
h∗ (qk , qdi ) = arg min(Q∗ (qk , τk , qdi )). (6) Q̂(qk , τk , qdi , hi ) = hTi λ(zk (qk , τk , qdi )). (7)
τk
The function zk (qk , τk , qdi ) is employed to simplify the
The control inputs are calculated by solving (δQ∗ /δτ )(qk , definition of the NN-nodes λ(·), explained later in the
τk ) = 0 for τk , assuming Q∗ is differentiable. paper. Q̂(zk , hi+1 ) has to fit δ(·):
δ(λk (zk (qk , τk , qdi )), hi ) = r̃(qi , qdi ) + τ̂ (qk )T Rτ̂ (qk )
3.3. Algorithm
+ Q̂i (qk+1 , τ̂ (qk+1 ), qdk+1 )
A neural network λ(·) with weights hi is employed to
(8)
approximate the cost of the control problem. The follow-
ing parameterizations was employed for estimating the hi+1 is calculated via the least squares method.
8 S. G. KHAN ET AL.
At the end of each learning period, following update Convergence or stability proof for this scheme can be
law is used to update the hi , see flow chart in Figure 5. found in the work by [45].
These learning gains stay until the end of the next learn-
ing cycle. The selection of an appropriate learning period h(i+1),app = αhi+1 + (1 − γ )hi,app , (9)
is necessary for the solution of the recursive least square
problem (Figures 5 and 4). A shorter learning period will where 0 < α < 1, is the forgetting factor. Hence, control
lead to quick convergence. However, it should be able to policy is implemented with the updated gain, h(i+1),app .
capture the dynamics behavior of the system fully. For lin- These gains will remains unchanged until the end of the
ear system, less number of data points may be required next learning cycle.
as in the case of [44]. For a complex nonlinear system, As mentioned above, a polynomial based NN is used
more data points may be required as suggested in [33]. for cost function estimation. For the simulation in this
10 S. G. KHAN ET AL.
paper, 78 neurons are employed which produce satisfac- the first 6 s (three learning cycles of 5.4 s, with each
tory results. cycle 1.8 s long). In the first learning cycle i.e. first 1.8,
The function λ(zk (qk , τk , qdi )) in Equation (7) is calcu- an initial PD controller is applied. At the end of the
lated by the Kroneker product of zk = [τ1 , τ2 , e1 , e2 , ė1 , ė2 , cycles, the new RL controller gains are applied. Hence,
e21 , e22 , ė21 , ė22 , q22 , q̇22 ]T where τ1 and τ2 are the hip and knee in the second cycle, some improvement is observed.
joint input torques respectively. The hip and the knee In the third cycle, the tracking becomes even bet-
joint position errors are given by e1 and e2 respectively. ter. These results show the efficiency of this scheme.
The hip and the knee joint angular positions are given by Figure 11 shows that the Q-function is estimating the
q1 and q2 (Figure 4). cost function very well. Please note that qr and qd are
The hip joint and knee joints position errors [e1 , e2 ] are the same in this time interval(as no human torque or
calculated from qd − q, where q = [q1−hip , q2−knee ]T and efforts).
is the modified (based on the mass-spring damper system
model) reference angular position for hip and knee joint
when human is applying some torque. In any other case
5. Conclusion
qd will be equal to qr , the reference position for hip and Robotic walk assist devices can help regain mobility for
knee, see Figures 3 and 4. those who have walking disabilities due to weak mus-
cles etc. The work presented here focuses on improv-
4. Simulation ing control performance of such devices, which lays
a foundation for producing affordable, safe, and user-
Control scheme proposed above is simulated using friendly. Suitable control techniques can help optimize
a model of RWAD developed in SimMechanics tool- control efforts as well as learn the dynamics online,
box (Matlab/Simulink). The dynamics model can be resulting in improved performance of the system. In
described by the following equation: this paper, a dynamic model-free, bio-inspired RL-based
m(q)q̈ + v(q, q̇) + g(q) = τ , (10) optimal adaptive compliance control scheme has been
proposed and implemented in simulation on a robotic
where q = [q1 − hip, q2 − knee]T , hip and knee joints walk assist device, employing hip and knee joints. Com-
angular positions. The matrix, m ∈ 2×2 is the combined pliance control is introduced through an external mass-
inertia of both the walk assist device and the human spring-damper model. A polynomial based neural net-
leg. Similarly, vector v ∈ 2×1 is the representation of work was used for estimating the Q-function; minimiza-
the coriolis/centripetal torques. Gravitational torques are tion of which leads to the derivation of control torque.
given by g ∈ 2×1 . Hip and knee joints input torques Simulation results validate better tracking and compliant
are given by τ ∈ 2×1 . The above dynamics model was behavior of an RL-based controller of two DoF robotic
simulated in Matlab/Simulink walk assist device.
A sampling time of 0.0005 s was used in simulation.
Using 0.001 s step or higher leads to instability within the
Disclosure statement
first 10 s. Figure 6 shows tracking results of hip and knee
joints. Cost estimate via Q-function and control torques No potential conflict of interest was reported by the authors.
are shown in Figures 7 and 8 respectively. The cost esti-
mates get better as the time passes. The magnitudes of Notes on contributors
the control input torques have significantly diminished
after the first learning cycle. The oscillations in the con- Said Ghani Khan earned his Ph.D. degree at the Bristol
Robotics Laboratory, University of the West of England (and
trol inputs are due to the band-limited white noise which University of Bristol) in 2012. He did his M.Sc. in Robotics
is added to the control inputs to satisfy the condition of from the University of Plymouth, United Kingdom in 2006.
‘persistency of excitation’. He received his B.Sc. in Mech. Engg. from University of Engi-
As mentioned before, when torque applied by the neering Technology Peshawar, Pakistan in 2003. Dr. Khan is
human is detected by the torque sensor in the hip joint, currently working as an Assistant Professor in the Department
of Mechanical Engineering, Taibah University (Yanbu Branch),
the demand qr is modified to become qd based on the
Saudi Arabia. He is also an honorary member of the research
compliant mass-spring-damper reference model. Simi- staff at University of Bristol, United Kingdom. Dr. Khan is the
larly, in Figures 9 and 10 trajectory tracking the perfor- author of many journal articles, book chapters, and conference
mance of hip and knee joints are shown. When human papers. He has also co-authored a book on Bio-inspired Control
effort is there, the demand position is modified in the of Humanoid robot arm, published by Springer.
desired compliant manner i.e. qr is changed into qd Muhammad Tufail received an MSc degree in Mechatronics
to allow human to exert the torque. Figure 12 shows Engineering from Asian Institute of Technology (2007) and
ADVANCED ROBOTICS 11
a Ph.D. degree in Mechanical Engineering (with specializa- [8] Viteckova S, Kutilek P, Jirina M. Wearable lower limb
tion in Manufacturing and Mechatronics) from University of robotics: a review. Biocybern Biomed Eng. 2013;33(2):
British Columbia, Canada (2015). He then worked as a post- 96–105. doi:10.1016/j.bbe.2013.03.005
doctoral research fellow at the Industrial Automation Labora- [9] Kazerooni H, Steger R. The berkeley lower extremity
tory at UBC (2016). He has worked as a consultant and research exoskeleton. J Dyn Syst Meas Control. 2006;128(1):14–25.
associate on robotics and automation projects with the indus- [10] Zoss AB, Kazerooni H, Chu A. Biomechanical design of
try including the Canadian oil giant Cenovus Energy. He has the berkeley lower extremity exoskeleton (bleex). IEEE
more than ten years of research and teaching experience includ- ASME Trans Mechatron. 2006;11(2):128–138.
ing three years of teaching industrial robotics at both under- [11] Kawamoto H, Hayashi T, Sakurai T, et al. Develop-
grad and graduate levels at UBC, Canada ( 2014–2017). Since ment of single leg version of hal for hemiplegia. 2009
2017, he is working as an Assistant Professor at the Depart- Annual international conference of the IEEE engineer-
ment of Mechatronics Engineering, University of Engineering ing in medicine and biology society; IEEE; 2009. p.
and Technology, Peshawar. His research interests include robot 5038–5043.
control, vision, haptics, and teleoperation. [12] Kusuda Y. In quest of mobility–honda to develop walking
assist devices. Ind Robot: Int J. 2009;36(6):537–539.
Syed Humayoon Shah recently received his Master’s degree in [13] Ikeuchi Y, Ashihara J, Hiki Y, et al. Walking assist device
Mechatronics Engineering from the University of Engineer- with bodyweight support system. 2009 IEEE/RSJ interna-
ing and Technology Peshawar, Pakistan. He has graduated in tional conference on intelligent robots and systems; IEEE;
Electronic Engineering from BUITEMS Quetta, Pakistan in 2009. p. 4073–4079.
2014. His research interest is in the field of robotics, nonlinear [14] Murray SA, Ha KH, Hartigan C, et al. An assistive control
control, reinforcement learning, control theory, and machine approach for a lower-limb exoskeleton to facilitate recov-
vision. He is currently working on rehabilitation robotics and ery of walking following stroke. IEEE Trans Neural Syst
artificial muscles. Rehabil Eng. 2015;23(3):441–449.
Irfan Ullah is a Mechanical Engineering honors graduate of [15] Quintero HA, Farris RJ, Goldfarb M. A method for the
University of Engineering and Technology Peshawar, Pakistan. autonomous control of lower limb exoskeletons for per-
He received his Masters (1983) and PhD (1994) in Mechani- sons with paraplegia. J Med Device. 2012;6(4): 041003.
cal Engineering from The University of Michigan, Ann Arbor. [16] Pratt JE, Krupp BT, Morse CJ, et al. The roboknee: an
His research interests include design and dynamics of machin- exoskeleton for enhancing strength and endurance dur-
ery, including mechanisms and robots. He is currently working ing walking. IEEE international conference on robotics
as Professor at the College of Engineering, Taibah University, and automation, 2004. proceedings. ICRA’04. 2004; Vol.
Yanbu, KSA. 3. IEEE; 2004. p. 2430–2435.
[17] Asl HJ, Narikiyo T, Kawanishi M. An assist-as-needed
control scheme for robot-assisted rehabilitation. 2017
American control conference (acc); IEEE; 2017. p.
References 198–203.
[18] Hussain S, Xie S, Member S. Assist-as-needed control of
[1] Horst RW. A bio-robotic leg orthosis for rehabilita- an intrinsically compliant robotic gait training orthosis.
tion and mobility enhancement. Proceedings of the 31st IEEE Trans Ind Electron. 2017;64(2):1675–1685.
annual international conference of the ieee engineering in [19] Han S, Wang H, Tian Y. Model-free based adaptive
medicine and biology society; Minneapolis, Minnesota, nonsingular fast terminal sliding mode control with
USA: IEEE; 2009. p. 5030–5033. time-delay estimation for a 12 dof multi-functional lower
[2] Young AJ, Ferris DP. Analysis of state of the art and future limb exoskeleton. Adv Eng Soft. 2018;119:38–47.
directions for robotic exoskeletons. IEEE Trans Neural [20] Ahmed S, Wang H, Tian Y. Model-free control using time
Syst Rehabil Eng. 2017;25(2):171–182. delay estimation and fractional-order nonsingular fast
[3] Weerasingha A, Withanage W, Pragnathilaka A, et al. terminal sliding mode for uncertain lower-limb exoskele-
Powered ankle exoskeletons: existent designs and con- ton. JVC/J Vibr Control. 2018;24(22):5273–5290.
trol systems. Proceedings of international conference on [21] Shah SH, Khan SG, Shah K, et al. Compliance control
artificial life and robotics; Vol. 23. 2018. p. 76–83. of robotic walk assist device via integral sliding mode
[4] Meng W, Liu Q, Zhou Z, et al. Recent develop- control. In: 2019 16th international bhurban conference
ment of mechanisms and control strategies for robot- on applied sciences and technology (ibcast). 2019 Jan. p.
assisted lower limb rehabilitation. Mechatronics. 2015;31: 515–520.
132–145. doi:10.1016/j.mechatronics.2015.04.005 [22] Aguirre-Ollinger G, Colgate JE, Peshkin MA, et al. Active-
[5] Huo W, Mohammed S, Moreno JC, et al. Lower limb wear- impedance control of a lower-limb assistive exoskeleton.
able robots for assistance and rehabilitation: a state of the In: IEEE 10th international conference on rehabilitation
art. IEEE Syst J. 2016;10(3):1068–1081. robotics. Noordwijk, The Netherlands. 2007. p. 188–195.
[6] Al-Shuka HF, Rahman MH, Leonhardt S, et al. Biome- [23] Marchal-Crespo L, Reinkensmeyer DJ. Review of control
chanics, actuation, and multi-level control strategies of strategies for robotic movement training after neurologic
power-augmentation lower extremity exoskeletons: an injury. J Neuroeng Rehabil. 2009;6(1):1–15.
overview. Int J Dyn Control. 2019;1–27. [24] Winter DA. The biomechanics and motor control of
[7] Dollar AM, Herr H. Lower extremity exoskeletons and human gait normal elderly and pathological. Vol. Univer-
active orthoses: challenges and state-of-the-art. IEEE sity of Waterloo Press; 1991. https://trid.trb.org/view.aspx?
Trans Robot. 2008;24(1):144–158. id = 770965
12 S. G. KHAN ET AL.
[25] Mohammadi H, Richter H. Robust tracking/impedance [36] Buchli J, Theodorou E, Stulp F, et al. variable impedance
control: Application to prosthetics. In: 2015 american control – a reinforcement learning approach. In: Robotics
control conference (acc). IEEE. 2015. p. 2673–2678. science and systems. 2010.
[26] Sup F, Bohara A, Goldfarb M. Design and control of [37] Hamaya M, Matsubara T, Noda T, et al. Learning assistive
a powered transfemoral prosthesis. Int J Robot Res. strategies for exoskeleton robots from user-robot physical
2008;27(2):263–273. NIHMS150003 interaction. Pattern Recognit Lett. 2017;99:67–76.
[27] Madani T, Daachi B, Djouani K. Non-singular termi- [38] Kuan CP, Young KY. Reinforcement learning and robust
nal sliding mode controller: application to an actu- control for robot compliance tasks. J Intell Robot Syst.
ated exoskeleton. Mechatronics. 2016 feb;33:136–145. 1998;23:165–182.
https://www.sciencedirect.com/science/article/abs/pii/ [39] Kim B, Park J, Park S, et al. Impedance learning for robotic
S0957415815001737 contact task using natural actor-critic algorithm. IEEE
[28] Jabbari Asl H, Narikiyo T, Kawanishi M. Neural network- Trans System Man Cybern Part B: Cybern. 2010;4(2):
based bounded control of robotic exoskeletons with- 433–443.
out velocity measurements. Control Eng Pract. 2018 [40] Seraji H. Adaptive compliance control: an approach
(June);80:94–104. doi:10.1016/j.conengprac.2018.08. to implicit force control in compliant motion. 1994.
005 citeseer.ist.psu.edu/404588.html.
[29] Nguyen-Tuong D, Peters JR, Seeger M. Local Gaussian [41] Colbaugh R, Seraji H, Glass K. Adaptive compliant
process regression for real time online model learning. In: motion control for dextrous manipulators. Int J Robot
Advances in neural information processing systems. 2009. Res. 1995;14(3):270–280.
p. 1193–1200. [42] Colbaugh R, Wedeward K, Glass K, et al. New results on
[30] Huang R, Cheng H, Guo H, et al. Hierarchical interac- adaptive compliant motion control for dextrous manipu-
tive learning for a human-powered augmentation lower lators. Int J Robot Autom. 1996;11(1):230–238.
exoskeleton. In: Proceedings – IEEE international con- [43] Khan SG, Herrmann G, Pipe T, et al. Safe adaptive compli-
ference on robotics and automation. Stockholm, Sweden: ance control of a humanoid robotic arm with anti-windup
IEEE. 2016. p. 257–263. compensation and posture control. Int J Soc Robot. 2010
[31] Schaal S, Atkeson CG. Learning control in robotics. IEEE Sep;2(3):305–319.
Robot Autom Mag. 2010;17(2):20–29. [44] Al-Tamimi A, Lewis F, Abu-Khalaf M. Model-free
[32] Wang Y, Xu W, Tao C, et al. Reinforcement learning- Q-learning designs for linear discrete-time zero-sum
based shared control for walking-aid robot and its exper- games with application to H∞ control. Automatica.
imental verification. Adv Robot. 2015;29(22):1463–1481. 2007;43(3):473–481.
doi:10.1080/01691864.2015.1070748 [45] Al-Tamimi A, Lewis F, Abu-Khalaf M. Discrete-time non-
[33] Khan SG, Herrmann G, Lewis FL, et al. Reinforcement linear hjb solution using approximate dynamic program-
learning and optimal adaptive control: an overview and ming: convergence proof. IEEE Trans Syst Man Cybern
implementation examples. Annu Rev Control. 2012;36(1): Part B: Cybern. 2008 August;38(4):943–949.
42–59. doi:10.1016/j.arcontrol.2012.03.004 [46] Khan S, Herrmann G, Lewis F, et al. A novel q-learning
[34] Hamilton WW N, Luttgens K. Kinesiology: scientific basis based adaptive optimal controller implementation for a
of human motion. New York: McGraw-Hill; 2011. humanoid robotic arm*. Vol. 44. 2011. p. 13528 – 13533.
[35] Wang Y, Wang S, Ishida K, et al. High path tracking con- 18th IFAC World Congress.
trol of an intelligent walking-support robot under time- [47] Vrabie D, Vamvoudakis KG, Lewis FL. Optimal adaptive
varying friction and unknown parameters. Adv Robot. control and differential games by reinforcement learning
2017;31(14):739–752. principles. London: IET Press; 2012.