10 1080@01691864 2019 1690574

Advanced Robotics
ISSN: 0169-1864 (Print) 1568-5535 (Online) Journal homepage: https://www.tandfonline.com/loi/tadr20
Reinforcement learning based compliance control

of a robotic walk assist device
S. G. Khan, M. Tufail, S. H. Shah & I. Ullah
To cite this article: S. G. Khan, M. Tufail, S. H. Shah & I. Ullah (2019): Reinforcement
learning based compliance control of a robotic walk assist device, Advanced Robotics, DOI:
10.1080/01691864.2019.1690574
To link to this article: https://doi.org/10.1080/01691864.2019.1690574
Published online: 18 Nov 2019.
Submit your article to this journal
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tadr20
ADVANCED ROBOTICS
https://doi.org/10.1080/01691864.2019.1690574
FULL PAPER
Reinforcement learning based compliance control of a robotic walk assist device

S. G. Khana,b , M. Tufailc , S. H. Shahc and I. Ullaha
a Department of Mechanical Engineering, College of Engineering Yanbu, Taibah University, Yanbu, Saudi Arabia; b Department of Mechanical
Engineering, University of Bristol, Bristol, UK; c Department of Mechatronics Engineering, University of Engineering and Technology, Peshawar,
Pakistan
ABSTRACT ARTICLE HISTORY

Millions of people around the globe have to deal with walking disability. Robotic walk assist devices Received 4 April 2019
can help people with walking disabilities, especially those with weak legs. However, safety, cost, effi- Revised 7 August 2019 and
ciency and user friendliness are some of the key challenges. For robotic walk assist devices, light 20 October 2019 Accepted
weight structure and energy efficient design as well as optimal control are vitally important. In addi- 24 October 2019
tion, compliance control can help to improve the safety of such devices as well as contribute to their KEYWORDS
user friendliness. In this paper, an optimal adaptive compliance control is proposed for a Robotic walk Robotics walk assist device;
assist device. The suggested scheme is based on bio-inspired reinforcement learning. It is completely reinforcement learning;
dynamic-model-free scheme and employs joint position and velocity feedback as well as sensed joint compliance control; model
torque (applied by user during walk) for compliance control. The efficiency of the controller is tested reference; lower extremity
exoskeleton
in simulation on a robotic walk assisting device model.
1. Introduction daily life activities such as walking, exercising, and mov-

ing. They can offer a variety of assistance modes such
Human locomotion is vital for a healthy life. A signif-
as manual or automatic assistance, continuous passive
icant number of people suffer from walking disabilities
motion and robotic therapy [1]. In all these modes, the
caused by various diseases that affect legs, brain, nerves,
aim is to generate compliant motion in accordance with
or spine. Traditional walk assist tools such as crutches
the terrain conditions on assist-as-needed basis.
and walkers have been commonly used to help such peo-
A state of the art review of literature on mechanism
ple. However, rehabilitation robotics can be employed to
design and control of lower limb robotic exoskeletons can
facilitate regaining motor skills of weak muscles. Smart
be found in [2–6]. Authors in [7] and [8] also present
robotics walk assist devices have the potential to improve
a design and performance review of hardware, actua-
the social mobility of people and improve their quality
tion, sensing, and control systems for most of the devices
of life. Consequently, it will ease pressure on the health
that have been reported in the literature. Table 1 also
system as well as on immediate family and relatives. How-
presents a summary of currently available lower limb
ever, design and control of robotic walk assist devices
robotic exoskeletons.
(RWAD) are still very challenging and such devices need
The effectiveness of walk assist devices depends
to be affordable, easy to use and safe. Currently, there are greatly on their control strategies. Unlike other serial
many companies that produce robotic walk assist devices. manipulators, pure motion or force control schemes here
For instance, Honda, Ottobock and Össur and others, are are not adequate, as these devices have to physically inter-
actively developing prostheses, orthosis and be-bionics. act all the time with the users. Different factors that make
The following paragraphs summarises state of the art in the control problem challenging include dealing with
the field of robotic walk assist devices and their control. hazards due to robot motion, collision with objects and
Wearable, power-assisted, lower extremity active surroundings, human-robot interaction, and other safety
orthoses augment the locomotion ability of humans who hazards. Mitigating these hazards requires not only the
need mobility assistance or enhancement. These robotic use of inherently safe mechanism designs and application
devices with their advanced sensing and actuation capa- of safeguard measures but also need human-centered
bilities are designed for the elderly and people with and bio-inspired control algorithms based on real-time
disabilities or functional limitations to perform their feedback from the subject. In addition, they should
CONTACT S. G. Khan sfatehrahman@taibahu.edu.sa, engr_ghani@hotmail.com Department of Mechanical Engineering, College of Engineering

Yanbu, Taibah University, Albandar, Yanbu 41315, Saudi Arabia
© 2019 Informa UK Limited, trading as Taylor & Francis Group and The Robotics Society of Japan
2 S. G. KHAN ET AL.
optimally adapt to the uncertain and changing dynam-
compensation, impedance control
network-based trajectory tracking

PD joint-level controller with gravity
to emulate virtual spring-damper

ics of the system. Different control techniques for robotic
Inverse dynamics based positive
Adaptive feedforward neural

rehabilitation and assistance systems include variable-
Control Technique gain proportional-derivative (PD) feedback control of
feedback controller
hip and knee joints [15], robust adaptive assist-as-needed
control [18], model-free adaptive nonsingular terminal
Position Control
system [15]
Force control
Force control
controller
sliding mode control [19], sliding mode control [20] and
integral sliding mode control [21].
Quintero et. al. [15] uses a combination of motion
control and impedance control. The former is used to
Backdriveable harmonic
enforce the desired trajectory during the swing phase of
drive servomotors
Electric DC actuators
Brushless DC motors
gait. The latter is used when assistive torques are needed

Electric DC motors
Actuation
Electric DC SEAs
to facilitate movement toward a given equilibrium point
(e.g. during the transition from sitting to standing
Hydraulic
and vice versa). A supervisory finite-state machine is

used to achieve switching between the two control
modes.
sense ground contact
Encoders, floor reaction
Potentiometers, inertial
Sensors in foot sole to
Incremental encoders
The aim of impedance control is to assign a desired

force sensor, joint
dynamic relationship (called the target or desired

Sensing
force sensors
impedance) between interaction forces and the dis-

F/T, encoder
sensors
EMG, F/T
placements resulting from them. Unlike traditional pure

motion or force control techniques which are designed
to reject disturbances, impedance control accommodates
Only the knee actuator dynamics
and actuators are considered.
Three different dynamic models
disturbances as long as they are stable. In the absence of

dynamic model considered
Dynamics of both multi-body

each phase during walking
(Euler-Lagrangian), one for
considered, no multi-body
Multibody dynamics of the
environmental forces, the objective of impedance con-

system (mechanism and
is required for weight
Inverse dynamic model
Only actuator dynamic
trol reduces to motion tracking where predetermined

Dynamics
trajectories are followed. During the interaction, the

human subject)
are considered
compensation
target impedance shapes deviations from the reference

trajectory in the desired manner [22]. These target
impedance parameters can be re-programmed to fit dif-
ferent scenarios.
For powered orthoses, impedance control has been
No. of DOF
shown to be more effective as compared to pure motion

7
control, force control, and the hybrid position/force con-

trol [23]. During operation of the impedance-controlled
Since 1999 Assist elderly to walk and support
Since 2017 Augment power of hip, knee, and

semi-paralyzed people to walk
Since 2000 Increase strength and endurance
of humans to carry heavy load
Since 2001 Enhance capability of human leg

Assist paraplegics, stroke victims
Since 1992 Assist with, or enhance human
Table 1. Some state of the art lower limb robotic exoskeletons.
walk assist device, when the user movement deviates

and other paralyzed or
from the desired trajectory (human normal gait cycle),

the controller provides an assistive torque (resulting from
Purpose
the characteristic human impedance model [24,25]) to

body weight
ankle joints
capabilities
help the user return to normal gait. In situations when

there is no deviation from the normal gait, the con-
troller does not intervene. This is termed in literature as
assist-as-needed control.
For robotic manipulators, impedance control is mostly
2010
Year
developed in operational (task) space. This particularly

works well for industrial applications where tasks are
Exoskeleton (BLEEX) [9,10]
repetitive in nature and environment impedance can be

Berkeley’s Lower Extremity
Tsukuba’s Hybrid Assistive
estimated. In prostheses and orthoses, absolute Cartesian

Honda’s Walking Assist
as Indego by Parker
Varnderbilt (marketed
coordinates are not used. Instead motion is characterized

Limb (HAL) [11]
TTI-Knuckle1 [17]
Hannafin) [14]
Device [12,13]
by joint angles. This makes the impedance control prob-

RoboKnee [16]
lem simpler as compared to that developed in task space.

For exoskeletons, impedance control adjusts the dynamic
Name
relationship between the motion and torques involved at

ADVANCED ROBOTICS 3
each joint (e.g. hip, knee, and ankle) according to desired penalties on peak force exerted on the environment dur-
parameters that are characterized by studies on human ing physical interaction or to execute the desired trajec-
gait [24,26]. tory while assuring safety by respecting joint or actuator
The success of impedance control greatly depends limits [33].
on the availability of complete knowledge of the robot In this paper, the main aim is to introduce optimal-
dynamic model. In real applications, target impedance ity (control) for RWAD via a dynamic-model-free RL
is, therefore, hard to achieve due to a mismatch between scheme, in contrast to traditional adaptive control which
the approximated and actual dynamics of the system. To is not optimal. We are focusing on a model reference
solve this problem, robust impedance control techniques compliance control that uses an external mass-spring-
have been developed such as Sliding Mode Control [27], damper system as a compliant reference model. Hence,
and robust impedance controller based on the passivity the main contribution of the paper is the use of a dynamic
approach [25]. model-free RL technique to bring adaptivity and opti-
Like any other serial manipulator, exoskeleton struc- mality to RWAD. Therefore, the suggested scheme will
tures have a nonlinear dynamic model. Any control automatically adjust itself to any user. In addition, com-
strategy based on the estimated dynamic model (e.g. pliance will bring a degree of safety to RWAD users. To
computed torque control, impedance control etc.) to the best of our knowledge no one else has applied this
achieve the desired behavior would have to deal with type of RL-based Optimal Adaptive control to a robotic
the underlying time-varying uncertainties arising from walk assist device.
the robot itself, the human limb, and their mutual and The remaining paper is organized as follows. Section 2
environmental interactions. Moreover, a structure that describes the muscle function and introduces our robotic
fits human subjects with different conditions and assis- walk assist device. In Section 3, a reinforcement learning
tance requirements necessitates the development of a control algorithm is explained. In Section 4 simulation
design that is inherently robust and a control strategy results are presented and discussed. In Section 5, the
that optimally adapts itself to these differing scenarios. paper is concluded.
Traditional adaptive control techniques rely on the avail-
ability of robot analytical model that is first: hard to
2. Walk assist device
derive, second: can not capture all types of uncertainties
(e.g. friction, disturbances etc.), and third: requires exten- The human leg is usually modeled as a skeletal model
sive computational resources to be computed in real-time coupled by the model of a muscle-tendon complex
(due to their nonincremental nature). (MTC). The skeletal model is an articulated rigid body
In comparison, model-free adaptive control tech- that accepts applied moments on the joints as an input
niques specially those that incorporate bio-inspired and the generated motion as an output. The MTC model
learning such as reinforcement learning [15], Artificial takes the neural signals from the brain and the MTC
Neural Networks (ANN) [28], Gaussian process regres- length as inputs and outputs the force in the MTC. This
sion [29] and learning with piecewise linear models using force can be converted to moments by multiplying with
nonparametric regression techniques [30], are not vul- the moment-arm and then fed to the skeletal model
nerable to these problems. These methods are also called as input.
direct learning as they achieve nonlinear function approx- For simplicity, we model the human leg (see Figures 1
imation without involving going through the rigorous and 2) as an open kinematic chain of three rigid bodies
system identification process. Schaal and Atkeson, 2010 only i.e. the hip, the thigh, and the lower leg connected
in [31] present a survey of different approaches to robot together by two rotational joints. The hip joint is consid-
learning. ered as fixed to the body, the thigh rotates with respect to
Recently, reinforcement learning has drawn the atten- the hip and the lower leg rotates with respect to the thigh.
tion of the research community for the control of lower The joints are taken as one degree-of-freedom rotational
extremity exoskeleton (see, for example [30,32]). In con- joints and thus the entire motion is planar with only two
trol paradigms, RL-based control is inherently both adap- degrees of freedom.
tive and optimal. Adaptive in the sense that it lets the Justification for a 2-DOF planar model for RWAD
controller adapt to uncertainty and unforeseen changes comes from available studies [34] of the normal range of
in the robot dynamics. Optimal in the sense that gen- joint motion in human adults. Knee joint (approximated
eral optimization objectives can be achieved; usually as a hinge) rotates in the sagittal plane with flexion-
defined as a set of sequential decisions leading to a extension motion in the range 0-160 degrees controlled
goal or the best possible outcome. The goal (encoded by active muscles. The other rotation (in the transverse
as cost functions) could be, for example, to include plane) in knee joint is small (about +/- 10 degrees) and
4 S. G. KHAN ET AL.
a reinforcement learning scheme in combination with

Hip Joint
a sliding mode impedance control scheme for compli-
ance. A reinforcement learning mechanism is used in
their work to deal with the variation in different compli-
Thigh link
ance tasks. In this paper, RL scheme is used for control in
the joint space for the control of RWAD; here compliance
is produced via a second order mass-spring-damper sys-
Knee Joint
tem model. Kim et al. [39] have used the reinforcement
learning approach to find suitable compliance for differ-
ent situations during interaction with the environment.
Knee link
An RL-based variable-impedance control as presented
by Schaal et al. [36] has been implemented via a policy
improvement with path integral approach.
Ankle Joint
The most important features of this RL-based model
reference compliance controller is that, for its implemen-
tation, no prior information about the parameters of the
Figure 1. Human lower limb as a kinematic chain. robot is necessary. Only hip and knee joints’ angular
positions, velocities and control signal measurements are
used.
As mentioned before, the reinforcement leaning con-
trol scheme here is completely dynamic model free. The
scheme learns these dynamics over time and calculates
the torque required to track the reference trajectory in the
joint space. The proposed control scheme is depicted in
the block diagram shown in Figure 3. Here, qr is the ref-
erence trajectory the controller will track in the absence
of external torque, τext . When the human is applying
torque the reference trajectory will be modified to qd
based on mass-spring-damper system to reduce the actu-
ation efforts of RWAD in order to allow the human to use
his own efforts during walk. For clarity, the mass-spring-
damper reference model is briefly described in the next
section
3.1. Reference model

Figure 2. Robotic walk assist device 3D model.
Both hip and knee joints are made to follow the behav-
ior of a second order mass-spring-damper system. As
is not of interest in walk assist devices. The hip joint also mentioned above, the main idea is to mimic the com-
rotates in the sagittal plane in a flexion-extension style pliant behavior of a model inertia, rotational spring-
with a range between 0 and 120 degrees. Knowledge of damper system. In addition, this may be helpful in the
the joint limits and joint comfort zones greatly facilitates rehabilitation of the user. For example, if human is gain-
the design task. In real situations, variations in individ- ing strength and starts applying more efforts, then RWAD
ual subject capabilities can be easily accommodated by will reduce the amount of applied torque. The compliant,
the proposed RL-based control technique. reference model (Figures 3 and 4) is defined by the iner-
tia matrix Js , the damping matrix coefficient bs and the
stiffness coefficient matrix ks . These values determine the
3. Reinforcement learning based control of
behavior of the reference model [40–43],
RWAD
Reinforcement learning (RL) based control is get- Js q̈d + bs q̇d + ks qd = −τext + Js q̈r + bs q̇r + ks qr , (1)
ting popularity in robotic community mainly due to
its model-free nature and optimality, see for example where τext is 2 × 1 the sensed torques vector , qr 2 × 1 is
[33,35–37]. For instance, Kuan et al. [38] have proposed the reference trajectory vector(hip and knee joints) and
ADVANCED ROBOTICS 5
Reference Model
qr - qd
Actor
+ Reinforcement learning
Robotic walk assist
device
Critic
Figure 3. Reinforcement learning based compliance controller.
Figure 4. Recursive least square scheme (Simulink) is used to calculate NN gains.
qd is a 2 × 1 the new reference position vector to com- e.g. if ks is decreased, the robot becomes more compli-
pensate the human effort (interaction torque, sensed in ant. In this paper, all these values are manually selected.
the torque sensors in hip and knee joints). Therefore, Js , However, this a potential future work to learn suitable
bs and ks can be employed to tune the level of compliance, compliance levels.
6 S. G. KHAN ET AL.
3.2. Q-learning algorithm

The Q-learning algorithm employed here is based on
the work by [33,44–47]. Here, it is used for the imple-
mentation of model-reference compliance controller for
an exoskeleton robotic walk assist device. A polynomial
based Neural Network (NN) is employed for estimating
the Q-function.
The following discrete-time system is considered:
qk+1 = g(qk ) + h(qk )τk , yk = qk , (2)
where qk ∈ 4 is the state vector, containing, hip and

knee joints angular positions and velocities. The vari-
able, yk ∈ 2 is the output vector that contains hip and
knee joint angles and τk ∈ m is the control input torque
vector (contains hip and knee torque inputs). An infinite
horizon value function is given below:
∞

L∗ (qk , qdk ) = min r(qi , τi , qdi ) (3)
τi
i=k
where, r(qi , τi , qdi ) = r̃(qi , qdi ) + τiT Rτi , R > 0. The vec-
tor qdi is the demand so that r̃(qi , qdi ) ≥ 0 represents a
cost for tracking.
The optimal cost function can be rewritten as:
Q∗ (qk , τk , qdi ) = r(qk , τk , qdi ) + L∗ (qk+1 , qdk+1 ) (4)
Bellman’s optimality equation can be written in terms of

Figure 5. RLS scheme to calculate NN gains. Q∗ :
L∗ (qk , qdi ) = min(Q∗ (qk , τk , qdi )). (5)
τk
Figure 6. Hip and Knee position tracking.

ADVANCED ROBOTICS 7
Figure 7. Q value vs cost function.
Figure 8. Control inputs.
The optimal control policy is: Q-function:
h∗ (qk , qdi ) = arg min(Q∗ (qk , τk , qdi )). (6) Q̂(qk , τk , qdi , hi ) = hTi λ(zk (qk , τk , qdi )). (7)
τk
The function zk (qk , τk , qdi ) is employed to simplify the
The control inputs are calculated by solving (δQ∗ /δτ )(qk , definition of the NN-nodes λ(·), explained later in the
τk ) = 0 for τk , assuming Q∗ is differentiable. paper. Q̂(zk , hi+1 ) has to fit δ(·):
δ(λk (zk (qk , τk , qdi )), hi ) = r̃(qi , qdi ) + τ̂ (qk )T Rτ̂ (qk )
3.3. Algorithm
+ Q̂i (qk+1 , τ̂ (qk+1 ), qdk+1 )
A neural network λ(·) with weights hi is employed to
(8)
approximate the cost of the control problem. The follow-
ing parameterizations was employed for estimating the hi+1 is calculated via the least squares method.
8 S. G. KHAN ET AL.
Figure 9. Hip position tracking with compliance.
Figure 10. Knee position tracking with compliance.

ADVANCED ROBOTICS 9
Figure 11. Q value vs cost function.
Figure 12. First three learning cycles.
At the end of each learning period, following update Convergence or stability proof for this scheme can be
law is used to update the hi , see flow chart in Figure 5. found in the work by [45].
These learning gains stay until the end of the next learn-
ing cycle. The selection of an appropriate learning period h(i+1),app = αhi+1 + (1 − γ )hi,app , (9)
is necessary for the solution of the recursive least square
problem (Figures 5 and 4). A shorter learning period will where 0 < α < 1, is the forgetting factor. Hence, control
lead to quick convergence. However, it should be able to policy is implemented with the updated gain, h(i+1),app .
capture the dynamics behavior of the system fully. For lin- These gains will remains unchanged until the end of the
ear system, less number of data points may be required next learning cycle.
as in the case of [44]. For a complex nonlinear system, As mentioned above, a polynomial based NN is used
more data points may be required as suggested in [33]. for cost function estimation. For the simulation in this
10 S. G. KHAN ET AL.
paper, 78 neurons are employed which produce satisfac- the first 6 s (three learning cycles of 5.4 s, with each
tory results. cycle 1.8 s long). In the first learning cycle i.e. first 1.8,
The function λ(zk (qk , τk , qdi )) in Equation (7) is calcu- an initial PD controller is applied. At the end of the
lated by the Kroneker product of zk = [τ1 , τ2 , e1 , e2 , ė1 , ė2 , cycles, the new RL controller gains are applied. Hence,
e21 , e22 , ė21 , ė22 , q22 , q̇22 ]T where τ1 and τ2 are the hip and knee in the second cycle, some improvement is observed.
joint input torques respectively. The hip and the knee In the third cycle, the tracking becomes even bet-
joint position errors are given by e1 and e2 respectively. ter. These results show the efficiency of this scheme.
The hip and the knee joint angular positions are given by Figure 11 shows that the Q-function is estimating the
q1 and q2 (Figure 4). cost function very well. Please note that qr and qd are
The hip joint and knee joints position errors [e1 , e2 ] are the same in this time interval(as no human torque or
calculated from qd − q, where q = [q1−hip , q2−knee ]T and efforts).
is the modified (based on the mass-spring damper system
model) reference angular position for hip and knee joint
when human is applying some torque. In any other case
5. Conclusion
qd will be equal to qr , the reference position for hip and Robotic walk assist devices can help regain mobility for
knee, see Figures 3 and 4. those who have walking disabilities due to weak mus-
cles etc. The work presented here focuses on improv-
4. Simulation ing control performance of such devices, which lays
a foundation for producing affordable, safe, and user-
Control scheme proposed above is simulated using friendly. Suitable control techniques can help optimize
a model of RWAD developed in SimMechanics tool- control efforts as well as learn the dynamics online,
box (Matlab/Simulink). The dynamics model can be resulting in improved performance of the system. In
described by the following equation: this paper, a dynamic model-free, bio-inspired RL-based
m(q)q̈ + v(q, q̇) + g(q) = τ , (10) optimal adaptive compliance control scheme has been
proposed and implemented in simulation on a robotic
where q = [q1 − hip, q2 − knee]T , hip and knee joints walk assist device, employing hip and knee joints. Com-
angular positions. The matrix, m ∈ 2×2 is the combined pliance control is introduced through an external mass-
inertia of both the walk assist device and the human spring-damper model. A polynomial based neural net-
leg. Similarly, vector v ∈ 2×1 is the representation of work was used for estimating the Q-function; minimiza-
the coriolis/centripetal torques. Gravitational torques are tion of which leads to the derivation of control torque.
given by g ∈ 2×1 . Hip and knee joints input torques Simulation results validate better tracking and compliant
are given by τ ∈ 2×1 . The above dynamics model was behavior of an RL-based controller of two DoF robotic
simulated in Matlab/Simulink walk assist device.
A sampling time of 0.0005 s was used in simulation.
Using 0.001 s step or higher leads to instability within the
Disclosure statement
first 10 s. Figure 6 shows tracking results of hip and knee
joints. Cost estimate via Q-function and control torques No potential conflict of interest was reported by the authors.
are shown in Figures 7 and 8 respectively. The cost esti-
mates get better as the time passes. The magnitudes of Notes on contributors
the control input torques have significantly diminished
after the first learning cycle. The oscillations in the con- Said Ghani Khan earned his Ph.D. degree at the Bristol
Robotics Laboratory, University of the West of England (and
trol inputs are due to the band-limited white noise which University of Bristol) in 2012. He did his M.Sc. in Robotics
is added to the control inputs to satisfy the condition of from the University of Plymouth, United Kingdom in 2006.
‘persistency of excitation’. He received his B.Sc. in Mech. Engg. from University of Engi-
As mentioned before, when torque applied by the neering Technology Peshawar, Pakistan in 2003. Dr. Khan is
human is detected by the torque sensor in the hip joint, currently working as an Assistant Professor in the Department
of Mechanical Engineering, Taibah University (Yanbu Branch),
the demand qr is modified to become qd based on the
Saudi Arabia. He is also an honorary member of the research
compliant mass-spring-damper reference model. Simi- staff at University of Bristol, United Kingdom. Dr. Khan is the
larly, in Figures 9 and 10 trajectory tracking the perfor- author of many journal articles, book chapters, and conference
mance of hip and knee joints are shown. When human papers. He has also co-authored a book on Bio-inspired Control
effort is there, the demand position is modified in the of Humanoid robot arm, published by Springer.
desired compliant manner i.e. qr is changed into qd Muhammad Tufail received an MSc degree in Mechatronics
to allow human to exert the torque. Figure 12 shows Engineering from Asian Institute of Technology (2007) and
ADVANCED ROBOTICS 11
a Ph.D. degree in Mechanical Engineering (with specializa- [8] Viteckova S, Kutilek P, Jirina M. Wearable lower limb
tion in Manufacturing and Mechatronics) from University of robotics: a review. Biocybern Biomed Eng. 2013;33(2):
British Columbia, Canada (2015). He then worked as a post- 96–105. doi:10.1016/j.bbe.2013.03.005
doctoral research fellow at the Industrial Automation Labora- [9] Kazerooni H, Steger R. The berkeley lower extremity
tory at UBC (2016). He has worked as a consultant and research exoskeleton. J Dyn Syst Meas Control. 2006;128(1):14–25.
associate on robotics and automation projects with the indus- [10] Zoss AB, Kazerooni H, Chu A. Biomechanical design of
try including the Canadian oil giant Cenovus Energy. He has the berkeley lower extremity exoskeleton (bleex). IEEE
more than ten years of research and teaching experience includ- ASME Trans Mechatron. 2006;11(2):128–138.
ing three years of teaching industrial robotics at both under- [11] Kawamoto H, Hayashi T, Sakurai T, et al. Develop-
grad and graduate levels at UBC, Canada ( 2014–2017). Since ment of single leg version of hal for hemiplegia. 2009
2017, he is working as an Assistant Professor at the Depart- Annual international conference of the IEEE engineer-
ment of Mechatronics Engineering, University of Engineering ing in medicine and biology society; IEEE; 2009. p.
and Technology, Peshawar. His research interests include robot 5038–5043.
control, vision, haptics, and teleoperation. [12] Kusuda Y. In quest of mobility–honda to develop walking
assist devices. Ind Robot: Int J. 2009;36(6):537–539.
Syed Humayoon Shah recently received his Master’s degree in [13] Ikeuchi Y, Ashihara J, Hiki Y, et al. Walking assist device
Mechatronics Engineering from the University of Engineer- with bodyweight support system. 2009 IEEE/RSJ interna-
ing and Technology Peshawar, Pakistan. He has graduated in tional conference on intelligent robots and systems; IEEE;
Electronic Engineering from BUITEMS Quetta, Pakistan in 2009. p. 4073–4079.
2014. His research interest is in the field of robotics, nonlinear [14] Murray SA, Ha KH, Hartigan C, et al. An assistive control
control, reinforcement learning, control theory, and machine approach for a lower-limb exoskeleton to facilitate recov-
vision. He is currently working on rehabilitation robotics and ery of walking following stroke. IEEE Trans Neural Syst
artificial muscles. Rehabil Eng. 2015;23(3):441–449.
Irfan Ullah is a Mechanical Engineering honors graduate of [15] Quintero HA, Farris RJ, Goldfarb M. A method for the
University of Engineering and Technology Peshawar, Pakistan. autonomous control of lower limb exoskeletons for per-
He received his Masters (1983) and PhD (1994) in Mechani- sons with paraplegia. J Med Device. 2012;6(4): 041003.
cal Engineering from The University of Michigan, Ann Arbor. [16] Pratt JE, Krupp BT, Morse CJ, et al. The roboknee: an
His research interests include design and dynamics of machin- exoskeleton for enhancing strength and endurance dur-
ery, including mechanisms and robots. He is currently working ing walking. IEEE international conference on robotics
as Professor at the College of Engineering, Taibah University, and automation, 2004. proceedings. ICRA’04. 2004; Vol.
Yanbu, KSA. 3. IEEE; 2004. p. 2430–2435.
[17] Asl HJ, Narikiyo T, Kawanishi M. An assist-as-needed
control scheme for robot-assisted rehabilitation. 2017
American control conference (acc); IEEE; 2017. p.
References 198–203.
[18] Hussain S, Xie S, Member S. Assist-as-needed control of
[1] Horst RW. A bio-robotic leg orthosis for rehabilita- an intrinsically compliant robotic gait training orthosis.
tion and mobility enhancement. Proceedings of the 31st IEEE Trans Ind Electron. 2017;64(2):1675–1685.
annual international conference of the ieee engineering in [19] Han S, Wang H, Tian Y. Model-free based adaptive
medicine and biology society; Minneapolis, Minnesota, nonsingular fast terminal sliding mode control with
USA: IEEE; 2009. p. 5030–5033. time-delay estimation for a 12 dof multi-functional lower
[2] Young AJ, Ferris DP. Analysis of state of the art and future limb exoskeleton. Adv Eng Soft. 2018;119:38–47.
directions for robotic exoskeletons. IEEE Trans Neural [20] Ahmed S, Wang H, Tian Y. Model-free control using time
Syst Rehabil Eng. 2017;25(2):171–182. delay estimation and fractional-order nonsingular fast
[3] Weerasingha A, Withanage W, Pragnathilaka A, et al. terminal sliding mode for uncertain lower-limb exoskele-
Powered ankle exoskeletons: existent designs and con- ton. JVC/J Vibr Control. 2018;24(22):5273–5290.
trol systems. Proceedings of international conference on [21] Shah SH, Khan SG, Shah K, et al. Compliance control
artificial life and robotics; Vol. 23. 2018. p. 76–83. of robotic walk assist device via integral sliding mode
[4] Meng W, Liu Q, Zhou Z, et al. Recent develop- control. In: 2019 16th international bhurban conference
ment of mechanisms and control strategies for robot- on applied sciences and technology (ibcast). 2019 Jan. p.
assisted lower limb rehabilitation. Mechatronics. 2015;31: 515–520.
132–145. doi:10.1016/j.mechatronics.2015.04.005 [22] Aguirre-Ollinger G, Colgate JE, Peshkin MA, et al. Active-
[5] Huo W, Mohammed S, Moreno JC, et al. Lower limb wear- impedance control of a lower-limb assistive exoskeleton.
able robots for assistance and rehabilitation: a state of the In: IEEE 10th international conference on rehabilitation
art. IEEE Syst J. 2016;10(3):1068–1081. robotics. Noordwijk, The Netherlands. 2007. p. 188–195.
[6] Al-Shuka HF, Rahman MH, Leonhardt S, et al. Biome- [23] Marchal-Crespo L, Reinkensmeyer DJ. Review of control
chanics, actuation, and multi-level control strategies of strategies for robotic movement training after neurologic
power-augmentation lower extremity exoskeletons: an injury. J Neuroeng Rehabil. 2009;6(1):1–15.
overview. Int J Dyn Control. 2019;1–27. [24] Winter DA. The biomechanics and motor control of
[7] Dollar AM, Herr H. Lower extremity exoskeletons and human gait normal elderly and pathological. Vol. Univer-
active orthoses: challenges and state-of-the-art. IEEE sity of Waterloo Press; 1991. https://trid.trb.org/view.aspx?
Trans Robot. 2008;24(1):144–158. id = 770965
12 S. G. KHAN ET AL.
[25] Mohammadi H, Richter H. Robust tracking/impedance [36] Buchli J, Theodorou E, Stulp F, et al. variable impedance
control: Application to prosthetics. In: 2015 american control – a reinforcement learning approach. In: Robotics
control conference (acc). IEEE. 2015. p. 2673–2678. science and systems. 2010.
[26] Sup F, Bohara A, Goldfarb M. Design and control of [37] Hamaya M, Matsubara T, Noda T, et al. Learning assistive
a powered transfemoral prosthesis. Int J Robot Res. strategies for exoskeleton robots from user-robot physical
2008;27(2):263–273. NIHMS150003 interaction. Pattern Recognit Lett. 2017;99:67–76.
[27] Madani T, Daachi B, Djouani K. Non-singular termi- [38] Kuan CP, Young KY. Reinforcement learning and robust
nal sliding mode controller: application to an actu- control for robot compliance tasks. J Intell Robot Syst.
ated exoskeleton. Mechatronics. 2016 feb;33:136–145. 1998;23:165–182.
https://www.sciencedirect.com/science/article/abs/pii/ [39] Kim B, Park J, Park S, et al. Impedance learning for robotic
S0957415815001737 contact task using natural actor-critic algorithm. IEEE
[28] Jabbari Asl H, Narikiyo T, Kawanishi M. Neural network- Trans System Man Cybern Part B: Cybern. 2010;4(2):
based bounded control of robotic exoskeletons with- 433–443.
out velocity measurements. Control Eng Pract. 2018 [40] Seraji H. Adaptive compliance control: an approach
(June);80:94–104. doi:10.1016/j.conengprac.2018.08. to implicit force control in compliant motion. 1994.
005 citeseer.ist.psu.edu/404588.html.
[29] Nguyen-Tuong D, Peters JR, Seeger M. Local Gaussian [41] Colbaugh R, Seraji H, Glass K. Adaptive compliant
process regression for real time online model learning. In: motion control for dextrous manipulators. Int J Robot
Advances in neural information processing systems. 2009. Res. 1995;14(3):270–280.
p. 1193–1200. [42] Colbaugh R, Wedeward K, Glass K, et al. New results on
[30] Huang R, Cheng H, Guo H, et al. Hierarchical interac- adaptive compliant motion control for dextrous manipu-
tive learning for a human-powered augmentation lower lators. Int J Robot Autom. 1996;11(1):230–238.
exoskeleton. In: Proceedings – IEEE international con- [43] Khan SG, Herrmann G, Pipe T, et al. Safe adaptive compli-
ference on robotics and automation. Stockholm, Sweden: ance control of a humanoid robotic arm with anti-windup
IEEE. 2016. p. 257–263. compensation and posture control. Int J Soc Robot. 2010
[31] Schaal S, Atkeson CG. Learning control in robotics. IEEE Sep;2(3):305–319.
Robot Autom Mag. 2010;17(2):20–29. [44] Al-Tamimi A, Lewis F, Abu-Khalaf M. Model-free
[32] Wang Y, Xu W, Tao C, et al. Reinforcement learning- Q-learning designs for linear discrete-time zero-sum
based shared control for walking-aid robot and its exper- games with application to H∞ control. Automatica.
imental verification. Adv Robot. 2015;29(22):1463–1481. 2007;43(3):473–481.
doi:10.1080/01691864.2015.1070748 [45] Al-Tamimi A, Lewis F, Abu-Khalaf M. Discrete-time non-
[33] Khan SG, Herrmann G, Lewis FL, et al. Reinforcement linear hjb solution using approximate dynamic program-
learning and optimal adaptive control: an overview and ming: convergence proof. IEEE Trans Syst Man Cybern
implementation examples. Annu Rev Control. 2012;36(1): Part B: Cybern. 2008 August;38(4):943–949.
42–59. doi:10.1016/j.arcontrol.2012.03.004 [46] Khan S, Herrmann G, Lewis F, et al. A novel q-learning
[34] Hamilton WW N, Luttgens K. Kinesiology: scientific basis based adaptive optimal controller implementation for a
of human motion. New York: McGraw-Hill; 2011. humanoid robotic arm*. Vol. 44. 2011. p. 13528 – 13533.
[35] Wang Y, Wang S, Ishida K, et al. High path tracking con- 18th IFAC World Congress.
trol of an intelligent walking-support robot under time- [47] Vrabie D, Vamvoudakis KG, Lewis FL. Optimal adaptive
varying friction and unknown parameters. Adv Robot. control and differential games by reinforcement learning
2017;31(14):739–752. principles. London: IET Press; 2012.

10 1080@01691864 2019 1690574

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1080@01691864 2019 1690574

Uploaded by

Copyright:

Available Formats

Advanced Robotics

ISSN: 0169-1864 (Print) 1568-5535 (Online) Journal homepage: https://www.tandfonline.com/loi/tadr20

Reinforcement learning based compliance control

S. G. Khan, M. Tufail, S. H. Shah & I. Ullah

To link to this article: https://doi.org/10.1080/01691864.2019.1690574

Published online: 18 Nov 2019.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Reinforcement learning based compliance control of a robotic walk assist device

ABSTRACT ARTICLE HISTORY

1. Introduction daily life activities such as walking, exercising, and mov-

CONTACT S. G. Khan sfatehrahman@taibahu.edu.sa, engr_ghani@hotmail.com Department of Mechanical Engineering, College of Engineering

optimally adapt to the uncertain and changing dynam-

compensation, impedance control

network-based trajectory tracking

to emulate virtual spring-damper

Inverse dynamics based positive

Adaptive feedforward neural

gait. The latter is used when assistive torques are needed

and vice versa). A supervisory finite-state machine is

Encoders, ﬂoor reaction

The aim of impedance control is to assign a desired

dynamic relationship (called the target or desired

impedance) between interaction forces and the dis-

placements resulting from them. Unlike traditional pure

disturbances as long as they are stable. In the absence of

Dynamics of both multi-body

Multibody dynamics of the

environmental forces, the objective of impedance con-

trol reduces to motion tracking where predetermined

trajectories are followed. During the interaction, the

target impedance shapes deviations from the reference

shown to be more effective as compared to pure motion

control, force control, and the hybrid position/force con-

Since 2017 Augment power of hip, knee, and

Since 2001 Enhance capability of human leg

walk assist device, when the user movement deviates

from the desired trajectory (human normal gait cycle),

the characteristic human impedance model [24,25]) to

help the user return to normal gait. In situations when

developed in operational (task) space. This particularly

repetitive in nature and environment impedance can be

Tsukuba’s Hybrid Assistive

estimated. In prostheses and orthoses, absolute Cartesian

coordinates are not used. Instead motion is characterized

by joint angles. This makes the impedance control prob-

lem simpler as compared to that developed in task space.

relationship between the motion and torques involved at

a reinforcement learning scheme in combination with

3.1. Reference model

Figure 3. Reinforcement learning based compliance controller.

Figure 4. Recursive least square scheme (Simulink) is used to calculate NN gains.

3.2. Q-learning algorithm

qk+1 = g(qk ) + h(qk )τk , yk = qk , (2)

where qk ∈ 4 is the state vector, containing, hip and

Q∗ (qk , τk , qdi ) = r(qk , τk , qdi ) + L∗ (qk+1 , qdk+1 ) (4)

Bellman’s optimality equation can be written in terms of

Figure 6. Hip and Knee position tracking.

Figure 7. Q value vs cost function.

Figure 8. Control inputs.

The optimal control policy is: Q-function: