Ijerph 17 05502 PDF

International Journal of
Environmental Research
and Public Health
Article
Shared Control of an Electric Wheelchair Considering
Physical Functions and Driving Motivation
Lele Xi * and Motoki Shino
Department of Human and Engineered Environmental Studies, Graduate School of Frontier Sciences,
The University of Tokyo (UT), Chiba 277-8561, Japan; motoki@k.u-tokyo.ac.jp
* Correspondence: xi.lele@atl.k.u-tokyo.ac.jp; Tel.: +81-4-7136-4667

Received: 14 June 2020; Accepted: 22 July 2020; Published: 30 July 2020
Abstract: Individuals with severe physical impairments have difficulties operating electric wheelchairs
(EWs), especially in situations where fine steering abilities are required. Automatic driving partly
solves the problem, although excessive reliance on automatic driving is not conducive to maintaining
their residual physical functions and may cause more serious diseases in the future. The objective of
this study was to develop a shared control system that can be adapted to different environments by
completely utilizing the operating ability of the user while maintaining the motivation of the user
to drive. The operating characteristics of individuals with severe physical impairments were first
analyzed to understand their difficulties when operating EWs. Subsequently, a novel reinforcement
learning-based shared control method was proposed to adjust the control weight between the user
and the machine to meet the requirements of fully exploiting the operating abilities of the users while
assisting them when necessary. Experimental results showed that the proposed shared control system
gradually adjusted the control weights between the user and the machine, providing safe operation
of the EW while ensuring full use of the control signals from the user. It was also found that the
shared control results were deeply affected by the types of users.
Keywords: electric wheelchair; physical impairments; shared control; driving motivation;

reinforcement learning; user-machine interaction
1. Introduction
Recently, the number of individuals with mobility problems has increased in Japan [1]. Moreover,
maintaining mobility is one of the most important prerequisites for improving quality of life (QOL) [2,3].
Many assistive devices, such as electric wheelchairs (EWs), exoskeletal robotics, and assistive walkers
have been developed to help individuals with mobility problems. For severely physically impaired
people, their physical functions are extremely weak and only a few muscles can be used to operate
assistive robots; therefore, EWs are almost their only tool for maintaining mobility.
With the development of control technologies, sensors, and artificial intelligence technologies, EWs
with intelligent controllers have become a growing trend in recent years. With these new technologies,
automatic driving and shared control EWs have been developed to help individuals who are unable to
operate regular EWs regain their mobility. These state-of-the-art technologies mainly include intention
estimation [4], path planning [5], controller design [6], and environmental sensing [7].
Individuals with severe physical impairments usually experience difficulties operating EWs,
especially in situations where precise control is necessary, such as approaching an obstacle or turning
in a corridor. Automatic driving can solve this problem; however, excessive reliance on automatic
driving does not help maintain their residual physical functions and may cause more serious illnesses
in the future [8].
Int. J. Environ. Res. Public Health 2020, 17, 5502; doi:10.3390/ijerph17155502 www.mdpi.com/journal/ijerph
Shared control offers the possibility for the user to work together with the machine. Shared
control is typically defined as “If the function of the machine augments or replaces one or more of
the human senses, the user must be able to reach his goal and also understand what the machine is
doing. This
Int. J.two-way
Environ. Res. interaction must
Public Health 2020, exist in any shared control system” [9].
17, 5502 2 of 19
As discussed above, the use of physical functions is important to maintain the residual physical
functions of Shared
individuals with severe physical impairments. However, their extremely weak physical
control offers the possibility for the user to work together with the machine. Shared control
functionsis limit
typically their operating
defined as “If theability.
functionTheyof the cannot
machine properly
augments ordrive
replaces EWsonewithout
or more ofproper
the humanassistance,
especially in situations
senses, the user mustwhere be precise control
able to reach his and
goal steering abilities are
and also understand required.
what However,
the machine is doing.too much
This two-way interaction must exist in any shared control system”
assistance usually leads them to rely on the system and not to actively try to operate the EW,[9].
generating aAs discussed
lazy signalabove,
[10]. the use of physical
Therefore, functions is
the objective ofimportant
this study to maintain
was to the residualaphysical
develop shared control
functions of individuals with severe physical impairments. However, their extremely weak physical
system that can be adapted to different environments by completely utilizing the operating ability of
functions limit their operating ability. They cannot properly drive EWs without proper assistance,
users and maintaining
especially their where
in situations driving motivation.
precise control and Specifically, work
steering abilities are related
required.to shared too
However, control
much for EWs
is introduced
assistancein Section 2, and
usually leads themthe operating
to rely characteristics
on the system of individuals
and not to actively try to operatewho
the EW,have difficulties in
generating
a lazy signal [10]. Therefore, the objective of this study was to
operating EWs, along with the requirements of the shared control system, are discussed in develop a shared control system that
Section 3.
can be adapted to different environments by completely utilizing
The construction of the shared control system is presented in Section 4. The design of the the operating ability of users and
maintaining their driving motivation. Specifically, work related to shared control for EWs is introduced
requirements-based algorithm is discussed in Section 5. The results of the experiments that were
in Section 2, and the operating characteristics of individuals who have difficulties in operating EWs,
performedalongtowith test
the the validityof of
requirements the shared
the shared control
control system, aresystem
discussedand to analyze
in Section the interaction
3. The construction
characteristics between
of the shared controlthesystem
machine and the
is presented in different users
Section 4. The are of
design presented in Section 6.algorithm
the requirements-based Finally, Section
is discussed
7 presents in Section 5. The results of the experiments that were performed to test the validity of
the conclusions.
the shared control system and to analyze the interaction characteristics between the machine and the
different users are presented in Section 6. Finally, Section 7 presents the conclusions.
2. Related Works
The2.construction
Related Works
of a shared control for EWs is shown in Figure 1. According to the definition
The construction of a shared
of shared control, the user first control
expresses for EWs
their is shown
intention in Figuresignals
through 1. According
such toasthe
thedefinition
movement of a
of shared control, the user first expresses their intention through signals such as the movement of a
joystick. A shared controller is then designed to allow the user and the machine to work together to
joystick. A shared controller is then designed to allow the user and the machine to work together to
operate an EW to the intended destination. Therefore, the challenge was how to design the machine
operate an EW to the intended destination. Therefore, the challenge was how to design the machine
and the shared controller.
and the shared controller.
Machine
+ Shared
controller
−
Give signals
User
Figure Figure 1. Construction

1. Construction of aofshared
a sharedcontrol
control system
systemforfor
electric wheelchairs
electric (EWs).(EWs).
wheelchairs
Many types of shared control systems have been developed for EWs. Essentially, they are divided
Manyintotypes of shared
three categories control
based on howsystems have been
users participate developed
in the driving processfor EWs.The
[11,12]: Essentially,
first categorythey are
divided corresponds
into three categories basedshared
to behavior-based on how users
control participate
systems, which arein typically
the driving process
designed [11,12]: The first
for individuals
categorywith
corresponds to behavior-based
disabilities who maintain a constant andshared control
significant systems,
operating which
ability. are typically
These systems designed for
only intervene
in a few pre-defined situations, such as avoiding an obstacle or following
individuals with disabilities who maintain a constant and significant operating ability. These a wall [13]. The key issue systems
of this approach is how to execute the behavior. Many EWs are coupled with methods such as the
only intervene in a few pre-defined situations, such as avoiding an obstacle or following a wall [13].
dynamic window approach (DWA) [14] or the vector field histogram (VFH) [15], which are used when
The key defined
issue ofsituations
this approach is how
are detected to execute
or triggered the The
by users. behavior. Many EWs
second approach are coupled
corresponds with methods
to goal-based
such as the dynamic
shared window
control systems approach
[16]. (DWA)
These systems first[14] or the
estimate vector goal
a potential fieldorhistogram
sub-goal from (VFH) [15], which
the input
are used when defined situations are detected or triggered by users. The second approach
signal of the user. Subsequently, several possible paths are generated by global and local planners,
and the
corresponds to most feasible driving
goal-based sharedpath is selected
control systems based on the
[16]. inputsystems
These of the user. Although
first estimate this
a system
potential goal
or sub-goal from the input signal of the user. Subsequently, several possible paths are generated by
global and local planners, and the most feasible driving path is selected based on the input of the
Int. J. Environ. Res. Public
Int. Public Health
Health 2020,
2020, 17,
17, x5502 of 19
3 of 19
user. Although this system takes into account the intentions of the user, the EWs simply select the
takes intopath
prepared account the intentions
calculated of the user,
by the planners, the EWs
thereby simply select
significantly thethe
limiting prepared
control path calculated
authority of the
by
user. The third approach corresponds to continuous shared control systems [6]. With theseapproach
the planners, thereby significantly limiting the control authority of the user. The third types of
corresponds
systems, user to continuous
performance shared control systems
is evaluated online[6].
viaWith these types
predefined of systems,
cost functions,user performance
and is
the control
evaluated online via predefined cost functions, and the control authority of the user
authority of the user is then determined by online evaluation results. However, driving an EW is a is then determined
by online evaluation
complicated, long-term results. However,
process, driving an
where operation is EW is a complicated,
different even under the long-term process, where
same conditions. Real-
operation is different even under the same conditions. Real-time optimization limits
time optimization limits user performance and the construction of cost functions is always difficult. user performance
and the construction
This study focused of cost functions is
on designing alwayscontrol
a shared difficult.
system that considered the physical functions
Thiswhile
of users study maintaining
focused on designing a shared
their driving control system
motivation. To make thatfull
considered
use of thethephysical
physical functions
functions of
of
users while maintaining their driving motivation. To make full use of the physical
users, this shared control system needed to fully understand the operating characteristics of the usersfunctions of users,
this shared driving
in different control system needed
situations. to fully understand
Additionally, the operating
driving motivation characteristics
also needed to be fullyofconsidered
the users in
to
different
encourage driving
users tosituations. Additionally,
actively participate driving
in the motivation
EW driving also needed to be fully considered to
process.
encourage users to actively participate in the EW driving process.
3. System Requirements Based on Target User
3. System Requirements Based on Target User
3.1. Operating Characteristics of Individuals with Severe Physical Impairments
3.1. Operating Characteristics of Individuals with Severe Physical Impairments
Many types of input devices have been developed for individuals with severe physical
Many types of input devices have been developed for individuals with severe physical impairments,
impairments, such as those using user-generated sound signals and electromyography signals.
such as those using user-generated sound signals and electromyography signals. However, one of the
However, one of the purposes of this study was to make better use of the residual physical functions
purposes of this study was to make better use of the residual physical functions of users; therefore,
of users; therefore, this study decided on the use of continuous input devices, such as joysticks, for
this study decided on the use of continuous input devices, such as joysticks, for the target users.
the target users. Previous research showed that although some input devices have been developed
Previous research showed that although some input devices have been developed considering their
considering their physical functions, they still have difficulties operating such EWs, especially in
physical functions, they still have difficulties operating such EWs, especially in situations where
situations where fine steering abilities are necessary [17,18]. Figure 2 shows a driver inner model. The
fine steering abilities are necessary [17,18]. Figure 2 shows a driver inner model. The driver first
driver first recognizes the environmental information and EW movements, then makes a judgment
recognizes the environmental information and EW movements, then makes a judgment based on the
based on the recognition and gives operations based on the judgment results. Those who have
recognition and gives operations based on the judgment results. Those who have difficulty driving an
difficulty driving an EW have one or more problems in the recognition process. As shown in Figure
EW have one or more problems in the recognition process. As shown in Figure 2, visual impairment
2, visual impairment usually affects the recognition process. It usually manifests itself as low vision,
usually affects the recognition process. It usually manifests itself as low vision, low range of head/neck
low range of head/neck movement, impaired eye movement, or visual field loss [19]. Individuals
movement, impaired eye movement, or visual field loss [19]. Individuals with such problems cannot
with such problems cannot accurately perceive the position of obstacles. The ability to judge spatial
accurately perceive the position of obstacles. The ability to judge spatial relationships and the ability to
relationships and the ability to solve problems are important for operating an EW [20]. Therefore,
solve problems are important for operating an EW [20]. Therefore, individuals with these cognitive
individuals with these cognitive problems cannot make accurate judgments. However, even with
problems cannot make accurate judgments. However, even with good recognition and judgment,
good recognition and judgment, the input signals of individuals with weak physical functions may
the input signals of individuals with weak physical functions may be insufficient or excessive [17,18].
be insufficient or excessive [17,18]. Problems in all three processes result in the input signals given by
Problems in all three processes result in the input signals given by these users always being insufficient,
these users always being insufficient, advanced, or delayed.
advanced, or delayed.
Figure 2. Driver
Figure 2. Driver inner
inner model.
model.
The
The target
target users
users of
of this
this study
study are
are individuals
individuals with
withdifficulties
difficultiesdriving
drivingEWs.
EWs. Their
Their input
input
characteristics are summarized as follows:
characteristics are summarized as follows:
1. Limited input range. Individuals with extremely weak physical functions usually lack
appropriate input devices. Although some devices have been developed with their physical
functions in mind, during the development of their input devices, the user’s physical
Int. J. Environ. Res. Public Health 2020, 17, 5502 4 of 19
1. Limited input range. Individuals with extremely weak physical functions usually lack appropriate
input devices. Although some devices have been developed with their physical functions in
mind, during the development of their input devices, the user’s physical functions continue
diminishing; thus, they cannot properly operate the designed devices after development.
2. Delayed or advanced input signals. For individuals with the problems discussed above, their
input only intuitively represents the intention of the subjects, although driving an EW in
real environments corresponds to complicated behavior. More accurate and frequent input
is necessary, especially for situations where fine steering skills are required, which is almost
impossible for them.
3.2. Driving Environment Settings

The living environment of individuals with severe physical impairments is extremely limited.
Most of them move between their bedrooms, living rooms, and bathrooms [21]. Therefore, going
straight and turning are the two most basic and important movements to complete these driving tasks.
In this study, driving straight and turning left was used as the driving course, because we regarded
turning left and right as an equivalent and symmetrical challenge.
3.3. System Requirements

The purpose of driving an EW is to allow users to reach their destinations; therefore, the system
must be able to accomplish this goal. It is also necessary to make the driving process gradually
safer and more comfortable, because safety and comfort are the two most important factors for EW
driving [11]. As discussed above, the characteristics of target users and driving environments vary;
therefore, the shared control system should be able to adapt to various types of users and driving
environments. The objective of this study was to design a shared control system for individuals with
weak physical functions by considering their residual physical functions and driving motivation.
It is important to use their operating signals to the greatest extent possible to maintain their residual
physical functions. Therefore, to utilize the operating capacity of users completely, it was necessary to
develop a novel shared control method that assists by considering the entire driving process of an
EW. Given the requirements of the EW driving process, the physical functions of users, and the input
characteristics as discussed above, the requirements of the shared control system are summarized
as follows:
1. The system should assist users in reaching their destination.

2. The complete driving process should be safe.
3. The complete driving process should be comfortable.
4. The system should gradually adapt to different users.
5. The system should gradually adapt to different environments.
6. The system should completely utilize the residual physical functions of individuals
with disabilities.
7. The system should consider the driving motivation of users.
4. Construction of the Shared Control System
4.1. Modeling an EW
Figure 3 shows the top view of the proposed EW. The EW is driven by two rear wheels, and two
casters are used as the front wheels. The meanings of the parameters are listed in Table 1. The two
assumptions are set as follows:
1. The slip between the wheels and the road can be neglected.
2. The EW will not move in the pitch direction, which means that the casters will not leave the road.
Int. J. Environ. Res. Public Health 2020, 17, x 5 of 19
Casters
Vehicle body
Rear wheel
r
Figure3.3. Top
Figure Topview
viewof
ofthe
theproposed
proposedEW
EWmodel.
model.
Casters
Table 1. Parameters
Vehicle bodyof the EW model.
Table 1. Parameters of the EW model.
Parameters Meaning of the Parameters
ωR , ωL ω , Angular
Angular velocity
velocity ofof the
the right
right (left)
(left) wheel
wheel
r Radius
Radius ofof
thethe rear
rear wheels
wheels
W Width Rear wheel
. Width ofof
thetheEW EW
v, ψ , Velocity
Velocity ofof
thethe rstraight
straight (yaw)
(yaw) motion
motion
x, y , Position in world coordinate
Position in world coordinate system system
COG Center of gravity
COR Center of rotation
COR Center
Figure 3. Top view of the of rotation
proposed EW model.
The straight
4.2. Concept of theand rotational
Shared Table 1.ofParameters
Controlmotions
System an EW can of the
beEW model. as (1) and (2), respectively.
expressed
Based on the requirements, the concept ofr(the ωR novel
+ofωthe shared control system is shown in Figure 4.
L ) right (left) wheel
ω , Angular velocity
The user-machine relationship is similar vto=the relationship
Radius 2 between a driving coach and a learner
of the rear wheels
(1)
driver. This scheme is very similar to the continuous Widthshared control, as discussed in previous research
of the EW
[6]. The difference is that, to make , better .
Velocity
use (ofωthe
rof the ωL ) (yaw)
R −straight
residual motion functions of users, the coach
physical
,
ψ=
Position in world coordinate system
(2)
W
does not continuously correct the performance of users. The coach should intervene at an appropriate
time depending on the performance
4.2. Concept of the Shared ControlCOR System
of the learner driver.
Center The coach should also be able to intervene
of rotation
in different ways depending on the different training purposes. For example, to use the residual
Based
physical on the
functions
4.2. Concept requirements,
ofShared
of the users,Control
it was the concepttoofdecrease
possible
System the novel theshared controlappropriately
intervention system is shown for in Figure 4.
individuals
The user-machine
with error-correcting relationship
abilities. is similar to the relationship between a driving coach and a learner driver.
Based on the requirements, the concept of the novel shared control system is shown in Figure 4.
This scheme is very similar to the continuous shared control, as discussed in previous research [6].
The user-machine relationship is similar to the relationship between a driving coach and a learner
The difference
driver. Thisisscheme
that, toismake better to
very similar use
theofcontinuous
the residual physical
shared functions
control, of users,
as discussed the coach
in previous does not
research
continuously correct the
[6]. The difference performance
is that, of users.
to make better use ofThe coach should
the residual physical intervene
functionsat ofan appropriate
users, the coach time
depending
does noton the performance
continuously correct theof the learner driver.
performance of users.The coachshould
The coach should also beatable
intervene to intervene in
an appropriate
different
timeways depending
depending on the on the different
performance training
of the learnerpurposes.
driver. TheForcoachexample, to use
should also be the
ableresidual physical
to intervene
in different ways depending on the different training purposes. For example,
functions of users, it was possible to decrease the intervention appropriately for individuals with to use the residual
physical functions
error-correcting of users, it was possible to decrease the intervention appropriately for individuals
abilities.
with error-correcting abilities.
Figure 4. Shared control system concept.
4.3. Framework of the Shared Control System

To meet the requirements, the proposed shared control system (coach) was divided into two
parts: the planner and online learning parts, as shown in Figure 5a. According to the analysis in the
previous section, the coach has a standard in mind. When the operation of the user does not meet this
standard, the coach will interveneFigure 4.4.Shared
Shared control
appropriately.
Figure system
Thesystem
control concept.
planner part can be regarded as the operation
concept.
standard of the coach, whereas the online learning part is the judge of the coach, to decide when and
how4.3.
to Framework
intervene.ofUsers
the Shared Control aSystem
provided velocity command ( , ) through some type of input
standard, the coach will intervene appropriately. The planner part can be regarded as the operation
4.3. Framework of the Shared Control System

standard, the coach will intervene appropriately. The planner part can be regarded as the operation
standard of the coach, whereas the online learning part is the judge of the coach, to decide when and
.
how to Int.intervene.
J. Environ. Res.Users provided
Public Health a xvelocity command (vuser , ψuser ) through some type of input
2020, 17, 6 of device
19
.
while operating the EW. Subsequently, user signals, planner velocity signals (vPlanner , ψPlanner ), and
device while operating the EW. Subsequently, user signals, planner velocity signals
environmental information (x, y, ψ) were used in the online learning system. The control weight for
( , ), and environmental information ( , , ) were used in the online learning
the user was defined as kv and kψ. , whereas the control weight for the nonexpert planner was defined as
system. The control weight for the user was defined as and , whereas the control weight for
(1 − kvthe − kψ. ). Theplanner
, 1nonexpert onlinewas learning part is shown
defined as (1 − , 1 −
in Figure 5b. For individuals with severe disabilities,
) . The online learning part is shown in Figure
the operating characteristics
5b. For individuals withfor going
severe straight and
disabilities, the turning arecharacteristics
operating significantly for
different, and the and
going straight straight
and yaw motions of the EW were always analyzed separately [22]; thus,
turning are significantly different, and the straight and yaw motions of the EWvwere always k and kψ
. were calculated
analyzed
separately.
separatelyThe [22];
shared controland
thus, equations arecalculated
were expressedseparately.
as The shared control equations are
expressed as 
= kv vuser + (1 − kv )vDWA
 .v =

. + (1 − )

. (3)
 ψ = kψ. ψuser + (1 − kψ. )ψDWA (3)


= + (1 − )
( , )
( , , )
( , )
( , )
( , , , , )
(a)
( , )
+
( , )
+
+ ( , )
( , ) −
+
( , )
+
Online learning
shared control
(b)
Figure 5. Construction of the shared control system: (a) framework of the shared control system and
Figure 5. Construction of the shared control system: (a) framework of the shared control system and
(b) online learning
(b) online part.part.
learning
Int. J. Environ.
Int. J. Environ. Res.
Res. Public
Public Health
Health 2020, 17, x5502
2020, 17, 77 of
of 19
19
5. Online Learning System Design for the Shared Control System

5. Online Learning System Design for the Shared Control System
When a driving coach tries to teach a learner driver, the driving behavior is also observed, and
When a driving coach tries to teach a learner driver, the driving behavior is also observed, and
the learner driver is only assisted when necessary. Therefore, the coach also needed to understand
the learner driver is only assisted when necessary. Therefore, the coach also needed to understand
the behavior of the learner driver. Unlike teaching people how to drive a car, the people targeted in
the behavior of the learner driver. Unlike teaching people how to drive a car, the people targeted in
this study were individuals who have difficulties driving EWs. They cannot properly operate an EW,
this study were individuals who have difficulties driving EWs. They cannot properly operate an EW,
even if they received enough training. Therefore, it was important to ensure that both the user and
even if they received enough training. Therefore, it was important to ensure that both the user and
coach understood each other as quickly as possible. The coach was also asked to completely use the
coach understood each other as quickly as possible. The coach was also asked to completely use the
operating ability of the users. In addition, driving an EW is a complicated, long-term process, where
operating ability of the users. In addition, driving an EW is a complicated, long-term process, where
the operation is different even under the same conditions. Therefore, reinforcement learning (RL)
the operation is different even under the same conditions. Therefore, reinforcement learning (RL) was
was considered a suitable long-term optimization method for this shared control. This section
considered a suitable long-term optimization method for this shared control. This section discusses the
discusses the definition and application of reinforcement learning in this study.
definition and application of reinforcement learning in this study.
5.1. Definition of Reinforcement Learning
5.1. Definition of Reinforcement Learning
Figure 6 shows the structure of RL [23]. The decision-maker is called the agent, and it interacts
Figure 6 shows the structure of RL [23]. The decision-maker is called the agent, and it interacts
with the environment at each step. The agent acts, and the environment responds to the action and
with the environment at each step. The agent acts, and the environment responds to the action and
presents a new situation to the agent. Specifically, the agent interacts with the environment at a
presents a new situation to the agent. Specifically, the agent interacts with the environment at a
sequence of discrete time steps, T = [ , , … ]. The environmental information received by the
sequence of discrete time steps, T = [t1 , t2 , t3 . . . tn ]. The environmental information received by the
agent at each time ( ) is defined as the state ( ), and ( ) denotes the action sets that the agent
agent at each time (t) is defined as the state (St ), and (At ) denotes the action sets that the agent possibly
possibly takes at this state. After taking the action, the agent receives the reward ( ) and finds itself
takes at this state. After taking the action, the agent receives the reward (R) and finds itself in a new
in a new state ( ). How the agent chooses the action is called the agent policy, where π ( | )
state (St+1 ). How the agent chooses the action is called the agent policy, where πt (as) denotes the
denotes the possibility that = if = . The goal of RL is to obtain the optimal policy (π ),
possibility that At = a if St = s . The goal of RL is to obtain the optimal policy (π0 ), maximizing the
maximizing the accumulated reward. Specifically, the optimal policy (π ( )) is represented in (4) by
accumulated reward. Specifically, the optimal policy (π0 (S)) is represented in (4) by the state-action
the state-action value function ( , ) .
value function Q(S, a).
π =0 arg max ( π, )
π = argmax∈ Q (S, a) (4)
a∈A
( )
Agent
Action
Environment
Figure 6. Structure of reinforcement learning.

Figure 6. Structure of reinforcement learning.
5.2. Reinforcement Learning for the Shared Control System to Determine the Control Weights
5.2. Reinforcement Learning for the Shared Control System to Determine the Control Weights
The online learning part aims to assign user control weights in different environments, considering
Thefunctions
physical online learning
and driving partmotivation.
aims to assign user control
As discussed above, weights in different
reinforcement learningenvironments,
is a long-term
considering physical
optimization functions
method that could and driving
be used motivation.
to solve As discussed
this problem above,
by designing an reinforcement learning
appropriate reward (R)
is a long-term optimization method that could be used to solve this problem by
and state (S), and choosing a suitable reinforcement learning algorithm. The structure of the shared designing an
appropriate reward ( ) and state ( ), and choosing a suitable reinforcement learning algorithm.
control system using reinforcement learning is shown in Figure 7. The reward (R) was first calculated The
structure
using userofinput,
the shared
planner control
input,system using reinforcement
and environmental learning
information. is shown in Figure
Subsequently, the user7.and
Theplanner
reward
(inputs,
) was first calculated using user input, planner input, and environmental
environmental information, and rewards (R) were used in the reinforcement learning algorithm information.
Subsequently,
to calculate thetheuseruser and weights.
control planner inputs, environmental information, and rewards ( ) were used
in the reinforcement learning algorithm to calculate the user control weights.
( , )
+ ( , )
( , )
Figure 7.Figure 7. Structure

Structure of sharedcontrol
of shared control using
usingreinforcement learning.
reinforcement learning.
In reinforcement learning, the purpose of the agent is formalized using a scalar value reward (R)
In reinforcement learning,
from the environment theagent.
to the purpose In thisof the agent
research, is formalized
the purpose using
of the agent was toa assist
scalar thevalue
users reward ( )
in completing the driving task by considering their physical functions and maintaining
from the environment to the agent. In this research, the purpose of the agent was to assist the users their driving
motivation. In other words, the reward (R) should consist of information regarding the evaluation of
in completing the driving task by considering their physical functions and maintaining their driving
the driving process, such as reaching the goal, safety, comfort, adaptation to users and environments,
motivation.useIn of
other words,
physical theand
functions, reward ( ) should
maintenance of drivingconsist of information
motivation. regarding
This aspect of the the evaluation of
study is discussed
the driving process, such as reaching the goal, safety, comfort, adaptation to users and environments,
in Section 5.3.
Conversely, controlling a dynamic robot using reinforcement learning usually requires many
use of physical functions, and maintenance of driving motivation. This aspect of the study is
pre-training trials. However, the target users were individuals with weak physical functions who have
discussed indifficulties
Sectiondriving
5.3. EWs; therefore, too many training trials could fatigue them. In addition, individuals
Conversely, controlling
with extremely a dynamic
weak physical functionsrobot using
could have reinforcement
their physical functionslearning usually
(operating ability) requires many
affected
by many factors, such as the surrounding temperature and the holding
pre-training trials. However, the target users were individuals with weak physical functions who position of the input device,
resulting in different operating characteristics at different times. Therefore, this system had to be able
have difficulties driving EWs; therefore, too many training trials could fatigue them. In addition,
to learn without pre-training and reach convergence within certain training trials. Proper design of
individuals states
withand extremely weaksolve
action sets could physical functions
this problem to somecould have istheir
extent, which physical
discussed functions
in Section 5.4. (operating
ability) affected by many factors, such as the surrounding temperature and the holding position of
5.3. Reward Design for the Shared Control System
the input device, resulting in different operating characteristics at different times. Therefore, this
As discussed above, the reward (R) should consist of information about the design requirements.
system hadHowever,
to be able to first
it was learn without
necessary pre-training
to understand the EWand reach
driving convergence
process. Therefore, within certain training
in this section,
trials. Proper
the design of states and
driving characteristics action
are first sets could
investigated through solve this problem
experiments, to reward
and then the some design
extent, which is
discussed inmethod
Sectionis proposed
5.4. based on the investigation and design requirements.
5.3.1. Driving Characteristics Investigation
5.3. Reward Design for the Shared Control System
To describe the characteristics of the drivers in more detail, we first asked several healthy
people to drive an EW. The driving characteristics of the EW were then analyzed using the driving
As discussed above, the reward ( ) should consist of information about the design requirements.
data. As discussed in Section 3, driving an EW in the setting of going straight and turning left was
However, itinvestigated
was first here.
necessary to understand the EW driving process. Therefore, in this section, the
driving characteristics arehere
The EW used first
was investigated through
the EMC730, produced by theexperiments,
IMASEN Company. andFourthen theyoung
healthy reward design
volunteers were asked to drive the EW. They were asked
method is proposed based on the investigation and design requirements. to practice driving for 5 min before the
experiments, and each volunteer was asked to drive along the course 10 times during each experiment.
The experimental setup is shown in Figure 8a. The course shown in Figure 8b was set up so that it was
5.3.1. Driving
1.2 Characteristics
m wide to meet the Investigation
corridor width requirement (feet method). A motion capture system (Cortex,
Motion Analysis Corporation, Santa Rosa, CA, USA) was used to record the EW trajectories and speeds.
To describe the characteristics of the drivers in more detail, we first asked several healthy people
The input signals of the volunteers were also recorded.
to drive an EW. The driving characteristics of the EW were then analyzed using the driving data. As
discussed in Section 3, driving an EW in the setting of going straight and turning left was investigated
here.
The EW used here was the EMC730, produced by the IMASEN Company. Four healthy young
Int. J. Environ. Res. Public
Int. J. Environ. Health
Res. Public 2020,
Health 17,17,
2020, x 5502 9 of 19 9 of 19
Detected area using

motion capture system
Detected area using
motion capture system
End 1.6 m
End 1.6 m
3 m1.2m
3m 5m
5 mm
1.6
1.2m 1.6 m
The directionThe direction of EW

of EW
1.2m
Start
1.2m
Start
(a) (b)
(a) (b)
Figure 8. Experimental setup: (a) experimental scene and (b) course setting.
Figure 9a shows the EW trajectories and body speeds. Each line shows the trajectory of the
Figure 9a shows the EW trajectories and body speeds. Each line shows the trajectory of the
midpoints of the rear wheels. Additionally, the color bar represents the body speed of the midpoint:
midpoints
Figure 9a the of the rear
shows wheels. Additionally, and
the color barspeeds.
represents the body speed of the midpoint:
the brighter color,the EW trajectories
the faster the body speed. body
Figure 9b shows Each
an example line shows
of body speedthe
and trajectory
EW of the
the brighter the color, the faster the body speed. Figure 9b shows an example of body speed and EW
midpoints of Itthe
direction. canrear wheels.
be seen Additionally,
that users thecontinuously
tend to operate color bar represents
when drivingthethebody speed
EW going of the midpoint:
straight
direction. It can be seen that users tend to operate continuously when driving the EW going straight
the brighter theleft.
and turning
and turning color,
left. the faster the body speed. Figure 9b shows an example of body speed and EW
direction. It can be seen that users tend to operate continuously when driving the EW going straight
2.5 1.5 3
and turning left. Direction of EW
Body speed
2
Body speed [km/h]
Direction of EW [rad]
Body Speed [km/h]

2.5 1 1.5 2 3
Direction of EW
1.5
Body speed
1.2m 2
Body speed [km/h]
1
Direction of EW [rad]
0.5 1
Body Speed [km/h]

1.2m
1 2
0.5
1.5
1.2m 0 0 0
1 2 2.5 3 3.5 4 4.5 5 5.5
Time [s]
1.2m 0.5 1
(a) 0.5 (b)

Figure 9. Experimental results of the driving characteristics: (a) trajectories and body speeds and
Figure 9. Experimental results of the0driving characteristics: (a) trajectories
0 and body speeds and (b) 0
(b) motion of an EW. 2 2.5 3 3.5 4 4.5 5 5.5
motion of an EW. Time [s]
5.3.2. Reward Design for the Shared Control System
5.3.2. Reward Design for the Shared Control System
To meet the design (a) requirements, the reward for continuous driving was divided (b) into four parts:
To meet the design requirements, the reward for continuous
current reward, predicted reward, participation reward, and task reward. The currentdriving was divided into four parts:
reward was
current reward,
determined predicted reward,
by the difference participation reward, and task reward. The current reward was
Figure 9. Experimental resultsbetween the speed
of the driving of the current(a)
characteristics: moment and the
trajectories and previous moment,
body speeds and (b)
determined
considering byindicators
the difference
such between
as comfort thewhen
speed of the an
driving current
EW moment and the previous moment,
[11].
motion of an EW.
considering indicators
A Hybrid such [24]
A* (HA*) as comfort when driving
based method an EW [11].
was designed to calculate the predicted reward based
A Hybrid A* (HA*) [24] based method was designed
on whether the EW was driven to the goal. The HA* is a heuristic to calculate the predicted
method that reward based onthe
determines
5.3.2.whether
Reward the Design
EW was for the to
driven Shared
the Control
goal. The HA* System
is a heuristic method that determines the cost fromthe
cost from the current state to the goal state. The HA* method was initially used to calculate
thecost
current
of the state to the goal state. The HA* method was initially used tothe
calculate the cost of thethe
To meet thecurrent
designstate. The
requirements,differencethe between
reward thefor
cost calculated
continuous for
driving current
wasmoment
dividedand into four parts:
current state. The difference between the cost calculated for the current moment
next moment could be used as the predicted reward, assuming that the inputs of the user and the and the next moment
current
couldreward,
be used predicted
as the predictedreward,
reward, participation
assuming that reward,
the inputs and task reward. The current reward was
planner remained unchanged during one cycle. The items forofthe
thecost
userfunction
and the planner remained
of this HA* method
determined
unchanged byduring
are shown
the difference
in Tableone2,cycle.
which
between
The items
were
the
for
designed thespeed of theofcurrent
costconsidering
by function HA*moment
thisrequirements
the method and and
are the in
shown previous
Table of moment,
characteristics
2, individuals
which indicators
considering werewith designedsuch by asconsidering
comfort theG requirements
when driving anand EWcharacteristics
[11]. of individuals with
severe disabilities. Here, n denotes the positive gain for each cost item, HC denotes
severe
A the disabilities.
Hybrid
heuristicA*cost Here,
(HA*) [24]
using denotes
A*,based the
method
d denotes positive gain
was designed
the distance for each
to the closest cost item,
to wall,
calculate HC
α is the denotes
the the
predicted
steering heuristic
angle, reward
and CC based on
cost using A*, d denotes the distance to the closest wall, is the steering angle, and CC and BC are
whether the EW was driven to the goal. The HA* is a heuristic method that determines the cost from
the direction change and backward penalty, respectively. Figure 10 shows an example for states
the current state to the goal state. The HA* method was initially used to calculate the cost of the
current state. The difference between the cost calculated for the current moment and the next moment
could be used as the predicted reward, assuming that the inputs of the user and the planner remained
and BC are the direction change and backward penalty, respectively. Figure 10 shows an example for
states < x = 2 m, y = 0.1 : 2.4 m, ψ = 15 : 225◦ >, where the color represents the cost for each state:
as the cost increases, the color becomes lighter.
Table 2. Items for the cost function in HA* to calculate the predicted reward.
Cost Items Calculating Method

Approaching the goal G1 ∗ HC
Safety
Int. J. Environ. Res. Public Health 2020, 17, x G2 ∗ d 10 of 19
Magnitude of the input G3 ∗ α
Input change ∘ G4 ∗ δα
= 2 m, = 0.1: 2.4 m, = 15: 225 , where the color represents the cost for each state: as the cost
Forward/backward change G5 ∗ CC
increases, the color becomes Backward lighter. G6 ∗ BC
Figure
Figure 10.
10. Example
Example of
of cost determination using
using Hybrid
HybridA* (HA*)<x
A*(HA*) <x == 2 m, y =
= 0.1:2.4 m, ψ =
= 15:225◦∘>.
15:225 >.
The participation
Table 2.reward
Items forwas
thesimply in HA*bytoPcalculate
calculated
cost function 2 ∗ kt−1 , to
theevaluate
predictedwhether
reward. the input of the
user was used to drive the EW, where kt−1 denotes the control weight of the user at the last moment
Cost Items Calculating Method
and P2 denotes the gain for this reward. A larger P2 might make the algorithm more inclined to use
Approaching the goal ∗
the input of the user. Safety ∗
Finally, a task reward was set to evaluate
Magnitude whether the user completed
of the input ∗ the task. The system obtains
a positive reward (R goal ) if the EW achieved
Input change the goal, a negative ∗ reward (−R collide ) if the EW collided,
and zero task reward otherwise. Forward/backward change ∗
The entire reward function Backward
is shown in (5). The predicted reward ∗ corresponded to the difference
between the cost of the state for the next moment and the current moment, and represented the
The participation
evaluation of the currentreward wasofsimply
behavior the EW.calculated
The current bycost∗was calculated
, to evaluatefrom whether the input
the change in speedof
the user
of the was
EW, as used to driveisthe
acceleration theEW,mainwherefactor affectingdenotes the control
comfort. weight of
Participation the user
rewards at the
were usedlast
to
moment and
evaluate whether denotes the gain
a user actively for this reward.
participated A larger
in the driving process.might make
Finally, the taskthe reward
algorithm more
aided the
inclined
system intoevaluating
use the input of thebehavior
driving user. throughout the entire driving process.
Finally, a task reward was set to evaluate whether the user completed the task. The system
obtains a positive reward Reward
( ) if=the EW
Cost achieved
current − Costnextthe+goal, a negative
P1 ∗abs (vt − vt−1reward
) (− ) if the EW
collided, and zero task reward otherwise.
| {z } | {z }
Predicted reward Current reward
The entire reward function (5)
+ P2 is∗ (shown
1 − kt−1in) (5).
+ RThe predicted reward corresponded to the difference
goal − Rcollision (P1 , P2 < 0),
between the cost of the state|for the {z next } moment
| {zand the } current moment, and represented the
Participation reward
evaluation of the current behavior of the EW. The current cost was calculated from the change in
Task reward
speed of the EW, as acceleration is the main factor affecting comfort. Participation rewards were used
to evaluate whether a user actively participated in the driving process. Finally, the task reward aided
the system in evaluating driving behavior throughout the entire driving process.
Reward= − + ∗ abs(v − v )
+ ∗ (1 − ) + − ( , 0), (5)
5.4. States and Action Sets Design for the Shared Control System
The states of the entire shared control system were defined as < x, y, ψ >, where x, y denotes the
position information and ψ represents the orientation of the EW. When discretizing the states, if the
density of the states was too low, it could not accurately reflect the motion of the EW. In contrast, if
the density of the states was too high, it could affect the convergence of the reinforcement learning
algorithm. Therefore, it was important to design the resolution of the state to achieve the purpose
of this application. To reduce the EW orientation classifications, we classified the EW orientation in
the current state by considering the ultimate goal of EW driving rather than considering only the size
of the orientation in Cartesian coordinates. The EW orientation (ψ) was classified into the following
three categories: correct direction, middle correct direction, and wrong direction. In this application,
the resolution of the orientation was set at 30◦ . As shown in Figure 10, concerning the position
(x = 2 m, y = 0.1 m), ψ between 75◦ and 105◦ , and 45◦ and 75◦ were considered the correct direction
and middle correct direction, respectively, whereas ψ different from these ranges were considered the
wrong direction.
The frequency of this system was set at 10 Hz. Given that the speed of an EW in indoor
environments typically ranges from 1.5 to 2.5 km/h, the EW moved approximately 4–7 cm within
a cycle. Position information was presented using grid cells that corresponded to squares of a
certain length. We compared the different learning results by using grid lengths of 10, 20, and 30 cm
through simulations. The results show that a rapid and accurate convergence was obtained after
approximately 10 training trials when the grid size was 20 cm. In the case of 10 cm, the policy converged
but required more than 30 training trials, and in the case of 30 cm, the policy did not converge even
after more than 50 training trials. This was because the side length of the grid size was too large.
The position of the EW could vary significantly when the EW passes through the grid under different
conditions. Thus, the system could not accurately learn the correct policy for user control weight.
5.5. Sarsa Learning Based Shared Control Algorithm

Various methods have been developed to solve reinforcement learning problems (value function
estimation), and can be classified into three types: dynamic programming-based optimal control
approaches, the Monte Carlo method, and temporal difference methods such as the Q-learning and
Sarsa learning methods.
Sarsa learning is a widely used online learning algorithm, especially in the field of dynamic
planning within unknown environments [25,26]. In this study, Sarsa learning was applied to calculate
the optimal policy used to select the optimal user control weight in different states. The state sequence
(St ), action (At ), reward function (R), action value function (Q), and optimal policy were designed in
the aforementioned parts of this section. The step for updating the action-value function is shown
in (6), where αt denotes the learning factor and γ denotes the discount factor.
Q(St , At ) = Q(St , At ) + αt [Rt + γQ(St+1 , At+1 ) − Q(St , At )], (6)
Furthermore, we applied the softmax method to select the actions [23]. This is a method that
varies the probabilities of the actions as a graded function of the estimated value. Additionally, we used
the Boltzmann distribution to show the action selection probability of each given state. The Boltzmann
distribution is shown in (7), commonly used in the action selection process in RL problems, where
τ denotes a positive temperature parameter affecting the exploration/exploitation trade-off and PAN
denotes the possibility of action (AN ).
eQ(AN )/τ
pAN = Pn , (7)
Q(Ai )/τ
i=1 e
The complete algorithm is shown in Table 3. The system obtained a set of data {St , At , Rt } after
each update.
Table 3. Shared control algorithm for EW driving.
Algorithm: Sarsa Learning-Based Shared Control Algorithm

1. t = 0, γ = 0.8, αt = 0.05, Rt calculated by reward function
2. Recursively compute until the stop condition is met
3. Recursively compute until reaching the goal
4. Obtain current stateSt
5. Decide action At = kv , kψ. by Sarsa learning
.
6. Send kv , kψ. to the system, calculate output v, ψ
.
7. Send output v, ψ to the EW
8. Calculate the next state St+1
9. Update the Q-table, Load Rt
The stop condition of the training process was set as follows:
1. Q (s,a) for each state only changes to a certain value.

2. The rank Q (s,a) for each state only changes to a certain value.
3. The latest trajectory does not include new states.
4. Consecutive successes of the above condition.
5. The number of training trials exceeds a certain number.
6. Experiments: System Effectiveness and User-Machine Interactions

There were two purposes for these experiments. The first was to verify the effectiveness of the
shared controller with real users. The second was to analyze the characteristics of user-machine
interactions. The experiments were first carried out in a virtual reality (VR) platform, as this system
requires several training trials and for safety reasons.
6.1. Experimental Setup

The experimental setup is shown in Figure 11a. The user provided the operating signals through
a joystick and the input signals were then transmitted to MATLAB through the user datagram protocol
(UDP). MATLAB was used to calculate the real-time positions and orientations using information
about the environment and the input signals. Subsequently, the calculation results were sent to Unity
(Unity Technologies) through the UDP. Users could train their driving ability in the virtual reality
world through a head-mounted virtual reality display (Oculus). The sampling frequency of this shared
control system was set as 10 Hz. Figure 11b shows the experimental scene, where the course was set at
1.2 m width, as discussed in Section 5.3.
As discussed in Section 3, there were two input characteristics for individuals who have difficulties
operating an EW: insufficient input and oversteered (excessive) input. Ten volunteers participated in
the experiments. In the case of insufficient input, the volunteers were required to use a restraining
device to limit the leftward movement of the thumb. As shown in Figure 12, the restraining device
was made of polyform splinting material (SAKAIMED Company, Tokyo, Japan), and the shape of the
material could be changed by heating. The most important reason for the oversteering phenomenon
is that the user provides more steering signals than expected, causing new errors, and this vicious
cycle leads to the oversteering phenomenon of the entire driving process. To achieve the oversteering
There were two purposes for these experiments. The first was to verify the effectiveness of the
shared controller with real users. The second was to analyze the characteristics of user-machine
interactions. The experiments were first carried out in a virtual reality (VR) platform, as this system
requires several training trials and for safety reasons.

6.1. Experimental Setup
The experimental setup is shown in Figure 11a. The user provided the operating signals through
effect,
a joystickthe and
following
the inputalgorithm
signalswas were added
then to deal with to
transmitted theMATLAB
input from the user:
through the Ifuser
the datagram
input for
changing direction (more than 30% of the maximum) lasted more
protocol (UDP). MATLAB was used to calculate the real-time positions and orientations using than 0.3 s, the input gain would be
increased three
information abouttimes.
the Figure 13 shows
environment examples of normal users, users with insufficient input (with
Int. J. Environ. Res. Public Health 2020, 17, x and the input signals. Subsequently, the calculation results13were of 19
the restraining device), and users with
sent to Unity (Unity Technologies) through the UDP. oversteered input (by changing
Users could train thetheir
input gain), ability
driving showing in that
the
theJ.auxiliary
virtual
Int. As reality settings
discussed
Environ. Res. world
Public inused in this
through
Section
Health 2020, 3,axsection
17, there to reproduce
head-mounted
were two virtual
inputthe input characteristics
reality
characteristics were correct.
displayfor(Oculus).
individualsThe who It should
sampling
13have
of 19
be noted here
frequency
difficulties thatshared
ofoperating
this all volunteers
an control were required
system
EW: insufficient was settoas
input understand
10 oversteered
and their
Hz. Figure input
11b showscharacteristics
(excessive) the byvolunteers
experimental
input. Ten operating
scene,
without
where As the shared
discussed
the course
participated in thewascontrol
in system
Section 3, 5
set at 1.2 mInwidth,
experiments. times
there
the case before
were two starting
as discussed input the experiments.
characteristics
in Section
of insufficient the volunteers were required tohave
input, 5.3. for individuals who use
difficulties operating an EW: insufficient input and oversteered
a restraining device to limit the leftward movement of the thumb. As shown in Figure 12, the (excessive) input. Ten volunteers
participated
restraining device in the was
experiments. In the casesplinting
made of polyform of insufficient
materialinput, the volunteers
(SAKAIMED Company,were required to use
Tokyo, Japan),
a restraining
and the shapedeviceof thetomaterial
limit the couldleftward movement
be changed of the thumb.
by heating. The most As important
shown in reason Figure for 12, the
the
restraining device was made of polyform splinting material (SAKAIMED
oversteering phenomenon is that the user provides more steering signals than expected, causing new Company, Tokyo, Japan),
and
errors, theand
shape
this of the material
vicious cycle leadscould to thebe oversteering
changed by phenomenon
heating. The of most important
the entire reason
driving for the
process. To
oversteering
achieve phenomenoneffect,
the oversteering is thatthethefollowing
user provides more steering
algorithm was added signals thanwith
to deal expected,
the inputcausing
fromnew the
errors,
user: If and this vicious
the input cycle leads
for changing to the(more
direction oversteering
than 30% phenomenon of the lasted
of the maximum) entire more
driving process.
than To
0.3 s, the
achieve the oversteering effect, the following algorithm was added
input gain would be increased three times. Figure 13 shows examples of normal users, users with to deal with the input from the
user: If the input
insufficient inputfor changing
(with direction device),
the restraining (more than and30%usersof the
withmaximum)
oversteered lasted
input more
(by than 0.3 s, the
changing the
input gain would be increased three times. Figure 13 shows examples
input gain), showing that the auxiliary settings used in this section to reproduce the input of normal users, users with
insufficient
characteristics input
were(with the restraining
correct. It should bedevice), and that
noted here usersallwith oversteered
volunteers input (bytochanging
were required understand the
input gain),
their input showing that
characteristics by the auxiliary
operating settings
without used in
the shared this system
control section 5to timesreproduce the input
before starting the
characteristics
experiments. were correct. It should be noted here that all volunteers were required to understand
(a) (b)
their input characteristics by operating without the shared control system 5 times before starting the
experiments. Figure 11. (a) Experimental setup and (b) experimental scene.
Figure 11. (a) Experimental setup and (b) experimental scene.
Figure 12. Use of a restraining device to reproduce the insufficient input scenario.
Collide
Collide
(a) (b) (c)

Figure 13. Driving examples with different input characteristics: (a) normal user, (b) user with
(a)
Figure 13. Driving examples with different input(b) characteristics: (a) normal user, (c)
(b) user with
insufficient input, and (c) user with oversteered input.
insufficient input, and (c) user with oversteered input.
Figure 13. Driving
6.2. Experimental Methodexamples with different input characteristics: (a) normal user, (b) user with
insufficient input,
6.2. Experimental Method and (c) user with oversteered input.
Two experiments were conducted with each participant to verify the effectiveness of the shared
Two
control experiments
system
6.2. Experimentaland were the
analyze
Method conducted with each
user-machine participant
interactions. to verify
Before eachthe effectiveness
experiment, eachofparticipant
the shared
control system and analyze the user-machine interactions. Before each experiment, each participant
was Two
askedexperiments
to reproducewere
theconducted with
insufficient andeach participant
oversteered to verify
inputs usingthethe
effectiveness of the shared
methods introduced in
control system and analyze the user-machine interactions. Before each experiment, each
Section 6.1 to understand their input characteristics. After each experiment, a questionnaire participant
survey
was asked to reproduce
was conducted. theofinsufficient
The content and oversteered
the questionnaire inputs using the methods introduced in
is shown below:
Section
1. The6.1 to understand
control is easy. their input characteristics. After each experiment, a questionnaire survey
was conducted. The content of the questionnaire is shown below:
4. The training duration is long.

5. The EW is controlled by the user.
6. J. The
Int. control
Environ. is mostly
Res. Public satisfactory.
Health 2020, 17, 5502 14 of 19
Subjects were asked to complete the experiments and answer a questionnaire based on a Likert
scale (i.e., 1 = strongly disagree, 2 = disagree, 3 = neither agree nor disagree, 4 = agree, 5 = strongly
was asked to reproduce the insufficient and oversteered inputs using the methods introduced in
agree) [27]. The participants were also asked to answer the Driving Behavior Questionnaire (DBQ)
Section 6.1 to understand their input characteristics. After each experiment, a questionnaire survey
[28] and the Driving Style Questionnaire (DSQ) [29], which were used to analyze the driving
was conducted. The content of the questionnaire is shown below:
characteristics of the users.
1. The control is easy.
6.3. Experimental
2. Results concentration.
The control requires
3. The objective
The control causes physical
results mainlyfatigue.
showed the effectiveness of the shared control system and how the
4.
shared The training
control duration
system is long.
assisted and interacted with the users, whereas the subjective feelings of the
usersThe
5. aboutEWthe shared control
is controlled by thesystem
user. are shown in the subjective results.
6. The control is mostly satisfactory.
6.3.1. Objective Results
Subjects were asked to complete the experiments and answer a questionnaire based on a Likert
scale Figure
(i.e., 1 =14strongly
shows an example2 of
disagree, the insufficient
= disagree, case, agree
3 = neither wherenor thedisagree,
participant
4 =can hardly
agree, 5 = give a
strongly
signal [27].
agree) with The
the restraining
participantsdevice.
were alsoTaking
askedthe
to 7th training
answer trial (the
the Driving last training
Behavior trial) as an
Questionnaire example,
(DBQ) [28]
the planner began to dominate the control of after 2.7 s. It was found that the
and the Driving Style Questionnaire (DSQ) [29], which were used to analyze the driving characteristics user could still take
most
of the of the control in the
users. direction, because during the 2.7–4.7 s range, the user input ( ) and
DWA input ( ) were very similar.
6.3. Experimental
Figures 15 and Results
16 show the results of users with oversteered input characteristics. It was found
that, besides the physical functions (operating characteristics), the user’s personality also affected the
The objective results mainly showed the effectiveness of the shared control system and how the
results of the experiments. Users with oversteered input characteristics could be divided into two
shared control system assisted and interacted with the users, whereas the subjective feelings of the
categories based on how users adapted to the controller. The first category corresponds to active users,
users about the shared control system are shown in the subjective results.
who were positively involved in the training process. An example of experimental results for an
activeObjective
6.3.1. user is shown
Resultsin Figure 15. The second category corresponds to passive users. The passive
users tended to rely on the fact that the shared controller gradually helped them to complete the .
Figure
driving, and14they
shows an example
maintained the of the operating
same insufficient case, where
pattern duringthe participant
training. can hardly
An example of agive
passiveaψ
signal
user iswith
shown theinrestraining device.
Figure 16. The Taking
pattern the 7th
of their training trial (the last training trial) as an example,
.input signals remained unchanged, making the shared
the
EWplanner
control began to dominate
more similar to autonomous of ψ afterOf2.7
the controldriving. s. ten
the It was found seven
subjects, that the useractive
were couldusers
still take
and
most of the control in the
three of them were passive users. v direction, because during the 2.7–4.7 s range, the user input (vuser ) and
DWA input (vDWA ) were very similar.
1st trial 1 1st trial

0.5 4th trial 0.5 4th trial
8th trial 0 8th trial
0 -0.5
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Time [s] Time [s]
1
0.5 0.5
0
0 -0.5
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Time [s] Time [s]
1
0.5 0.5
0
0 -0.5
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Time [s] Time [s]
1 1
0.5 0.5
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Time [s] Time [s]
(a) (b)
Figure 14. Velocity and control weight change in the insufficient case: (a) vuser , vDWA , v, and kv change
.
Figure 14. .
Velocity .
and control weight change in the insufficient case: (a) , , , and
and (b) ψuser , ψDWA , ψ, and kψ. change.
change and (b) , , , and change.
Figures 15 and 16 show the results of users with oversteered input characteristics. It was found
that, besides the physical functions (operating characteristics), the user’s personality also affected the
results of the experiments. Users with oversteered input characteristics could be divided into two
categories based on how users adapted to the controller. The first category corresponds to active users,
who were positively involved in the training process. An example of experimental results for an active
user is shown in Figure 15. The second category corresponds to passive users. The passive users
tended to rely on the fact that the shared controller gradually helped them to complete the driving,
and they maintained the same operating pattern during training. An example of a passive user is
shown in Figure 16. The pattern of their input signals remained unchanged, making the shared EW
control more similar to autonomous driving. Of the ten subjects, seven were active users and three of
them were0.5 passive users. 5th trial 0.5 5th trial
0 -0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
Time [s] 1 Time [s]
1st trial 1st trial
0.5 9th trial 0
0.5 9th trial
0 -0.50
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
0 -0.5
0 0.5 1 1.5 Time
2 [s] 2.5 3 3.5 4 0 0.5 1 1.5 Time
2 [s] 2.5 3 3.5 4
Time [s] 1 Time [s]
0.5 0.51
0.5 0
0.5
0 -0.50
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
0 -0.5
0 0.5 1 1.5 Time
2 [s] 2.5 3 3.5 4 0 0.5 1 1.5 Time
2 [s] 2.5 3 3.5 4
Time [s] 1 Time [s]
0.51 0.51
0
0.5 0.5
0 -0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
0 0
0 0.5 1 1.5 Time
2 [s] 2.5 3 3.5 4 0 0.5 1 1.5 Time
2 [s] 2.5 3 3.5 4
1 Time [s] 1 Time [s]
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
Time [s] Time [s]
(a) (b)
Figure 15. Velocity and control weight change in the oversteer case (active user): (a)
(a) (b)
, , , and change, (b) , , , and change.
Figure 15. Velocity and control weight change in the oversteer case (active user): (a) vuser , vDWA , v, and kv
Figure 15. . Velocity
. and
. control weight change in the oversteer case (active user): (a)
change, (b) ψuser , ψDWA , ψ, and kψ. change.
, , , and change, (b) , , , and change.
0 -0.5
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Time [s] 1 Time [s]
1st trial 1st trial
0.5 8th trial 0
0.5 8th trial
0 -0.50
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
0 -0.5
0 0.5 1 Time
1.5 [s] 2 2.5 3 0 0.5 1 Time
1.5 [s] 2 2.5 3
Time [s] 1 Time [s]
0.5 0.51
0.5 0
0.5
0 -0.50
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
0 -0.5
0 0.5 1 Time
1.5 [s] 2 2.5 3 0 0.5 1 Time
1.5 [s] 2 2.5 3
Time [s] 1 Time [s]
0.51 0.51
0
0.5 0.5
0 -0.5
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
0 0
0 0.5 1 Time
1.5 [s] 2 2.5 3 0 0.5 1 Time
1.5 [s] 2 2.5 3
1 Time [s] 1 Time [s]
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Time [s] Time [s]
(a) (b)
Figure 16. Velocity and control weight change in the oversteer case (passive user): (a) vuser , vDWA , v, and kv
Figure 16. . Velocity
(a)and control weight change in the oversteer case (passive user): (a)
. .
change, (b) ψuser , ψDWA , ψ, and k . change. (b)
, , , and change, (b)ψ , , , and change.
6.3.2.Figure
Subjective 16. Results
Velocity and control weight change in the oversteer case (passive user): (a)
6.3.2. Subjective
, Results
, , and change, (b) , , , and change.
As almost all training sessions did not exceed 3 min, the shared system did not cause user fatigue
As almost
or demand too allmuchtraining sessions
attention, anddid not also
users exceedfound3 min,
thethe sharedtime
training system did not cause
acceptable. user fatigue
However, if the
6.3.2.
or Subjective
demand too Results
much attention, and users also found the training time acceptable. However, if the
training trials exceeded 15 times, users often felt that the training duration was too long. As shown in
training
As trials
almost exceeded
all training15 times,
sessions users
did often
not felt
exceed that
3 the
min, training
the shared duration
system was
did
Figure 17a, the points for the sixth item (the control is mostly satisfactory) tended to be high if the too
not long.
cause As
user shown
fatigue
in Figure
or
points for 17a,
demand thetoo the points
much
fourth for the
attention,
item (the sixth item
and users
training (the
alsocontrol
duration long)is
found
is themostly
were satisfactory)
training
low. 17b tended
time acceptable.
Figure to be
shows However,
the high ifif the
relationship the
points
between for
training trialsthe
the fifthfourth
exceeded item (the training
15 times,
item (the EW isusers duration
often felt
controlled is
by the long) were
thatuser) low.
the training Figure
and theduration 17b
sixth item shows
was the
toocontrol
(the relationship
long. As shown
is mostly
between
in Figure the
17a, fifth
the item
points (the
for EW
the is
sixthcontrolled
item (the by the
control user)
is and
mostly the sixth item
satisfactory)
satisfactory). Some passive users also thought that the EW was under their control; this may be because (the
tended control
to be is
highmostly
if the
satisfactory).
points
they for thethat
thought Some
fourth passive
theyitem
had(the users also
training
already thought
duration
understood that the
theisassistance EW was
long) weremethod under
low. Figure their
of the17b control;
shows
shared this may
the relationship
control system and be
because they thought that they had already understood the assistance method
between the fifth item (the EW is controlled by the user) and the sixth item (the control is mostly of the shared control
system and the
satisfactory). Somedriving process
passive was
users within
also theirthat
thought imagination.
the EW was From the personality
under their control; tests (DBQ
this mayand be
DSQ), it was found that those who defined themselves as defensive drivers
because they thought that they had already understood the assistance method of the shared control were active users.
system and the driving process was within their imagination. From the personality tests (DBQ and
DSQ), it was found that those who defined themselves as defensive drivers were active users.
Int. J. Environ.
the driving processRes. Public
was Health 2020,
within 17, x imagination. From the personality tests (DBQ and DSQ),
their 16 of it
19was
found that those who defined themselves as defensive drivers were active users.
(a) (b)
17. Results
FigureFigure of the
17. Results of questionnaire. The
the questionnaire. Thelarger
largerthe
thecircle, the greater
circle, the greaterthe
thenumber
number of users
of users in that
in that
situation: (a) relationship
situation: between
(a) relationship betweentraining
trainingduration
duration and
and overall satisfaction,
overall satisfaction, and
and (b)(b) relationship
relationship
between the feeling
between of under
the feeling control
of under andand
control overall satisfaction.
overall satisfaction.
6.4. Discussion
6.4. Discussion
The experimental
The experimental results first
results showed
first showedthe theeffectiveness
effectiveness of of the
theproposed
proposed shared
shared control
control system,
system,
which whichcould could
be used be by users
used with different
by users inputinput
with different characteristics. TheThe
characteristics. shared
sharedcontrol system
control systemwas found
was
found to help users make a safe left turn by increasing at least one of
to help users make a safe left turn by increasing at least one of the DWA control weights. According to the DWA control weights.
SectionAccording
5.3.2, gains to in
Section 5.3.2, gains
the reward in the
function (5)reward
might function
have a large(5) might haveon
influence a large influence interactions.
user-machine on user-
machine interactions. Therefore, it could be inferred that adequately improving one of the
Therefore, it could be inferred that adequately improving one of the participation gains (e.g., a P2
.
participation gains (e.g., a 𝑃2 increase in 𝜓̇ ) might give users more opportunities to train their
increase in ψ) might give users more opportunities to train their steering control ability. The input
steering control ability. The input characteristics of passive and active users were also very different.
characteristics
Figure 18 of passive
shows and active
the input changes users were also
of different users,very different.
where Figure
the red and blue18 shows
lines the input
represent changes
the active
of different users, where the red and blue lines represent the active and passive
and passive users, respectively. The variation was calculated using (8), which represents the average . users, respectively.
The variation
input changewasincalculated
the and using 𝜓̇ directions,
(8), whichwhere represents
t and △ the average
denote the timeinput
and change
the samplingin thetime, n ψ
v and
denotes
directions, wherethe ttotal
andsampling
4t denotenumber,
the time andandInput(t) denotes time,
the sampling the relative inputthe
n denotes signal
totalatsampling
time t. It number,
was
found denotes
and Input(t) that active theusers wereinput
relative more signal
inclined atto change
time t. It their operations,
was found especially
that active were𝜓̇ more
in the
users direction.
inclined
. ̇
Based on the above discussion, the operating ability to control and 𝜓 could be trained separately.
toInt.
change their
J. Environ. Res.operations,
Public Health 17, x in the ψ direction. Based on the above discussion, the operating
especially
2020,
. Section 17 of 19
As discussed in 3, the residual physical functions of users varied, and a more rational
abilitychoice
to control v and ψ could be trained separately.
of parameters might improve user satisfaction. In addition, it was possible to make those
passive users more actively involved in the shared control system with some prior education and
instructions.
𝑛
|𝐼 𝑝𝑢 ( 1) 𝐼 𝑝𝑢 ( )|
Variation = ∑( ) (8)
△
𝑡=1
(a) (b)
Figure 18..Input
Figure18. Inputchanges
changes of
of active
active and
and passive users: (a)
passive users: (a) input
inputchanges in v direction
changesin direction and
and (b)
(b) input
input
change in ψ direction.
̇
change in 𝜓 direction
7. Conclusions
In this study, a novel shared control system was developed for individuals with difficulties in
operating EWs. This novel shared control system focused on utilizing the residual physical functions
of users and maintaining their driving motivation. The requirements of the shared control system
As discussed in Section 3, the residual physical functions of users varied, and a more rational
choice of parameters might improve user satisfaction. In addition, it was possible to make those passive
users more actively involved in the shared control system with some prior education and instructions.

Xn Input(t + 1) − Input(t)
Variation = ( ) (8)
n4t
t=1
7. Conclusions
In this study, a novel shared control system was developed for individuals with difficulties
in operating EWs. This novel shared control system focused on utilizing the residual physical
functions of users and maintaining their driving motivation. The requirements of the shared control
system were first obtained by analyzing the operating characteristics of the target users and their
lifestyles. Then, a novel reinforcement learning-based shared control scheme was proposed based
on the design requirements. The effectiveness of the shared control system was verified through
VR-based experiments, in which the shared control system could be adapted to various types of users.
The experimental results also showed that some users could not actively participate in the shared
control system, as they tended to rely on adaptation from the machine side. For these users, it was
important to provide some directions beforehand.
In the future, the parameters of the shared control system, such as reward gains, could be
designed considering the operational ability of users and the training purpose. This paper only
discussed individuals with severe physical impairments who still have some physical functions to
operate the interfaces with, such as tiny joysticks. For those people who cannot use joystick-type
controllers, the proposed system should be extended for more interfaces, such as sound and EMG
signals. Considering different operating devices will lead to different driving characteristics, so the
parameters for the shared control system and reward design will need to be modified accordingly.
Author Contributions: Conceptualization, L.X. and M.S.; methodology, L.X. and M.S.; software, L.X.; validation,
L.X.; formal analysis, L.X.; investigation, L.X.; resources, M.S.; data curation, L.X.; writing—original draft
preparation, L.X.; writing—review and editing, L.X. and M.S.; visualization, L.X.; supervision, M.S.; project
administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was funded by JSPS KAKENHI, grant number 24700591.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Cabinet Office. Annual Report on Individuals of Disabilities. Available online: https://www8.cao.go.jp/
shougai/whitepaper/h30hakusho/zenbun/siryo_02.html (accessed on 10 April 2020).
2. Rosenbloom, S. The mobility needs of the elderly. In Transportation in an Aging Society, Improving Mobility and
Safety for Older Persons; Transportation Research Board: Washington, DC, USA, 1988; Volume 2, pp. 21–71.
3. Sasaki, T. Application of mechatronics to support of movement for the handicapped. J. Soc. Mech. Eng.
1998, 101, 57–61.
4. Vanhooydonck, D.; Demeester, E.; Nuttin, M.; Van Brussel, H. Shared control for intelligent wheelchairs:
An implicit estimation of the user intention. In Proceedings of the 1st International Workshop on Advances
in Service Robotics, Bardolino, Italy, 13–15 March 2003.
5. Zeng, Q.; Teo, C.L.; Rebsamen, B.; Burdet, E. Collaborative path planning for a robotic wheelchair.
Disabil. Rehabil. Assist. Technol. 2008, 3, 315–324. [CrossRef] [PubMed]
6. Urdiales, C.; Peula, J.M.; Fdez-Carmona, M.; Barrue, C.; Perez, E.J.; Sanchez-Tato, I.; Caltagirone, C. A new
multi-criteria optimization strategy for shared control in wheelchair assisted navigation. Auton. Robot.
2011, 30, 179–197. [CrossRef]
7. Simpson, R.; LoPresti, E.; Hayashi, S.; Guo, S.; Ding, D.; Ammer, W.; Cooper, R. A prototype power
assist wheelchair that provides for obstacle detection and avoidance for those with visual impairments.
J. Neuroeng. Rehabil. 2005, 2, 30. [CrossRef] [PubMed]
8. Suzuki, M.; Takeyoshi, D. Mobility support robotics for the elderly (in Japanese). BME 1996, 10, 18–23.
9. Aigner, P.; McCarragher, B. Shared control framework applied to a robotic aid for the blind. IEEE Control
Syst. Mag. 1999, 19, 40–46.
10. Vanhooydonck, D.; Demeester, E.; Huntemann, A.; Philips, J.; Vanacker, G.; Van Brussel, H.; Nuttin, M.
Adaptable navigational assistance for intelligent wheelchairs by means of an implicit personalized user model.
Rob. Auton. Syst. 2010, 58, 963–977. [CrossRef]
11. Cooper, R.A. Intelligent control of power wheelchairs. IEEE Eng. Med. Biol. 1995, 14, 423–431. [CrossRef]
12. Poncela, A.; Urdiales, C.; Perez, E.J.; Sandoval, F. A new efficiency-weighted strategy for continuous
human/robot cooperation in navigation. IEEE Trans. Syst. Man Cybern. B Cybern. 2009, 39, 486–500.
[CrossRef]
13. Yanco, H.A. Shared User-Computer Control of a Robotic Wheelchair System. Ph.D. Thesis, Massachusetts
Institute of Technology, Cambridge, MA, USA, 2000.
14. Fox, D.; Burgard, W.; Thrun, S. The dynamic window approach to collision avoidance. IEEE Robot Autom. Mag.
1997, 4, 23–33. [CrossRef]
15. Simpson, R.C.; Levine, S.P.; Bell, D.A.; Jaros, L.A.; Koren, Y.; Borenstein, J. NavChair: An assistive wheelchair
navigation system with automatic adaptation. In Assistive Technology and Artificial Intelligence; Springer
Verlag GMBH: Berlin, Germany, 1998; pp. 235–255.
16. Demeester, E.; Huntemann, A.; Vanhooydonck, D.; Vanacker, G.; Van Brussel, H.; Nuttin, M. User-adapted
plan recognition and user-adapted shared control: A Bayesian approach to semi-autonomous wheelchair
driving. Auton. Robot. 2008, 24, 193–211. [CrossRef]
17. Shino, M.; Yamamoto, Y.; Mikata, T.; Inoue, T. Input device for electric wheelchairs considering physical
functions of persons with severe Duchenne muscular dystrophy. Technol. Disabil. 2014, 26, 105–115.
[CrossRef]
18. Xi, L.; Yamamoto, Y.; Mikata, T.; Shino, M. One-dimensional input device of electric wheelchair for persons
with severe Duchenne Muscular Dystrophy. Technol. Disabil. 2019, 31, 101–113. [CrossRef]
19. Simpson, R.C.; LoPresti, E.F.; Cooper, R.A. How many people would benefit from a smart wheelchair?
J. Rehabil. Res. Dev. 2008, 45, 53–71. [CrossRef] [PubMed]
20. Furumasu, J.; Guerette, P.; Tefft, D. The development of a powered wheelchair mobility program for
young children. Technol. Disabil. 1996, 5, 41–48. [CrossRef]
21. Sim, J.; Shino, M.; Kamata, M. Analysis on the upper extremities to persons with Duchenne Muscular
Dystrophy for the operation of electronic wheelchair. In Proceedings of the Society of Life Supporting
Engineering, Osaka, Japan, 18–20 September 2010.
22. Oh, S.; Hata, N.; Hori, Y. Integrated motion control of a wheelchair in the longitudinal, lateral, and pitch
directions. IEEE Trans. Ind. Electron. 2008, 55, 1855–1862.
23. Sutton, R.S.; Barto, A.G. Finite Markov Decision Processes. In Reinforcement Learning: An Introduction, 2nd ed.;
MIT Press: London, UK, 2014; pp. 53–82.
24. Petereit, J.; Emter, T.; Frey, C.W.; Kopfstedt, T.; Beutel, A. Application of Hybrid A* to an autonomous
mobile robot for path planning in unstructured outdoor environments. In Proceedings of the 7th German
Conference on Robotics, Munich, Germany, 21–22 May 2012.
25. Xu, W.; Huang, J.; Wang, Y.; Tao, C.; Cheng, L. Reinforcement learning-based shared control for walking-aid
robot and its experimental verification. Adv. Robot. 2015, 29, 1463–1481. [CrossRef]
26. Ramachandran, D.; Gupta, R. Smoothed sarsa: Reinforcement learning for robot delivery tasks. In Proceedings
of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009.
27. Likert, R. A technique for the measurement of attitudes. Arch. Psychol. 1932, 22, 55.
28. Komada, Y.; Shinohara, K.; Kimura, T.; Miura, T. The classification of drivers by their self-report driving
behavior and accident risk. IATSS Rev. 2009, 34, 230–237.
29. Ishibashi, M.; Okuwa, M.; Akamatsu, M. Development of driving style questionnaire and workload sensitivity
questionnaire for driver’s characteristic identification. JSAE 2002, 55-02, 9–12.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Ijerph 17 05502 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ijerph 17 05502 PDF

Uploaded by

Copyright:

Available Formats

International Journal of

Keywords: electric wheelchair; physical impairments; shared control; driving motivation;

Figure Figure 1. Construction

3.2. Driving Environment Settings

3.3. System Requirements

1. The system should assist users in reaching their destination.

4. Construction of the Shared Control System

Int. J. Environ. Res. Public Health 2020, 17, x 5 of 19

Figure 4. Shared control system concept.

4.3. Framework of the Shared Control System

4.3. Framework of the Shared Control System

5. Online Learning System Design for the Shared Control System

Figure 6. Structure of reinforcement learning.

Figure 7.Figure 7. Structure

Detected area using

The directionThe direction of EW

Body Speed [km/h]

Body Speed [km/h]

(a) 0.5 (b)

Cost Items Calculating Method

5.5. Sarsa Learning Based Shared Control Algorithm

Q(St , At ) = Q(St , At ) + αt [Rt + γQ(St+1 , At+1 ) − Q(St , At )], (6)

Table 3. Shared control algorithm for EW driving.

Algorithm: Sarsa Learning-Based Shared Control Algorithm

The stop condition of the training process was set as follows:

1. Q (s,a) for each state only changes to a certain value.

6. Experiments: System Effectiveness and User-Machine Interactions

6.1. Experimental Setup

Int. J. Environ. Res. Public Health 2020, 17, 5502 13 of 19

(a) (b) (c)

4. The training duration is long.

1st trial 1 1st trial

You might also like