Endalkachew - D and Zekarias - G Case Study

ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY
COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING
ELECTROMECHANICAL ENGINEERING DEPARTMENT
Modelling and Simulation of Mechatronic Systems (EMEg-6422) Project-Two
Case Study on Utility Function Method for Behaviour Selection in

Autonomous Robots
By:
1. Endalkachew Degarege………………... GSR230/14
2. Zekarias Girma……………………….... GSR237/14
Submitted to: Dr. Beteley Teka
26/06/2022 G.C
Modelling and Simulation of Mechatronic Systems
Table of Contents
List of Figures ................................................................................................................................. II
Abstract ......................................................................................................................................... III
1. Utility Function Method for Behaviour Selection in Autonomous Robots ............................. 1
1.1. Introduction ...................................................................................................................... 1
1.2. Behaviour Selection ......................................................................................................... 1
1.3. Behavioural selection in robots ........................................................................................ 2
1.4. Utility function method .................................................................................................... 3
1.4.1. Utility concept ........................................................................................................... 3
1.4.2. The utility function method concept ......................................................................... 3
1.5. General guidelines for utility function Method ................................................................ 3
1.6. Case study - a Transportation Task................................................................................... 4
1.6.1. Required Task of the Robot ...................................................................................... 4
1.6.2. Behaviours ................................................................................................................ 5
1.6.3. Behaviour selection ................................................................................................... 6
1.6.4. Simulations and result ............................................................................................... 6
1.7. Extended UF Method ....................................................................................................... 8
References ....................................................................................................................................... 9
Page | I
AASTU, School of Electrical and Mechanical Engineering, Mechatronics Engineering
List of Figures
Figure 1: A schematic illustration of the transportation robot [3] ................................................... 4
Figure 2: Variation in the utility values [3] ..................................................................................... 7
Figure 3: A 3D snapshot from the simulator, showing the robot in action [3]................................ 7
Page | II
Abstract
This paper is presenting a condensed and summarized review of the case study which is about the
utility function method for behaviour selection of the autonomous robot. The autonomous robot
from its very nature of evolution to its current state has been reviewed on different literatures. In
addition, the paper has a very clear description about what is meant by utility, utility function, and
behavior selection, and finally using a transportation task case study the method has been discussed
in detail to easily elaborate the utility function method for behavioural selections of autonomous
robots.
Page | III
1. Utility Function Method for Behaviour Selection in Autonomous Robots

1.1. Introduction
Autonomy is a prime issue on robotics field and it is closely related to decision making. A truly
autonomous robot is one that can perceive its environment, make decisions based on what it
perceives and/or has been programmed to recognize conditions and then actuate a movement or
manipulation within that environment [1]. One of the main challenges for autonomous agents that
have to survive in unknown and changing environments is to make the “right” decisions in their
interactions with the environment. The control of autonomous intelligent robotic agent operating
in unstructured changing environments includes many objective difficulties. One major difficulty
concerns the characteristics of the environment that the agent should operate in [2].
In behaviour-based robotics (BBR), the artificial brain of a robot is built from a repertoire of basic
behaviours which are activated or deactivated through a process of behaviour selection that uses
the state of the robot possibly, its environment as input. An obstacle facing the behaviour-based
approach is the problem of behavioural selection, i.e., the problem of activating appropriate
behaviours at all times. Simple robots with limited behavioural repertoires can have their
behaviour selection created manually, which is actually how most behavioural selection methods
work [3].
Contrary to the systems established in traditional artificial intelligence (AI), which are more
deliberate yet often work relatively slowly, many behaviour-based systems are strongly reactive,
i.e., there is a more or less immediate relationship between perception and action. In reality, the
description of a robotic brain frequently combines the top-down method described in classical AI
with the bottom-up method described in BBR [3].
1.2. Behaviour Selection

Behaviour-Based Robotics, or BBR is an approach in robotics that focuses on robots that can
exhibit complex appearing behaviours despite little internal variable states to model its immediate
environment and gradually correct its actions via sensory-motor links. Behaviour-based robots
respond to their external stimuli which enables them to avoid obstacles and complete their tasks.
For example, if a robot is supposed to deliver some instrument in a particular hospital room.
Mapping the whole structure of a hospital is not beneficial as it will keep changing according to
Page | 1
real-time situations and there might be too much crowd for the robot to move in a previously empty
room, that is why mapping a small area in its environment enables the robot to work efficiently
and also enables the robot to have less complex instruction set as it has to only implement actions
like wandering, obstacle avoiding and delivering the package while classical AI-Based robot will
have to include mapped environment of the hospital and have to find the best path according to
the map which might not work in a real-time scenario.
Every time we design behaviour for autonomous robots, we look for a suitable metric to evaluate
our performance, track our progress as we learn, or compare our algorithms to one another. When
a specific goal is assigned, the task usually offers a built-in way to gauge success. For example,
walking can be gauged by its velocity and perturbation stability. It is common for behaviour to be
created and quantified to go hand in hand in situations where behaviour is taught through
optimization of a global objective function. This also holds true, in principle, for behaviour derived
from goals unrelated to the job at hand, which has recently become increasingly successful in
producing emergent autonomous behaviour in robots [4] [5] [6].
Several methods for behavioural selection have been suggested in the literatures, e.g.,
subsumption, the potential fields method, DAMN, activation networks etc. However, most of these
methods require that the experimenter be able to set several important parameters by hand. It is
not far-fetched to use biologically inspired method when generating behavioural organization
systems for autonomous robots. This method is the utility function which plays an important role
in behavioural selection in animals. It is invariably so that behavioural selection involves a tradeoff
where less important tasks must be sacrificed in favor of more urgent tasks.
1.3. Behavioural selection in robots

In robots with larger behavioural repertoires, specifying behavioural selection by hand is a
daunting task, not least because of the difficulty in comparing, at all times and in all situations, the
relative merits of several behaviours. Such comparison requires a common currency which, in
economic theory and game theory, goes under the name utility, a concept that has also been
introduced in ethology and, more recently, in robotics [7].
Page | 2
1.4. Utility function method

1.4.1. Utility concept
The concept of utility can be used for modelling decision-making both in biological and artificial
organisms like robots. In order to overcome the difficulties associated with behavioural selection,
a method known as the utility function (UF) method has been developed. In this method,
behavioural selection is based on the value of utility functions that are evolved rather than hand-
coded, thus minimizing the bias introduced by the user of the method. In this paper, the UF method
will be illustrated by means of an example, namely a transportation task which is directly taken
from the literature on Mechatronics in action [7].
1.4.2. The utility function method concept

A central concept in the UF method is the robot’s state that is obtained by measuring the values of
a set of state variables (z). The state variables are of three different kinds:
✓ External variables (s) such as the readings of IR sensors, sonars, cameras and laser range
finders
✓ Internal physical variables (p) such as the readings of a battery sensor
✓ Abstract variables (x).
The internal variables (physical and abstract) measure the internal state of the robot. However, the
physical variables are, as their name implies, obtained through measurement of physical quantities,
the abstract variables roughly correspond to signaling molecules (hormones) in biological systems.
Thus, the internal abstract variables (henceforth referred to as hormone variables) provide the robot
with a rudimentary endocrine system, allowing a sort of short-term memory independent of the
readings of physical quantities, internal or external.
Note that not all utility functions must depend on all state variables. In many cases, the utility
functions depend only on a subset of the available state variables, and different utility functions
may use different subsets of the variables as inputs [7] [8].
1.5. General guidelines for utility function Method

The methods used for conducting utility method for selection behaviour for an autonomous robot
are as follows [7]:
❖ Specify the configuration for the simulations:

Page | 3
✓ Define the characteristics of the robot, i.e., its shape as well as its sensors and actuators.
✓ Define a set of suitable behaviours for the task at hand.
✓ Define a set of state variables, e.g., the readings of IR proximity sensors and a few hormone
variables.
✓ Define the structure of the utility functions, i.e., the polynomial degree and the input
variables.
✓ Define the arena in which the robot is supposed to operate.
✓ Define a suitable objective function (fitness function).
❖ Generate a random population of behaviour selection systems specified by the parameters
determining the utility functions together with the parameters specifying the hormone variable
dynamics.
❖ Run the optimization procedure:
✓ Evaluate the population of behaviour selection systems
✓ Generate new behaviour selection systems through the evolutionary processes of selection,
crossover, and mutation.
✓ Repeat Steps the above two steps until a user-specified termination criterion have been
achieved.
1.6. Case study - a Transportation Task

To clarify the description of the utility function method, we need to consider a specific example,
namely, a transportation task [3].
1.6.1. Required Task of the Robot

Robot is required to transport objects between given points in an arena representing a typical office
environment. In this case the robot is required to transport objects from point A to B in the figure
below.
Figure 1: A schematic illustration of the transportation robot [3]

Page | 4
The center of the arena contains a staircase and elevators. These two regions are off-limits for the
robot. Thus, the robot is constrained to move in the corridors and offices. In the simulations, a
differentially steered robot with cylindrical cross section was used. The robot was equipped with
wheel encoders (for odometry), touch sensors, and a laser range finder (LRF) mounted on a pole.
1.6.2. Behaviours
Three behaviours were considered in the behavioural repertoire of this case srudy
✓ Path navigation (B1)

✓ Localization (B2) and
✓ Obstacle avoidance (B3).
In B1, the robot navigated through a sequence of waypoints generated by the combined use of a
grid-based map and an A* search algorithm. The last waypoint coincided with the target location.
Whenever the robot reached a target, a new target location was generated (and thus a new sequence
of waypoints). Furthermore, if B1 was deactivated, a new sequence of waypoints was generated
upon its reactivation, connecting the current (estimated) position to the target location. It should
be noted that B1 relies solely on odometry.
It is the purpose of the localization behaviour (B2) to maintain odometric accuracy by recalibrating
the odometric estimate of the robot’s position and heading. B2 is based on the readings of the (2D)
LRF which is mounted on a pole extending vertically from the top surface of the robot to avoid
including moving objects (if any) in the scan. The basic idea is to match the current readings of
the actual LRF to the readings of a virtual LRF placed (virtually) in various locations in the map
in the vicinity of the estimated position of the robot, and then to generate a new position estimate
using the best-matching virtual LRF reading.
In contrast with the two behaviours just described, the obstacle avoidance behaviour is very simple:
When activated by the behaviour selection system, this behaviour simply sets the speed of the
motors to equal, negative values so that the robot will move backwards in a straight line.
Page | 5
1.6.3. Behaviour selection

In this investigation, two hormone variables 𝑥1 𝑎𝑛𝑑 𝑥2 were introduced, and these were the only
state variables used for B1 and B2. For B3, the readings of the three touch sensors (𝑠1 , 𝑠2 𝑎𝑛𝑑 𝑠3 )
were used as state variables, together with the two hormone variables.
Thus, the utility function polynomials were specified as follows for B1, B2 and B3, respectively.
(1) (1) (1)

𝑈1 = 𝑈1 (𝑥1 , 𝑥2 ) = 𝑎00 + 𝑎10 + 𝑎20 + ⋯
(2) (2) (2)

𝑈2 = 𝑈2 (𝑥1 , 𝑥2 ) = 𝑎00 + 𝑎10 + 𝑎20 + ⋯
And
(3) (3) (3)

𝑈3 = 𝑈3 (𝑥1 , 𝑥2 , 𝑠1 , 𝑠2 , 𝑠3 ) = 𝑎00000 + 𝑎10000 + 𝑎20000 + ⋯
The maximum value of the hormone variables (𝑥𝑚𝑎𝑥 ) was set to 1 for both variables, eliminating
two parameters from the optimization procedure [3].
1.6.4. Simulations and result

In each evaluation, the robot was allowed to move for T = 150 s. The time step length was set to
dt = 0.01 s, and simulations were terminated if the body of the robot collided with an object (e.g.,
a wall), but not, of course, in cases where only the touch sensors were in contact with the object.
In each time step, the values of the five state variables 𝑥1 , 𝑥2 , 𝑠1 , 𝑠2 𝑎𝑛𝑑 𝑠3 were obtained, and
the utility values 𝑈1 , 𝑈2 𝑎𝑛𝑑 𝑈3 were calculated. Then, the robot activated (or kept active) the
behaviour with the highest utility value.
During optimization, the parameters determining the variation of the hormone variables
𝑥1 𝑎𝑛𝑑 𝑥2 as well as the parameters specifying the utility functions were encoded in two
chromosomes. Thus, the behaviour selection system used in a given evaluation was obtained in a
decoding step, during which the parameters were read off from the chromosomes. In this
application, the total number of parameters was equal to 100.
Page | 6
Figure 2: Variation in the utility values [3]

After optimization, the resulting robot was capable of carrying out the intended task, switching
between the three available behaviours at the correct moment. The robot spent about 27 % of its
time in B2 (localization) and almost all the remaining time in B1 (navigation). On rare occasions,
the obstacle avoidance behaviour was activated [3].
Figure 3: A 3D snapshot from the simulator, showing the robot in action [3]
Page | 7
1.7. Extended UF Method

The current version of the UF method described above is a pure arbitration method, i.e., it allows
only a single behaviour to be active at any given time. In tasks centered on locomotion, such as
navigation tasks, this approach is normally sufficient since a given actuator can only carry out one
particular movement at any given time.
However, in more complex tasks, a robot may be equipped with several non-motor (cognitive)
behaviours that may very well run concurrently with a motor behaviour. Thus, work is underway
to allow parallel activation of more than one behaviour. However, allowing parallel activation of
behaviours makes the procedure of activating appropriate processes even more complicated and
there are still many unresolved issues that must be solved before the extended utility function
(EUF) method is completed [3].
Page | 8
References
[1] Álvaro Castro-González,María Malfaz and Miguel A. Salichs, "Learning the Selection of
Actions for an Autonomous Social Robot by Reinforcement Learning Based on Motivations,"
International Journal of Social Robotics, vol. 3, pp. 427-441, November 2011.
[2] Hani Hagras and Tarek Sobh, "Intelligent learning and control of autonomous robotic agents
operating in unstructured environments," Information Sciences, 28 November 2001.
[3] M. Wahde, "A method for behavioural organization for autonomous robots based on
evolutionary optimization of utility functions," Journal of Systems and Control Engineering,
p. 249–258, 2003.
[4] D. Y. Yoon, S. R. Oh, G. T. Park, and B. J. You, "A behaviour-based approach to reactive
navigation for autonomous robot," IFAC Proc, vol. 35, p. 109–114, 2002.
[5] J. Sequeira and M. Isabel Ribeiro, "Behavior-based control for semi-autonomous robots,"
IFAC Proc, vol. 37, pp. 18-23, 2004.
[6] G. Martius and E. Olbrich, "Quantifying emergent behavior of autonomous robots," vol. 17,
pp. 7266-7297, 2015.
[7] N. B. Santos, R. S. Bavaresco, J. E. R. Tavares, G. de O. Ramos and J. L. V. Barbosa, "A

systematic mapping study of robotics in human care," vol. 144, 2021.
[8] I. Cos, I. Cos-aguilera, L. Canamero and G. Hayes, "Learning Object Functionalities in the
Context of Behaviour Selection," 2014.
Page | 9

Endalkachew - D and Zekarias - G Case Study

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Endalkachew - D and Zekarias - G Case Study

Uploaded by

Copyright:

Available Formats

ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY

COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING

ELECTROMECHANICAL ENGINEERING DEPARTMENT

Modelling and Simulation of Mechatronic Systems (EMEg-6422) Project-Two

Case Study on Utility Function Method for Behaviour Selection in

1. Endalkachew Degarege………………... GSR230/14

2. Zekarias Girma……………………….... GSR237/14

Submitted to: Dr. Beteley Teka

Abstract ......................................................................................................................................... III

1. Utility Function Method for Behaviour Selection in Autonomous Robots ............................. 1

1.1. Introduction ...................................................................................................................... 1

1.2. Behaviour Selection ......................................................................................................... 1

1.3. Behavioural selection in robots ........................................................................................ 2

1.4. Utility function method .................................................................................................... 3

1.4.1. Utility concept ........................................................................................................... 3

1.4.2. The utility function method concept ......................................................................... 3

1.5. General guidelines for utility function Method ................................................................ 3

1.6. Case study - a Transportation Task................................................................................... 4

1.6.1. Required Task of the Robot ...................................................................................... 4

1.6.2. Behaviours ................................................................................................................ 5

1.6.3. Behaviour selection ................................................................................................... 6

1.6.4. Simulations and result ............................................................................................... 6

1.7. Extended UF Method ....................................................................................................... 8

1. Utility Function Method for Behaviour Selection in Autonomous Robots

1.2. Behaviour Selection

1.3. Behavioural selection in robots

1.4. Utility function method

1.4.2. The utility function method concept

1.5. General guidelines for utility function Method

❖ Specify the configuration for the simulations:

1.6. Case study - a Transportation Task

1.6.1. Required Task of the Robot

Figure 1: A schematic illustration of the transportation robot [3]

✓ Path navigation (B1)

1.6.3. Behaviour selection

(1) (1) (1)

(2) (2) (2)

(3) (3) (3)

1.6.4. Simulations and result

Figure 2: Variation in the utility values [3]

1.7. Extended UF Method

[7] N. B. Santos, R. S. Bavaresco, J. E. R. Tavares, G. de O. Ramos and J. L. V. Barbosa, "A

You might also like