You are on page 1of 7

Proceedings of the 2017 IEEE

International Conference on Information and Automation (ICIA)


Macau SAR, China, July 2017

Simulation of Intelligent Traffic Control for Autonomous


Vehicles
Terje Kristensen and Nnamdi Johnson Ezeora
Department of Computing, Mathematics and Physics
Western Norway University of Applied Sciences
Bergen, Norway
terje.kristensen@hvl.no & nnjoh9@gmail.com

Abstract— Urban cities are getting more congested with Q-Learnning (QL) [5] is a learning algorithm for agents
vehicular traffic and most of the traffic control systems are has been modelled based on the reinforcement algorithm.
not smart to detect and give priority to emergency vehicles. The following questions are addressed in this paper: Are
The effect results to inadequate services delivered by the public
emergency agencies, and unnecessary traffic congestion to other reinforced Java Agent Development Framework (JADE)
road users at intersection points. In this paper we present an [6] agents and Simulation of Urban Mobility (SUMO) [7]
effective reinforced road traffic control policy that reduces the relevant tools for traffic simulation? Can SUMO emergency
waiting time of emergency vehicles at road intersection points, vehicles be given priority when approaching intersection
and also reduce the travel time of other vehicles. points? It is pertinent to highlight some of the challenges
The work involved simulation of traffic control at an in-
tersection using a multi-agent system development framework facing road traffic control with focus on intersection points.
(JADE) and an agent-based traffic simulator (SUMO). In order Wolfgang Lutz [8] has projected a 20% probability
to exploit the full potential of SUMO, a third tool (TraSMAPI) increase in the world population size in the year 2050. This
was used to connect JADE and SUMO and also, to provide a is based on the peak increase in population size at the end of
higher abstraction of SUMO. In this way, the simulation is not every year [8]. Consequently, the growing number of human
restricted to only what SUMO can offer, but also permits us to
control and manipulate the behaviour of the simulation runs. population has also been translated to a global increase in
The result shows that intelligent traffic control fulfils its the road traffic across the world as more people are travelling
objectives by significantly reducing the travel time of emergency for either economic purposes or personal needs. The ability
vehicles or other normal vehicles. It is shown in the paper that to control this expected increase of road traffic especially at
intelligent traffic control is at least 96% better in reducing the intersections is becoming quite challenging. This in effect
waiting time of emergency vehicles at intersections than non-
intelligent fixed traffic control. has led to loss of productivity, and deterioration in the
living standards of urban city dwellers due to high carbon
Index Terms— SUMO, intersection, traffic light, multi-agent emissions.
system, emergency vehicles Researchers have applied different approaches to deal with
problems related to traffic control for autonomous vehicles.
I. I NTRODUCTION In [9] a non-traffic light control system was presented. The
The mathematical theory of Markov decision principle system is based on the principle of reservation based control
(MDP) [1] is considered as the integral breakthrough in algorithm. However, in this paper, we choose to return
solving decision-making problems which commonly occurs traffic lights that are reinforced using the QL algorithm.
in a time discrete stochastic games where values computed We are also going further to model different vehicle types
at each stage of a program transition, are dependent on the using SUMO vehicle type demand modelling. In addition, a
values calculated on its previous stages. To illustrate this new traffic control policy that grants priority to emergency
principle, let us assume that in a time discrete stochastic vehicles to cross intersections has also been developed.
process, where each process is associated with a given Other similar approaches to traffic control using
state, a decision maker must choose an action available reinforcement learning were described in the work of
in this state. Any action that is chosen moves the process Arel et al. [4]. They modelled a traffic light scheduling
into a new state with a reward function awarded to the system using Multi-agent Reinforcement Learning (MARL)
decision maker [2]. MDP is applied when solving many [4]. In their work, a novel use of a multi-agent system
optimization problems in the fields of Artificial Intelligence and the RL algorithm to obtain what was adjudged as an
(AI), economics, manufacturing, etc. Unquestionably, efficient traffic signal control policy.
Dynamic Programming [3] and Reinforcement Learning
(RL) [4] algorithms are used in solving many MDP decision
problems such as traffic control.

459
978-1-5386-3154-6/17/$31.00 ©2017 IEEE
Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.
II. M ULTI - AGENT S YSTEM
Q(s, a) = Q(s, a) + α[r + γmaxa0 Q(s0 , a0 )
Multi-agent system is composed of multiple interacting (1)
− Q(s, a)]
computing elements known as agents [10]. Wooldridge and
Jennings [11] described an intelligent agent as ”hardware or where a is the action executed while in state s leading to
(more usually) a software-based computer system that has the subsequent state s0 and yielding a reward r. α is the
the following properties”: learning rate and γ is the discount factor
• Reactivity: Intelligent agents perceive and respond in a
timely fashion to changes that occur in their environment 1) State Definitions: Several RL approach for traffic con-
in order to satisfy their design objectives. trol defines states as the real traffic lights states (red, green,
• Pro-activeness: The ability to exhibit goal-directed be- yellow) responsible for controlling different lanes in a road
haviour by taking the initiative in order to satisfy their network. However, in our approach, we defined our states
design objectives. based on the different behaviours of the environment. The
• Social ability: Intelligent agents are capable of interact- reason for doing so is to minimize the complexity of com-
ing with other agents (and possibly humans), through putations. For instance, learning the traffic light state that
negotiation and/or cooperation, to satisfy their design gives priority to an emergency vehicle will lead to a high
objectives. exploration within the system. This is so because, emergency
• Autonomy: Agents operate without the direct interven- vehicles approach intersections at a non-fixed time intervals
tion of humans or others, and have some kind of control and also, a large number of vehicles is needed to make a
over their actions and internal state. meaningful simulation.
Therefore, two states were modelled and they simply define
A. Agent Environment the behaviours of the environment. The first behaviour defines
the traffic light states and phases when emergency vehicles
Agent environment is considered as everything that sur-
are not perceived at intersection. The second behaviour(non-
rounds the agent that is not part of the agent itself [12].
fixed) defines what happens when an oncoming emergency
Environment plays a major role in an agent life cycle. The
vehicle is perceived approaching an intersection.
roles are bidirectional, in the sense that the environment has
2) Traffic light Actions: The function of the traffic light
an effect on the agent behaviour and the agent, in turn, can
agents is to provide a well coordinated sequence of traffic
alter the environment. The intertwined relationship is clearly
control plans using the traffic lights. Thus, the traffic light
exhibited by intelligent agents situated in a dynamic environ-
agents were programmed to learn the behaviour that max-
ment; a scenario where an agent next action is dependent on
imizes reward. A non-fixed behaviour has a higher reward
the changing state of its environment which is triggered by its
value while a fixed behaviour has a lesser reward value. Table
current action. An agent environment may consist of more
I defines the control plan of a fixed behaviour, where P hase
than two independent interacting objects. For instance, the
is the number of navigations or manoeuvres at an intersection
following objects can be seen in traffic control environment:
and Duration is the total signal time for each phase. The
junctions or intersections, traffic lights, traffic signal, moving
intersection consists of 12 control or phase states labelled s1
vehicles, lanes, etc.
to s12 as shown in Tables I and II. Each phase is allocated a
B. Reinforcement Learning using greedy strategy fixed period of time with a combination of traffic light colour
schemes that represents the navigation state as illustrated in
Reinforcement learning is an unsupervised machine learn- Fig. 1. The navigation options are controlled by the traffic
ing algorithm which is goal-directed. Agents act in their light controller using the control policy shown in Fig. 2.
environment based on the immediate reward value associated The traffic light colours G, r, g, y stands for green (”GO”),
with every state in the environment. Thus, agents tend to red (”STOP”), flashing yellow (”GO WITH CAUTION”) and
map actions to states based on a numerical reward value. yellow (”SAFE STOP”) respectively.
The high value increases the probability of taking action. So,
the learning process is continuous and agents have explicit TABLE I
goals they want to accomplish. F IXED CONTROL PLAN
The Q-learning algorithm implemented selects action to Phase S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 duration
perform based on a greedy policy [13]. Only the action that 1 G G G G r r r r r r r r 31
2 y y y y r r r r r r r r 4
guarantees the highest value function is selected at each step 3 G r r G G G G r r G r r 31
4 y r r y y y y r r y y y 4
of probabilistic decision making. Therefore, the task facing 5 G r r r r r G G G G r r 31
6 y r r r r r y y y y r r 4
the agent is to find a maximum value. One that maximises 7 G r r G r r G r r G G G 31
the total discounted expected reward, γ. The Q-value is 8 y r r y r r y r r y y y 4

calculated using (1).

460

Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.
(a) First phase - Vehicles approach- (b) Second phase - A normal vehi-
ing an intersection. cle has approached the intersection
before the emergency vehicle.

Fig. 1. An example of intersection SUMO.

(c) Third phase - Normal vehicle (d) Fourth phase - The emergency
denied to cross regardless it was vehicle has crossed the intersection.
first and its traffic light indicator
was initially green.
Fig. 3. An Example of Intelligent traffic control at an intersection.

change in the environment when an emergency vehicle is


perceived as illustrated in Fig. 3 (only the first four phases
are demonstrated). However, the fixed behaviour is restored
in scenarios where two or more emergency vehicles approach
an intersection while lanes without emergency vehicles stay
in a g state to avoid collision. Fig. 4 shows the sequence
diagram for emergency vehicle detection.
III. M ULTI - AGENT D EVELOPMENT USING JADE
F RAMEWORK
Multi-agent systems have been efficiently used in sim-
Fig. 2. Traffic control policy ulating complex problems that occur in the real world.
Results generated from the simulations are compared to real
world scenarios and they further help in finding solutions
Conversely, a non-fixed behaviour will alert other agents to problems that are costly or risky to be modelled using
to change their traffic light states. The traffic light for the practical engineering methods. Correspondingly, the JADE
lane with oncoming emergency vehicles is switched to green framework has been used in this work to stimulate coopera-
while others will move to a red state. Thus, the change in tion of intelligent traffic light agents controlling the traffic at
traffic control plan shown in Table II is as a result of the intersections. However, the objective of the simulation using
multi-agent systems is to create a traffic generated model that
should provide an imitation of a real road traffic scenario.
TABLE II JADE is a popular software framework for programming
N ON - FIXED CONTROL PLAN and developing interoperable multi-agent systems that can be
Phase S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 duration distributed over connected host networks. The goal of Tilab
1 G G G G r r r r r r r r 31 [14], developers of JADE, is to simplify the development
2 y y y y r r r r r r r r 4
3 G r r G r r G r r G G G 4 of agents that are in compliance with FIPA standardization
4 y r r y r r y r r y y y 6
5 G r r G G G G r r G r r 31 specifications [15]. JADE possesses many intriguing features
6 y r r y y y y r r y r r 4
7 G r r r r r G G G G r r 31 which have earned it as the choice of most researchers. These
8 y r r r r r y y y y r r 4 features were well defined by Fabio et al. according to [6]:”

461

Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.
uttered. An example is ”I declare you husband and wife”.
Austin discovered that some English words possess action
just by mere utterance. They are categorised as illocutionary
acts [16] and has been adopted by FIPA as a group of
performatives for conversation management. Some examples
are: request, confirm, inform, refuse, etc.

IV. AGENT-D IRECTED T RAFFIC S IMULATION WITH


SUMO
SUMO is a portable, microscopic agent-based traffic sim-
ulator designed to handle large road networks [7]. However,
it is not only a traffic simulator, it is also an application suite
which can be redesigned or configured to simulate traffic of
different scenarios [17]. SUMO is available as an open source
to help many researchers to further contribute and improve
the framework. SUMO can be used both for a simple and
complex traffic simulations which are bundled with a remote
control interface (TraCI) to adapt the simulations online, and
connect it to external applications [17]. The interoperability
of SUMO with a multi-agent framework was demonstrated
Fig. 4. A sequence diagram of emergency vehicle detection.
in a recent research work that involves the running of JADE
agents on the SUMO environment [18].
• A directory facilitator (DF), which provides a yellow A. Generating Road Network
pages service that allows agents to register their services,
and locates other agents based on the services they Building a road network is the first stage when starting
provide. a SUMO simulation demo. Roads in SUMO consist of
• An agent management system (AMS) for creating and nodes (junctions) and edges (or roads connecting to various
destroying agents. junctions) which are connected together using a connection
• Transport mechanism and interface to send/receive mes- file. On the whole, a SUMO road network file with extension
sages to/from other agents. .net.xml comprises edges, nodes (junctions), connections
• Light-weight transport of ACL messages inside the same and traffic light control (if specified).
agent platform, as messages are transferred, encoded as Creating a road network can be done through the following
Java objects, rather than strings. three ways:
• GUI to manage several agents and agent platforms from • Manually creation
the same agent.” • Using netgenerate tool
• Using netconvert tool. It is widely used when generat-
A. Communication in JADE ing network from OpenStreetMap [19]
Communication in JADE is based on ACL which complies
with FIPA standards. Agents communication is essential to B. Modelling SUMO Vehicles
facilitate agent interaction and cooperation. JADE uses differ- A SUMO route file which is generated using duarouter
ent FIPA interaction protocols which provide an admissible command tool contains data of all the traffic and vehicles
sequence of message exchanges between the agents. The that will flow on a generated road network. Therefore, route
interaction protocols found in [15] are quite exhaustive, but files contain all the vehicular information that will steer
the choice of protocol depends on the designer and the in the SUMO environment. The vehicle types are defined
intended goal. Each interaction protocol contains a sets of using these attributes: colour, maximum speed, acceleration,
performatives. vehicle class (like bus, passenger, emergency, train, bicycle,
FIPA-Request interaction protocol is used and is imple- etc.), the length of vehicles, etc. The speed of the each
mented in two different ways: first, to alert and request other vehicle is calculated using a car-following model given in
traffic light agents to cooperate whenever an oncoming emer- [7]. The car-following model computes the vehicle speeds
gency vehicle is perceived in the environment and secondly, by taking account of the vehicle speed, its distance to the
to request reward after cooperation. leading vehicle and the leading vehicle distance as expressed
According to authors in [16], performatives are expressions in (2) [20]. This equation is regarded as safe velocity (vsaf e )
or statements performing an act by the very fact of being [20]. This model is implemented by default in SUMO and it

462

Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.
is indeed, very essential to avoid vehicles collision along the
edges.

gn (t) − vn−1 (t)τ


vsaf e = vn+1 (t) + vn (t)+vn−1 (t) (2)
2b +τ
where gn (t) is the gap distance between vehicles, vn (t) is
speed of the following vehicle at time t, vn−1 (t) is speed of
the leading vehicles at time t, b is the desired acceleration of
the vehicle and τ is the reaction time.

C. TraCI
TraCI is a transmission control protocol (TCP) based
client/server architecture that grants access to external ap-
plications connecting to SUMO. It also makes interfacing Fig. 5. The system architecture.
with Python scripts possible and useful in modelling adaptive
traffic control system [21]. TraCI is used in this work to
establish connection between JADE (external application) 1) Experimental Setup: The simulation was composed of
and SUMO simulator through TraSMAPI (interprets and two lanes (there are no pedestrian or bicycles) spanning in
exchange commands with TraCI). In this case, SUMO acts an opposite directions and connected by five junctions that
as the server, and it is started by adding this command line: replicate a 2 x 2 grid network. The speed limit across each
− − remote − port < IN T > in the SUMO configuration lane is 15 metres per second (15 m/s). A total number of
file. Where < IN T > is the port number. 300 vehicles are loaded, and the simulation is run for at least
1100 time steps with a delay of 120 m/s. Results for the
D. TraSMAPI
experiment are generated using this SUMO output command:
TraSMAPI is an open source application programming − − tripinf o − output < f ile >, where f ile specifies the
interface implemented in Java that offers complete integration directory for the output dump.
between multi-agent system and simulator interaction mod-
2) Results: The distribution of each vehicle types was
ules. Therefore, in order to manipulate the agents in a SUMO
varied in order to have an in depth understanding of the traffic
traffic simulator and make a higher abstraction of SUMO a
system behaviour. Generally, in real life, a road network
tool is needed to provide both the interface for development
with only one emergency vehicle would always have the
of multi-agent and communicating with the SUMO server.
least travel time even when no priority is awarded to it.
Thus, TraSMAPI developed by the Artificial Intelligence and
For this reason, the vehicle distribution levels was varied as
Computer Science Laboratory (LIACC), at the University of
shown in Table III. At level 4, the vehicle types were equally
Porto is the right tool to provide the necessary abstractions
distributed (the number of normal and emergency vehicles are
[22].
equal) which is a rare scenario in real world. Notwithstanding
Therefore, a real time connection between the JADE
that the distribution is uniform, emergency vehicles managed
traffic light agents and its corresponding visualization on
to complete their trips ahead of the normal vehicles. This
the SUMO simulator is realized through a TCP socket con-
significant performance was also recorded in levels 1, 2 and
nection between TraCI and TraSMAPI. In fact, through this
3 as shown in Fig. 6.
communication, important data are retrieved (eg. oncoming
emergency vehicle identity) and used for traffic control by
the external application. Fig. 5 illustrates the architecture of
TABLE III
the system, derived from the TraSMAPI core architecture.
AVERAGE TRAVEL TIME ( ROUNDED OFF TO THE NEAREST WHOLE
NUMBER ) BASED ON VEHICLE TYPES
V. E XPERIMENTS AND R ESULTS
A. Average Travel Time Level Vehicle type Number of vehicles Total (seconds) Average (seconds)
Normal 184 38797 211
1
In this Section, the total average travel time spent by the Emergency 16 1335 83
Normal 155 27830 180
two different vehicles types in a simulation run is calculated. 2
Emergency 45 4490 100
In the setup, vehicles are only allowed to halt when approach- 3
Normal 204 28254 139
Emergency 96 11569 121
ing a vehicle queue or an intersection and also, vehicles are Normal 150 19091 127
4
not permitted to overtake each other when queued up. Emergency 150 16671 111

463

Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.
300
normal vehicle SUMO output edge based traffic measure files were used to
emergency vehicle
250 generate separate data for the two different control policies.
Average Travel Time (s)

The simulation was run for at least 1000 time steps and
200
results were collected at intervals of 50 - 1000 time steps.
150 2) Results: The result suggests that intelligent traffic
control outperformed non-intelligent fixed plan. Although
100
fixed control performed slightly better at edge ”0/2to0/1”
50 as illustrated in Fig. 8 which is as the result of high number
of vehicles routed along the edge. However, the experiments
0
1 2 3 4 show that on the whole, intelligent control performance is at
Level (Vehicles)
least 33% better than a fixed control plan.
Fig. 6. Average travel time, Normal vs. Emergency vehicle plotted against
various vehicle distribution levels. VI. D ISCUSSION
The first research question was to confirm or not the
hypothesis that emergency cars are prioritized. In the fourth
B. Average Waiting Time at Intersection
level of the result presented in Fig. 6, the network was
Having shown that the system reduces total travel time for populated with an equal number of the two vehicle types.
the emergency vehicles. We want to further show how much Impressively, emergency cars completed their journey by a
time is spent by each vehicle types waiting at an intersection. factor of 1.14 seconds less than the arrival time of normal
Results obtained in this experiment would tell if the system vehicles. In fact, such an even distribution rarely occurs in
is smart. In this experiment, the following restrictions were a real world scenario. Similarly, a better performance was
maintained: no change of lanes, no halting unless when in a obtained when the number of emergency cars are less than
queue, no turning except at intersection and no overtaking.
1) Experimental Setup: The simulation environment was
3,000 normal vehicle
modelled as a 2 x 2 grid network, and 300 vehicles were emergency vehicle
loaded into the network. The vehicles were distributed and 2,500
Total Waiting Time (s)

emergency cars accounted for 25% of the total number of 2,000


vehicles. This was done because in a real world, emergency
1,500
vehicles hardly constitute over 25% of all the vehicles in a
road network, except in a disaster management. 1,000
Two SUMO output edge based traffic measure files were
500
generated using an additional file. Each of the files contain
all the output traffic data that took place in the whole network 0
0/0to0/1

0/0to1/0

0/1to1/1

0/2to0/1

0/2to1/2

1/0to1/1

1/1to0/1

1/1to1/0

1/1to1/2

1/1to2/1

1/2to1/1

2/0to1/0

2/0to2/1

2/1to1/1

2/2to1/2

2/2to2/1
edges based on a particular vehicle type. The vehicle speed
was limited to 15 m/s and the simulation was run for at least
Edge ID
1000 time steps with a delay of 120 m/s. The results were
collected at intervals of 50 - 1000 time steps. Fig. 7. Total waiting time on the lanes connected to an intersection plotted
2) Results: The results of this experiment as shown in Fig. against the edge ID’s.
7 suggests that emergency vehicles spend less time waiting
Fixed
at an intersection. These results were collected on 16 edges 3,000
Intelligent
with traffic light control out of total 24 edges of a 2 x 2 grid 2,500
Total Waiting Time (s)

network.
2,000

C. Intelligent vs. Fixed Control 1,500

In this experiment, the reinforcement of intelligent control 1,000


was removed and subsequently modelled to a fixed time
500
traffic control policy similar to default traffic control imple-
mentation in SUMO. 0
0/0to0/1

0/0to1/0

0/1to1/1

0/2to0/1

0/2to1/2

1/0to1/1

1/1to0/1

1/1to1/0

1/1to1/2

1/1to2/1

1/2to1/1

2/0to1/0

2/0to2/1

2/1to1/1

2/2to1/2

2/2to2/1

1) Experimental Setup: A 2 x 2 grid network consisting


of 16 edges with traffic light control out of the entire 24
Edge ID
edges in the network was modelled. 300 vehicles were loaded
into the simulation in order to carefully monitor the system Fig. 8. Total waiting time of all vehicles (Fixed vs Intelligent) at an
behaviour. Vehicular speeds were restricted to 15 m/s and two intersection plotted against the edge ID’s.

464

Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.
the number of normal vehicles. [4] I. Arel, C. Liu, T. Urbanik, and A. Kohls, “Reinforcement learning-
Furthermore, by testing the waiting time of emergency based multi-agent system for network traffic signal control,” Intelligent
Transport Systems, IET, vol. 4, no. 2, pp. 128–135, 2010.
vehicles separately for both fixed and intelligent control, it [5] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8,
is clearly shown that the last one has a better performance no. 3-4, pp. 279–292, 1992.
(96%). Thus, the result provides an additional confirmation [6] F. Bellifemine, A. Poggi, and G. Rimassa, “Jade–a fipa-compliant agent
framework,” in Proceedings of PAAM, vol. 99, no. 97-108. London,
that emergency vehicles are always given priority. 1999, p. 33.
Finally, it is pertinent to note that SUMO, TraSMAPI and [7] M. Behrisch, L. Bieker, J. Erdmann, and D. Krajzewicz, “Sumo–
JADE are tools relevant to use in traffic control simulations. simulation of urban mobility,” in The Third International Conference
on Advances in System Simulation (SIMUL 2011), Barcelona, Spain,
Through this work, it has become very apparent that SUMO 2011.
is very robust, flexible and offers ease in usage. The mod- [8] W. Lutz, W. Sanderson, and S. Scherbov, “The end of world population
elling of different vehicle types used in this work, is one of growth,” Nature, vol. 412, no. 6846, pp. 543–545, 2001.
[9] T. Kristensen and K. Smith, “Intelligent traffic simulation by a multi-
the main advantages of SUMO. agent system,” in 2015 Third World Conference on Complex Systems
(WCCS). IEEE, 2015, pp. 1–7.
VII. C ONCLUSION [10] M. Wooldridge, An introduction to multiagent systems. John Wiley
& Sons, 2009.
The purpose of the current study was to model and [11] M. Wooldridge and N. R. Jennings, “Intelligent agents: Theory and
make simulations of an effective traffic control system for practice,” The knowledge engineering review, vol. 10, no. 02, pp. 115–
autonomous vehicles at intersections, that also gives prior- 152, 1995.
[12] W. J. Teahan, Artificial Intelligence–Agents and Environments. Book-
ity to emergency vehicles. The system developed is also Boon, 2010.
using SUMO as a microscopic traffic simulator,TraSMAPI [13] S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence
to provide a higher abstraction of SUMO and JADE for results for single-step on-policy reinforcement-learning algorithms,”
Machine learning, vol. 38, no. 3, pp. 287–308, 2000.
the development of traffic light agents. To ensure that the [14] JADE, http://jade.tilab.com/, 2016, [Online; accessed 08-February-
traffic light being more intelligent, the traffic light agents are 2016].
reinforced using Q-Learning. Two vehicle types that mimic [15] FIPA, http://www.fipa.org/repository/ips.php3, 2002, [Online; accessed
07-March-2016].
normal passenger and emergency cars were modelled using [16] J. R. Searle, Speech acts: An essay in the philosophy of language.
the SUMO tools. Experimental results from this work show Cambridge university press, 1969, vol. 626.
that intelligent control reduces the waiting time of emergency [17] D. Krajzewicz, J. Erdmann, M. Behrisch, and L. Bieker, “Recent
development and applications of sumo–simulation of urban mobility,”
vehicles at intersection points, and even in severe scenarios International Journal On Advances in Systems and Measurements,
where the number of emergency cars shares equal distribution vol. 5, no. 3&4, 2012.
to the normal vehicles. [18] G. Soares, J. Macedo, Z. Kokkinogenis, and R. Rossetti, “An integrated
framework for multi-agent traffic simulation using sumo and jade,” in
Furthermore, the system can be used by transport engineers SUMO2013, The first SUMO user conference, 2013, pp. 15–17.
in the design and planning of a road network that gives pri- [19] OSM, http://www.openstreetmap.org, 2016, [Online; accessed 01-
ority to emergency vehicles. This may help the engineers to January-2016].
[20] V. Kanagaraj, G. Asaithambi, C. N. Kumar, K. K. Srinivasan, and
analyze traffic performance and strategies needed to improve R. Sivanandan, “Evaluation of different vehicle following models under
the travel time and the safety emergency or other vehicles in mixed traffic conditions,” Procedia-Social and Behavioral Sciences,
general. vol. 104, pp. 390–401, 2013.
[21] K. Pandit, D. Ghosal, H. M. Zhang, and C.-N. Chuah, “Adaptive traffic
signal control with vehicular ad hoc networks,” Vehicular Technology,
VIII. F URTHER W ORK IEEE Transactions on, vol. 62, no. 4, pp. 1459–1471, 2013.
In a future work, the system may be extended to assign [22] I. J. Timóteo, M. R. Araújo, R. J. Rossetti, and E. C. Oliveira,
“Trasmapi: An api oriented towards multi-agent systems real-time in-
the emergency vehicles a separate lane of the network. teraction with multiple traffic simulators,” in Intelligent Transportation
Also, the traffic control algorithm can also be improved by Systems (ITSC), 2010 13th International IEEE Conference on. IEEE,
incorporating pedestrians and trams in the system. 2010, pp. 1183–1188.

ACKNOWLEDGMENT
The two authors contributed equally to this work and
would like to offer special thanks to the SUMO-user Source-
forge online community for their helpful suggestions.
R EFERENCES
[1] M. L. Littman, “Markov games as a framework for multi-agent
reinforcement learning,” in Proceedings of the eleventh international
conference on machine learning, vol. 157, 1994, pp. 157–163.
[2] B. Givan and R. Parr, “An introduction to Markov decision processes,”
Purdue University, 2001.
[3] D. P. Bertsekas, Dynamic programming and optimal control. Athena
Scientific Belmont, MA, 1995, vol. 1, no. 2.

465

Authorized licensed use limited to: Hochschule Coburg. Downloaded on October 27,2023 at 09:22:58 UTC from IEEE Xplore. Restrictions apply.

You might also like