Professional Documents
Culture Documents
A comparison of
genetic algorithm
and reinforcement
learning for
autonomous driving
KTH Bachelor Thesis Report
<Ziyi Xiang>
En jämförelse mellan
genetisk algoritm och
förstärkningslärande
för självkörande bilar
<Ziyi Xiang>
The research problem could be formulated as such: How is the learning efficiency
compared between reinforcement learning and genetic algorithm on autonomous
navigation through a dynamic environment?
Keywords
i
Abstract
I det här papperet jämförs två olika metoder, förstärkningsinlärning och genetisk
algoritm för att designa autonoma bilar styrsystem i en dynamisk miljö.
Keywords
ii
Acknowledgements
I would like to offer my special thanks to my supervisor Jana Tumová as well as the
examiner Örjan Ekeberg. Assistance provided by Jana was greatly appreciated. I
would also like to extend my thanks to Shuai Wu, Oskar Nehlin and Wen Yin for
the great advices.
iii
Authors
Stockholm, Sweden
Examiner
The Professor
Örjan Ekeberg
KTH Royal Institute of Technology
Supervisor
The Supervisor
Jana Tumová
KTH Royal Institute of Technology
Contents
1 Introduction 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Theoretical Background 3
2.1 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Methods 11
3.1 Measurement evaluation . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 The simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 The agent’s control system . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 The reinforcement learning’s car controller . . . . . . . . . . . . . . 17
3.5 The genetic algorithm’s car controller . . . . . . . . . . . . . . . . . 20
4 Result 24
4.1 Reinforcement learning test . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Genetic algorithm test . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Conclusion 28
6 Discussion 28
6.1 safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Implementation difficulty and time cost . . . . . . . . . . . . . . . . 29
6.3 Further study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
References 31
v
1 Introduction
Autonomous driving has many debates among experts. It has good potential
for better safety over human drivers. The cars cannot get distracted and always
obeying traffic rules.
Autonomous cars are already being developed by many companies like Volvo,
Tesla, and Google. [6, 14]. The cars use sensors like radar or camera to observe
the environment. The movement control system predicts the environment based
on sensors’ observations and makes the decisions of movement control. These
solutions can still to be optimized, due to full automation has not been achieved.
Most of them are in a partially automated level. [2]
Autonomous cars are closely associated with machine learning and artificial
intelligence. The machines in the industry need to be smarter in order to
be able to accomplish the tasks with increasing size of difficulty. Machine
learning considers being a technology that helps robots learn and interact with
the environment.
Deep reinforcement learning is widely used and is a very efficient way to design
the AI behaviors.[5] On the other hand, the genetic algorithm has also been proved
to be a successful technique for the optimization of automatic robot and vehicle
guidance. [4]Both methods require a large sample data set to learn which can lead
to an expensive cost for producing sample data. [4, 5]
The purpose of this study is to provide a safe driving condition with fewer learning
samples.
1
1.1 Problem statement
1.2 Delimitations
1
A term which describes the robots used in machine learning
2
2 Theoretical Background
This study focus on reinforcement learning and genetic algorithm for designing
a car control system. Both methods are capable of decision making and
optimization problems. The action predictions is a decision-making problem
based on observation. An optimized solution of such a problem can be obtained
by the optimization algorithm. [4, 5]
In this section, the paper introduces some basic theoretical concepts and research
finding for both algorithms.
The state is defined by the agent itself and the surrounding environment around
the agent, such as velocity, position, mass or distance to obstacles. In a real life
scenario, the environment could be observed by camera and sensor, data obtained
from these observations in turn will represent the states of the vehicle.
The agent can take different action in a given state, such as accelerate and rotate.
The action set S defined by the available choices agent can make in a particular
state.
A transition function:
P (s, a, s’)
3
The transition function is when we take the current state as the starting point,
taking an action and the probability it lands to the status s’.
A discount factor γindicates how much future rewards agents should care about
compared to current rewards.
A reward function
R(s, a, s’)
The reward function represents the reward when agents take an action in state s
and land to the new state s’.
The agent constantly observes the environment and acts base on these
observations. After the agent performs some action, the reward associated with
observation of what just happened will be calculated. The policy2 of action is
updated based on what the agent has learned, where future decision-making is
influenced by previous attempts. This creates a feedback cycle and the process
repeat until it finds a policy to maximize its reward.
∑
H
maxπ E[ γ t R(st , at , st+1 )|π]
t=0
[1]
4
Dynamic programming and policy optimization are two keys to solving MDP
problem.[1, 10]
A famous method for solving MDP using the concept of dynamic environment
is called Q-learning. Q-learning learns the policy by trial and error from storing
data. It uses past experiences to calculates the expected reward for each action in
a given state and iterative update the change of reward.
The problem of using dynamic programming is the huge time and space
complexity when problem size scales up. If the states and actions are large or
infinite the memory is not enough to store all data and the calculation becomes
slow.[1]
Policy optimization is a method where the agent directly learns the policy function
without calculating the reward in each state.[10] The algorithm acts with the
current policy and improves it through learning. The policy is not forced to choose
the action that gives the highest reward. Instead, it uses some probability function
to randomly select an action to discover a better solution.
5
The objective of policy optimization is to find a policy function π with parameter
θ which maximized the total reward.
θ is a parameter or weight vector for the policy π. It can use gradient descent
function3 to update the policy by changing.
By using policy optimization, the algorithm will increase the probability of taking
action that gives higher reward and decreases the probability of action which is
worse than the latest experience. It calculates the performance of policy and uses
it to influence the next iteration.
The optimization method will slightly change the θ of the policy based on the latest
performance. Unlike q-learning, it does not store the offline data in memory.
It learns directly from what the agent is acting. Once policy updates, the old
experience will be discarded.
In policy optimization, the sampling operation is not efficient due to the data only
update one performance and discards the data after use. Moreover, the result
is not stable due to the large changing distribution of observation and policy.
If the program takes a step too far from the previous policy, it will change the
entire distribution of acting in the environment. Recovering the old policy will be
difficult and the policy could then ends up in a bad position. Therefore, a more
stable algorithm is required.
PPO can be used to limit the policy update by defining a maximum distance which
is called a region. The algorithm optimizes the local approximation and finding
3
An optimization method that iterative adding or subtracting value to find the optimal input to
function which optimize the result.
6
the optimal point in this bounded region as a result updated policy can no longer
move too far away from the old policy.
The deep reinforcement learning is based on the neural networks. The neural
network is a matrix network system spired by the biological neural network and
animal brains.[3]
The input layer is a set of digits representing the state or observations. The
information passes through the network and maps into the actions which return
from the output layer.
7
For each pair of connecting neurons iandj, there is a weight value, which is given
between two neurons. With each update, the neurons from the left layer update
its connecting neurons by adding connecting neuron’ values times the weights of
connection. [3]
Once the neuron from the input layer receives the inputs, they will update
the connecting neuron on the right by multiplying its own value and weight
connection value, then adds to the target value. This iteration repeats until all
inputs digits have passed through the network. We apply mathematics function
like Sigmoid function[3] to limit the value between 0 and 1. This is used for some
models where we have to predict some probabilities in range 0 and 1.
Sigmoid function:
g(z) = 1/(1 + e−z )
The connecting layers between the input layer and the output layer are called
hidden layers.[3] The hidden layers are mainly used to increase the complexity
of the matrix, which gives the neural network an opportunity to create a solution
with more advanced mapping. It is especially useful for solving large and complex
problems, but an increased complexity can also lead to an increased difficulty for
learning.
There are many variants of the neural network, while the basic concepts are
similar. After observing the result, the network adjusts the value of the weight
along the path using backpropagation. The detail of backpropagation is irrelevant
on understanding the following chapters, thus will not be discussed here. More
detail and further explanation is referred to chapter 4 of the book ”Artificial
Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep
Learning” by James and Stone. [11]
After training the network, it should be able to find the pattern of inputs and map
them to the actions.
8
2.2 Genetic algorithm
Genetic algorithms start from a set of randomly generated solutions based on their
problems.[9] By observing the solutions’ performances, the algorithms select the
most successful solutions for the reproduction of new solutions. The algorithms
repeatedly create multiple sample solutions and observe their performances.
After evaluating multiple iterations, the solutions evolve toward the optimal.
The solutions are called chromosomes and represent in the form of an array or a
matrix.[9] The iteration in genetic algorithms is called generation and the most
successful agents in each generation are called parents. Each agent has it’s own
identical solution in the form of an array or matrix. It is a mutation version of
reproduced solution inherited from their parents.
In each generation, the algorithms will select two parents who perform best in the
previous simulation and use their chromosomes to create children. The parents
cross over the chromosomes and create a new chromosome by combing their
solutions. The program takes the child solution, replicates it to create multiple
solutions which will use in the next generation.
9
Figure 2.2: Genetic algorithm illustration
The neural network is an efficient way to represent the mapping between the state
and the action. In deep reinforcement learning, the algorithm use the neural
network to represent the policy. By utilizing the network it maps the states into
the actions.
As we mentioned in section 2.1.5, there are weight values between each connected
neuron. The weight value can represent the chromosomes in genetic algorithm.[9]
In each generation, the program selects the parents, crossovers and mutates their
weight values to create their children. The children inherited from their parents’
weights and evolve toward the optimal after evaluating multiple iterations. The
weights in the first generation can be randomly generated.
10
3 Methods
A successful test is defined by when a car in the respective method could finish
100 laps(figure3.2a) on the racing track without any collision. Any contact with
the walls destroy the cars. (figure3.1a)
11
(a) lap
The simulator is created in Unity 3D, which is a widely used software for game
development.[8] In Unity, we can import 3d assets like roads and cars from the
store instead of making our own, which is time efficient. It also has some useful
plugins such as Unity machine learning agents toolkit(ML-Agent) for training
intelligent agents.[8] The ML-Agent is a library, which can use for building
reinforcement learning.
12
3.2.2 The motion controller
The motion controller is a c# script, which attached to each car and allows our
agents to take control of the car’s position. (figure3.2a)
In our simulation, the car moves in the horizontal plane with a given speed and
an angle of facing direction. The speed and the facing direction are control by our
agents.
Here is the pseudocode for the motion of rotation and acceleration control.
The update function is called once per frame, it calculates the car’s new rotation
angle and current speed based on the neural network’s output. The agent controls
the motions, by sending the values of the updates to the UpdateMovement
function.
The sensor scripts allow the agent to observe the current state, by continuously
scanning the surrounding environment in each frame. The sensors consist of 7
laser beams with 30 degrees angles in between, which placed in front of each car.
(figure3.3)
The laser beams are built-in in Unity, which can classify the type of scanning object
and calculate the distance between the current position to the target position.
Multiple laser beams will create an array of distances, which can represent the
input layer in the neural network.
13
Figure 3.3: Sensors
The study illustrates two high-level figure which contains the executive summary
of designing the control system.
In genetic algorithm, the program does not need a reward system, but it requires
the functions that can select the parents, crossover and mutate the chromosomes.
(figure 3.5a)
The genetic algorithm builds on the most successful agent in the previous
attempts. In each so-called generation, a chosen number of cars are created and
simultaneously sent along the track. If all cars collide with the walls, the algorithm
will pick the last two cars that stayed on the track to be the parents.
14
5 or 10 cars along the path(figure3.6), when all the cars get destroyed, it starts
a new generation with updated neural networks. This process repeats until the
program reaches the goal we have settled.
(a)
15
(a)
16
Figure 3.6: Simulation of GA consists of 5 cars per generation
The car controller for reinforcement agent interacts with neural network and
sends the speed and rotation changes to the motion controller.
In each frame update, the car received the observation which is an array of floating
points from the sensor, and send the observation to the neural network. There is
a total of 7 lasers, thus the size of the input layer will also be 7. After utilizing the
neural network, it returns two corresponding actions: the acceleration and the
rotation, thus the output layer will have a size of 2.(figure3.7)
The neural network is created inside the ML-agents, but we are able to configure
the size of the neural network. The size selection is based on the complexity of the
problem. (refers to section2.1.5) A larger neural network creates solutions with
more advanced mapping from the actions into the state, but will also increase the
17
learning cost. A smaller network learns faster, but the mapping from the actions to
the states will be simpler, which means it can be difficult to solve the large problem
with multiple inputs.
The simulation chooses the neural network with a size of 2x64. The neural
network of size 2 x64 has 2 hidden layers of length 64, which is considered to
be large enough to solve our problem.
The program can use the in-built function from ML-agent to directly access the
output layer from the network. The first element of the output array represents
the rotation direction. The second element represents the acceleration. The
rotation speed and acceleration speed are two predefined variables, which can be
changed.
R o t a t i o n ( outPutLayer [ 0 ] ) :
I f outPutLayer [ 0 ] i s 1 then
//make a r i g h t turn
d i r e c t i o n A n g l e <− d i r e c t i o n A n g l e + r o t a t i o n S p e e d
I f outPutLayer [ 0 ] i s 2 then
//make a l e f t turn
d i r e c t i o n A n g l e <− d i r e c t i o n A n g l e − r o t a t i o n S p e e d
I f outPutLayer [ 0 ] i s 0 then
// donothing
return d i r e c t i o n A n g l e
A c c e l e r a t i o n ( outPutLayer [ 1 ] ) :
I f outPutLayer [ 1 ] i s 1 then
// a c c e l e r a t e
speed <− speed + a c c e l e r a t i o n S p e e d
I f outPutLayer [ 1 ] i s 2 then
// d e a c c e l e r a t e
speed <− speed − a c c e l e r a t i o n S p e e d
I f outPutLayer [ 1 ] i s 0 then
// donothing
18
return speed
The controller passes the observing states using a built-in function called
AddVectorObs(). AddVectorObs() takes a vector, which is from our sensors and
send the vector to the neural network.
The reward system defines how much a policy should be rewarded or punished by
observing the result.
If the car keeps a good distance to the wall, The action will be rewarded, otherwise
if the car drives too close to the wall or get destroyed, it will get a punished.
19
the vector is set to be the length of 8.0. We create a constant variable of r which
represents the ratio of the reward gains.
In each frame updates, If the distance between the car and the wall is in range of
0 and 2,5, the program will get a penalty of 5 times r for dangerous driving. If the
car crashed, a penalty of 100 times r will be added,otherwise it gets a reward of
1.
CheckReward ( Sensors ) :
for i <− 0 t o s e n s o r s . l e n g t h
i f c o l l i d i n g then
AddReward(− r * 100)
else i f s e n s o r [ 0 ] . d i s t a n c e To W a l l <= 2 ,5 then
AddReward(− r * 5)
otherwise
AddReward ( 1 )
The controller and reward system are the only things that need to implement, the
ML-agents has the implementation of reinforcement learning inside its library,
which does all network changes and the iteration updates.
The size of the neural network remains the same as the network in reinforcement
learning, which is 2x64.
The interaction with the neural network gives the agent rotation and acceleration
control, it is implemented in a similar way compare with reinforcement
learning.
R o t a t i o n ( outPutLayer [ 0 ] ) :
I f outPutLayer [ 0 ] i s in range o f 0 . 2 1 and 0.6 then
//make a r i g h t turn
d i r e c t i o n A n g l e <− d i r e c t i o n A n g l e + r o t a t i o n S p e e d
I f outPutLayer [ 0 ] i s in range o f 0 . 6 1 and 1 then
//make a l e f t turn
20
d i r e c t i o n A n g l e <− d i r e c t i o n A n g l e − r o t a t i o n S p e e d
I f outPutLayer [ 0 ] i s in range o f 0 and 0.2 then
// donothing
return d i r e c t i o n A n g l e
A c c e l e r a t i o n ( outPutLayer [ 1 ] ) :
I f outPutLayer [ 1 ] i s in range o f 0 . 2 1 and 0.6 then
// a c c e l e r a t e
speed <− speed + a c c e l e r a t i o n S p e e d
I f outPutLayer [ 1 ] i s in range o f 0 . 6 1 and 1 then
// d e a c c e l e r a t e
speed <− speed − a c c e l e r a t i o n S p e e d
I f outPutLayer [ 1 ] i s in range o f 0 and 0.2 then
// donothing
return speed
Neural network itself looks like a 2-dimensional matrix, which consists of multiple
arrays of different lengths(figure3.8). The car controller received the input array
from the sensors and send it to the neural network. The function utilizes the neural
network using the weight matrix and maps the observation to the actions.
The genetic algorithm uses weights to represents the chromosomes. The crossover
and mutation function will then apply to the matrix of weights. For each connected
pair in the neural network, there is a weight value and the neural network has
multiple layers, therefore the weights will be a 3-dimensional matrix. ijk in
weights[i][j][k] stands for the number of layers, the length of current layer, and
the length of previous layer.3.8
Here is the pseudocode which describes the mapping from the observations to the
actions.
FeedForward ( i n p u t s ) :
for i <− 1 t o numberOfLayer−1 {
//from the second l a y e r through the network
for j <− 0 t o c u r r e n t l a y e r S i z e {
for k <− 0 t o p r e v i o u s l a y e r S i z e {
21
v a l u e += w e i g h t s [ i −1][ j ] [ k ] * neurons [ i + 1 ] [ j ]
// the c o n n e c t i o n s w e i g h t s between two neuron
// times the v a l u e from the l e f t connected neuron
}
neurons [ i ] [ j ] <− Sigmoid ( v a l u e )
}
}
return neurons [ numberOfLayer −1]// output Layer
For each connected neurons from the left layer, the neuron multiples the weight
with its value and sends the result to all connected neurons on the right. The
neurons on the right will then sum over the results from all left layer and apply
a sigmoid function[12] to bound the result between 0 and 1. This process will
repeat until all the updates on the left-hand side pass through are hidden layers
and return a solution.
22
3.5.1 Selection,crossover and mutation function
In each generation, the genetic algorithm selects two agents which survives the
longest time and combine their chromosomes using crossover function. As a
result, the children will randomly inherit from their parents.
C r o s s o v e r ( f a t h e r , mother ) :
for i , j , k t o s i z e o f w e i g h t s
rand <− Random(0 or 1 )
i f rand i s 1 then
w e i g h t s [ i ] [ j ] [ k ] <− f a t h e r [ i ] [ j ] [ k ]
else
w e i g h t s [ i ] [ j ] [ k ] <− mother [ i ] [ j ] [ k ]
}
return w e i g h t s
The mutation function goes through all elements in the weight matrix, with a
probability value p it replaces the value with a random number.
Mutate ( w e i g h t s ) :
for i , j , k t o s i z e o f w e i g h t s
i f ( p < RandomFloat (0 t o 1 )
w e i g h t s [ i ] [ j ] [ k ] <− RandomFloat (0 t o 1 )
return w e i g h t s
23
4 Result
The result of the simulation will be presented in this chapter and will be used to
aware the research question.
The statistic from the tables (table4.1,table4.2) are used to compare their learning-
cost to achieved the optimized solution. In the simulation of reinforcement
learning, the program performs 10 tests with the same configuration to generate a
reliable result. The simulation destroyed 3763 cars on average to optimize the
solution, where the maximum learning-cost is 7054 cars and the minimum is
2934. The data points tend to be close around three thousand, which can prove
the reliability of this data set.
SAMPLE
Test 1 3186
Test 2 2982
Test 3 3414
Test 4 4826
Test 5 3006
Test 6 7054
Test 7 3446
Test 8 3781
Test 9 2934
Test 10 3008
24
Figure 4.1: Number of cars destroyed to finish 100 laps
Reinforcement learning cannot find a good solution with just a few attempts,
it takes a longer time to learn, but the variances of learning-cost consider to
be smaller and more stable. Unlike the genetic algorithm, the reinforcement
learning optimizes the solution in a stable way by constantly making small
progress in a bounded region, which is defined by the implementation of ML-
agents reinforcement learning.
The result we got from the genetic algorithm is 3005 cars on average for
using 10 cars in each generation, which is lower than 3763 from reinforcement
learning.
The data points are separated far away from the others, which means the
simulation cannot get the same results over multiple trials because the variation
is large. The maximum learning-cost is 18712 compare with the minimum of
819.
25
Table 4.2: Number of cars destroyed to finish 100 laps
The statistic has shown, the genetic algorithm has a lower average learning-cost,
but it does not learn as stable as reinforcement learning. The genetic algorithm
constantly created new solutions by crossover parents chromosomes, a problem
lies on that two agents with the same performance might still have a very different
weight matrix. Combing such pairs are more likely to mess up the entire solutions,
which have a negative impact on learning efficiency.
26
4.3 Comparison
Refers to the box diagrams4.1,4.2, the genetic algorithm starts strong finding
optimized solutions efficiently, but this increase of performance is unstable. There
are multiple trials have small learning-cost, which lowers the average learning-
cost, but it can be surpassed with increased testing size by the reinforcement
learning variant.
27
5 Conclusion
6 Discussion
In the current stage, the learning-rate are very similar for both algorithms. It is
very difficult to draw an easy conclusion from the current simulation, but the
genetic algorithm is more likely to have less learning-cost than reinforcement
learning to finish 100 laps. Reinforcement learning has higher learning-cost, but
the results have a smaller variance compare with genetic algorithm. A smaller
variance indicates that the data points tend to be close to the mean, which giving
a better statistic significance on its mean.
6.1 safety
Because of the randomly generated solution, the cars from the genetic algorithms
often make some dangerous turns when steering and drive close to the wall, while
reinforcement learning does not. This is thought to be reinforcement learning
design to keep a distance to earn the reward,in contrast to the genetic algorithm
which does not have any sub-conditions to achieve than surviving 100 laps. The
cars from reinforcement learning keep the distances, which means they have good
potential for better safety over genetic algorithms.
28
6.2 Implementation difficulty and time cost
The probability of mutation and the number of cars in each generation might
result in a better learning rate and accuracy could be optimized in the genetic
algorithm. There is room for improvement in the reward system design
concerning reinforcement learning..
For each algorithm, we only perform 10 tests which are pretty small due to time
limitation. There is a number of laser detection, but the detection does not cover
all angles, therefore, the agent can not detect full information of the surrounding
environment. That can lead to a wrong prediction. The inputs and outputs are
separate actions and observations. But in the ideal condition, it should be a
combination of actions under a period of time, instead of predicting one single
action per state.
Future studies could include moving obstacles e.g. other controlled by other
agents the input should include a set of observations over time to be able to predict
the moving direction of these objects. The future studies can focus on improving
the simulation, increasing the input sizes, allowing the agents to detect a moving
29
obstacle and respond with a combination of actions. These changes will make the
study closer to real life scenario.
30
References
[2] Bimbraw, Keshav. Autonomous Cars: Past, Present and Future - A Review
of the Developments in the Last Century, the Present Scenario and the
Expected Future of Autonomous Vehicle Technology. Tech. rep. Thapar
University, 2015.
[5] Fridman, Lex. “MIT 6.S094: Deep Learning for Self-Driving Cars ”. In:
(2019). URL: https://selfdrivingcars.mit.edu/resources/.
[6] Hendrickson, Josh. “What Are the Different Self-Driving Car “Levels” of
Autonomy?” In: (2019). URL: https://www.howtogeek.com/401759/what-
are - the - different - self - driving - car - levels - of - autonomy, visited
2019-5-10.
[7] Joschu et al. Proximal Policy Optimization Algorithms. Tech. rep. 2017.
[8] Juliani, A. et al. “Unity: A General Platform for Intelligent Agents. arXiv
preprint arXiv:1809.0262”. In: Sport Management Review (2018). URL:
https://github.com/Unity-Technologies/ml-agents, visited 2019-3-1.
[9] MathWorks. “What Is the Genetic Algorithm? ” In: (2019). URL: https://
de.mathworks.com/help/gads/what-is-the-genetic-algorithm.html,
visited 2019-4-1.
31
[11] Stone, James. Artificial Intelligence Engines: A Tutorial Introduction
to the Mathematics of Deep Learning. Apr. 2019, pp. 37–62. ISBN:
9780956372819.
[13] Warrendale and PA. “SAE International Releases Updated Visual Chart for
Its “Levels of Driving Automation” Standard for Self-Driving Vehicles”. In:
(2018). URL: https://www.sae.org/news/press- room/2018/12/sae-
international - releases - updated - visual - chart - for - its - %E2 % 80 %
9Clevels - of - driving - automation % E2 % 80 % 9D - standard - for - self -
driving-vehicles, visited 2019-5-10.
[14] White, Joseph and Khan, Shariq. “Waymo says it will build self-driving cars
in Michigan”. In: (2019). URL: https://www.reuters.com/article/us-
autonomous- waymo/waymo- says- it- will- build- self- driving- cars-
in-michigan-idUSKCN1PG22R, visited 2019-5-10.
32
TRITA-EECS-EX-2019:505
www.kth.se