You are on page 1of 3

Making the Car Faster on Highway in DeepTraffic

Final Project Report for AAE 561 Convex Optimization

Jianhang Chen
School of Electrical and Computer Engineering, Purdue University
West Lafayette, IN

Abstract—This project proposal is mainly for designing an

optimization method to make a car faster in DeepTraffic, which is
a simulation environment provided by MIT 6.S094. The goal of
this project is to achieve fastest speed or pass as many cars as
possible within a given time in the simulator. Method like standard
(model-free) Reinforcement Learning algorithm (Q Learning) is
used to solved in this project.

Keywords—traffic optimization; neural network; reinforcement

learning; autonomous vehicles


A. Problem Statement

There are five fundamental problems in autonomous driving,

which are localization, perception, decision logic, control
execution and validation/verification according to [1]. Decision
making problem is an important part of autonomous vehicle
problems. In traffic planning, a better solution could improve the
overall efficiency of vehicles. Fig. 1. Deep Traffic Simulator Interface
The objective of this project is to create an algorithm to
1) Sensor Inputs:
accelerate the speed of a car in the environment with uncertainty.
There are five types of actions including acceleration, The inputs from the sensor could be arbitrarily defined.
deceleration, no action, and changing lanes to left or right side. Through the simulator, the lanes detected by the cars could be
There are safety constraints on the car which will be introduced defined. The number of patches the car could detect ahead and
in detail in part B. Basically, the car will be forced to slow down behind could also be specified. Fig. 2 shows the overlay of
when any other cars are in the safety region of the car. In this roads detected by sensors with parameters: lanesSide = 3,
project, we could define the input out the cars, which means the patchesAhead = 4, and patchesBehind = 2.
car could receive the information in a certain area near the car.
A larger region means more information but also harder to train
the model if using neural network algorithm. With the sensor
inputs and safety constraints, this project is intended to develop
an algorithm to maximize the speed of the car in DeepTraffic
environment, where other cars run randomly.

B. DeepTraffic Simulation
DeepTraffic is a gamified simulation of typical highway
traffic developed by Lex Fridman for MIT 6.S094 [2]. Fig. 1
shows the simulation interface of the simulator. All the cars run
on the system of lanes consist of grids.

Fig. 2. Sensor Inputs of the Car

2) Safety Constraints:
The safety system in the DeepTraffic simulator will slow
down the car speed if other cars are in the safety region. The area
of safety region will increase as the car speed up. In Fig. 3, the
red area is the region of safety system. The lane switching
function is disabled when there is any other car in safety area.

Fig. 5. Markov Decision Process [4]

Another important in reinforcement concept is Markov

decision process. A Markov decision process relies on the
Markov assumption, that the probability of the next state s(i+1)
depends only on current state s(i) and action a(i), but not on
preceding states or actions.

B. Deep Q-Learning Algorithm

Fig. 3. Safety Constraints of the Car
In this project, I use Deep Q-Learning algorithm
specifically. The update function of Deep Q-learning is:

II. METHOD 𝑄𝑡+1 (𝑠𝑡 , 𝑎𝑡 )

= 𝑄𝑡 (𝑠𝑡 , 𝑎𝑡 ) + 𝛼(𝑅𝑡+1 + 𝛾 max 𝑄𝑡 (𝑠𝑡+1 , 𝑎) − 𝑄𝑡 (𝑠𝑡 , 𝑎))
A. Deep Reinforcement Learning
Deep Reinforcement Learning [3] is used in this part. High where Q is the Q function for evaluation the pair of state and
dimensional sensor inputs directly put into the first layer of action, 𝑄𝑡+1 (𝑠𝑡 , 𝑎𝑡 ) is the new state value,
neural network and update the policy and decide the action of 𝑄𝑡 (𝑠𝑡 , 𝑎𝑡 ) is the old state value, 𝛼 is the learning rate, 𝑅𝑡+1 is
the car. Reinforcement learning is one of the important the reward, 𝛾 is a discount factor to balance long term and short
frameworks for decision making. term goal.

The detailed algorithm is as follows:

1. Initialize replay memory D

2. Initialize action-value function Q with random
3. Observe initial state s
4. repeat
a. select an action a
Fig. 4. The agent–environment interaction in a Markov decision process [5]
i. with probability 𝜖 select a
random action
In reinforcement learning, we treat the learning process as ii. with probability 1 − 𝜖 select
agent and environment interaction. At each step, the agent 𝑎 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎′ 𝑄(𝑠, 𝑎′ )
performs an action, receives a new state and reward; at the same b. carry out action a
time, the environment receives action, emits observation and c. observe reward r and new state s’
reward to the agent. d. store experience <s, a, r, s’> in replay
memory D
e. sample random transitions <ss, aa, rr, ss’>
from replay memory calculate target for
each minibatch transition
i. if ss’ is terminal state then tt=rr
ii. otherwise
𝑡𝑡 = 𝑟𝑟 + 𝛾 max 𝑄(𝑠𝑠 ′ , 𝑎𝑎′)
f. train the Q network using
(𝑡𝑡 − 𝑄(𝑠𝑠, 𝑎𝑎)) as loss
g. s = s’
5. until terminated

In Fig. 6 is an over view of the neural network for deep

reinforcement learning. Fig. 7. Simulation Result

Input Layer (420) IV. DISCUSSION

The power of deep learning has become well-known and
Fully Connected Layer 1 – 30 Neurons with tanh activation there is a tremendous boost in this field. Although there are a
lot of clichés about AI surpassing human beings because of the
Fully Connected Layer 2 – 20 Neurons with tanh activation development of AlphaGo, it still has huge limitations in deep
learning, especially deep reinforcement learning. Even though
reinforcement learning is not part of supervised learning, it is
Fully Connected Layer 3 – 20 Neurons with tanh activation
still a form of semi-supervised learning. We still need a large
amount of data from the interaction with the environment. In
Fully Connected Layer 4 – 20 Neurons with tanh activation this project, it needs about 500000 iterations to get the good
performance. That is a big challenge for the training cost in real
Output Layer Fully Connected 5 – 5 Neurons environment like autonomous vehicles.
Another thing I learned is that it is really hard to train the
Fig. 6. Overview of the Neural Network Architecture
deep network with different hyperparameters. There are a lot of
hyperparameters, such as layer number, size of neurons, types
of activation function, learning rate, discount factor, etc. For
this specific task, I also tried a lot of different hyperparameters
III. SIMULATION RESULT like size of patches, number of lanes detection, which are only
A simulation video and average speed is provided for for this task. A larger and deeper network may have better
evaluating the trained model which is shown in Fig. 7. In my generalization capacity. However, that is harder to train and
experiment, the car detects 50 patches ahead, 10 patches sometimes it cannot converge. In this task, more inputs mean
behind, and 3 lanes for left and right sides. In Deep Q-learning more information, but harder to learn the essence of the process.
update equation, the learning rate 𝛼 is 0.01, discount factor 𝛾 is In conclusion, this project is funny with visualized simulator.
0.9. In the simulation, the average speed is 71.25 mph. And I learned a lot through this process.
For demo video and source code instruction, please visit REFERENCES

[1] R. Langari, "Autonomous vehicles," in 2017 American Control

Conference (ACC), 2017, pp. 4018-4022.
[2] Lex Fridman, "Deep Traffic" in MIT 6.S094: Deep Learning for Self-
Driving Cars,
[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D.
Wierstra, et al., "Playing atari with deep reinforcement learning," arXiv
preprint arXiv:1312.5602, 2013.
[4] Matiisen T., "Demystifying Deep Reinforcement Learning,",
(last checked at 24.11.2016).
[5] Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An
introduction.," 2011.