48 views

Uploaded by api-339792990

- 9781627052955_sample
- Approximation of Lorenz-Optimal Solutions in Multiobjective Markov Decision Processes
- ijcai2013
- Reinforcement Learning Neural
- Article - April 26th Version
- [Bert, Broda, Garcez] First-order Logic Learning in Artificial Neural Network
- NN
- Combining Neural Network and Firefly Algorithm to Predict Stock Price in Tehran exchange
- kimoto
- RL_SVM_FPGA
- Stock Prices Forecast Using Radial Basis Function Neural Network
- A Case for Neural Networks
- IRJET-A Review on Machine Learning Algorithm Used for Crop Monitoring System in Agriculture
- Project Proposal AI
- remotesensing-06-09729.pdf
- 1394-2809-1-PB
- Biological Neural Systems-Presentation
- List of IISC Faculty of CSE
- Datamining Intro IEP
- paper_25.pdf

You are on page 1of 3

Jianhang Chen

School of Electrical and Computer Engineering, Purdue University

West Lafayette, IN

chen2670@purdue.edu

optimization method to make a car faster in DeepTraffic, which is

a simulation environment provided by MIT 6.S094. The goal of

this project is to achieve fastest speed or pass as many cars as

possible within a given time in the simulator. Method like standard

(model-free) Reinforcement Learning algorithm (Q Learning) is

used to solved in this project.

learning; autonomous vehicles

I. INTRODUCTION

A. Problem Statement

which are localization, perception, decision logic, control

execution and validation/verification according to [1]. Decision

making problem is an important part of autonomous vehicle

problems. In traffic planning, a better solution could improve the

overall efficiency of vehicles. Fig. 1. Deep Traffic Simulator Interface

The objective of this project is to create an algorithm to

1) Sensor Inputs:

accelerate the speed of a car in the environment with uncertainty.

There are five types of actions including acceleration, The inputs from the sensor could be arbitrarily defined.

deceleration, no action, and changing lanes to left or right side. Through the simulator, the lanes detected by the cars could be

There are safety constraints on the car which will be introduced defined. The number of patches the car could detect ahead and

in detail in part B. Basically, the car will be forced to slow down behind could also be specified. Fig. 2 shows the overlay of

when any other cars are in the safety region of the car. In this roads detected by sensors with parameters: lanesSide = 3,

project, we could define the input out the cars, which means the patchesAhead = 4, and patchesBehind = 2.

car could receive the information in a certain area near the car.

A larger region means more information but also harder to train

the model if using neural network algorithm. With the sensor

inputs and safety constraints, this project is intended to develop

an algorithm to maximize the speed of the car in DeepTraffic

environment, where other cars run randomly.

B. DeepTraffic Simulation

DeepTraffic is a gamified simulation of typical highway

traffic developed by Lex Fridman for MIT 6.S094 [2]. Fig. 1

shows the simulation interface of the simulator. All the cars run

on the system of lanes consist of grids.

2) Safety Constraints:

The safety system in the DeepTraffic simulator will slow

down the car speed if other cars are in the safety region. The area

of safety region will increase as the car speed up. In Fig. 3, the

red area is the region of safety system. The lane switching

function is disabled when there is any other car in safety area.

decision process. A Markov decision process relies on the

Markov assumption, that the probability of the next state s(i+1)

depends only on current state s(i) and action a(i), but not on

preceding states or actions.

Fig. 3. Safety Constraints of the Car

In this project, I use Deep Q-Learning algorithm

specifically. The update function of Deep Q-learning is:

= 𝑄𝑡 (𝑠𝑡 , 𝑎𝑡 ) + 𝛼(𝑅𝑡+1 + 𝛾 max 𝑄𝑡 (𝑠𝑡+1 , 𝑎) − 𝑄𝑡 (𝑠𝑡 , 𝑎))

𝑎

A. Deep Reinforcement Learning

Deep Reinforcement Learning [3] is used in this part. High where Q is the Q function for evaluation the pair of state and

dimensional sensor inputs directly put into the first layer of action, 𝑄𝑡+1 (𝑠𝑡 , 𝑎𝑡 ) is the new state value,

neural network and update the policy and decide the action of 𝑄𝑡 (𝑠𝑡 , 𝑎𝑡 ) is the old state value, 𝛼 is the learning rate, 𝑅𝑡+1 is

the car. Reinforcement learning is one of the important the reward, 𝛾 is a discount factor to balance long term and short

frameworks for decision making. term goal.

2. Initialize action-value function Q with random

weights

3. Observe initial state s

4. repeat

a. select an action a

Fig. 4. The agent–environment interaction in a Markov decision process [5]

i. with probability 𝜖 select a

random action

In reinforcement learning, we treat the learning process as ii. with probability 1 − 𝜖 select

agent and environment interaction. At each step, the agent 𝑎 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎′ 𝑄(𝑠, 𝑎′ )

performs an action, receives a new state and reward; at the same b. carry out action a

time, the environment receives action, emits observation and c. observe reward r and new state s’

reward to the agent. d. store experience <s, a, r, s’> in replay

memory D

e. sample random transitions <ss, aa, rr, ss’>

from replay memory calculate target for

each minibatch transition

i. if ss’ is terminal state then tt=rr

ii. otherwise

𝑡𝑡 = 𝑟𝑟 + 𝛾 max 𝑄(𝑠𝑠 ′ , 𝑎𝑎′)

𝑎′

f. train the Q network using

2

(𝑡𝑡 − 𝑄(𝑠𝑠, 𝑎𝑎)) as loss

g. s = s’

5. until terminated

reinforcement learning. Fig. 7. Simulation Result

The power of deep learning has become well-known and

Fully Connected Layer 1 – 30 Neurons with tanh activation there is a tremendous boost in this field. Although there are a

lot of clichés about AI surpassing human beings because of the

Fully Connected Layer 2 – 20 Neurons with tanh activation development of AlphaGo, it still has huge limitations in deep

learning, especially deep reinforcement learning. Even though

reinforcement learning is not part of supervised learning, it is

Fully Connected Layer 3 – 20 Neurons with tanh activation

still a form of semi-supervised learning. We still need a large

amount of data from the interaction with the environment. In

Fully Connected Layer 4 – 20 Neurons with tanh activation this project, it needs about 500000 iterations to get the good

performance. That is a big challenge for the training cost in real

Output Layer Fully Connected 5 – 5 Neurons environment like autonomous vehicles.

Another thing I learned is that it is really hard to train the

Fig. 6. Overview of the Neural Network Architecture

deep network with different hyperparameters. There are a lot of

hyperparameters, such as layer number, size of neurons, types

of activation function, learning rate, discount factor, etc. For

this specific task, I also tried a lot of different hyperparameters

III. SIMULATION RESULT like size of patches, number of lanes detection, which are only

A simulation video and average speed is provided for for this task. A larger and deeper network may have better

evaluating the trained model which is shown in Fig. 7. In my generalization capacity. However, that is harder to train and

experiment, the car detects 50 patches ahead, 10 patches sometimes it cannot converge. In this task, more inputs mean

behind, and 3 lanes for left and right sides. In Deep Q-learning more information, but harder to learn the essence of the process.

update equation, the learning rate 𝛼 is 0.01, discount factor 𝛾 is In conclusion, this project is funny with visualized simulator.

0.9. In the simulation, the average speed is 71.25 mph. And I learned a lot through this process.

For demo video and source code instruction, please visit

https://github.com/JianHangChen/DeepTraffic. REFERENCES

Conference (ACC), 2017, pp. 4018-4022.

[2] Lex Fridman, "Deep Traffic" in MIT 6.S094: Deep Learning for Self-

Driving Cars, https://selfdrivingcars.mit.edu/deeptrafficjs/

[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D.

Wierstra, et al., "Playing atari with deep reinforcement learning," arXiv

preprint arXiv:1312.5602, 2013.

[4] Matiisen T., "Demystifying Deep Reinforcement Learning,"

https://www.nervanasys.com/demystifying-deep-reinforcement-learning,

(last checked at 24.11.2016).

[5] Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An

introduction.," 2011.

- 9781627052955_sampleUploaded byRiyaz Bhat
- Approximation of Lorenz-Optimal Solutions in Multiobjective Markov Decision ProcessesUploaded byPaul Weng
- ijcai2013Uploaded byPaul Weng
- Reinforcement Learning NeuralUploaded byMoly
- Article - April 26th VersionUploaded byLiza Olivas
- [Bert, Broda, Garcez] First-order Logic Learning in Artificial Neural NetworkUploaded byGianni
- NNUploaded byrosita61
- Combining Neural Network and Firefly Algorithm to Predict Stock Price in Tehran exchangeUploaded byATS
- kimotoUploaded byarsatheesh
- RL_SVM_FPGAUploaded byImran Javed
- Stock Prices Forecast Using Radial Basis Function Neural NetworkUploaded byijcsis
- A Case for Neural NetworksUploaded bysiriusdarkstar
- IRJET-A Review on Machine Learning Algorithm Used for Crop Monitoring System in AgricultureUploaded byIRJET Journal
- Project Proposal AIUploaded bySIDDHARTH7
- remotesensing-06-09729.pdfUploaded byBruno Mian Silva
- 1394-2809-1-PBUploaded byNishant Jadhav
- Biological Neural Systems-PresentationUploaded byapi-26548538
- List of IISC Faculty of CSEUploaded byRakesh Kashyap
- Datamining Intro IEPUploaded byVinay Kushwaha
- paper_25.pdfUploaded byfarooq
- Neural Networks, Radial Basis Functions, And ComplexityUploaded byAndrei Condrea
- Content.docxUploaded bynagalaxmi
- ytfytfyt677Uploaded bytogars
- 4.2.5.neuralnetworkUploaded byMouhamed Mouh
- Packt.Artificial.Intelligence.By.Example.1788990544.pdfUploaded byArtur Carvalho
- yoloUploaded byfernandovaras2005
- Thermal Process Calculation Using Artificial NeuraUploaded byMario Calderon
- INTEGRATED HANDWRITING RECOGNITION SYSTEM USING ARTIFICIAL…Uploaded byAnaBBMaria
- MLP TerminologyUploaded bySteve Logue
- Design SocialUploaded bytoloka

- topic 2 adams trainingUploaded byapi-339792990
- jianhang chen activity recognition on kinect-3d videos using transfer learningUploaded byapi-339792990
- adaptiveUploaded byapi-339792990
- topic 3 adams trainingUploaded byapi-339792990
- 1611201 xcom dynamic analysis of falling projection parametersUploaded byapi-339792990
- topic 1 adams trainingUploaded byapi-339792990
- sea1Uploaded byapi-339792990

- 8. Solar PV SimulationUploaded bySourabh Banerjee
- SAT 10 Standards List RTFUploaded byyamaaraa
- hero honda projectUploaded byKashinath G Kashi
- Tea 1504Uploaded byMiloud Chougui
- FancyUploaded bySarah Lynn Marion
- susceptibiltyUploaded by10031706-004
- PNGE 321 Drilling Engineering 01Uploaded byAbdullah Demir
- Adaptation.docxUploaded bygmvatavu1
- Raja Shiv ChattrapatiUploaded bySandeep Valunj
- br.pdfUploaded byCarlos Eduardo Zelidon
- HMI Software ManualUploaded byCaleb Sullivan
- Timbuktu PacketUploaded byaquadante
- FILE TÓM TẮT SÁCH QTCCU_TIẾNG ANHUploaded byVit Be
- SmogUploaded byPatricia Moraru
- TPE vs EPDM vs SBR and QualityUploaded byManas Mondal
- Exercitii Cu Concordanta TimpurilorUploaded byIulia Serban Radu
- Aits 1819 Crt i JeemUploaded bymadhav
- Patient Safety and Nutrition and Hydration in the ElderlyUploaded bytitik
- 1Uploaded byPEDRO
- Work_Place_Inspections._PDF.pdfUploaded byAnonymous iI88Lt
- Study Case - Sesi 14Uploaded byDonny Iskandarsyah
- Hotel Night Audit ProgramUploaded byindrajit1980
- Petron v CaberteUploaded byGlutton Arch
- Mavic Data .pdfUploaded byMikk
- Irda Motor UpdateUploaded byAryann Gupta
- lectut-MTN-304-pdf-Sintering all slides_1st April.pdfUploaded byDevashish Meena
- Linguistics by Jean AitchisonUploaded byLuis Rodriguez Zalazar
- LEC9 InstabilityUploaded byNatasa Bajalovic
- Mango StudyUploaded bygandurik
- Design of DNA OrigamiUploaded byarunrsen