You are on page 1of 7

Hybrid AI Agent on 2d Racing Game using Neural

Networks and Reinforcement Learning

Karl Dame Angelo S. Abad Cris B. Maloloy-on Jules Cabanlit


BS in Computer Science BS in Computer Science BS in Computer Science
University of Mindanao, Matina University of Mindanao, Matina University of Mindanao, Matina
Davao City, 8000 Davao City, 8000 Davao City, 8000
+639776972206 +639506457720 julescabanlit23@gmail.com
karldame.abad17@gmail.com crisbmaloloyon@gmail.com

Randy F. Ardeña, MCS


Adviser, College of Computing
Education
University of Mindanao, Matina
Davao City, 8000
randy_ardena@umindanao.edu.ph

Categories and Subject Descriptors One of the problems for these Neural Networks is the amount of
I.2.6 [Computing Methodologies] Learning - Machine Learning, time it takes to train them because it requires considerable
Neural Networks, Reinforcement Learning, Deep amount of time for them to learn accomplish complex tasks.
NeuroEvolution. Another problem is that they don’t have exact decision-making
skills and they need data to actually learn the skills [2].
General Terms Reinforcement Learning is an unsupervised type of learning for
Algorithms, Performance, Experimentation, Reliability. Agents. It offers the Agent or Neural Network a reward system
just like asking a child to do or perform something better by
Keywords giving them rewards. Reinforcement Learning make these
AI Agent, Hybrid, 2D, Artificial Neural Network, Reinforcement Networks improve their performance because of the reward
Learning, Deep Learning. system it offers them. Because of this they intend become better
in real life environments while having a number of possible
1. INTRODUCTION actions or reactions rather than being limited to options coming
from video games [3].
1.1 Background of the Study
Deep Neuroevolution aims to use evolutionary techniques in
Artificial Intelligence has been growing more sophisticated order to reduce the time consumed in the learning process of the
today. Before, it only has several functions that make them Neural Network or Agent. It measures performance more
special. Today, AI is being used to collect and organize large efficiently and it does not require large amounts of data to be fed
amounts of information and make insights and guesses that are to the Neural Network. However, this technique requires lots of
beyond our capabilities as humans [6]. AI has become so patience due to the consistent need to perform trial and error
advanced that a company even made an AI a board member methods [4].
because it has the capability to predict market movement and
trends [6]. AI Agents are created to do tasks that some normal The researchers’ main goal is to developer a Hybrid AI Agent
human beings cannot do easily. that will solve all the problems mentioned above about Neural
Networks. Making a Hybrid Agent by combining and optimizing
Neural Networks, Reinforcement Learning, and Deep
An AI Agent is an autonomous entity which acts and directs its Neuroevolution will improve the performance, learning process,
activity towards achieving goals in an environment using efficiency, and decision-making skills of the Agent. The
observation and trials. Agents may also learn and use knowledge researchers will test and compare the Hybrid Agent to the Agents
in order to achieve their goals. They can also use objects or tools existing already by means of a 2d game. 2d games are great when
that they learned how to handle inside the environment. An it comes to testing because of the easier environment. Combining
intelligent agent is a program that can make decisions or perform these technologies will result more intelligent Agent that takes
a service based on its environment, user input and experiences lesser time in learning while being more efficient in achieving its
[1]. These programs can be used to autonomously gather goals.
information on a regular, programmed schedule or when
prompted by the user in real time. AI agents are developed by 1.2 Purpose and Description
using neural networks which would act as their brains that will The purpose of this study is to experiment with Neural Networks
develop from time to time. by optimizing it and combining Reinforcement Learning and
Artificial neural networks are one of the main tools used in Deep Neuroevolution with it to develop a more reliable and
machine learning. Tare brain-inspired systems which are efficient AI Agent that will learn faster and adapt easier to its
intended to follow or imitate the way that we learn knowledge. environment. This Hybrid AI Agent will be compared to Agents
They are tools that are used to find and recognize patterns that developed using DQN and A3C technology. Reinforcement
are too complex for us humans to think of and find out about [2]. Learning will improve the Agent’s learning process because it

1
will learn on its own making it develop improved decision- model created by the researchers outperformed all previous
making skills. Deep Neuroevolution will make the learning approaches on six of the ATARI games and surpassed a human
process of the Agent faster because of the evolutionary expert on three ATARI games.
technique. The researchers found that Reinforcement Learning had several
challenges. First, the researchers found that training the model
1.3 Objectives using RL algorithms can take a long time because it has delays
The researchers listed the objectives to test and compare on the learning process. Another issue is that the algorithms for
the efficiency, performance, and reliability of the Agents. RL learns new behaviors each time and this results to a
problematic deep learning process [7].
1.3.1 General
The main objective of the study is to develop a Hybrid AI Agent 2.2 Learning Deep Architectures for AI
that is able to learn how to move on its own without supervision This study discussed important topics and information about
with lesser errors, adapt to the environment faster, conquer Machine Learning. The researcher discussed the principles
obstacles effectively, and learn faster compared to the usual AI regarding Machine Learning and the methods for Unsupervised
Agent. Learning (Reinforcement Learning). In this study, the researcher
1.3.2 Specific discussed the process on how to train Deep Architectures.
To address the general objectives of the study, the researchers In order to develop models that can imitate or surpass human
specifically aim to develop an AI Agent that: knowledge it needs large quantities of information to be
processed to get all the knowledge it needs to learn [8]. This
1.3.2.1 Takes lesser amount of times in learning specific study discussed the whole learning process of a model starting
knowledge by implementing reinforcement learning; from gathering data to processing it and producing the expected
outputs. Deep Learning starts by gathering lower level features
and developing them into higher level features. Letting a model
1.3.2.2 Provides more exact decision-making skills by learn from basic to advanced features using Deep Learning
optimizing neural network with reinforcement learning to make makes it learn complex functions without requiring or
the Agent learn faster through reward system; demanding a lot of human supervision [8].

1.3.2.3 Consumes less time in learning knowledge by applying 2.3 Evolution Strategies as a Scalable
Neuroevolution’s optimizer technique; Alternative to Reinforcement Learning
In this study, the researchers used Evolution Strategies instead of
1.3.2.4 Converge no matter what the learning process is by Reinforcement Learning to train Agents to play ATARI games.
implementing Reinforcement learning to make the Agent deeply The researchers used this method to prove that Evolution
engaged to learn how to play the game. Strategies can cut the training time of the Agents while producing
very competitive results compared to Reinforcement Learning.
The performance of the Agent trained using Evolution Strategies
1.4 Scope and Limitations was exceptional because of the improved parameters used.
This study proposed an AI Agent that will be applied only to 2d Evolution Strategies is well suited for Agent training because it
Games. The Agent will be put inside a 2d game to let it learn the operates on complete information and episodes. It only requires
process on play the game by itself. The hybrid Agent is expected minimal communication or supervision making it easier for
to learn faster be more efficient in playing the game. training for both the developer and the Agent. This process does
The developers can only develop the Hybrid Agent for 2d Games not get affected by delays in rewards for the Agent because even
since it is hard to create models for 3d Games with very limited though it happens from time to time [9].
time. Making models and environments for 3d games take longer 2.4 Asynchronous Methods for Deep
amounts of time and more efforts on the developers’ end making
this a limitation of the study. Reinforcement Learning
In this study, the researchers proposed a framework that uses a
2. REVIEW OF RELATED LITERATURE simple and lightweight concept for Reinforcement Learning to
In this part of the study, related studies and systems were optimize and improve Neural Networks. The Asynchronous
gathered from the internet which deals similar topics in Method was stated as the best performing method by the
optimizing Neural Networks to create an AI Agent. These studies researchers because it cuts the training time for an Agent in half
that will be discussed are created or developed to solve the compared to the current ATARI domain.
problems also that are found in AI Agents by optimizing Neural
Previous approaches for Reinforcement Learning mere rely on
Networks. By discussing these studies, the researchers will prove
heavily specialized hardwares such as strong GPUs, CPUs, and
that the proposed study is feasible and can solve the problems
RAMs while these methods do not rely on strong hardware
that was stated.
because it can run on standard machine without having any
2.1 Playing ATARI with Deep problem. The A3C or asynchronous advantage actorcritic
method is the both method proposed because of the superior
Reinforcement Learning mastery it offers. Also, A3C trains Agents easier because of its
parallel training [10].
In this study, the researchers created a deep learning model that
can learn control policies using Reinforcement Learning. The
model is created as a convolutional neural network that has 2.5 Deep Reinforcement Learning
inputs of raw pixels and outputs of value functions. The model is In this study, the researcher discussed Deep Reinforcement
trained using the reward system of Reinforcement Learning. The Learning as a whole in an overview style. The researcher
researchers applied the model to seven ATARI 2600 games from compared several elements of Reinforcement Learning starting
the Arcade Learning Environment. This study proved that the

2
with functions, rewards, memory, and representation. Also, the learn to predict the future from a large quantity of observations
researcher discussed the challenges, problems, and opportunities from the real-world [14].
regarding Reinforcement Learning.
Unsupervised Learning was the main topic in this study. The
researcher stated that attention and memory was the main focus
2.9 Fundamentals of Recurrent Neural
in this approach. Attention is a mechanism to focus on the salient Network (RNN) and Long Short-Term
parts. Memory provides long term data storage [11]. Memory (LSTM) Network
Reinforcement Learning offers a lot for both the developer and This study introduces the fundamentals of recurrent and long
Agent/Model because it makes the training process shorter and it short-term memory networks. The researcher’s goal in this study
helps the Agent/Model process knowledge better making it more is to explain the most essential fundamentals of these networks
efficient [11]. in a concise manner. Also, difficulties in training RNNs were
discussed in this study and how to convert or transform RNNs to
2.6 Markov Decision Proceses: Concepts and LSTMs using logical Argumentss.
Algorithms According to the researcher, LSTM is a network type of RNN.
This study introduces the MDP or the Markov Decision Process’ Vanilla LSTM is a nickname of LSTM, it is called Vanilla LSTM
Algorithms. The researcher’s main goal in this study is to show because of the model’s flexibility and generality. According to
or discuss what MDP offers to developers and other researchers. the researcher, RNN systems are experiencing problems with
This study introduces the algorithms and policies involved with regards to practice, despite their stability because during training
learning optimal behaviors for Agents. the networks suffer from the problems of vanishing gradients and
MDP or Markov Decision Process is a decision-theoritec exploding gradients. LSTMs were invented to solve the
planning method in stochastic domains. The main goal of this vanishing gradients problem, it incorporates nonlinear, data-
process is to improve the performance and maximize the dependent controls into the RNN. This ensures that the gradients
functions of a model. MDP is being used by developers in order does not vanish and it makes the network versatile [15].
to solve problems regarding planning, learning, and game 2.10 Understanding Convolutional Neural
playing of the model [12].
Networks
2.7 Recent Advances in Autoencoder-based In this study, the researcher’s goal to give information on how
CNNs work and why they work. According to the researcher,
Representation Learning CNNs performance extraordinarily well on most ML(machine
In this study, the researchers reviewed the recent advances in learning) tasks but their mathematical functions are not
Autoencoder-Based Representation Learning. The researchers completely understood. CNNs solve equations by going through
used meta-priors to organize the results in the review they did. a series of convolutional filters. CNNs have shown excellent
Meta-priors are derived from general assumptions about the performance in terms of solving ML(Machine Learning)
world. The meta-priors determine how useful the results of the problems.
Representation Learning process is to the real world [13].
CNNs use wavelet transformation frameworks to analyze its
In this study, the researchers stated four important meta-priors operations. CNNs transform their input with a series of linear
that greatly impact Unsupervised Learning. First, operators and point-wise non-linearities. The researcher, used
Disentanglement, this meta-prior aims to improve the efficiency scattering transform to study the properties of CNNs. Scattering
of the learning process through abstract representation of data. transform is obtained by convolving a single channel. This
Second, Hierarchical Organization of explanatory factors, this process provides intuition on how CNNs work however, the
meta-prior show various levels of granularity for objects giving transformation suffers from high variance and loss of
them more concrete descriptions making it easier for information because only a single channel consider convolutions.
understanding. Third, Semi-supervised Learning, this meta- Channel combinations should be allowed to help analyze
priors offers synergy because it combines supervised and properties of CNN architectures efficiently [16]s.
unsupervised learning together making the learning process
much more efficient. Lastly, Clustering Structure, this meta-prior
categorizes the real-world data sets making the distribution of the
data efficient [13].

2.8 How do Mixture Density RNN’s Predict


The Future
In this study, the researchers analyzed the predictions created by
MD-RRN or Mixture Density Recurrent Neural Network. They
chose this type of RRN because according to them, these
networks learn to model predictions that are very effective for
problems that may cause different future possibilities. These
networks use Gaussian Distributions that compliment their
prediction features. According to this study, MD-RRNs are very
interesting for prediction tasks since the recurrent part of the
network allows forecasting of sequence and it allows prediction
to be creative.
MD-RRNs make predictions by sampling different probabilities
and sub-distributions. According to this study, MD-RRNs can

3
3. TECHNICAL BACKGROUND
This chapter will discuss the details of shows the process of the
researchers do for the study and the interaction of the model and
the environment

3.1 Conceptual Framework

Figure 1: Conceptual Framework

First, the researchers are building the environment in order with Mixture Density Network so that the output is in
to get image frames for the agent. probabilistic prediction rather than deterministic prediction
Lastly, the Controller is responsible for determining the next
The output of the Environment which is a Frame will be action to take in order to maximize the expected reward of the
used in order to generate VAE Dataset, the Dataset contains the agent during the rollout of the environment. Controller is a single
recorded action and the recorded observation. layer feed-forward neural network. It outputs an action vector for
motor control. The Covariance Matrix Adaptaion Evolution
Strategy (CMA-ES) is included in this section in order to
optimized the parameters that the Controller obtained from the
Using the generated VAE Dataset, the researchers can now
output of the VAE and MDN-RNN.
train the Variational AutoEncoder, which outputs some weights
in order to use for Recurrent Neural Network. The weights from After the training, the researchers will use the finished model into
VAE will be depend of amount of dataset provided from previous the environment with the weights gained from the training. How
data gathered and amount of time trained. The many the VAE the model interacts the environment can be shown in the model
Dataset and the longer training time, the better. Variational interaction.
AutoEncoder provides encoder and decoder that encodes the
input, creates a representation, and decodes the representation The following sections will discuss the details of the neural
into a slightly different than the input given that allows the agent networks that are displayed in the conceptual framework.
sees the alternate occurences on what Agent observed.
3.2 Convolutional Neural Network

Using the weights from VAE, the researchers can generate VAE is a type of Convolutional Neural Network, the researchers
RNN dataset which can be used to train MDNRNN. will try to explain first about Convolutional Neural Network.

Convolutional Neural Network is a class of deep neural


Using the generated RNN Dataset, the researchers can now
networks, most commonly applied to analyzing visual imagery.
train the Mixture Density Network + Recurrent Neural Network,
CNNs are regularized versions of fully connected networks.
The weights from MDNRNN will be depend of amount of
Convolutional networks are inspired by biological process in that
dataset provided from previous data gathered and amount of time
the connectivity pattern between neurons resembles the
trained. The many the RNN Dataset and the longer training time,
organization of the animal visual cortex.
the better. Recurrent Neural Network allows the Agent predict
the future occurrences based on what Agent already see. Because A convolutional neural network consists of an input and an
many complex environments are very random in nature output layer, as well as multiple hidden layers. The hidden layers
(stochastic), the Recurrent Neural Network will be integration of a CNN typically consist of a series of convolutional layers that
convolve with a multiplication or other dot product. The

4
activation function is commonly a RELU layer, and is 3.3 Variational AutoEncoder
subsequently followed by additional convolutions such as
pooling layers, fully connected layers and normalization layers,
referred to as hidden layers because their inputs and outputs are AutoEncoder is a neural network puts the input that has a high
masked by the activation function and final convolution. The dimensionality into the neural network then compress the input
final convolution, in turn, often involves backpropagation in into a small representation. It can be done using two parts:
order to more accurately weight the end product. [1] Encoder and Decoder. The Encoder takes the input, then the
Encoder compress the input which converts into a small
For instance, using an input that will hold the raw pixel values of dimension than the actual input. Decoder reconstructs the
the image with a width of 32, height 32, and with three colors compressed input into the actual input but the differences of it is
channels R.G.B., there are 3 layers used to build ConvNets: 1. that the data size of the reconstructed input are now less than the
Convolutional Layer, 2. Pooling Layer and 3. Fully-Connected actual input. Variational AutoEncoder changes the AutoEncoder
Layer [17]. the mapped input into a distribution instead of fixed vector. The
differences of Variational AutoEncoder is that the middle part of
the network is replaced by two separate vectors, mean vector and
the standard deviation vector. [18]

Figure 2: Convolutional Neural Network and its Layers

Convolutional Layer will compute the output of neurons that are


connected to local regions in the input, each computing a dot
product their weights and a small region they are connected to in
Figure 4: AutoEncoder
the input volume. This may result in volume such as [32x32x12]
if we decided to use 12 filters.
ReLU (Rectifier Linear Unit) activation function will be used in
order to increase the non-linearity of the image

Figure 5: Variational AutoEncoder

3.4 Recurrent Neural Network


Recurrent Neural Network is a type of neural network that is used
to predict what comes next by processing real data sequences one
step at a time. Assuming the predictions are probabilistic, novel
sequences can be generated from a trained network by iteratively
sampling from the network’s output distribution, then feeding in
the sample as input at the next step. In other words by making
the network treat its inventions as if they were real, much like a
Figure 3: ReLU Activation function person dreaming. [19] By using a Recurrent Neural Network, the
Pooling Layer will perform a downsampling operation along the agent allows to remember the previous state in order to have an
spatial dimensions (width, height), resulting in volume such as ability to learn from the mistake when the agent train in the
[16x16x12]. specific environment
Lastly, Fully-Connected Layer will compute the class scores,
resulting in volume of size [1x1x10], where each of the 10
numbers correspond to a class score, such as among the 10
categories of CIFAR-10. As with ordinary Neural Networks and
as the name implies, each neuron in this layer will be connected
to all the numbers in the previous volume.

5
Figure 6: Recurrent Neural Network
. Figure 8: CarRacing-v0, an environment to be used
3.5 Mixture Density Network
Mixture Density Network is a type of neural network that change
the neural network change from deterministic output into a range REFERENCES
or probability output. [20] [1] Rouse, M. (2019) Definition – Intelligent Agent. Retrieved
on August 18, 2019 from
https://searchenterpriseai.techtarget.com/definition/agent-
intelligent-agent.
[2] Dormehl, L. (2019) What is an Artificial Neural Network.
Retrieved on August 18, 2019 from
https://www.digitaltrends.com/cool-tech/what-is-an-
artificial-neural-network/.
[3] Nicholson, C. (2019) Deep Reinforcement Learning.
Retrieved on August 18, 2019 from
https://skymind.ai/wiki/deep-reinforcement-learning.
[4] Frolov, S. (2018) NeuroEvolution. Retrieved on August 19,
2019 from https://www.inovex.de/blog/neuroevolution/.
[5] Souza, A. (2016) Neural Network Programming with
JAVA. Retrieved on August 19, 2019 from
http://pzs.dstu.dp.ua/DataMining/bibl/practical/Neural%20
Network%20Programming%20with%20Java.pdf.
[6] Aguis, C. (2019) Evolution of AI: Past, Present, Future.
Retrieved on August 29, 2019 from
https://medium.com/datadriveninvestor/evolution-of-ai-
Figure 7: Mixture Density Network past-present-future-6f995d5f964a.
[7] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
3.6 OpenAI Gym Antonoglou, I., Wierstra, D., Riedmiller, M. (2013) Playing
Atari with Deep Reinforcement Learning – DeepMind
Technologies. Retrieved on August 31, 2019.
OpenAI Gym is a toolkit for developing and comparing
[8] Bengio, Y. Learning Deep Architectures for AI –
reinforcement learning algorithms. It makes no assumptions
Foundations and Trends in Machine Learning. Retrieved on
about the structure of your agent, and is compatible with any
August 31, 2019 on
numerical computation library, such as Tensorflow or Theano.
https://www.nowpublishers.com/article/Details/MAL-
OpenAI Gym solves both problems when training an AI Agent,
006?fbclid=IwAR1AgR5DET0tIm_zdbyj9NoXZCR-
1. The need for better benchmarks. In supervised learning,
F3MWt8t0mNVPSvmMkHbMzMWqT0sMG3c.
progress has been driven by large labeled datasets like ImageNet.
[9] Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.
In RL, the closest equivalent would be a large and diverse
(2017) Evolution Strategies as a Scalable Alternative to
collection of environments. However, the existing open-source
Reinforcement Learning – Open AI. Retrieved on August
collections of RL environments don’t have enough variety, and
31, 2019.
they are often difficult to even set up and use. 2. Lack of
[10] Mnih, V., Badia, A., Mirza, M., Graves, A., Harley, T.,
standardization of environments used in publications. Subtle
Lillicrap, T., Silver, D., Kavukcuoglu, K. (2016)
differences in the problem definition, such as the reward function
Asynchronous Methods for Deep Reinforcement Learning.
or the set of actions, can drastically alter a task’s difficulty. This
Retrieved on August 31, 2019.
issue makes it difficult to reproduce published research and
[11] Li, Y. (2018) Deep Reinforcement Learning. Retrieved on
compare results from different papers.[21]
August 31, 2019.
[12] Otterlo, M. (2009) Markov Decision Processes: Concepts
and Algorithms. Retrieved on September 1, 2019.
[13] Tschannen, M., Bachem, O., Lucic, M. (2018) Recent
Advances in Autoencoder-Based Representation Learning.
Retrieved on September 2, 2019.
[14] Ellefsen, K., Martin, C., Torresen, J. (2019) How do
Mixture Density RNNs Predict the Future?. Retrieved on
September 3, 2019.

6
[15] Sherstinsky, A. (2018) Fundamentals of Recurrent Neural [19] Alex Graves (2013), Generating sequences with recurrent
Network (RNN) and Long Short-Term Memory (LSTM) neural networks http://arxiv.org/abs/1308.0850
Network. Retrieved on September 3, 2019. [20] Julian Vossen et al. (2018). Probabilistic Forecasting of
[16] Koushik, J. (2016) Understanding Convolutional Neural Household Electrical Load Using Artificial Neural
Networks. Retrieved on September 3, 2019. Networks
[17] CS231n Convolutional Neural Networks for Visual https://www.researchgate.net/publication/325194613_Prob
Recognition Retrieved on September 2019 from abilistic_Forecasting_of_Household_Electrical_Load_Usi
https://cs231n.github.io/convolutional-networks/ ng_Artificial_Neural_Networks
[18] Diederik P Kingma, Max Welling (2013). Auto-Encoding [21] OpenAI Gym Official Website Retrieved on August 2019
Variational Bayes https://arxiv.org/abs/1312.6114 from https://gym.openai.com/docs

You might also like