Professional Documents
Culture Documents
Textbook Ai 2017 Advances in Artificial Intelligence 30Th Australasian Joint Conference Melbourne Vic Australia August 19 20 2017 Proceedings 1St Edition Wei Peng Ebook All Chapter PDF
Textbook Ai 2017 Advances in Artificial Intelligence 30Th Australasian Joint Conference Melbourne Vic Australia August 19 20 2017 Proceedings 1St Edition Wei Peng Ebook All Chapter PDF
https://textbookfull.com/product/ai-2019-advances-in-artificial-
intelligence-32nd-australasian-joint-conference-adelaide-sa-
australia-december-2-5-2019-proceedings-jixue-liu/
https://textbookfull.com/product/language-data-and-knowledge-
first-international-conference-ldk-2017-galway-ireland-
june-19-20-2017-proceedings-1st-edition-jorge-gracia/
https://textbookfull.com/product/artificial-intelligence-
xxxvi-39th-sgai-international-conference-on-artificial-
intelligence-ai-2019-cambridge-uk-
december-17-19-2019-proceedings-max-bramer/
AI 2017: Advances in
Artificial Intelligence
30th Australasian Joint Conference
Melbourne, VIC, Australia, August 19–20, 2017
Proceedings
123
Lecture Notes in Artificial Intelligence 10400
Xiaodong Li (Eds.)
AI 2017: Advances in
Artificial Intelligence
30th Australasian Joint Conference
Melbourne, VIC, Australia, August 19–20, 2017
Proceedings
123
Editors
Wei Peng Xiaodong Li
La Trobe University RMIT University
Melbourne Melbourne
Australia Australia
Damminda Alahakoon
La Trobe University
Melbourne
Australia
This volume encompasses the papers presented at the 30th Australasian Joint Confer-
ence on Artificial Intelligence 2017 (AI 2017), which was held during August 19–20,
2017, in Melbourne, Australia. The Australasian Joint Conference on Artificial Intel-
ligence is an annual conference that has been dedicated to fostering research com-
munications and collaborations among researchers in the AI community since its
inception.
AI 2017 was co-hosted by RMIT University and La Trobe University and co-located
with the International Joint Conference on Artificial Intelligence (IJCAI 2017). This
volume covers a wide spectrum of research streams in artificial intelligence ranging
from machine learning, optimization, to big data science and their practical applications.
Out of 58 submissions from authors in 19 countries, there were 29 manuscripts accepted
for oral presentation at AI 2017. All submissions were peer-reviewed each by three or
more international Program Committee members and reviewers.
We would like to thank our international Program Committee members and
reviewers for their support and contributions to AI 2017. We also acknowledge the
support from RMIT University and La Trobe University for hosting this conference. The
assistance and sponsorship from Springer were essential for this volume to be published
in the Lecture Notes in Artificial Intelligence series, and we are very grateful for that.
Conference Chairs
Xiaodong Li RMIT University, Australia
(General Chair)
Damminda Alahakoon La Trobe University, Australia
(Program Co-chair)
Wei Peng La Trobe University, Australia
(Program Co-chair)
Publicity Chairs
Kai Qin Swinburne University of Technology, Australia
Yutao Qi Xidian University, China
Webmaster
Indika Kumara La Trobe University, Australia
Program Committee
Ajendra Dwivedi RMIT University, Australia
Andy Chun City University of Hong Kong, SAR China
Andy Song RMIT University, Australia
Bing Xue Victoria University of Wellington, New Zealand
C. Maria Keet University of Cape Town, South Africa
Chao Chen Deakin University, Australia
Damminda Alahakoon La Trobe University, Australia
Danchi Jiang University of Tasmania, Australia
Daniel Schmidt University of Melbourne, Australia
Fei Liu La Trobe University, Australia
Fei Zheng AUSTRAC, Australia
Haohui Chen DATA61, Australia
Hui Ma Victoria University of Wellington, New Zealand
Ivan Varzinczak CNRS - Université d'Artois, France
Jie Yin DATA61, Australia
Junfu Yin University of Technology, Sydney, Australia
Kathryn Merrick University of New South Wales, Australia
Ke Deng RMIT University, Australia
Kok-Leong Ong La Trobe University, Australia
Maolin Tang Queensland University of Technology, Australia
Markus Wagner The University of Adelaide, Australia
Michael Cree University of Waikato, New Zealand
Mohamed Gaber Birmingham City University, UK
Mohammad Nabi Omidvar University of Birmingham, UK
Nilupulee Nathawitharana La Trobe University, Australia
Nina Narodytska Samsung Research, USA
Nung Kion Lee Universiti Malaysia Sarawak, Malaysia
Partha Bhowmick Indian Institute of Technology, Kharagpur, India
Qing Liu DATA61, Australia
Rafal Rzepka Hokkaido University, Japan
Ricardo Sosa Auckland University of Technology, New Zealand
Seyedali Mirjalili Universiti Teknologi, Malaysia
Su Nguyen La Trobe University, Australia
Sung-Bae Cho Yonsei University, Japan
Terence Chen Telstra, Australia
Organization IX
Additional Reviewers
Machine Learning
Optimization
1 Introduction
With the rapid development of information technology, it is more and more difficult for
people immersed in the ocean of information to obtain information they need in time, and
so the problem of “Information Overload” is raised [1]. Search engines (such as Google,
Baidu, etc.) are the most common tools to help people get information, but still can’t meet
the personalized information needs in different backgrounds, different purposes and
different periods, and thus cannot effectively solve the problem [2]. As an important
research branch in the field of personalized service, recommendation system, which helps
users find interest items (online goods, music, movies, services, etc.) from large amounts
of data by mining the relationship between users and items, can effectively solve the
problem. Among the domain of recommendation systems, context-aware recommender
systems (CARS) fully utilizing contextual information to further improve recommen-
dation accuracy and user satisfaction, have become one of the hottest topics recently [3].
Under the assumption that each rating of an item must relate to a typical context,
the training process of contextual modeling is associated with the each quadruple of
<user, item, context, rating>. Since the observation is sparse and the additional envi-
ronmental contexts have various types resulting in a large dimensionality, the serious
sparsity problem is further intensified. In recent years, many scholars [2, 4–6] have put
forward a lot of approaches to solve this problem, which can be roughly divided into
two categories: explicit specific contexts reduction and dimensional compression. The
first requires the identification of the contextual data segments which is expensive to be
determined, and the dimensionality of the explicit contexts is still large enough to
present a sparsity challenge [2, 5]. While the other uses traditional dimension com-
pression methods, such as Principal Component Analysis (PCA) and Auto-Encoder, to
mine latent context factors [6] which is highly reduced relative to explicit contexts.
Although the dimensional compression method can deal with the sparsity problem to a
certain extent, the compression rate cannot be small enough and thus it still suffers from
sparsity problem.
In order to overcome the sparsity problem, underlying the idea that the impact of
contexts on item has its inner patterns, e.g., sweaters are popular in winter, but not in
summer; most of the people who go to the Great Wall come from China, but a few from
India, we propose to cluster context information in item-grain to abstract item-wise
context clusters firstly, then incorporate them into Matrix Factorization model. In our
proposed method, the item-wise context clusters are of high quality, and thus have little
opportunity to intensity the sparsity problem. Additionally, on the basis of IC-CARS
model, we further propose a hybrid model which integrates explicit context factors with
IC-CARS, and we fortunately find that the hybrid model outperforms IC-CARS when
applied some distinguishing explicit context factors. The extensive experiments show
that our proposed has outstanding performance in solving data sparsity problem, and
outperforms several state-of-the-art methods for recommending. Besides, complexity
analysis proves the scalability, so that it can be applied to large datasets efficiently.
The remainder of the paper is organized as follows. In the next section we describe
the related work. The proposed model and complexity analysis are presented in Sect. 3.
Section 4 displays and analyzes the experimentresults. Finally the conclusion and
future work are shown in Sect. 5.
2 Related Work
The area of CARS deals with modeling user preference and tastes by incorporating
available contextual information into recommendation process. As Adomavicius et al.
[7, 8] suggested, there are three main approaches to recommend items while consid-
ering context: pre-filtering, post-filtering, contextual modeling. The pre- and post-
methods filter recommendation using contextual restrictions before or after the rec-
ommendation list is generated, while the contextual modeling method incorporates
context into recommender framework directly. Among the three main methods, con-
textual modeling generally performs best [8].
In these years, the research on CARS has made a lot of progress. Chen [9] studied it
earlier, behind the idea that similarity calculation should consider the context condi-
tions, he proposed to integrate context information which is represented as a multidi-
mensional vector model, item-grain context correlation coefficient and user-grain
context similarity into collaborative filtering technology, and the proposed achieved a
good result. Wang [10] and Shi [11] integrated context information into user vector and
Context-Aware Recommender Systems Based on IC-CARS 5
item vector, and studied CARS from the perspective of user context similarity and item
context similarity. From the perspective of context similarity, Shin [12] firstly mea-
sured the similarity between the user’s current context and historical context using
cosine similarity, and then combined the historical context of user preferences to
predict the user’s preference in the current context. Adomavicius et al. [7] suggested a
multi-dimensional context of user preference similarity model, which is based on the
calculation of each “user-item-context” preference value to find the nearest neighbor
distance from high-dimensional context data, and predict the potential context of the
user preferences. Although it performs poorly in terms of computational efficiency and
real-time performance, it has the advantages of recommendation accuracy and
diversity.
The above research work mainly focus on the context modeling accuracy, while
ignoring the sparsity problem leaded by the high dimension of context information.
Recent researches have drawn increasing attention to the sparsity problem. In
Baltrunas’ proposed [5], circumstances and situations are modeled as explicit context
factors in three difference granularity, namely item-grain, category-grain and
global-grain. Although explicit context can be better explained by users and human
experts, its dimension is still large enough to trigger the sparsity problem. In order to
overcome the defects, Unger et al. [6] suggested to use latent context which can be
automatically acquired by dimension reduction method, and they further described a
novel recommendation technique that uses latent context and improves the recom-
mendation accuracy. In their experiment, the initial size of context dimensionality is
approximately 520, after the dimension reduction, the dimensionality had reduced
dramatically and the final recommendation accuracy increased by nearly 20%.
Although their experiment result is applausive, the compression rate cannot be small
enough and thus their method still suffers from sparsity problem.
Although explicit context and latent context can reduce the dimension of context to
some degree, they still cannot solve the sparsity problem well. In our research, we
firstly cluster context information for each item to abstract item-wise context clusters
which can express as the inner patterns within the impact of context on item, then
incorporate it into recommendation framework using Matrix Factorization method. In
the context clustering method, we apply elastic inner-distance and maximum cluster
number on K-means algorithm, which have a great effect on the recommendation
results, and due to the high quality and tiny number of item-wise context clusters, the
sparsity problem is effectively solved.
3 Method
In this section, we first introduce the item-grain context clustering model based on
K-means method. After that, the context-aware recommendation framework based on
MF is detailly demonstrated. Finally we show the complexity analysis.
which further intensifies the sparsity problem. We suggest to extract context clusters for
each item using K-means methods, and then assign a contextual factor parameter to
each item-wise context cluster in MF based recommendation framework, in which, the
contextual factor parameter promotes no opportunity for sparsity problem because the
number of context clusters is very small.
To simplify modeling, we assume that the context is non-hierarchical in this paper
(additionally, our model is fit for hierarchical context which is more in line with the
actual needs when using hierarchical clustering methods), and we choose Euclidean
Distance [13] as the distance measure function DisðCe ; Cf Þ describing the distance
between context vector Ce and Cf . The item-grain context clustering model first
extracts context training set Sj ¼ fC1 ; C2 ; C3 ; ::g for each item j, and then mines
context clusters for each item j. To control the quality of context clusters, we suggest
some restrictions on the clustering process. Specifically, for each item j, we define P as
the maximum number of context clusters, and the minimum size of cluster is defined as
Tj , where Tj ¼ jSj j=ð2PÞ, additionally, the maximum cluster intra-distance dj is defined
as the d jSj j ðjSj j 1Þ ordinal number of fDisðCe ; Cf ÞjCe ; Cf 2 Sj and e 6¼ fg
where d 2 ð0; 1Þ. However, to improve calculation efficiency, dj is generally obtained
by sampling. In these definitions, only P and d are the model parameters, which are the
trade-off parameters having an important influence on the quality of context clusters.
Algorithm 1 describes the item-grain context clustering model based on K-means
method.
Algorithm 1. Item-grain context clustering model based on K-means method.
Context-Aware Recommender Systems Based on IC-CARS 7
Where R^i;j;C represents the predictive rating value of user i for item j, bi which is the
baseline estimator of user i captures the overall preference of user i, qj is the baseline
estimator of item j, and the parameter indicates the overall popularity of item j. and
D-dimensional column vectors Ui and Vj representing user-specific and item-specific
latent feature vectors respectively.
Based on the prediction rule in Eq. 1, IC-CARS adds a new component to provide
context-aware recommendation. It is presented in Eq. 2.
Where R^i;j;C represents the predicted rating of user i for item j on the condition of
context C which is a t dimensional context vector ðc1 ; c2 ; ::; ct ÞT expressing time,
location, price factors and so on. Kj ðÞ obtained by item-grain context clustering method
is the clustering function of item j, Wj is the weight vector of item j, with elements Wj;k
representing the proportion of the k-th context cluster of item j. Qj;k is the rating bias of
the k-th context cluster for a given item j, which captures the contextual effect.
IC-CARS can handle the data sparsity problem well, while it cannot performs well
when there are some context factors very sensitive to recommendation results, because
IC-CARS originally is fair to every context factors. However, some context factors
may have a distinguishing influence on recommendation results. For example, the
influence of seasonal factor on sweater sales is much larger than the location factor. So
we further suggest a hybrid model adding explicit context factors to IC-CARS. The
prediction rule of the hybrid rating model is presented in Eq. 3.
8 Y. Shi et al.
X
d
R^i;j;C;e1; ...;ed; ¼ bi þ qj þ UiT Vj þ Wj;Kj ðCÞ Qj;Kj ðCÞ þ Ej;el ð3Þ
l¼1
Where vector ðe1 ; . . .; ed Þ denotes the explicit context factors which may be very
sensitive to recommendation results and can be chosen by human experts, typically,
when d = 0, the hybrid rating model is equal to IC-CARS which not takes explicit
context factors into account. Ej;el is the rating bias of a given item j under the explicit
context condition el , e.g., if a sweater is more popular with the explicit context
eseason = “winter”, a bias value may be higher. However, in scenarios where the explicit
context is missing, the bias causes no effect on the predicted rating.
In order to provide rating predictions, the variables of the models must be learned.
As is shown in Eq. 4, we define an optimization problem aiming to minimize the
square error between the actual rating and the predicted rating according to the pre-
diction rule. However, to counter over-fitting problem, we add a regularization com-
ponent which is controlled by the parameter k. And to solve the optimization problem,
we apply the stochastic gradient descent learning algorithm.
X
min ðRi;j;C;e1 ;...;ed R^i;j;C;e1 ;...;ed Þ2
b;q;U;V;Q;E
Ri;j;C;e1 ;...;ed 2R ð4Þ
þ kðkbk þ kqk2 þ kU k2 þ kV k2 þ kQk2 þ kE k2 Þ
2
4 Experimental Evaluation
In this section, we firstly describe the experimental datasets, and then we show the
effectiveness comparison with other state-of-the-art approaches. After that, we analyze
the impact of trade-off parameters in item-grain context clustering. Finally, the
experiment of exploiting the effect of explicit context factors is presented.
In the comparison experiment, we use 80:20 split ratio for training and testing, and
repeat each split 5 times. We use cross-validation for calibration, the learning ratio we
set is 0.001, the regularization parameter k is selected from {0.0005, 0.01, 0.05, 0.1,
0.2, 0.4, 0.8, 1.2}, the maximum number of context clusters P is selected from {5, 10,
15, 20, 25, 30, 35}, the elastic parameter d is selected form {0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
0.7, 0.8, 0.9}. We employ RMSE for assessing the quality of the recommendation in
the case of D ¼ 10, D ¼ 15 and D ¼ 20, and Table 2 displays the comparison results.
From Table 2, we can observe that the above MF based context-aware recom-
mendation models significantly outperform the regular MF model without fusing
context information, which further prove the effectiveness of incorporating context
information in MF model. As for the three homogeneous context-aware recommen-
dation models, CAMF-C, CAMF-CC and CAMF-CI, which fuse explicit context
factors in different granularities, behave differently on different datasets. Specifically,
CAMF-C which is coarsest and assume that each contextual condition has a global
influence on all ratings, performs best on TencentWeibo dataset, while CAMF-CI, the
most grained, performs best on MBookCrossing dataset. Compared with the three
explicit context factors fusing models, LCMF performs better, indicating that using
latent contexts can improve the accuracy to some extent, which mainly because latent
contexts can reduce the opportunity for data sparsity problem. Fortunately, we discover
that IC-CARS performs best within the compared models. The significant performance
demonstrates that item-wise inner context clusters hidden in contextual information has
a good effect on recommendation results, and it further show the effectiveness of
IC-CARS.
the same as the comparison experiment in the last subsection, except that we fix the
regularization parameter k equal to 0.01, set D = 15 and select P from {10, 20, 30}.
Figure 1 show the experiment results.
Form Fig. 1, we can discovery that when d increases with the initial value 0.1,
RMSE gradually decrease (the lower the better), however, when it reach a certain
threshold (specifically, 0.3 for TencentWeibo and 0.5 for MBookCrossing observed
from Fig. 1), RMSE starts to increase, which mainly because that a lower value for d
make the model more grained, while a higher value make the model coarser. However,
there is a trade-off between the model granularity and the sample sparsity. Additionally,
Fig. 1 tells us that P is also a trade-off parameter, while is more stable than d.
Specifically, P = 20 is most suitable both on TencentWeibo and on MBookCrossing,
and the performance difference caused by P is much smaller than d.
huge difference hidden in Fig. 2 tell us that when we apply some distinguishing explicit
context factors in the hybrid model, the performance improvement may be great, while
meaningless explicit context factors can also reduce the performance.
In this paper, based on the intuition that the impact of context on item has its inner
patterns affecting the behavior of users to items, we firstly present a novel item-grain
context clustering method which mines context clusters for each item, and then we
incorporate context clusters into Matrix Factorization model to provide context-aware
recommendation. The experiment results show that our proposed outperforms the
state-of-the-art baseline models, and the complexity analysis indicates that it is scalable
to large datasets. Additionally, in view of the fact that our proposed cannot handle
distinguished explicit context factors well, we further propose a hybrid model adding
explicit context factors to IC-CARS, and the relevant experimentresults show that the
hybrid model outperforms IC-CARS when applied some distinguished explicit context
factors. However, how to select and recognize distinguished explicit factors is still a
problem to be solved, which is worthy of further study and will be the focus of our next
research.
References
1. Ricci, F., Rokach, L., Shapira, B.: Recommender Systems Handbook. Springer, New York
(2011)
2. Wang, L.C., Meng, X.W., Zhang, Y.J.: Context-aware recommender systems. J. Softw. 23,
1–20 (2012)
Context-Aware Recommender Systems Based on IC-CARS 13
3. Natarajasivan, D., Govindarajan, M.: Location based context aware user interface
recommendation system. In: International Conference on Informatics and Analytics,
Pondicherry, pp. 78–83. ACM (2016)
4. Karatzoglou, A., Amatriain, X., Baltrunas, L.: Multiverse recommendation: n-dimensional
tensor factorization for context-aware collaborative filtering. In: ACM Conference on
Recommender Systems, Barcelona, pp. 79–86. ACM (2010)
5. Baltrunas, L., Ludwig, B., Ricci, F.: Matrix factorization techniques for context aware
recommendation. In: ACM Conference on Recommender Systems, Chicago, pp. 301–304.
ACM (2011)
6. Unger, M., Bar, A., Shapira, B.: Towards latent context-aware recommendation systerms.
J. Knowl.-Based Syst. 104, 165–178 (2016)
7. Adomavicius, G., Sankaranarayanan, R., Sen, S.: Incorporating contextual information in
recommender systems using a multidimensional approach. J. ACM Trans. Inf. Syst. 23, 103–
145 (2005)
8. Adomavicius, G., Tuzhilin, A.: Context-aware recommender systems. In: ACM Conference
on Recommender Systems, Lausanne, pp. 335–336. ACM (2010)
9. Chen, A.: Context-aware collaborative filtering system: predicting the user’s preference in the
ubiquitous computing environment. In: Strang, T., Linnhoff-Popien, C. (eds.) LoCA 2005.
LNCS, vol. 3479, pp. 244–253. Springer, Heidelberg (2005). doi:10.1007/11426646_23
10. Wang, L.C., Meng, X.W., Zhang, Y.J., Shi, Y.C.: New approaches to mood-based hybrid
collaborative filtering. In: 2010 Workshop on CAMRa 2010, pp. 28–33. ACM, New York
(2010)
11. Shi, Y., Larson, M., Hanjalic, A.: Mining mood-specific movie similarity with matrix
factorization for context-aware recommendation. In: Proceedings of the CAMRa 2010,
pp. 34–40. ACM, New York (2010)
12. Shin, D., Lee, J.W., Yeon, J., Lee, S.G.: Context-aware recommendation by aggregating user
context. In: Proceedings of the CEC 2009, pp. 423–430. IEEE, Washington (2009)
13. Kuzelewska, U.: Clustering algorithms in hybrid recommender system on MovieLens data.
J. Stud. Logic Gramm. Rhetor. 37, 125–139 (2014)
14. Gao, Q.L., Ling, G., Yang, J.F.: A preference elicitation method based on users’ cognitive
behavior for context-aware recommender system. J. Chin. J. Comput. 9, 1767–1776 (2015)
Competitive Reinforcement Learning
in Atari Games
1 Introduction
Reinforcement Learning is a distinct area within the broader fields of machine
learning and artificial intelligence, utilising learning through action principles
against its understanding of the environment, in order to choose the best course
of action given some interpreted state, to achieve maximum long term reward
[11]. Iteratively the reinforcement learning approach investigates its available
actions and after some period of searching, refines a course of actions to achieve
its objective. Recent progress in the application of reinforcement learning with
deep networks has led to the DeepMind deep Q–learner network (DQN) [8],
which can autonomously learn to play Atari video games [1,7] at or above the
level of expert humans [10]. This system was able to consistently and con-
temptibly outperform a skilled human player across a variety of games with
different game objectives, and was shown to learn the same high level strategies
adopted by expert human players. As a result the DQN is considered to be the
state of the art approach for reinforcement learning.
While DQNs were demonstrated to outperform human players at Atari
games, one limitation is that the DQN agents received specialist training on
c Springer International Publishing AG 2017
W. Peng et al. (Eds.): AI 2017, LNAI 10400, pp. 14–26, 2017.
DOI: 10.1007/978-3-319-63004-5 2
Competitive Reinforcement Learning in Atari Games 15
a single game. Whereas human agents are able learn and play multiple games.
The problem of interest in our research is the ability of a DQN agent to perform
multiple tasks, which we will assess through its ability to learn multiple Atari
games.
Previous research into reinforcement learning agents performing multiple
tasks has created the technique known as distilling, which was first presented
by [2] and later adapted by [3] for neural networks. Using this approach DQN
agents are first trained on individual games and then fused together to form one
agent which applies to both games [9].
Recent work has demonstrated that when learning two tasks sequentially a
DQN will suffer from catastrophic forgetting [4]. Where the process of learning a
second task will destroy the network weights associated with the first task. This
issue with sequential learning can be mitigated using elastic weight consolidation,
where weights that are important for performing the first task are protected from
large alterations when learning the second task [4].
The training method we propose is to simultaneously learn two games, effec-
tively ‘competing’ for representational space within the neural network. This
differs from the original DeepMind DQN [8] where the entire representational
space of the neural network was specialized for a single game.
An interesting research question is how the representational space of the
neural network will be allocated in a competitive learning environment; will the
representational space be shared or segregated for the two tasks? An example of
this being how well can the DQN agent perform at both Breakout, and Space
Invaders; where one game is required to avoid falling objects, and another is
required to hit and return falling objects. Our hypothesis is that the DQN will
suffer significantly when learning competing tasks. The full extent of this detri-
ment to performance being a function of the differences in the tasks at hand.
A corollary supposition being that the architecture will increasingly segregate
expressive power across the network for the alternate tasks as a function of the
perceived differences in action for reward. This supposition is corroborated by
success of the distilling approach [9] where a fuzed network is formed from two
individual networks.
Another hypothesis is that a DQN agent that is trained in a competitive
environment will generalise to new unforeseen tasks better than a specialist
DQN agent that is trained on a single task.
Our contribution is an evaluation of one DQN agent acting on two environ-
ments, and this differs from [9] where two agents learning on separate environ-
ments are combined, and [4] where one agent acts and learns on a sequence of
environments.
The data used for training and testing was presented in [7,8] in which Atari
2600 games are used for DQN performance evaluation. For the purposes of this
research pairs of games were selected according to different concepts of which
16 M. McKenzie et al.
two are shown here; a pair of similar games i.e. functionally similar objectives
and reward schemes, and a pair of high performing games i.e. those which the
original code base easily converged on a suitable reward strategy. A common
learning agent was then expected to learn both environment pairs competitively
as depicted in Fig. 1. A baseline of performance was completed to ensure that
the changes made to the code to assess multiple games did not alter its ability
to converge to a similar performance as per the original research.
3 Experimental Results
The following section details the analysis results for the two presented scenar-
ios consisting of a pair of similar games, and a pair of high performing games.
The results presented show the relative performance decrease caused by com-
peting environment learning as well as how well the learnt competitive agent
Competitive Reinforcement Learning in Atari Games 17
then performed against unseen games. All comparisons are made relative to the
performance measured by the original research presented by Mnih et al. [8].
Fig. 2. Measured performance of the game Space Invaders across the competitive Rein-
forcement Learning process of Space Invaders and Demon Attack
Fig. 3. Measured performance of the game Demon Attack across the competitive Rein-
forcement Learning process of Space Invaders and Demon Attack