Professional Documents
Culture Documents
General Chairs
IEEE Symposium Series on Computational Intelligence
Keeley Crockett, UK
Sanaz Mostaghim, Germany IEEE SSCI 2021
Program Chairs
Alice Smith, USA
December 4th - 7th 2021, Orlando, Florida, USA
Carlos A. Coello Coello, Mexico
IEEE SSCI is an established flagship annual international series of symposia on
Finance Chair
computational intelligence sponsored by the IEEE Computational Intelligence Society. IEEE
Piero Bonissone, USA
SSCI 2021 promotes and stimulates discussion on the latest theory, algorithms,
Publications Chairs
applications and emerging topics on computational intelligence across several unique
Dipti Srinivasan, Singapore
Anna Wilbik, Netherlands symposia, providing an opportunity for cross-fertilization of research and future
collaborations. For more information visit http://attend.ieee.org/ssci-2021
Conflict of Interest Chair
Marley Vellasco, Brazil
Registration Chairs
IEEE Computational Intelligence Symposia on
Julia Cheung, Taiwan Adaptive Dynamic Programming and Explainable Data Analytics in Computational
Steven Corns, USA
Reinforcement Learning (IEEE ADPRL) Intelligence (IEEE EDACI)
Keynote Chairs CI in Agriculture (IEEE CIAg) Evolutionary Learning (IEEE EL)
Bernadette Bouchon-Meunier,
CI for Brain Computer Interfaces (IEEE Evolutionary Neural Architecture Search and
France
Gary Yen, USA CIBCI) Applications (IEEE ENASA)
Manuel Roveri, Italy Computational Intelligence in Big Data (IEEE Evolutionary Scheduling and Combinatorial
Special Session Chair CIBD) Optimisation (IEEE ESCO)
Marde Helbig, Australia CI in Biometrics and Identity Management Ethical, Social and Legal Implications of
Tutorial Chairs (IEEE CIBIM) Artificial Intelligence (IEEE ETHAI)
Sansanee Auephanwiriyakul, CI in Control and Automation (IEEE CICA) CI in Feature Analysis, Selection and
Thailand CI in Healthcare and E-health (IEEE Learning in Image and Pattern Recognition
Daniel Ashlock, Canada CICARE) (IEEE FASLIP)
Manuel Roveri, Italy CI in Cyber Security (IEEE CICS) Foundations of CI (IEEE FOCI)
Submission Chair CI in Data Mining (IEEE CIDM) Evolvable Systems (IEEE ICES)
Nicolò Navarin, Italy CI in Dynamic and Uncertain Environments Immune Computation (IEEE IComputation)
Local Organizing Chair (IEEE CIDUE) Intelligent Agents (IEEE IA)
Zhen Ni, USA CI and Ensemble Learning (IEEE CIEL) Multi-agent System Coordination and
Travel Grant Chair CI for Engineering Solutions (IEEE CIES) Optimization (IEEE MASCO)
Pauline Haddow, Norway CI for Human-like Intelligence (IEEE CIHLI) Model-Based Evolutionary Algorithms (IEEE
Webmaster Chair CI in IoT and Smart Cities (IEEE CIIoT) MBEA)
Jen-Wei Huang CI for Multimedia Signal and Vision Multicriteria Decision-Making (IEEE MCDM)
Processing (IEEE CIMSIVP) Nature-Inspired Computation in Engineering
Whova Chair
Albert Lam, Hong Kong CI in Remote Sensing (IEEE CIRS) (IEEE NICE)
CI for Security and Defence Applications Robotic Intelligence in Informationally
Conference Activities Chair
Bing Xue, New Zealand (IEEE CISDA) Structured Space (IEEE RiiSS)
Deep Learning (IEEE DL) Cooperative Metaheuristics (IEEE SCM)
Social Event Chair
Jo-Ellen Synder, USA Evolving and Autonomous Learning Systems Differential Evolution (IEEE SDE)
(IEEE EALS) Swarm Intelligence Symposium (IEEE SIS)
Publicity Chairs
Jialin Liu, China
Joao Carvalho, Portugal
Matt Garrat, Australia
Important Dates Call for Papers
Special Session Proposals: May 28th 2021 Papers for IEEE SSCI 2021 should be submitted
Tutorial Proposals: May 28th 2021 electronically using the conference website: http://
Paper Submissions: August 6th 2021 attend.ieee.org/ssci-2021 and will be reviewed by
experts in the fields and ranked based on the crite-
Paper Acceptance: September 17th 2021
ria of originality, significance, quality and clarity.
Camera ready Paper: October 15th 2021
Features
10 Evolutionary Multi-Objective Model
Compression for Deep Neural Networks
by Zhehui Wang, Tao Luo, Miqing Li, Joey Tianyi Zhou,
Rick Siow Mong Goh, and Liangli Zhen
22
Fast and Unsupervised Neural Architecture
Evolution for Visual Representation Learning
by Song Xue, Hanlin Chen, Chunyu Xie, Baochang Zhang,
Xuan Gong, and David Doermann
33
Self-Supervised Representation Learning
for Evolutionary Neural Architecture Search
by Chen Wei,Yiping Tang, Chuang Niu, Haihong Hu,Yue Wang, and Jimin Liang
50
Forecasting Wind Speed Time Series
Via Dendritic Neural Regression
on the cover
©SHUTTERSTOCK.COM/ANDREY SUSLOV
by Junkai Ji, Minhui Dong, Qiuzhen Lin, and Kay Chen Tan
67
A Self-Adaptive Mutation Neural Architecture
Search Algorithm Based on Blocks
by Yu Xue,Yankang Wang, Jiayu Liang, and Adam Slowik
SFI-01681
Digital Object Identifier 10.1109/MCI.2021.3056259 AUGUST 2021 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 1
CIM Editorial Board Chuan-Kang Ting Editor’s
Editor’s
Editor-in-Chief
Chuan-Kang Ting
National Tsing Hua University, TAIWAN Remarks
National Tsing Hua University
Department of Power Mechanical Engineering
No. 101, Section 2, Kuang-Fu Road
Hsinchu 30013, TAIWAN
(Phone) +886-3-5742611
(Email) ckting@pme.nthu.edu.tw
Founding Editor-in-Chief
Gary G.Yen, Oklahoma State University, USA
The Evolution of Neural Networks
Past Editors-in-Chief
Kay Chen Tan, Hong Kong Polytechnic
University, HONG KONG
Hisao Ishibuchi, Southern University of Science and
Technology, CHINA
Editors-At-Large
Piero P. Bonissone, Piero P Bonissone Analytics
LLC, USA
David B. Fogel, Natural Selection, Inc., USA
Vincenzo Piuri, University of Milan, ITALY
Marios M. Polycarpou, University of Cyprus,
The focus on evolutionary neural architecture search in this
issue has driven me to ponder the evolution of biological
neural networks, or rather, the human brain. Researchers in
neuroanatomy found that the cognitive and mental development
of humans is attributed to the increase of our brain size through-
CYPRUS
Jacek M. Zurada, University of Louisville, USA out evolution. Yet bigger is not always better. It was also discov-
ered that, once the brain reaches a certain size, further growth
Associate Editors
José M. Alonso, University of Santiago de will only render the brain less efficient. In addition, the brain is limited and affect-
Compostela, SPAIN ed by its inherent architecture and signal processing time. These discoveries in neu-
Erik Cambria, Nanyang Technological University,
SINGAPORE roscience are interestingly analogous to the advances in neural networks and
Liang Feng, Chongqing University, CHINA evolutionary computation. Stacking of perceptrons empowers artificial neural net-
Barbara Hammer, Bielefeld University, GERMANY
Eyke Hüllermeier, University of Munich, GERMANY works to solve complex problems but detracts from their efficiency. Indeed, the
Sheng Li, University of Georgia, USA human brain evolution and the artificial neural network evolution both face the
Hsuan-Tien Lin, National Taiwan University,
TAIWAN trade-off between capacity and efficiency. The workings of nature truly give us a
Hongfu Liu, Brandeis University, USA lot to mull over.
Zhen Ni, Florida Atlantic University, USA
Yusuke Nojima, Osaka Prefecture University, JAPAN This issue includes five Features articles. The first article proposes an evolutionary
Nelishia Pillay, University of Pretoria, SOUTH AFRICA multi-objective model compression approach to simultaneously optimize the model
Danil Prokhorov, Toyota R&D, USA
Kai Qin, Swinburne University of Technology, size and accuracy of a deep neural network. The second and third articles adopt the
AUSTRALIA notion of self-supervised learning in the neural architecture evolution, resulting in
Rong Qu, University of Nottingham, UK
Ming Shao, University of Massachusetts state-of-the-art performance. To improve the accuracy of wind speed forecasting, the
Dartmouth, USA fourth article uses an evolutionary algorithm to optimize the architecture of dendritic
Vincent S. Tseng, National Chiao Tung University,
TAIWAN neural regression. The fifth article presents a self-adaptive mutation strategy for block-
Kyriakos G.Vamvoudakis, Georgia Tech, USA based evolutionary neural architecture search on convolutional neural networks.
Nishchal K.Verma, Indian Institute of Technology
Kanpur, INDIA In the Columns, the article proposes a multi-view feature construction method
Handing Wang, Xidian University, CHINA based on genetic programming and ensemble techniques. The approach can automati-
Dongrui Wu, Huazhong University of Science and
Technology, CHINA cally generate high-level features from multiple views; moreover, a new fitness func-
Bing Xue, Victoria University of Wellington, tion is developed to improve the discriminability of these features. Experimental
NEW ZEALAND
results show the effectiveness of the proposed approach in rolling bearing fault diag-
IEEE Periodicals/ nosis with a limited amount of data.
Magazines Department
Managing Editor, Mark Gallaher
We hope you will enjoy the articles in this issue. If you have any suggestions or
Senior Managing Editor, Geri Krolin-Taylor feedback for this magazine, please do not hesitate to contact me at ckting@pme.nthu
Senior Art Director, Janet Dudar .edu.tw.
Associate Art Director, Gail A. Schnitzer
Production Coordinator, Theresa L. Smith
Director, Business Development—
Media & Advertising, Mark David
Advertising Production Manager,
Felicia Spagnoli
Production Director, Peter M. Tuohy
Editorial Services Director, Kevin Lisankie
Senior Director, Publishing Operations,
Dawn Melley
E
Vice President – Publications – Kay Chen Tan,
thics in Artificial Intelligence (AI) is discussed everywhere, Hong Kong Polytechnic University, HONG KONG
in governmental circles as well as scientific forums. It is very Vice President – Technical Activities –
often associated with the two widely used concepts of Luis Magdalena, Universidad Politecnica
de Madrid, SPAIN
diversity and eXplainable Artificial Intelligence (XAI). The latter
was promoted by DARPA1, and it is expected to propose meth- Publication Editors
ods preserving the rights of users to understand how the AI sys- IEEE Transactions on Neural Networks
tems work and why decisions are made. Computational and Learning Systems
Haibo He, University of Rhode Island, USA
Intelligence methods have much to contribute to this effort to ensure that AI sys- IEEE Transactions on Fuzzy Systems
tems are ethical. Jon Garibaldi, University of Nottingham, UK
In July 2020, the European Union has published an Assessment List for Trustwor- IEEE Transactions on Evolutionary Computation
thy Artificial Intelligence (ALTAI)2 covering a wide range of ethical issues. Among the Carlos A. Coello Coello, CINVESTAV-IPN,
MEXICO
requirements listed, we find transparency, that encompasses the traceability of the data IEEE Transactions on Games
and model used to make a decision, the explainability of decisions and the communi- Julian Togelius, New York University, USA
cation with the user. Transparency mainly corresponds to the general approach of XAI IEEE Transactions on Cognitive and
Developmental Systems
and Computational Intelligence has much to do to participate in the effort of trans-
Yaochu Jin, University of Surrey, UK
parency of XAI, by means of the various methods of automatic generation of explana- IEEE Transactions on Emerging Topics in
tions, the use of counterfactuals, the help of visualization or the construction of fuzzy Computational Intelligence
systems providing interpretability and expressiveness, for example. Yew Soon Ong, Nanyang Technological University,
SINGAPORE
Another ALTAI requirement sounds familiar to us, dealing with diversity, non-dis-
IEEE Transactions on Artificial Intelligence
crimination and avoidance of unfair bias. Such concerns are at the heart of the IEEE Hussein Abbass, University of New South Wales,
Computational Intelligence’s actions, which is very committed to their respect in its AUSTRALIA
scientific and associative life. It is therefore natural to also consider these properties as
Administrative Committee
essential for all algorithms and products that are parts of AI systems, not only in the
collection of data and choice of attributes leading to a decision, but also in the ability Term ending in 2021:
David Fogel, Natural Selection, Inc., USA
of all categories of users to interact with the AI system regardless of their characteris- Barbara Hammer, Bielefeld University,
tics. Privacy and data protection is another aspect of the respect of human users. GERMANY
Other requirements focus more on the cognitive components of the construction Yonghong (Catherine) Huang,
of AI systems, the effect they can have on human users, the possible governance of McAfee LLC, USA
Xin Yao, Southern University of Science
human agents in AI systems and the integration of AI systems in the human society. and Technology, CHINA
The technical robustness of the AI system, its reliability, its security and its capacity to Jacek M. Zurada, University of Louisville, USA
identify and mitigate risks in a transparent way are not less important. Term ending in 2022:
Machine ethics and the notions of responsibility of AI systems or artificial moral Cesare Alippi, Politecnico di Milano, ITALY
agents3 are also topics of research that deserve a lot of attention. Consciousness in James C. Bezdek, USA
Gary Fogel, Natural Selection, Inc., USA
AI has been debated for a long time4 and it can take a form different from human Yaochu Jin, University of Surrey, UK
consciousness5. Multidisciplinarity, and in particular the cooperation with cognitive Alice E. Smith, Auburn University, USA
Term ending in 2023:
Piero P. Bonissone, Piero P Bonissone Analytics
1
https://www.cc.gatech.edu/~alanwags/DLAI2016/(Gunning)%20IJCAI-16%20DLAI%20WS.pdf LLC, USA
2
https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai Oscar Cordón, University of Granada, SPAIN
3
https://plato.stanford.edu/entries/ethics-ai/#MachEthi
4 Pauline Haddow, Norwegian University of Science
McCarthy, John (1996). Making robots conscious of their mental states. In S. Muggleton (ed.), _Machine Intelligence 15_.
Oxford University Press. and Technology, NORWAY
5
Pitrat, Jacques (2009). Artificial Beings : The Conscience of a Conscious Machine. Wiley, 2009 Haibo He, University of Rhode Island, USA
Hisao Ishibuchi, Southern University of Science
and Technology, CHINA
Digital Object Identifier 10.1109/MCI.2021.3084388
Date of current version: 15 July 2021 Digital Object Identifier 10.1109/MCI.2021.3056261
TAP.
CONNECT.
NETWORK.
SHARE.
Connect to IEEE–no matter where you are–with the IEEE App.
Stay up-to-date Schedule, manage, or Get geo and interest-based
with the latest news join meetups virtually recommendations
D eep neural networks (DNNs) have This special issue has brought together The second paper, titled “Fast and
shown significantly promising per- researchers to report state-of-the-art con- Unsupervised Neural Architecture Evolu-
formance in addressing real-world tributions on the latest research and devel- tion for Visual Representation Learning”
problems, such as image recognition, nat- opment, up-to-date issues, challenges, and by S. Xue et al., proposes the FaUNAE
ural language processing and self-driving. applications in the field of ENAS. Follow- algorithm to improve the unsupervised
The achievements of DNNs owe largely ing a rigorous peer review process, five visual representation learning ability
to their deep architectures. However, papers have been accepted for publication against the supervised peer competitors.
designing an optimal deep architecture for in this special issue. Specifically, FaUNAE employed the evo-
a particular problem requires rich domain The first paper included in the special lutionary algorithm to search for promis-
knowledge on both the investigated data issue is entitled “Evolutionary Multi- ing neural architectures from an existing
and the neural network domains, which is Objective Model Compression for Deep architecture designed by experts or exist-
not necessarily held by the end-users. Neural Networks” authored by Z.Wang et ing NAS algorithms that focus on the
Neural architecture search (NAS), as an al., which aims at accelerating the inference transferability from a small dataset to a
emerging technique to automatically speed of DNNs by optimizing the model larger dataset. To reduce the search cost
design the optimal deep architectures size and accuracy simultaneously with evo- and enhance the search efficiency, the
without requiring such expertise, is draw- lutionary algorithms.The architecture pop- prior knowledge and the inferior, as well
ing increasing attention from industry and ulation evolution was employed to explore as least promising operations, were used
academia. However, NAS is theoretically a and exploit the network space of pruning during the evolutionary process. In addi-
non-convex and non-differentiable opti- and quantization. In addition, a two-stage tion, the contrast-loss function was uti-
mization problem, and existing methods co-optimizing strategy of pruning and lized as the evaluation metric in a
are incapable of well addressing it. Evolu- quantization was proposed to significantly student-teacher framework to achieve the
tionary computation approaches, particu- reduce time cost during the architecture self-supervised evolution. FaUNAE was
larly genetic algorithms, particle swarm search process. To further lower energy evaluated on four widely used large-scale
optimization and genetic programming, consumption and reduce the model size, benchmark datasets. The results demon-
have shown superiority in addressing real- various dataflow designs and parameter strated the effectiveness of FaUNAE
world problems due largely to their pow- coding schemes were also considered in the upon various downstream applications,
erful abilities in searching for global optimization process. Unlike most related including object recognition, object
optima, dealing with non-convex/non- research solely focusing on reducing the detection, and instance segmentation.
differentiable problems, and requiring no model size while maintaining the model The third paper entitled “Self-Super-
rich domain knowledge. In this regard, accuracy, the work in this paper can achieve vised Representation Learning for Evolu-
deep neural architecture designed by evo- a trade-off between different model sizes tionary Neural Architecture Search” by
lutionary computation approaches, so and model accuracies, which meets the C. Wei et al. proposes a novel performance
called evolutionary neural architecture requirements for most edge devices in real- predictor, aiming to enhance the efficien-
search (ENAS), have attracted the interest world scenarios. The experimental results cy of ENAS algorithms. Specifically, the
of many researchers. demonstrated that the proposed algorithm ENAS algorithms are often computation-
can obtain a broad range of compact ally expensive in practice because hun-
Digital Object Identifier 10.1109/MCI.2021.3084391 DNNs for diverse memory usage and dreds of DNNs needed to be trained
Date of current version: 15 July 2021 energy consumption requirements. during the search process. Performance
D
I. Introduction significant loss of accuracy, the proposed approach attempts to
eep neural networks (DNNs) are artificial neural net- achieve a good balance between desired objectives, for meeting the
works with more than three layers (i.e., more than one requirements of different edge devices. Experimental results dem-
hidden layer), which progressively extract higher-level onstrate that the proposed approach can obtain a diverse popula-
features from the raw input in the learning process. tion of compact DNNs for customized requirements of accuracy,
They have delivered the state-of-the-art accuracy on various memory capacity, and energy consumption.
real-world problems, such as image classification, face recogni- The novelty and main contributions of this work can be
tion, and language translation [1]. The superior accuracy of summarized as follows:
DNNs, however, comes at the cost of high computational and ❏❏ The model compression problem is formulated as a multi-
space complexity. For example, the VGG-16 model [2] has about objective problem. The optimal solutions are searched in the
138 million parameters, which requires over 500 MB memory network pruning and quantization space using a popula-
for storage and 15.5G multiply-and-accumulates (MACs) to tion-based algorithm.
process an input image with 224 × 224 pixels. In myriad applica- ❏❏ To speed up the population evolution, a two-stage pruning/
tion scenarios, it is desirable to make the inference on edge quantization co-optimization strategy is developed based on
devices rather than on cloud, for reducing the latency and the orthogonality between pruning and quantization.
dependency on connectivity and improving privacy and security. ❏❏ The trade-offs between accuracy, energy efficiency, and
Many of the edge devices that draw the DNNs inference have model size in model compression are explored by consider-
stringent limitations on energy consumption, memory capacity, ing different dataflow designs and parameter coding schemes.
etc. The large-scale DNNs [3], [4] are usually difficult to be The experimental results demonstrate that the proposed
deployed on edge devices, thus hindering their wide application. method can obtain a set of diverse Pareto optimal solutions in
Efficient processing of DNNs for inference has become increas- a single run. Also, it achieves a considerably higher energy
ingly important for the deployment on edge devices. For generat- efficiency than current state-of-the-art methods.
ing efficient DNNs, many neural architecture search (NAS)
approaches have been developed in recent years [5]–[7]. One way II. Preliminaries
of carrying out NAS is to search from scratch [8], [9]. In contrast, Network pruning and quantization are two commonly used
model compression1 [10] searches for the optimal networks starting model compression techniques to improve the energy efficiency
from a well-trained network. For instance, to reduce the storage in model inference and/or to shrink the size of the model.
requirement of DNNs, Han et al. proposed a three-stage pipeline Moreover, the dataflow design employed by edge devices and the
(i.e., pruning, trained quantization, and Huffman coding) to com- coding scheme applied to store the weight matrix both have a
press redundant weights [10]. Wang et al. suggested removing significant impact on the performance of model compression.
redundant convolution filters to reduce the model size [11]. Rather
than reducing the model size, a few attempts [12], [13] are conduct- A. Network Pruning and Quantization
ed to compress DNNs directly by taking the energy consumption For making the training easy, the networks are usually over-
as the feedback signals. They have achieved promising results in parameterized [14]. Pruning is a widely-used model compression
reducing the size of weight parameters (or energy consumption). technique that can effectively reduce the energy consumption of
However, these approaches require the model to achieve approxi- edge devices and shrink the model size [10]. Network pruning
mately no loss of accuracy, rendering the solution less flexible. removes some of the redundant parameters in the network by
In practice, different users often have distinct preferences on setting their values as zeros. A well-trained neural network usually
desirable objectives, e.g., accuracy, model size, energy efficiency, and contains a large number of weights whose values are relatively
latency, when they select the optimal DNN model for their appli- small compared to other parameters. In most cases, these parame-
cations. In this paper, a novel approach, called Evolutionary Multi- ters are not particularly important when performing model infer-
Objective Model Compression (EMOMC), is proposed to ence. Hence, one can sort all the parameters in the model and
optimize energy efficiency/model size and accuracy simultaneous- replace those parameters with the least absolute values by zeros,
ly. By considering network pruning and quantization, the model while the accuracy of the model can still be maintained. For
compression is formulated as a multi-objective problem under dif- instance, the pruning amount to be 33%, then one-third of the
ferent dataflow designs and parameter coding schemes. Each can- parameters in the model will be replaced by zeros. In the infer-
didate architecture can be regarded as an individual in the ence process, if the processing elements (PEs)2 whose input
1
The technique aims to shrink the size of the neural network model without a signifi-
2
cant drop of accuracy. The PE is a basic unit to conduct computation in processors.
A. Model Compression
FIGURE 2 The CSR coding scheme with relative positions, which
Model compression aims to compress and accelerate DNN stores the values of non-zero elements and their relative positions in
models. Different approaches target different objectives, such as the array.
1 2.5 6
Energy Consumption (mJ)
0.8 2 5
0.6 1.5 4
0.4 1 3
0.2 0.5 2
0 0 1
0.7 0.75 0.8 0.85 0.9 0.75 0.8 0.85 0.9 0.95 0.8 0.85 0.9 0.95 1
Accuracy Accuracy Accuracy
MobileNet VGG-16 LeNet-5
FIGURE 5 The solution sets obtained from the bi-objective optimization of accuracy and energy consumption on CIFAR-10 (MobileNet and VGG-
16) and MNIST (LeNet-5). The four different dataflow designs are marked with different colors. In the legends, the quoted number after the data-
flow design indicates its energy consumption (mJ) on the original model before the model compression.
7.5 25 125
Model Size (MBytes)
6 20 100
4.5 15 75
3 10 50
1.5 5 25
0 0 0
0.7 0.75 0.8 0.85 0.9 0.75 0.8 0.85 0.9 0.95 0.8 0.85 0.9 0.95 1
Accuracy Accuracy Accuracy
MobileNet VGG-16 LeNet-5
FIGURE 6 The solution sets obtained from the bi-objective optimization of accuracy and model size on CIFAR-10 (MobileNet and VGG-16) and
MNIST (LeNet-5). The three different coding schemes are marked with different colors. In the legends, the quoted number after normal coding
scheme indicates the size of the original model before the model compression.
8 100 8
Normalized Score
Normalized Score
Normalized Score
6 75 6
4 50 r /τ = 5 4 r /τ = 5
r/τ = 5
r /τ = 1 r /τ = 1
2 r/τ = 1 25 2
r /τ = 0.2 r /τ = 0.2
r/τ = 0.2
0 0 0
0.7 0.75 0.8 0.85 0.9 0.75 0.8 0.85 0.9 0.95 0.8 0.85 0.9 0.95 1
Accuracy Accuracy Accuracy
MobileNet VGG-16 LeNet-5
FIGURE 7 The aggregation scores on CIFAR-10 (MobileNet and VGG-16) and MNIST (LeNet-5) under three different reward over penalty ratios.
The scores are individually normalized by the aggregation score obtained by the uncompressed models.
different parameter coding schemes. Each point stands for one where f1 is the accuracy of the model, and f2 is the corre-
compressed model in the solution set obtained by EMOMC. sponding energy consumption. When classifying an image, if
The results demonstrate that in terms of diversity the solution the result is correct, a reward r can be obtained; otherwise, a pen-
sets show similar patterns with the bi-objective optimization of alty x is performed. By giving a fixed amount of energy budget,
accuracy and energy consumption. the number of images that can be classified is inversely propor-
Furthermore, it can be observed that although COO and tional to the energy consumed per image f2 . From Equation (8),
CSR are developed to store a sparse matrix, sometimes they it can be seen that one of the key parameters in this aggregation
do not save the memory space for the compressed models, score system is the ratio between the reward and the penalty r/x ,
compared with the normal coding scheme. For example, if which indicates the significance of accuracy. The selection of the
pursuing high model accuracy, the normal coding scheme is optimal solution highly depends on the ratio r/x .
the best one among the three coding schemes for
MoblieNet. The reason is that although the COO and CSR
coding schemes only store non-zero elements, they still need
Ratio of Energy Cons.
23
VGG-16, as shown in Figure 5, if 2% of accuracy loss is 22
acceptable, the energy consumption can be reduced by around 21
1
80%. In the solution sets displayed in Figure 5, there are some
2–1
knee points if considering the balance of both the model accu- 2–2
racy and the energy consumption. To help users select the 2–3
0.70 0.75 0.80 0.85 0.90 0.95
model for deployment on edge devices, a new metric called
Accuracy
aggregation score is defined as:
FIGURE 9 The model size of VGG-16 over the model size of MobileNet
AScore = (f1 $ r + (1 - f1) $ x) /f2, (8) under different accuracy scores on CIFAR-10.
Digital Object Identifier 10.1109/MCI.2021.3084394 Corresponding author: Baochang Zhang (e-mail: bczhang@buaa.edu.cn).
Date of current version: 15 July 2021 Co-first authors: Song Xue, and Hanlin Chen
L
I. Introduction label and discovers that labels are not necessary for NAS, it cannot
earning high-level representations from labeled data and solve the aforementioned problems because it is computationally
deep learning models in an end-to-end manner is one of expensive and is trained using supervised learning for real applica-
the biggest successes in computer vision in recent history. tions. FaUNAE is introduced to evolve an architecture from an
These techniques make manually specified features largely existing architecture manually designed or searched from one
redundant and have greatly improved the state-of-the-art for many small-scale dataset on another large-scale dataset. This partial opti-
real-world applications, such as medical imaging, security, and mization can utilize the existing models to reduce the search cost
autonomous driving. However, there are still many challenges. For and improve search efficiency. The strategy is more practical for
example, a learned representation from supervised learning for real applications, as it can efficiently adapt to the new scenarios’
image classification may lack information such as texture, which minimal data labeling requirements.
matters little for classification but can be more relevant for later First, we adopt a trial-and-test method to evolve the initial
tasks. Yet adding it makes the representation less general, and it architecture, which is more efficient than the traditional evolution
might be irrelevant for tasks such as image captioning. Thus, methods, which are computationally expensive and require large
improving representation learning requires features to be focused amounts of labeled data. Second, we note that the quality of the
on solving a specific task. Unsupervised learning is an important architecture is hard to estimate due to the absence of labeled data.
stepping stone towards robust and generic representation learning To address this, we explore contrastive loss [1] as the evaluation
[1]. The main challenge is the significant performance gap com- metric for the operation evaluation. Although our method is built
pared to supervised learning. Neural architecture search (NAS) has based on contrastive loss [1], we model our method on the teach-
shown extraordinary potential in deep learning due to the cus- er-student framework to mimic supervised learning and then
tomization of the network for target data and tasks. Intuitively, estimate the operation performance even without annotations.
using the target data and searching the tasks directly without a Then the architecture can be evolved based on the estimated per-
proxy gap will result in the least domain bias. For performance rea- formance. Third, we address the fact that one bottleneck in NAS
sons however, early NAS [2] methods only searched small datasets. is its explosive search space of up to 148. The search space issue is
Later, transfer methods were utilized to search for an optimal even more challenging for unsupervised NAS built on an ambig-
architecture on one dataset, then adapt the architecture to work on uous performance estimation that further deteriorates the training
larger datasets [3]. ProxylessNAS [4] was proposed to search larger process. To address this issue, we build our search algorithm based
datasets after directly learning the architecture for the target task on the principles of survival of the fittest and elimination of the
and hardware, instead of using a proxy. Recent advances in NAS inferior. This significantly improves search efficiency. Our frame-
show a surge of interest in neural architecture evolution (NAE) work is shown in Fig. 1. Our contributions are as follows:
[3], [5]. NAE first searches for an appropriate architecture on a ❏❏ We propose unsupervised neural architecture evolution,
small dataset and then transfers the architecture to larger dataset, by which utilizes prior knowledge to search for an architecture
simply adjusting weights. One key to the success of NAS or NAE using large-scale unlabeled data.
is the use of large-scale data, suggesting that more data leads to bet- ❏❏ We design a new search block and limit the model size
ter performance.The problem, however, is that the cost of labeling according to the initial architecture during evolution to deal
these larger datasets may become prohibitive as their size grows. with the huge search space of ResNet. The search space is
In scenarios where we can not obtain sufficient annotations, further reduced through a contrastive loss in a teacher-stu-
self-supervised learning is a popular approach to leverage the dent framework by abandoning operations with less poten-
mutual information of unlabeled training data. However, the per- tial to significantly improve the search efficiency.
formance of unsupervised methods is still unsatisfactory compared ❏❏ Extensive experiments demonstrate that the proposed
with the supervised methods. One obstacle is that only the param- methods achieve better performance than prior art on Ima-
eters are learned using conventional self-supervised methods. To geNet, PASCAL VOC, and COCO.
address the performance gap, a natural idea is to explore the use of
NAE to optimize the architecture along with parameter training. II. Related Work
Tackling these bottlenecks will allow us to design and implement
efficient deep learning systems that will help to address a variety of A. Unsupervised Learning
practical applications. Specifically, we can initialize with an archi- Recent progress on unsupervised/self-supervised1 learning
tecture found using NAS on a small supervised dataset and then began with artificially designed pretext tasks, such as patch
evolve the architecture on a larger dataset using unsupervised
learning. Currently, existing architecture evolution methods [3], 1
Self-supervised learning is a form of unsupervised learning.
FIGURE 1 The main framework of the proposed Teacher-Student search strategy. Left: Search space of FaUNAE. Middle: Evolution process of FaUNAE. Right: FaUNAE reduces the search space by
SAConv
5×5
adapts to new datasets with unsupervised learning. We first
introduce an evolution strategy to effectively evolve the
SAConv
architecture from an existing architecture to a new one. We
3×3
Search Space
then train our network by exploiting the contrastive learn-
OUTPUT
INPUT
ing to discriminate between positive samples and negative
samples. Our framework is shown in Fig. 1.
Conv
5×5
A. Search Space
For different datasets, our architecture space is different. For
Conv
3×3
CIFAR10 [33], we search for two kinds of computation
i
2logN
architecture. The reduction cells are located at 1/3 and 2/3
saki
Search Space
of the total depth of the network, and the rest are normal
w(ok,n) = l(ok,n) –
Reduce
k
cells. Following [14], the set of operations include skip
connection (identity), 3 # 3 convolutions, 5 # 5 convolu-
i
tions, 3 # 3 depth-wise separable convolutions, 5 # 5
i
depth-wise separable convolutions,3 # 3 max pooling,
i
Contrastive
i
3 # 3 average pooling, and zero. The search space of a cell
Loss
consists of operations on all edges.
On ImageNet, we have experimentally determined that
s0
s1
s2
t
for unsupervised learning, ResNet [34] is better than cell-
based methods for building an architecture space. We denote
Teacher
Student
SAConv
SAConv
this space as {X i}, where i represents a given block. Rather 5×5
5×5
than repeating the bottleneck (building block in ResNet)
with various operations, however, we allow a set of search
SAConv
SAConv
3×3
3×3
blocks shown in Fig.2 (a) with various operations, including
OUTPUT
OUTPUT
traditional convolution with kernel size {3, 5}, split-atten-
INPUT
INPUT
tion convolution (SAConv) [35] with kernel sizes {3, 5} and
radixes {2, 4}. SAConv is a Split-Attention block [35] which
Conv
Conv
5×5
5×5
incorporates feature map split attention within the individu-
al network blocks. Each block divides the feature map into
Conv
Conv
3×3
Search Space
2
The evolution is based on unsupervised learning.
the operation in each block in a trial-and-test manner. We first where sa k represents the sampling times of the operation o ik .
i
mutate the operation based on its mutation probability, followed The operation o ik represents the kth operation in the ith block
by an evaluation step to make sure the mutation is ultimately used. and n represents current number of evolutions or the index of
Mutation: An initial structure a 0 is manually designed (e.g., epoch. In general, the operation in each block is kept constant
ResNet-50)3 or searched by another NAS (e.g., ProxylessNAS with a probability 1 - e in the beginning. For the mutation,
[4]) on a different dataset using supervised learning. The initial younger (less sample time) operations are more likely to be
subnetwork fis, which is generated by searching over-parameter- selected, with a probability of e, Intuitively, keeping the opera-
ized network based on a 0, is then trained using (6) for k steps to tion constant can be thought of as providing exploitation,
obtain the evaluation metric l ^o ikh . A new architecture a n (o k, n) is while mutations provide exploration [36]. We use two main
then constructed from the old architecture a n - 1 (o k, n - 1) by a mutations that we call the depth mutation and the op muta-
transformation or a mutation. The mutation probability p mt is tion, as in AmoebaNet [36], to modify the structure generated
defined as by the search space described above.
The operation mutation pays attention to the selection of
3
No weight parameters. operations in each block. Once the operation in a block is
Performance Evaluation
InfoNCE Loss
Teacher Student
(a)
Conv Conv
1×1 1×1
Conv Conv
1×1 1×1
FIGURE 2 (a) The main framework of the Teacher-Student model, which focuses on both the unsupervised neural architecture evolution (left)
and contrastive learning (right). (b) Compared with the original bottleneck (1) in ResNet, a new search block is designed for FaUNAE (2).” fmap
in memory” means that we select the operation to participate in the feature map calculation.
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
skip_connect
conv_3×3 sep_conv_3×3
N_{–1} N_4
max_pool_3×3 sep_conv_5×5 N_3
N_1
conv_3×3 skip_connect Output
skip_connect N_2
N_{0}
(a)
conv_3×3
N_{–1} avg_pool_3×3 skip_connect N_4
N_1
conv_3×3
avg_pool_3×3 Output
conv_5×5
N_2 conv_5×5
N_{0} skip_connect
N_3
(b)
FIGURE 4 Detailed structure of the best cells discovered on CIFAR-10. (a) normal cells found on CIFAR10. (b) reduction cells found on CIFAR10.
128×28×28
256×14×14
64×56×56
512×7×7
SAConv4 3×3
SAConv2 3×3
SAConv4 5×5
SAConv4 3×3
SAConv2 3×3
SAConv4 3×3
SAConv4 3×3
SAConv4 3×3
SAConv4 3×3
SAConv4 3×3
SAConv4 3×3
SAConv2 3×3
SAConv4 3×3
SAConv2 3×3
SAConv4 3×3
Conv 3×3
Conv 3×3
FIGURE 5 Detailed structures of the best structure discovered on ImageNet. “SAConv2” and “SAConv4” denote split-attention bottleneck convolu-
tion layer with radix of 2 and 4, respectively.
and AMDIM small/large with higher accuracy. Compared with 20]). Next, we perform experiments on the COCO [18] data
AMDIM large, which is by far the best method of manual set and the PASCAL VOC [19] data set, and compared FaUNAE
design that we know of, FaUNAE (ProxylessNAS) achieves with other methods.
not only a better performance (68.1 vs. 68.3), but the
model size was also reduced by about 20 times (626M vs. 1) Results on PASCAL VOC
30M). FaUNAE also performs better than the structure We use Faster R-CNN [22] as the detector and the evalu-
sampled randomly from the search space described in 3.1 ated network obtained on ImageNet as the backbone,
on Top-1 accuracy (68.3 vs. 66.2). When compared with with batch normalization tuned, implemented in [48]. All
other NAS methods which use the same search space, such layers are fine-tuned end-to-end. We fine-tune for 18k
as ProxylessNAS, FaUNAE obtains a better performance iterations (~18 epochs) on PASCAL VOC trainval07+12.
with higher accuracy (67.8 vs. 68.3) and is much faster We evaluate the default VOC metric of AP50 on the VOC
(23.1 vs. 15.3 GPU days). In addition, we calculated the test2007 set.
FLOPs of different models separately. The FLOPs of In Table 3, we can see the results on trainval07+12. FaU-
ResNet50, FaUNAE (ResNet50) and FaUNAE (Proxyless- NAE with ImageNet is better than ResNet50 with the same
NAS) are respectively: 4.11G, 3.87G and 4.08G. When the method (MoCo v2): up to + 0.9 AP50, + 2.3 AP, and + 2.6
FLOPs are approximately the same, FaUNAE (Proxyless- AP75. Also, our FaUNAE is better than the ImageNet super-
NAS) achieves better performance. vised counterparts, which shows the usefulness and effective-
We also set different initial structures a 0 including random ness of unsupervised learning.
structure, ResNet50, and the structure searched by Proxyless-
NAS on ImageNet100. As shown in Table 2, we find that the
better the initial structure, the better the performance, which TABLE 3 Results under object detection on PASCAL VOC
with Faster R-CNN.
shows the importance of the prior knowledge. For the struc-
ture (Fig. 5) obtained by FaUNAE on ImageNet, we find ARCHITECTURE METHOD AP50 AP AP75
that the structure on unsupervised learning prefers a small RESNET50 Super. 81.3 53.5 58.8
kernel size and a split-attention convolution [35], which also RESNET50 MOCO V1 [1] 81.5 55.9 62.6
shows the effectiveness of split-attention convolution and the
RESNET50 MOCO V2 [11] 82.4 57.0 63.6
rationality of FaUNAE. We searched the architecture on Ima-
FAUNAE MOCO V2 83.3 59.3 66.2
geNet for many times, which needs further research in our
future work.
D. Results on Object Detection TABLE 4 Results of object detection and instance segmentation on COCO with
and Segmentation Mask R-CNN. APbb means bounding-box AP and APmk means mask AP.
Learning transferable features is the main ARCHITECTURE METHOD APbb APbb APbb APmk mk
AP50 mk
AP75
50 75
goal of unsupervised learning. When used as
RESNET50 Super. 40.0 59.9 43.1 34.7 56.5 36.9
fine-tuning initialization for object detection
RESNET50 MOCO V1 [1] 40.7 60.5 44.1 35.4 57.3 37.6
and segmentation, ImageNet supervised pre-
training is the most influential (e.g., [21, 22, FAUNAE MOCO V2 43.1 63.0 47.2 37.7 60.2 40.6
©SHUTTERSTOCK.COM/YURCHANKA SIARHEI
N
I. Introduction tions in the neural architecture and efficiently identify unique
eural architecture search (NAS) refers to the use of cer- neural architectures.
tain search strategies to find the best performing neural Since different pretext tasks may lead to different feature rep-
architecture in a pre-defined search space with minimal resentations, two self-supervised learning methods are proposed
search costs [1]. The search strategies sample potentially from two different perspectives to improve the feature representa-
promising neural architectures from the search space and perfor- tion of neural architectures, and to investigate the effect of differ-
mance metrics of the sampled architectures obtained from time- ent pretext tasks on the predictive performance of neural
consuming training and validation procedures, which are used to predictors. The first method utilizes a handcrafted pretext task,
optimize the search strategies.To alleviate the time cost of training while the second one learns feature representation by contrasting
and validation procedures, some recently proposed NAS search positive pairs against negative pairs.
strategies employed neural predictors to accelerate the perfor- The pretext task of the first self-supervised learning method
mance estimation of the sampled architectures [2]–[6]. The capa- is to predict the normalized GED of two different neural archi-
bility of neural predictors to accurately predict the performance tectures in the search space. A graph neural network-based
of the sampled architectures is critical to downstream search strat- model with two independent identical branches is devised and
egies [2], [5]–[8]. Because of the significant time cost of obtaining the concatenation of the output features from both branches is
labeled training samples, acquiring accurate neural predictors used to predict the normalized GED. After the self-supervised
using a fewer number of training samples is one of the key issues pre-training, only one branch of the model is adopted to build
in NAS methods employing neural predictors. the neural predictor. This method is termed as self-supervised
Self-supervised representation learning, a type of unsupervised regression learning.
representation learning, has been successfully applied in areas such The second self-supervised learning method is inspired by the
as image classification [9], [10] and natural language processing prevalently contrastive learning for image classification [9], [10],
[11]. If a model is pre-trained by self-supervised representation [12], which maximizes the agreement between differently aug-
learning and then fine-tuned by supervised learning using a few mented views of the same image via a contrastive loss in latent
labeled training data, then it is highly likely to outperform its space [9]. Since there is no guarantee that a neural architecture
supervised counterparts [9], [10], [12]. In this paper, self-super- and its transformed form will have the same performance met-
vised representation learning is investigated and applied to the rics, it is not reasonable to directly apply contrastive learning to
NAS domain to enhance the performance of neural predictors NAS. This paper proposes a new contrastive learning algorithm,
built from graph neural networks [13] and employed in the termed self-supervised central contrastive learning, that uses the
downstream evolutionary search strategy. feature vector of a neural architecture and its nearby neural
Effective unsupervised representation learning falls into one of architectures’ feature vectors (with small GEDs) to build a cen-
two categories: generative or discriminative [9]. Existing unsuper- tral feature vector. Then, the contrastive loss is utilized to tightly
vised representation learning methods for NAS [8], [14] belong to aggregate the feature vectors of the architecture and its nearby
the generative category. Their learning objective is to compel the architectures onto the central feature vector and push the fea-
neural predictor to correctly reconstruct the input neural archi- ture vectors of other neural architectures away from the central
tecture, but it has limited relevance to NAS.This may result in the feature vector.
Input Input-to-Output
Node Path 2 1 2 3 4
(a) (c)
FIGURE 1 Overview of the position-aware path-based encoding. (a) A neural architecture in NASBench-101 search space. The green and red
lines indicate two input-to-output paths. (b) Operations and their corresponding unique indices. (c) Two different input-to-output paths and
their operation indices. (d) Position-aware path-based encoding of the two input-to-output paths in (c).
GIN
GIN
FC
Input Predictor this batch but not in S pos is denoted as S neg . The model fccl is
used to embed all of the neural architectures, and the feature vec-
pred
FC
FC
tor sets of S pos and S neg are denoted as E pos and E neg, respective-
ly. A central vector e c is calculated as the average of all of the
GMP
GIN
GIN
GIN
Input
Architecture of the feature vectors in E pos to the central vector e c and push
Embedding the feature vectors in E neg far away from e c . The detailed proce-
dure of central contrastive learning is summarized in Algorithm 1.
FIGURE 2 Structure of the regression model frl. An example of the central contrastive learning is illustrated in the
Supplementary Materials.
To reduce the interaction between the central vectors, a cen-
Architecture Embedding tral vector regularization term is added to the loss function that
forces each pair of the central vectors to be orthogonal. The cen-
Embedding
GMP
GIN
GIN
GIN
FC
Input Feature
M M
L reg = 1 / / 1 [ j ! i] e i< e j,(7)
FIGURE 3 Structure of the feature embedding model fccl. 2 i=0 j=0
28: end for 5: Fine-tune the neural predictor f with dataset D = " (s i, y i),
29: l t = / l t, idx i = 1, 2, f, n , .
idx
6: end
30: end for
M 7: Utilize the neural predictor f to guide the evolutionary neu-
31: L = 1 / l t + m L reg (E c) # Lreg: Eq. 7
M t=1 ral architecture search. # Detailed code can be found in
32: Update model fccl to minimize L. Algorithm 2 of NPENAS [6].
33: end for end for
8:
34: return model fccl Output: s ) = argmin ( y i), (s i, y i) ! D
9:
predictor, SS-RL and SS-CCL are compared under the search SEARCH TRAINING BATCH GPU
budgets of 20, 50, 100, 150, and 200. According to a search METHODS SPACE EPOCHS SIZE DAYS
0.5 0.5
Kendall Tau Correlation
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(a) (b)
Training Epoch: 150 Training Epoch: 200
0.6 0.6
0.5 0.5
Kendall Tau Correlation
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(c) (d)
Training Epoch: 250 Training Epoch: 300
0.6 0.6
0.5 0.5
Kendall Tau Correlation
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(e) (f)
SUPERVISED SS-RL SS-CCL
FIGURE 4 Predictive performance of neural predictors on NASBench-101 with different training epochs.
0.5 0.5
Kendall Tau Correlation
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(a) (b)
Training Epoch: 150 Training Epoch: 200
0.6 0.6
0.5 0.5
Kendall Tau Correlation
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(c) (d)
Training Epoch: 250 Training Epoch: 300
0.6 0.6
0.5 0.5
Kendall Tau Correlation
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(e) (f)
SUPERVISED SS-RL SS-CCL
FIGURE 5 Predictive performance of neural predictors on NASBench-201 with different training epochs.
0.5 0.5
Kendall Tau Correlation
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(a) (b)
Training Epoch: 150 Training Epoch: 200
0.6 0.6
0.5 0.5
Kendall Tau Correlation
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(c) (d)
Training Epoch: 250 Training Epoch: 300
0.6 0.6
0.5 0.5
Kendall Tau Correlation
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
20 50 100 150 200 20 50 100 150 200
Search Budget Search Budget
(e) (f)
FIGURE 6 Predictive performance of neural predictors pre-trained by different batch size N on NASBench-101.
0.4 NPENAS-NP
6.8 NPENAS-SSRL
NPENAS-SSCCL
0.3 6.6
6.4
0.2 MLP
NP_NAS 6.2
SS-CCL
0.1 SS-RL 6
SUPERVISED
SemiNAS 5.8
0.0
10 30 50 70 90 110 130 150
20 50 100 200 Number of Samples
Search Budget
FIGURE 7 Performance comparison of different neural predictors. FIGURE 8 Performance comparison of NAS algorithms on NASBench-101.
METHODS SEARCH BUDGET TEST ERR (%) AVG ARCHITECTURE EMBEDDING SEARCH METHOD
RS [50] 150 6.42 ± 0.20 – RANDOM SEARCH
REA [25] 150 6.32 ± 0.23 DISCRETE EVOLUTION
BANANAS-PE [3] 150 5.90 ± 0.15 SUPERVISED BAYESIAN OPTIMIZATION
BANANAS-AE [3] 150 5.85 ± 0.14 SUPERVISED BAYESIAN OPTIMIZATION
BANANAS-PAPE [3] 150 5.86 ± 0.14 SUPERVISED BAYESIAN OPTIMIZATION
NPENAS-NP [6] 150 5.83 ± 0.11 SUPERVISED EVOLUTION
†
NPENAS-NP-FIXED [6] 90 5.90 ± 0.16 SUPERVISED EVOLUTION
ARCH2VEC-RL [8] 400 5.90 UNSUPERVISED REINFORCE
ARCH2VEC-BO [8] 400 5.95 UNSUPERVISED BAYESIAN OPTIMIZATION
NPENAS-SSRL 150 5.86 ± 0.14 SELF-SUPERVISED EVOLUTION
NPENAS-SSRL-FIXED 90† 5.88 ± 0.15 SELF-SUPERVISED EVOLUTION
NPENAS-SSCCL 150 5.83 ± 0.11 SELF-SUPERVISED EVOLUTION
NPENAS-SSCCL-FIXED 90† 5.85 ± 0.13 SELF-SUPERVISED EVOLUTION
†
The neural predictor is trained with 90 evaluated neural architectures, while other algorithms use 150 neural architectures for evaluation.
TABLE IV Impact of fixed search budget on NASBench-101. is greater than 70, the performance of the extra sampled neural
architectures is predicted by the neural predictor. Upon comple-
METHODS SEARCH BUDGET† TEST ERR (%) AVG
tion of the search, the best performing neural architecture is
NPENAS-SSRL 20 6.04 ± 0.25 selected for evaluation on the CIFAR-10 dataset, and the archi-
NPENAS-SSRL 50 5.94 ± 0.18 tecture is evaluated five times with different seeds. All the other
NPENAS-SSRL 80 5.87 ± 0.14 evaluation settings are the same as DARTS [32]. These experi-
NPENAS-SSRL 110 5.86 ± 0.11 ments are executed on two Nvidia RTX 2080Ti GPUs.
NPENAS-SSRL 150 5.86 ± 0.14
2) NAS Results on NASBench-101
NPENAS-SSCCL 20 5.99 ± 0.21
The performance of different NAS algorithms on the NAS-
NPENAS-SSCCL 50 5.87 ± 0.15
Bench-101 benchmark is illustrated in Figure 8, and the quantita-
NPENAS-SSCCL 80 5.83 ± 0.12
tive comparison is also provided in Table III. Except for RS and
NPENAS-SSCCL 110 5.83 ± 0.12 REA, the other algorithms achieve comparable p erformance
NPENAS-SSCCL 150 5.83 ± 0.12 on NASBench-101 (Figure 8), while NPENAS-SSCCL and
†
The neural predictor is trained with the given number of search budgets. NPENAS-NP have the best performance overall, as shown in
10.2 REA
BANANAS-PE
encoding is an efficient and effective encoding scheme. In addi-
10 BANANAS-PAPE tion, the performance of NPENAS-NP improves from 5.86% [6]
NPENAS-NP to 5.83% after filtering out isomorphic graphs using the proposed
9.8 NPENAS-SSRL
NPENAS-SSCCL position-aware path-based encoding. Table III also shows that
9.6 NPENAS methods using the proposed self-supervised pre-
9.4 trained neural predictors (NPENAS-SSRL and NPENAS-SSC-
CL) have only a slight performance drop when the search budget
9.2 is reduced from 150 to 90, but still outperform the unsupervised
9 arch2vec that uses a large search budget of 400.
The impact of the search budget on NPENAS-SSRL and
8.8
10 20 30 40 50 60 70 80 90 100
NPENAS-SSCCL methods are shown in Table IV. The perfor-
Number of Samples mance of NPENAS-SSRL continues to improve as the search
budget increases, while NPENAS-SSCCL achieves its best per-
FIGURE 9 Performance comparison of NAS algorithms on NASBench-201 formance using only 80 evaluated neural architectures. These
for CIFAR10 classification. results are consistent with those in Section IV-C and again show
31.5 56.9
RS RS
31
Testing Error of Best Neural Net
Testing Error of Best Neural Net
REA REA
56.4
30.5 BANANAS-PE BANANAS-PE
BANANAS-PAPE BANANAS-PAPE
30 NPENAS-NP 55.9 NPENAS-NP
29.5 NPENAS-SSRL NPENAS-SSRL
NPENAS-SSCCL 55.4 NPENAS-SSCCL
29
28.5 54.9
28
54.4
27.5
27 53.9
26.5 53.4
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Samples Number of Samples
FIGURE 10 Performance comparison of NAS algorithms on NASBench-201 FIGURE 11 Performance comparison of NAS algorithms on NASBench-201
for CIFAR-100 classification. for ImageNet-16-120 classification.
METHODS SEARCH BUDGET TEST ERR (%) AVG CIFAR-10 TEST ERR (%) AVG CIFAR-100 TEST ERR (%) AVG IMAGENET-16-120
RS [50] 100 9.27 ± 0.32 28.55 ± 0.86 54.65 ± 0.80
REA [25] 100 8.98 ± 0.24 27.20 ± 0.89 53.89 ± 0.71
BANANAS-PE [3] 100 9.06 ± 0.31 27.41 ± 1.03 53.85 ± 0.64
BANANAS-AE [3] 100 8.95 ± 0.14 26.77 ± 0.67 53.67 ± 0.34
BANANAS-PAPE [3] 100 8.93 ± 0.16 26.86 ± 0.67 53.71 ± 0.46
NPENAS-NP [6] 100 8.95 ± 0.13 26.70 ± 0.64 53.90 ± 0.59
NPENAS-NP-FIXED [6] 50† 8.95 ± 0.15 26.72 ± 0.57 53.84 ± 0.57
NPENAS-SSRL 100 8.94 ± 0.11 26.55 ± 0.34 53.75 ± 0.51
NPENAS-SSRL-FIXED 50† 8.92 ± 0.11 26.57 ± 0.30 53.63 ± 0.38
NPENAS-SSCCL 100 8.94 ± 0.09 26.50 ± 0.12 53.80 ± 0.36
NPENAS-SSCCL-FIXED 50† 8.93 ± 0.10 26.58 ± 0.33 53.66 ± 0.36
†
The neural predictor is trained with 50 evaluated neural architectures, while other algorithms use 100 neural architectures for evaluation.
MODEL PARAMS (M) ERR(%) AVG ERR(%) BEST NO. OF SAMPLES EVALUATED GPU DAYS
NASNET-A [23] 3.3 – 2.65 20000 1800
NAONET [51] 10.6 – 3.18 1000 200
ASHA [50] 2.2 – 2.85 700 9
DARTS [20] 3.3 – 3.00 ± 0.14 – 1.5
BANANAS [3] 3.6 2.64 ± 0.05 2.57 100 11.8
GATES [5] 4.1 – 2.58 800 –
ARCH2VEC-RL [8] 3.3 2.65 ± 0.05 2.60 100 9.5
ARCH2VEC-BO [8] 3.6 2.56 ± 0.05 2.48 100 10.5
NPENAS-NP [6] 3.5 2.54 ± 0.10 2.44 100 1.8
NPENAS-SSCCL-FIXED 3.9 2.49 ± 0.06 2.41 70 1.6
that the neural predictor SS-CCL outperforms SS-RL on the NASBench-101, the performance of BANANAS using the posi-
large search space of NASBench-101. tion-aware path-based encoding exceeds that of using path-based
encoding. Furthermore, it can be seen from Figures 9-11 that the
3) NAS Results on NASBench-201 performance of our methods improves faster as the search budget
The above algorithms are compared on CIFAR-10, CIFAR-100, increases. Due to space limitation, more comparisons are attached
and ImageNet-16-120 on NASBench-201, and the results are in the Supplementary Materials.
shown in Figure 9, Figure 10 and Figure 11, respectively. The
quantitative comparison is presented in Table V. As arc2vec [8] does 4) NAS Results on DARTS
not report queries on this benchmark, it is not compared here. As shown in Table VI, NPENAS-SSCCL-FIXED achieves the
Table V shows that the methods proposed in this paper obtain best performance compared with the recently proposed NAS
the best performance on all three datasets. Specifically, NPENAS- algorithms, and its search speed is nearly the same as the gradient-
SSCCL achieves the best performance on both CIFAR-100 and based method DARTS [20].The searched normal cell and reduc-
ImageNet-16-120, almost reaching the ORACLE baseline on tion cell are illustrated in Figure 12.
CIFAR-100 (26.5% vs. 26.49%). In particular, the performance
of NPENAS-SSCCL is the same as the ORACLE baseline on V. Conclusion
ImageNet-16-120. On CIFAR-10, NPENAS-SSRL-FIXED This paper presents a new neural architecture encoding scheme,
achieves the best performance comparable to the ORACLE position-aware path-based encoding, to calculate the GED of
baseline (8.92% vs 8.91%). In addition, like the results on neural architectures. To enhance the performance of neural
predictors, two self-supervised learn-
ing methods are proposed to pre-train
the neural predictors’ architecture
sep_conv_3×3 embedding modules to generate
skip_connect meaningful representation of neural
c_{k-2} sep_conv_3×3
2 architectures. Extensive experiments
0
sep_conv_5×5 demonstrate the superiority of the
c_{k}
sep_conv_3×3 self-supervised pre-training. The
1 dil_conv_5×5
3 results advocate the adoption of the
sep_conv_3×3
c_{k-1} sep_conv_3×3 self-supervised central contrastive rep-
(a) resentation learning method, while
self-supervised regression learning
none can be considered when the search
0
c_{k-1} avg_pool_3×3 sep_conv_5×5 space is small. When integrating the
sep_conv_5×5
max_pool_3×3
3 pre-trained neural predictors with
c_{k}
NPENAS, it achieves state-of-the-art
c_{k-2} 1 max_pool_3×3 performance on the NASBench-101,
none 2
max_pool_3×3 NASBench-201 and DARTS search
space. Since neural predictors can be
(b) combined with different search strate-
gies, the proposed self-supervised
FIGURE 12 (a) The normal cell and (b) reduction cell searched by NPENAS-SSCCL-FIXED on DARTS. representation learning methods are
W
I. Introduction abovementioned methods. Considering the nonlinear nature
ith the rapid development of society, it has of wind speed series, artificial intelligence algorithms that are
become challenging for the remaining fossil designed for effectively solving nonlinear problems, such as
fuel supply to meet societal requirements world- artificial neural networks (ANNs) [16] and support vector
wide; thus, the demand for sustainable renewable machines (SVMs) [18], are suitable for wind speed forecast-
energy is growing. In recent years, wind energy, as a sustain- ing. A previous comparison of prediction performance has
able renewable energy source, has attracted particular atten- demonstrated that artificial intelligence algorithms are faster
tion. Wind is an abundant, pollution-free energy source that and more accurate than statistical models [19]. SVMs are
is widely distributed, has few geographical restrictions, and commonly used in prediction frameworks, and they can out-
can be easily used anywhere. Building wind farms is one of perform ANNs in certain cases. However, the performance
the main challenges in the use of wind energy. Nevertheless, of an SVM is limited by its penalty settings and kernel
the cost of building wind farms is low, no additional energy parameters; consequently, algorithms for tuning these hyper-
source is required, and such farms do not impact the sur- parameters are necessary [20]. For example, a genetic algo-
rounding environment. rithm was employed to enhance the prediction results of an
Since the energy supplied by wind farms mainly depends SVM in [21]; a reduced SVM with feature selection, trained
on the wind power, precise wind power forecasting is imper- using the particle swarm optimization algorithm, was used to
ative to enable reliable power system planning and wind optimize the parameters of an SVM in [22]; and the perfor-
farm operation [1, 2]. Wind power is closely related to wind mance of an SVM was enhanced using the cuckoo search
speed, and these two parameters share several characteristics, algorithm in [23]. In addition to SVMs, an increasing num-
such as randomness, uncontrollability and intermittency [3]. ber of ANNs and their variants have been proposed for wind
Because of these characteristics, wind farm management is speed prediction. For instance, a backpropagation (BP) neu-
extremely challenging. Calculating the energy production ral network was employed to forecast a wind speed series in
of a wind farm is essential for assessing the economic feasi- [24], a combination of an ANN and Markov chains was pro-
bility of such a project prior to construction planning [4]. posed for forecasting in [25], and a functional network was
Therefore, accurate wind speed prediction is being strongly utilized for multistep wind speed prediction in [26]. Further-
prioritized in this context [5]. More accurate forecasting more, a fine-tuned long short-term memory (LSTM) neural
capabilities correspond to larger reductions in the construc- network hybridized with the crow search algorithm, the
tion costs of wind farms [6]. wavelet transform and feature selection was applied for short-
To date, various models have been used for wind speed term wind speed forecasting in [27]. In [28], a hybrid model
prediction. These methods can be classified into three cate- involving a causal convolutional network and a gated recur-
gories: physical models, statistical models, and artificial rent unit architecture was used in wind speed prediction.
intelligence models [7]. Examples of physical models In general, physical models and traditional statistical mod-
include the Mesoscale Model Version 5 [8] and the Weather els both have several limitations pertaining to the precision
Research and Forecasting Model [9]. These numerical pre- and robustness of wind speed time series prediction, whereas
diction models can achieve satisfactory performance in artificial intelligence models can effectively overcome these
long-term wind speed prediction; however, such models problems to offer more powerful prediction performance.
require complex atmospheric information pertaining to Based on these considerations, the dendritic neuron model
pressure, temperature and other environmental factors and (DNM), which was recently developed based on inspiration
exhibit high computational complexity [10], [11]. from biological neurons in vivo [29], is adopted in this study
Compared to physical models, statistical models are more for the prediction of wind speed time series. In the DNM,
widely used to forecast wind speed. Examples of such models synaptic nonlinearity is implemented in a dendritic structure
include direct random time series models [12], autoregressive to effectively solve linearly inseparable problems, and this
models [13] and autoregressive integrated moving average model has been applied to a variety of complex continuous
(ARIMA) models [14]. ARIMA models are regarded as a functions [30]–[32]. The original DNM was specifically
typical class of statistical models, and their prediction perfor- designed for classification problems. By discarding the unnec-
mance in short-term wind speed forecasting has been veri- essary synapses and dendritic branches in the DNM, diverse
fied [15]. However, in general, the prediction performance of dendritic structures can be produced to pursue an extremely
statistical models is f lawed [16] because most statistical mod- high classification speed for each task. Notably, however, the
els are based on the assumption that the wind speed series structure of the original DNM is extremely simple, and it
Direct Condition
Inverse Condition
Branch-3
y
y
y
y
0.4 0.4
Synapse Synapse Synapse Dendrite 0.2 0.2
0 0
–2 –1.5 –1 –0.5 0 0.5 1 1.5 2 –2 –1.5 –1 –0.5 0 0.5 1 1.5 2
(c)
Constant 0 Connection
1 1
0.8 0.8
X1 0.6 0.6
y
X2 Xn
y
0.4 0.4
Soma 0.2 0.2
0 0
–2 –1.5 –1 –0.5 0 0.5 1 1.5 2 –2 –1.5 –1 –0.5 0 0.5 1 1.5 2
Neuron Architecture x x
(d)
Condition Functions
neurons. These elements are distributed throughout the den- D. Cell Body (Soma)
dritic tree and possess various receptors for specific ions. The soma fires depending on whether the membrane poten-
Depending on the potential of the ions entering a receptor, tial exceeds a given threshold. This process can be mathemat-
the synapse changes its connection state and enters either an ically described as a sigmoid operation on the product terms,
excitatory or inhibitory state [45]. The process of signal as follows:
transmission can be described using the following equation:
O= 1 , (4)
1 1 + e -k (V - qs)
y im = , (1)
1 + e -k (w im xi - qim)
where V represents the output of the membrane layer and k
where x i represents the i-th input feature, whose range is and qs are positive constant hyperparameters.
[0,1], with i ! 61, 2, f, I @; y im is the output of the i-th syn-
apse on the m-th dendritic branch, with m ! 61, 2, f, M @; k E. Connection Cases
is a hyperparameter that is a positive constant; and w im and On the right side of Fig. 1, the six functions of the synaptic
q im are connection parameters that represent a weight and a layer are illustrated for various combinations of w im and q im.
threshold, respectively. To obtain the appropriate values for According to these six different functions, the connection
each problem, these connection parameters in a DNR model states of the synaptic layer can be divided into four main cate-
can be trained using a learning algorithm. gories, defined as follows: constant 1 connections, whose
parameters satisfy w im 1 0 1 q im or 0 1 w im 1 q im , implying
B. Dendrite Layer that regardless of the input, the output is always excitatory;
Each branch in the dendrite layer receives the output signals constant 0 connections, whose parameters satisfy q im 1 w im 1 0
from all synapses on that branch. The nonlinear relationship or q im 1 0 1 w im , implying that regardless of the input, the
among these signals plays a key role in neural information output remains inhibitory; excitatory connections, whose
processing for several sensory systems in biological networks, parameters satisfy 0 1 q im 1 w im , implying that the input and
such as the visual and auditory systems [46], [47]; in DNR, output are directly correlated; and inhibitory connections,
this relationship can be expressed in terms of multiplication whose parameters satisfy w im 1 q im 1 0, implying that the
operations. Let Z m represent the output of the m-th dendritic input and output are inversely correlated.
branch. The equation for a dendritic branch can be expressed
as follows: F. Learning Algorithm
Because of the multiplication operations applied in the den-
I drite layer, the parameter space of the model appears to be
Z m = % y im .(2)
i=1 extremely large and complicated. Additionally, weights are
added to the output of each dendritic branch in the DNR
C. Membrane Layer model, leading to an increase in the dimensionality of the
The membrane layer combines all outputs from the dendrite parameter space, which further increases the difficulty of
layer through a summation operation. Let V represent the out- optimizing the parameters. Consequently, it is difficult to
put of the membrane layer. The corresponding equation can perfectly train the parameters when using the traditional BP
be expressed as follows: algorithm. Therefore, in this study, the SMS algorithm is
adopted as a more suitable global optimization algorithm to
M
optimize the DNR model. The SMS algorithm is an evolu-
V= / ^u m ) Z mh,(3) tionary algorithm that mimics the variation in the states of
m=1
matter. Compared with the traditional BP algorithm, the
where u m represents the strength of the m-th dendritic SMS algorithm exhibits a higher search ability, is less likely to
branch. This value is constant and is always set to 1 for each fall into local optima, and seldom results in overfitting. In this
branch in the original DNM to simplify the neural architec- subsection, this algorithm is described in detail.
ture to accelerate the computation process [29]. However, in The process of searching for the best solution in the SMS
reality, the thicknesses and signal transformation strengths of algorithm can be expressed as a series of physical motions
the dendritic branches vary; thus, using a uniform u m value among molecules, which mimic the state transformations of
for all branches may degrade the regression ability of the matter [38]. Specifically, the SMS algorithm can be divided
the evolutionary process. Collision behavior occurs when LIQUID 40% 0.4 0.2 [0.0,0.6] 0.2
two individuals are sufficiently close to each other. Let r SOLID 10% 0.1 0 [0.0,0.1] 0.0
z i ( i + m x ) - z j ( j + mx )
For the time series " x (i ) ; i = 1, 2, f, N ,, P (x l) and
Rx = .(20)
R i ( m)
P (x l + x) are the probabilities that x l and x l + x, respectively,
appear in " x (i ) ; i = 1, 2, f, N , . Based on these definitions, R x is compared against a given threshold i, whose range is
the MI entropy I (x l, x l + x) for a time delay x can be specified set to [10, 50]. If R x is larger than this threshold, the points are
as follows: determined to be false nearest neighbor points. For a real-
world chaotic time series, such as a wind speed series, the ini-
I (x) = I (x l, x l + x) = H (x l) + H (x l + x) - H (x l, x l + x).(17) tial number of embedding dimensions is usually set to 2. Once
the proportion of false nearest neighbor points is less than 5%,
According to the MI algorithm, the value of x when the corresponding number of embedding dimensions is select-
I (x) reaches a local minimum for the first time is taken as ed as the final solution. In certain extreme cases, the propor-
the final solution. tion of false nearest neighbor points cannot decrease to 5%; in
such a case, the critical dimensionality at which the number of
C. False Nearest Neighbors Algorithm false nearest neighbor points stops decreasing is considered the
The FNN algorithm, which was proposed by Kennel in final solution.
1992 [48], is used to calculate the embedding dimensionali-
ty for a chaotic time series. This algorithm is based on the D. Maximum Lyapunov Exponent
premise that a chaotic time series, such as a series of wind Before a machine learning technique is employed to forecast
speed data, can be regarded as a set of continuously varying the future wind speed, the chaotic characteristics of the histor-
particles in a high-dimensional space mapped to a one- ical wind speed series must be confirmed. Lyapunov expo-
dimensional space. If the number of embedding dimensions nents are commonly used for this purpose. In 1983, it was
is too small, the particles will be compressed and folded proven that if at least one of the Lyapunov exponents in a
onto one another due to the insufficient extent of their spa- dynamical system is positive, the corresponding time series can
tial orbits, meaning that two adjacent points in the one- be considered chaotic [50]. Therefore, the maximum Lyapu-
dimensional space may cor respond to t wo particles nov exponent of the time series needs to be calculated. If this
separated by a large distance in the high-dimensional space. exponent exceeds zero, the time series is considered to have
Two such adjacent particles are def ined as false nearest chaotic characteristics. Among the various approaches for cal-
neighbor points. In this case, the embedding dimensionali- culating Lyapunov exponents, Wolf ’s method [51], which is
ty should be gradually increased to fully expand the spatial based on the phase space reconstruction theory of Takens, is
orbits. With an increase in the number of embedding considered to be the most effective.
dimensions, the particles are expected to gradually separate, For a series X (t ) = (x (t ), x (x + t ), f, x ((m - 1)x + t ) re
and the number of false nearest neighbor points is expected constructed using Takens’ theory, as described above, the
to gradually decrease. Once the embedding dimensionality distance L i between X (t i) and the closest point X (t n) can be
is set to a sufficient value such that all false nearest neighbor expressed as follows:
points are eliminated, the corresponding solution is consid-
ered to be the optimal solution. L i = X (t n) - X (t i) .(21)
Suppose that z i (m) = (y (i ), y (x + i ), f, y ((m - 1) x + i ))
is a vector in an m-dimensional phase space and that According to Wolf ’s method, the maximum Lyapunov
z j(m) is the nearest neighbor point of z i; the distance exponent can be calculated as
between z i(m) and z j (m) can be calculated using the fol- M
lowing equation: m max = 1 / ln L i , (22)
t M - t 0 i = 0 L li
R i (m) = z i (m) - z j (m) .(18) where t 0 and t M represent the initial time and the f inal
time, respectively, and M = N - (m - 1) x. In addition,
This distance changes as the number of dimensions the interval for time series prediction can be calculated
increases. Thus, the updated distance between z i (m + 1) and as follows [52], [53]:
z j (m + 1) can be expressed as follows:
Tt = 1 .(23)
R i2 (m + 1) = R i2 (m) + z i (i + mx) - z j ( j + mx ) .(19) m max
Start
No Chaotic
No
Estimate Whether
It Is Chaotic
Wind Speed Time Series
Yes
Calculate Maximum
Lyapunov Exponent
End
8
curve of a prediction algorithm and the original curve. 4
0
If R is equal to 1, the predicted curve fits the original 0 200 400 600 800 1,000 1,200
curve perfectly.
(2) Nonparametric Statistical Test: This test detects wheth- FIGURE 3 Wind speed curves for long-term and short-term predictions.
er a significant difference exists between the proposed meth-
od and another algorithm. In this study, the Wilcoxon rank
sum test [56], [57] was conducted using the KEEL software TABLE II Results for the time delays, embedding
[58]. The significance level was set to 5%. If the p-value of the dimensionalities and maximum Lyapunov exponents of
wind speed series at different time scales.
Wilcoxon rank sum test was less than 5%, a significant differ-
ence in performance was considered to exist between the two MAXIMUM
TIME EMBEDDING LYAPUNOV
algorithms being compared. TIME DELAY DIMENSIONALITY EXPONENT
(3) Relative Graphs: Graphs related to an experiment INTERVAL (x) (m) (mmax) CHAOTIC
enable more intuitive observation of the experimental ONE DAY 2 4 0.1181 YES
results. The following graphs were generated in this study. TEN MIN 5 7 0.0330 YES
First, fit graphs and correlation coefficient graphs were plot-
ted to visualize how well the predicted results matched the
target values. Second, a convergence graph was produced to TABLE III Parameter settings of algorithms for predicting
illustrate the speed and stability of the proposed model dur- wind speed time series.
ing the convergence process. ALGORITHM PARAMETERS
MLP HIDDENLAYER = 10, LEARNINGRATE = 0.01,
C. Performance Comparison and Analysis EPOCH = 1000
Following the approaches described above, the time delays, ENN LEARNINGRATE = 0.01, EPOCH = 1000
embedding dimensionalities and maximum Lyapunov SVR LINEAR, POLYNOMIAL, RBF AND SIGMOID KERNELS
exponents of the wind speed series at different time scales were
LSTM HIDDENUNITS = 200, EPOCH = 1000
calculated, and the results are presented in Table II. The maxi-
DNM-BP LEARNINGRATE = 0.01, EPOCH = 1000
mum Lyapunov exponents presented in Table II imply that the
two wind speed time series are chaotic. Next, phase space EDNM POPSIZE = 100, EPOCH = 1000
reconstruction was conducted based on the time delay and DNR-BP LEARNINGRATE = 0.01, EPOCH = 1000
embedding dimensionality. DNR-SMS POPSIZE = 100, EPOCH = 1000
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
0.2 Ten-Min 1:1 MSE and RMSE were calculated using the normalized data,
Ten-Min 4:1 whereas MAE and R were calculated using the raw data.
0.1 Ten-Min 9:1
Tables IV and V show that for most algorithms, the predic-
0 tion results for both the long-term and short-term wind
0 100 200 300 400 500
Learning Epoch
speed time series with a partition ratio of 4:1 were superior to
those for a ratio of 1:1; moreover, the prediction results for a
FIGURE 4 Convergence curves of the DNR model for the experimental partition ratio of 9:1 were inferior to those for 4:1. These
wind speed time series. f indings suggest that merely increasing the number of
20
Data 16
Actual Value Training Fit
15 Prediction 14
Predictive Value Y=T 12
10 10
8
5 6
4
0 2
0 200 400 600 800 1,000 1,200 Target
(a)
Output ~= 0.29 ∗ Target + 4.8
20 16
Actual Value Training Prediction Data 14
15 Predictive Value Fit
Y=T 12
10 10
8
5 6
4
0 2
0 200 400 600 800 1,000 1,200 Target
(b)
20 16
Actual Value Data
Training Prediction Fit 14
15 Predictive Value
Y=T 12
10 10
8
5 6
4
0 2
0 200 400 600 800 1,000 1,200 Target
(c)
FIGURE 5 Training and prediction results and predicted correlation coefficients of DNR-SMS for the wind speed time series with an interval of
one day and partition ratios of (a) 1:1, (b) 4:1, and (c) 9:1.
10 7
Data
Actual Value Training Prediction Fit 6
8 Predictive Value Y=T 5
6
4
4 3
2 2
1
0
0 200 400 600 800 1,000 1,200 Target
(a)
Output ~= 0.85 ∗ Target + 0.58
10
Actual Value Training Data
Data 6
8 Prediction Fit
Fit
Predictive Value 5
Y == TT
Y
6 4
4 3
2 2
0 1
0 200 400 600 800 1,000 1,200 Target
(b)
Output ~= 0.84 ∗ Target + 0.66
10
Actual Value Data
Data
8 Training Prediction Fit
Fit 6
Predictive Value
Y
Y == TT
6 5
4 4
2 3
0 2
0 200 400 600 800 1,000 1,200 Target
(c)
FIGURE 6 Training and prediction results and predicted correlation coefficients of DNR-SMS for the wind speed time series with an interval of
ten minutes and partition ratios of (a) 1:1, (b) 4:1, and (c) 9:1.
TABLE VI Long-term prediction performance of DNR models trained using different evolution algorithms
for the experimental wind speed time series.
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
DNR-GA 1.91E-02±1.55E-03 1.01E-06 1.38E-01±5.50E-03 1.01E-06 1.84E+00±1.16E-01 4.89E-06 5.02E-01±5.98E-02
DNR-PSO 1.73E-02±6.64E-04 1.01E-02 1.32E-01±2.50E-03 1.06E-02 1.72E+00±1.73E-02 3.48E-01 5.46E-01±1.68E-02
DNR-JADE 1.74E-02±6.89E-04 9.13E-07 1.32E-01±2.59E-03 9.12E-07 1.73E+00±4.62E-02 9.12E-07 5.46E-01±2.10E-02
DNR-L-SHADE 1.81E-02±2.15E-03 1.46E-02 1.34E-01±7.48E-03 1.54E-02 1.76E+00±1.08E-01 8.41E-02 5.27E-01±6.00E-02
DNR-SMS 1.69E-02±2.04E-04 - 1.30E-01±7.85E-04 - 1.72E+00±1.07E-02 - 5.53E-01±5.73E-03
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
DNR-GA 1.91E-02±1.43E-03 9.13E-07 1.38E-01±5.03E-03 9.13E-07 1.79E+00±9.18E-02 1.01E-06 4.66E-01±5.43E-02
DNR-PSO 1.74E-02±4.84E-04 1.50E-02 1.32E-01±1.82E-03 1.54E-02 1.69E+00±1.97E-02 6.67E-04 5.06E-01±1.40E-02
DNR-JADE 1.78E-02±1.01E-03 2.65E-05 1.34E-01±3.66E-03 2.54E-05 1.72E+00±3.98E-02 7.13E-06 5.00E-01±3.17E-02
DNR-L-SHADE 1.85E-02±1.66E-03 1.03E-05 1.36E-01±5.93E-03 1.13E-05 1.73E+00±6.58E-02 7.13E-06 4.81E-01±4.08E-02
DNR-SMS 1.71E-02±1.84E-04 - 1.31E-01±7.05E-04 - 1.67E+00±9.46E-03 - 5.14E-01±6.22E-03
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
DNR-GA 2.46E-02±2.20E-03 2.25E-06 1.57E-01±6.81E-03 2.25E-06 1.97E+00±1.05E-01 3.66E-06 4.34E-01±6.40E-02
DNR-PSO 2.35E-02±2.80E-03 2.27E-03 1.53E-01±8.46E-03 2.27E-03 1.91E+00±6.54E-02 5.51E-05 4.66E-01±5.01E-02
DNR-JADE 2.30E-02±1.19E-03 8.25E-04 1.52E-01±3.91E-03 9.49E-04 1.90E+00±4.89E-02 1.86E-03 4.82E-01±2.57E-02
DNR-L-SHADE 2.35E-02±1.49E-03 2.13E-05 1.53E-01±4.80E-03 2.13E-05 1.90E+00±4.06E-02 6.52E-05 4.67E-01±3.16E-02
DNR-SMS 2.21E-02±3.55E-04 - 1.49E-01±1.20E-03 - 1.86E+00±1.35E-02 - 4.94E-01±9.41E-03
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
MSE (MEAN ± STD RMSE (MEAN ± STD MAE (MEAN ± STD R (MEAN ± STD
ALGORITHM (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD)) P-VALUE (OR SD))
A Self-Adaptive Mutation
©SHUTTERSTOCK.COM/BETTERVECTOR
Neural Architecture Search
Algorithm Based on Blocks
Conv3
Conv2
Conv3
Output
Input
A. Framework of SaMuNet
The framework of SaMuNet is shown in Algorithm 1, first, a
population with size N is initialized. Then, each individual in the
initialized population is evaluated. After that, the architecture of
FIGURE 3 An example of the DenseNet block with three convolutional
the CNN is evolved by the crossover and self-adaptive mutation
layers. operators. A new population will be generated by the environ-
ment selection operator. This
process continues until the
TABLE I Notations and Corresponding Descriptions. maximum generation number
is reached. Finally, the best
NOTATIONS DESCRIPTIONS NOTATIONS DESCRIPTIONS
generated individual is decod-
N Population size P Current population
ed into a CNN with the best
nlter Maximal iteration Q Offspring population architecture. In the following
m Crossover probability nsflag Success matrix in a generation parts, encoding strategy, popu-
n Mutation probability nfflag Fail matrix in a generation lation initialization, individual
Dtrain Train datasets S Success matrix in defined generation evaluation, crossover opera-
tors, self-adaptive mutation
Dvalid Validation datasets F Fail matrix in defined generation
mechanism, and the proposed
p1, p2 Two parent individuals COGS Candidate offspring generate strategy
environment selection opera-
q1, q2 Two offspring individuals NS Number of strategies
tor are introduced.
F. Self-Adaptive Mechanism Strategy Based on Blocks where S 3q represents the new probability of the qth COGS
The mutation operator usually explores the search space to after the update. In addition, this probability needs to be nor-
obtain individuals with promising performance. The pro- malized by Equation (4). The above equations are used to
posed algorithm uses three types of mutation methods, i.e., update the probability of each COGS according to the perfor-
adding, removing, and replacing. Adding refers to adding a mance of the offspring. Obviously, the better the performance
unit at the selected location of the individual, removing of a COGS, the higher the probability that it will be selected
refers to removing the current unit at the designated location during the entire evolutionary process.
of the individual, and replacing refers to randomly replacing In Algorithm 5, the implementation of the self-adaptive
a new type of unit with a new one at the selected location of mutation strategy is shown in details. First, three opera-
the individual. tions are added to the strategy pool (line 1). Then, a ran-
The self-adaptive mechanism has been widely used in the dom number is generated to determine whether to perform
evolutionary computation community [24], [34]. In this mutation operations. If a mutation occurs, a block opera-
paper, a self-adaptive mutation strategy pool with three candi- tion is selected to generate new offspring. The information
date offspring generation strategies (COGSs) is proposed. This of generating offspring is used to update the probability
mechanism enables the algorithm to adaptively find mutation of the COGS. Finally, the mutated offspring population
operators that are more suitable for the current task. Thus it is output.
can determine a more suitable generation strategy to match
the evolution process at different stages. G. Environmental Selection Based on Semi-Complete
The self-adaptive mutation operator mechanism is described Binary Competition
as follows: First, each COGS has an initial probability 1/N s, In the environmental selection stage, suitable individuals need to
where Ns represents the number of COGS in the strategy be selected from the population to serve as the next generation
pool. Let Pq represent the probability of the qth strategy being population. The simplest way to accomplish this is to select the
selected (q = 1, 2,…, Ns). Next, the roulette strategy is used to individual with the highest fitness value from the population
select a COGS. Through this strategy, a new individual is gen- each time, but this will cause the population to lose diversity
erated, and it is compared with its parent. The result is record- and fall into a local optimum [48], [49].
ed in the initial zero matrix nsflag i, s and nfflag i, s (i= 1, 2,…, N, For NAS applications, researchers have proposed many
s = 1, 2,…, Ns) where N represents the number of individuals selection strategies. Real et al. [50] used aging evolution to
in the population. After a generation, the sums of each col-
umn of nsflag i, s and nfflag i, s are calculated, and they are
recorded in two new matrices S k, q and Fk, q (k= 1,…, N g,
Algorithm 4 Crossover operator of SaMuNet.
q = 1,…, Ns, where Ng represents the updated period). S
records the number of successes of the qth strategy in the kth Input: Two parents p1, p2, the crossover probability m.
generation, and F records the number of failures of qth strate- Output: Two offspring q1, q2.
gy in the kth generation. After a generation, both nsflag N, Ns 1: Randomly generate a number v in range of [0,1];
and nsflag N, Ns are reset. 2: if v < m then
After Ng generations, the total number of successes and fail- 3: Select two parents p1, p2 from Pt by binary tourna-
ures of all COGSs is calculated, and the probability of each ment selection;
COGS is updated as follows: 4: Randomly generate two integer numbers to separate p1,
p2 into two parts, respectively;
Ng
S 1q = / S k,q (1) 5: Combine the first part of p1 and the second part of p2;
k=1 6: Combine the first part of p2 and the second part of p1;
S 2q = )
m, if S 1q = 0 7: else
1 (2) 8: q1 ! p1;
S , otherwise
q
9: q2 ! p2;
where q represents the strategy in the qth strategy pool, and m is 10: end if
set to 0.0001 to prevent the probability of some strategies from 11: Output two offspring q1, q2;
...
...
...
specific function and type constraint. ...
13 View2 Features Tp×16 Fp×13 TFp×29
The input layer takes the original fea- Sp
...
...
...
tures as inputs, which are the terminals
...
...
...
...
...
...
...
...
diagnosis. The tree depths of the input Sm ... ... ... Sn ... ... ...
and output layers are one, and those of
the feature construction and feature
Evolutionary Learning
combination layers are automatically
adjusted according to the given task. GP1 GP2 GP3
Figure 4 shows the program structure
and an example program of MFCGPE,
respectively. This example program rep- Best Model
resents a feature vector containing three
constructed features. The detailed
description of the operators and the ter-
minals will be described in the follow-
Step 2: Multi-View Feature Construction
ing subsections.
C. Function Set Training Set and Test Set With the Constructed Features
The function set of MFCGPE is com-
posed of two types of operators, i.e., the
Classifier 1 Classifier 2 Classifier 3
feature construction operators and the
feature combination operators. Table I
lists the detailed information of the Majority Voting
function set. Four arithmetic operators
including +, −, ×, and ' are for feature
construction, where ' is protected by Final Diagnosis
returning 0 if the divisor is 0. The inputs Step 3: Ensemble Diagnosis
of +, −, ×, and ' operators are two fea-
tures, and the output of them is a new
feature. It should be noted that their FIGURE 3 Overall structure of the MFCGPE fault diagnosis approach.
An effective fitness function is crucial training set. The average value of the Ni Ni
D intra (X i) = 1 / / d (X k , X l )
(i) (i)
for guiding GP to construct high-level three sub-test sets is set as the diagno- Ni Ni k = 1 l = 1
features for fault diagnosis. When the sis accuracy. (3)
number of training samples is small, the The Dist is calculated according to Ni Nj
D inter (X i, X j) = 1 / / d (X k , X l )
(i) (j)
features constructed by GP can easily Equations (2)–(5), which evaluate the Ni N j k = 1 l = 1
achieve good classification performance distance of the training samples with the (4)
on the training set, but poor generaliza- constructed features. The calculation of Dist = 1
tion performance on the test set. To Dist is based on the Euclidean distance. 1 + e - (min (Dinter) - max (Dintra))
(5)
address this issue, a new fitness function For the given samples X k = {x k1, x k2,
based on the classification accuracy and f, x kn} and X l = {x l1, x l2, f, x l n}, the The proposed fitness function opti-
the distance measure of the training Euclidean distance between them can mizes the classification accuracy and the
samples is proposed to guide the search be calculated as Equation (2), where x ko distances of the training samples, simul-
of MFCGPE. To calculate the diagnosis and x lo represent the features of samples taneously. When the fitness value
accuracy, KNN is used to perform clas- X k and X l, respectively, and n is the approaches one, it indicates that the fea-
sification based on the constructed fea- number of features. The calculation of tures constructed by GP have the best
tures. The reason for using KNN is that the intra-class and inter-class distances is classification performance and the train-
it is a simple classification algorithm, based on Equation (3) and Equation ing samples have a small intra-class dis-
easy to implement, and treats each (4), where X represents the samples in tance and a large inter-class distance.
feature equally without any feature one single class, and N is the number of
weighting or selection [22], [52]. With samples in the X class. Equation (5) is F. Ensemble for Fault Diagnosis
the use of KNN, MFCGPE can auto- the calculation of Dist, where the sigmoid The MFCGPE system is able to find
matically construct discriminative fea- function is used to transform the differ- the three best GP trees/programs that
tures and avoid redundant or irrelevant
features. The distance measure is to
TABLE III Terminal set corresponding to View2.
minimize the intra-class distance and
maximize the inter-class distance of the SYMBOL FORMULA SYMBOL FORMULA
training samples based on the con- F1 M F8 M
F5 M F12 M
where Acc represents the diagnosis accu- / [A 2j f ( j )] / [(A j - F2) 4 f ( j )]
j=1 j=1
racy of KNN using the constructed fea- M M M (F9) 4
= / (A j f ( j ))G= / f ( j )G
4
tures and Dist represents the distance j=1 j=1
j=1
Amplitude (m/s2)
ORF REF
12,000 Hz. Similar to the NCEPU 400 500
dataset, the first 102,400 data points 0 0
of vibration signals under each run- –400 –500
ning condition are divided into 50 0 0.08 0.16 0 0.08 0.16
samples on average. Five samples of IOCF ROCF
400 200
each running condition are ran-
0 0
domly selected to form the training
–400 –200
set, and the remaining samples are 0 0.08 0.16 0 0.08 0.16
used as the test set. Table V lists the Time (s) Time (s)
detailed information of the CWRU
dataset. Figure 9 shows the time FIGURE 7 Time domain waveform of vibration signals under six running conditions in NCEPU.
domain waveform of vibration sig-
nals under ten running conditions domain waveform of vibration sig- sification. The TDF and FDF features
in CWRU. nals under four running conditions have been described in Section III-D.
3) X JTU [55]: It is a run-to-failure in XJTU. The numbers of features in MDF,
bearing fault dataset collected by MMSDE, and IMDE are 37, 20, and 20,
Xi’an Jiaotong University (XJTU), B. Comparison Methods respectively. The comparisons aim to
in which the reason for fault occur- To verify the effectiveness of MFCGPE, investigate whether the features con-
rence is the natural damage as the four categories of competitive methods, structed by MFCGPE are more effective
running time increases. Figure 10 i.e., 19 methods, are employed for com- for fault diagnosis than these manually
shows the used test rig of XJTU. In parisons. The first category includes five crafted features.
the experiments, the vibration sig- different classification algorithms, i.e., The third category contains six GP
nals of the test bearings are collect- KNN, SVM, Naive Bayes (NB), logistic based methods using TDF, FDF, and
ed with a sampling frequency of regression (LR), and multilayer percep- TFDF (described in Section III-D) as
25,600 Hz until the bearing fails, tron (MLP), which use raw signals
and the time interval for each sam- amplitude to train the classifiers for fault
pling is one minute. The vibration classification. The second category are
signals of the last two recorded data KNN using five manually crafted fea-
file of the Bearing 2_1, Bearing tures, i.e., TDF, FDF, multi-domain fea-
2_2, and Bearing 2_3 datasets are tures (MDF) [25], modified multi-scale
used as the analyzed data. The fault symbolic dynamic entropy (MMSDE)
types are inner ring fault (IRF), [14], and improved multi-scale disper-
outer ring fault (ORF), and cage sion entropy (IMDE) [15], for fault clas- FIGURE 8 CWRU test rig.
fault (CF). The signals of the first
two recorded data file of the Bear-
TABLE V Description of the CWRU dataset.
ing 2_1 dataset is used as the nor-
mal (NOR) vibration signals for CLASS DEFECT TRAINING TEST
LABEL RUNNING CONDITION SIZE (IN) SAMPLES SAMPLES
analysis. Every vibration signal
contains 32,768 data points and is 1 Normal 0 5 45
divided into 32 samples on average 2 Inner ring fault 0.007 5 45
for conducting the experiments. 3 Inner ring fault 0.014 5 45
Each sample contains 2,048 data 4 Inner ring fault 0.021 5 45
points. Five samples under each 5 Outer ring fault 0.007 5 45
running condition are randomly
6 Outer ring fault 0.014 5 45
selected to form the training set,
7 Outer ring fault 0.021 5 45
and the remaining samples are used
as the test set.Table VI lists the 8 Rolling element fault 0.007 5 45
–5 –50
Wilcoxon rank-sum test with a 5% sig- 0 0.04 0.08 0 0.04 0.08
nificance level is employed to evaluate ORF CF
30 50
the significant difference in performance
0 0
improvement of MFCGPE compared to
–30 –50
a method. In Table VII, the “+” symbol 0 0.04 0.08 0 0.04 0.08
indicates that the performance of MFC- Time (s) Time (s)
GPE is significantly better than the
comparison method. The summary of FIGURE 11 Time domain waveform of vibration signals under four running conditions in XJTU.
TABLE VII Diagnosis accuracy (%) of MFCGPE and the comparison methods on the NCEPU, CWRU, and XJTU datasets.
FCm ×
FC2 F11 F5
FCm
× – ×
FC2 FC2
F5 – F5 F5 – –
÷ ÷ ÷ ÷ FCm
F8 F1 F7 – F12 F5
T2 – T10 T5 ÷ T10 T2 T5 – FC2
F8 –
FIGURE 12 Example trees evolved by MFCGPE on the NCEPU dataset. (a) Best tree obtained under View1 (b) Best tree obtained under View2 (c)
Best tree obtained under View3.
Class Label
Class Label
4 4
GPE constructs multi-view features and
gains a higher diagnosis accuracy by
3 3 using an ensemble built from the features
constructed from different views.
2 2
1 1 VII. Conclusions
0 45 90 135 180 225 270 45 90 135 180 225 270 The goal of this paper was to develop a
Sample Number Sample Number new GP-based approach to achieving
(c) (d)
effective fault types diagnosis of rolling
Ture bearings using a small number of train-
Predict ing samples. This goal has been suc-
cessfully achieved by developing the
FIGURE 14 Diagnosis results on the NCEPU dataset using the constructed features of differ- MFCGPE approach. A new GP pro-
ent views and the constructed ensemble. (a) Diagnosis results of using the features construct- gram structure, a function set, and a
ed on View1 (b) Diagnosis results of using the features constructed on View2 (c) Diagnosis
results of using the features constructed on View3 (d) Diagnosis results of using the construct- terminal set were developed to enable
ed ensemble. MFECPE to construct a flexible number
by O. A. Ibrahim, J. M. Keller, and J. C. data stream evolves, flag outliers, and where objects are temporarily stored, the
Bezdek, IEEE Transactions on Emerging most importantly, spawn new emerging system can extract properties of the stored
Topics in Computational Intelligence, Vol. structures. We compare the two new objects related to the task. We propose a
5, No. 2, April 2021, pp. 262–273. indices to the incremental Xie-Beni and deep reinforcement learning (RL) neural
Davies-Boudin indices, which to our network to learn affordances, i.e., a
Digital Object Identifier: 10.1109/ knowledge offer the only comparable sequence of actions to manipulate these
TETCI.2019.2909521 approach, with numerical examples on a objects. The RL network and object
“Dunn’s internal cluster validity variety of synthetic and real datasets.” detector are trained alternatively. After
index is used to assess partition quality both the network and detector are
and identify a “best” crisp c-partition of IEEE Transactions on Artificial trained, the objects and their affordances
n objects built from static data sets. This Intelligence are transferred to an external memory.
index is quite sensitive to inliers and They are then utilized when the same
outliers in the input data, so a subse- Procedural Memory Augmented Deep objects are detected in input frames. Here,
quent study developed a family of 17 Reinforcement Learning, by Y. Ma, J. we use a combination of a dictionary and
generalized Dunn’s indices that extend Brooks, H. Li, and J. C. Principe, IEEE a linked list for the external memory that
and improve the original measure in Transactions on Artificial Intelligence, Vol. can be accessed by either content or tem-
various ways. This paper presents online 1, No. 2, Oct 2020, pp. 105–120. poral order. This dual access is motivated
versions of two modified generalized by the temporal property of human pro-
Dunn’s indices that can be used for the Digital Object Identifier: 10.1109/ cedural memory. The proposed memory-
dynamic evaluation of an evolving (clus- TAI.2021.3054722 augmented RL framework br ings
ter) structure in streaming data. We “Inspired by the human brain, we pro- advantages of transferability, explainability
argue that this method is a good way to pose an external memory-augmented and computational efficiency with respect
monitor the ongoing performance of decision-making architecture for video to conventional deep learning architec-
streaming clustering algorithms, and we processing. A self-organizing object detec- tures. We validate the framework on the
illustrate several types of inferences that tor is employed as a frontend to decon- video game Super Mario Brothers to
can be drawn from such indices. Stream- struct the environment. This is done by show superiority to some classical deep
ing clustering algorithms are incremen- extracting events from the flow of time RL architectures and exemplify these
tal, process incoming data points only and detecting objects within the frames. three advantages.”
once and then discard them, adapt as the By employing an extra working memory
BENEFITS:
• Rapidly disseminate your research findings
Follow @TechRxiv_org
Learn more techrxiv.org Powered by IEEE
IEEE Computational Intelligence Society Publications
IEEE
Transactions
on
IEEE
Transactions
on
Evolutionary
IEEE
Transactions
on
Fuzzy
Systems
Neural
Networks
and
Learning
Systems Computation
Editor-in-Chief:
Haibo
He Editor-in-Chief:
Jonathan
Garibaldi Editor-in-Chief:
Carlos
A.
Coello
Coello
Impact Factor: 8.793 Impact Factor: 9.518 Impact Factor: 11.169
Eigenfactor: 0.04821 Eigenfactor: 0.02514 Eigenfactor: 0.0117
Article Influence Score: 2.584 Article Influence Score: 2.203 Article Influence Score: 3.059
Ranking (JCR 2019): Ranking (JCR 2019): Ranking (JCR 2019):
#3 Computer Science—Theory & Methods #7 Computer Science—Artificial Intelligence #2 Computer Science—Theory & Methods
#3 Computer Science—Artificial Intelligence
#3 Computer Science—Hardware & Architecture #10 Computer Science—Electrical and Electronic Engineering
IEEE
Transactions
on
Emerging
Topics
in
IEEE
Computational
Intelligence
Magazine IEEE
Transactions
on
Artificial
Intelligence
Computational
Intelligence
Editor-in-Chief:
Chuan-Kang
Ting Editor-in-Chief:
Yew
Soon
Ong Editor-in-Chief:
Hussein
Abbass
Impact Factor: 9.083 Indexing: Scopus Sole transactions for all AI topics.
Eigenfactor: 0.00277 Uniquely communicate AI benefits for every
Publishes original articles on emerging
Article Influence Score: 2.468 paper using published open-access impact
aspects of computational intelligence,
statements.
Ranking (JCR 2019): including theory, applications, and surveys.
DBLB indexed.
#9 Computer Science—Artificial Intelligence
IEEE
Transactions
on
IEEE
Press
Books
IEEE
Transactions
on
Games
Cognitive
and
Developmental
Systems Computational
Intelligence
Series
Editor-in-Chief:
Yaochu
Jin Editor-in-Chief:
Julian
Togelius Editor-in-Chief:
David
Fogel
Impact Factor: 2.667 Impact Factor: 1.886
CIS
Liaison:
Alice
Smith
Eigenfactor: 0.00123 Eigenfactor: 0.00019
Article Influence Score: 0.694 Article Influence Score: 0.449 The IEEE Press Series on Computational
Focuses on advances in the study of Publishes original high-quality articles Intelligence includes books on neural, fuzzy,
development and cognition in natural covering scientific, technical, and and evolutionary computation, and related
(humans, animals) and artificial (robots, engineering aspects of games. technologies, of interest to the engineering
agents) systems. and scientific communities.
Additionally,
IEEE
CIS
technically
co-sponsors
the
following
other
titles;
The
IEEE
Transactions
on
Smart
Grid,
the
IEEE
Transactions
on
Big
Data,
the
IEEE
Transactions
on
Nanobioscience,
the
IEEE
Transactions
on
Information
Forensics
and
Security,
and
the
IEEE
Transactions
on
Affective
Computing.