You are on page 1of 58

Buffer-based Adaptive Fuzzy Classifier

SAJAL DEBNATH

BACHELOR OF COMPUTER SCIENCE & ENGINEERING

UNIVERSITY OF BARISHAL
UNIVERSITY OF BARISHAL

DECLARATION OF THESIS AND COPYRIGHT

Author’s Full Name : Sajal Debnath

ID : 16CSE025

Date of Birth : 19-09-1997

Title : Buffer-based Adaptive Fuzzy Classifier

Academic Session : 2015-2016

Certified by:

(Student’s Signature) (Supervisor’s Signature)

Sajal Debnath Dr. Md Manjur Ahmed

Student Name
Date: 29 December 2021 Name of Supervisor
Date: 29 December 2021
SUPERVISOR’S DECLARATION

I hereby declare that I have checked this thesis/project and in my opinion, this
thesis/project is adequate in terms of scope and quality for the award of the degree of
Bachelor in Computer Science & Engineering.

Full Name : Dr. Md Manjur Ahmed


Position : Assistant Professor and Chairman of Computer Science & Engineering
Department.
Date : 29 December 2021
STUDENT’S DECLARATION

I, Sajal Debnath, declare that this thesis titled, Buffer-based Adaptive Fuzzy Classifier and the
work presented in it are my own. I confirm that:
 This work was done wholly or mainly while in candidature for a research degree at
this University.
 Where any part of this thesis has not previously been submitted for a degree or any
other qualification at this University or any other institution, this has been clearly
stated.
 Where I have consulted the published work of others, this is always clearly attributed.
 Where I have quoted from the work of others, the source is always given. With the
exception of such quotations, this thesis is entirely my own work.

(Student’s Signature)
Full Name : SAJAL DEBNATH

Number : 16CSE025
Date :
Buffer-based Adaptive Fuzzy Classifier

SAJAL DEBNATH

Thesis submitted in fulfillment of the requirements


for the award of the degree of
Bachelor of Computer Science & Engineering

Faculty of Science and Technology


UNIVERSITY OF BARISHAL

29 December 2021
Acknowledgements
At first, I thank my Almighty for giving his blessing upon me to proceed with my work
without any inconvenience. I feel exceptionally blessed the way he makes the paths
facilitate for me. I would like to express my sincere gratitude to my supervisor Dr. Md
Manjur Ahmed for his germinal ideas, continuous encouragement, invaluable guidance,
and support in making this research possible. I am sincerely thankful for the time he
spent proofreading and correcting my mistakes. It’s not enough to thank him for all his
help with the research directions and experimentation. I am extremely grateful to my
beloved parent and family for keeping on supporting and giving advice and motivation
to me. To all my friends that keep company me all the time. Most of all, I would
like to thanks the University of Barisal and the Department of Computer Science and
Engineering for allowing me to write an honors thesis.

i
Abstract
For the technological revolution, heterogeneous sources are generating data streams at a
high rate and online classification of these data facilitating data mining and data anal-
ysis. Among all those classifiers, Fuzzy-System-Based (FSB) classifiers have remarkable
contributions and dominant efficiency with antecedent-consequent structure. Mamdani
and Takagi-Sugeno type structure always uses the same antecedent part with fuzzy sets
which are themselves defined by parameterized scalar membership functions esteemed
by logical AND/OR operations. These membership functions are discerned either by
experts or from data. Moreover, most algorithms with FSB classifiers evade the pres-
ence of temporarily irrelevant data points or data clouds which can be relevant in the
future. Here, we are developing a novel data-cloud based classification algorithm for
stream data classification namely, Buffer-based Adaptive Fuzzy Classifier (BAFC). The
offline training stage of this algorithm can identify data-cloud from a static dataset to
construct the AnYa type fuzzy rule. It earns the capability to cope with the dynamic
nature of stream data. In the online or one-pass training stage, it updates its rule base
by creating and merging data-cloud based on its potential area. This algorithm also
introduces a recursive formula for calculating data-cloud density with a buffer which is
used for storing temporarily irrelevant data cloud. It also has an online pruning system
of data-cloud to get rid of storage problems. This approach can solve the problems of
parameterization and redundant rule base for other streaming data (sensor data, bank
transaction, intruder detection, image, video, stock market prediction, disease predic-
tion, etc.) classification algorithm. Evaluation of this two-stage algorithm has been
conducted on several benchmark datasets and proves its superiority on different well-
established classifiers.
Publications
1. ”Buffer-based Adaptive Fuzzy Classifier” in IEEE Transaction on Fuzzy Systems,
2021. (Submitted)
Contents

Acknowledgements i

Abstract ii

Publications iii

Contents iv

List of Figures vi

List of Tables vii

1 Introduction 1
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Contributions of this Research . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background Study 7
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Fuzzy rule based classification approach . . . . . . . . . . . . . . . 7
2.1.1.1 eClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.2 eClass0 and eClass1 . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.3 EVABCD . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Data-cloud and AnYa type fuzzy rule based classifiers . . . . . . . 9
2.1.2.1 Incremental simplified FRB classsifier . . . . . . . . . . . 9
2.1.2.2 SOFLP classifier . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Empirical Data analyst (EDA) based classifiers . . . . . . . . . . . 10
2.1.3.1 NaIve EDA classifier . . . . . . . . . . . . . . . . . . . . . 10
2.1.3.2 Naive TEDA classifier . . . . . . . . . . . . . . . . . . . . 10
2.1.3.3 ALMMo-0 . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3.4 TEDA classifier . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3.5 FRB classifier . . . . . . . . . . . . . . . . . . . . . . . . 11

iv
Contents v

2.2 Related Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


2.2.1 Data analysis and EDA operators . . . . . . . . . . . . . . . . . . 11
2.2.2 Data-cloud and AnYa type fuzzy rule . . . . . . . . . . . . . . . . 14

3 Buffer-based Adaptive Fuzzy Classifier 16


3.1 Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Offline training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Online training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Decision Making Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Algorithm Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5.1 Computational complexity analysis of offline training process . . . 28
3.5.2 Computational complexity analysis of online training process . . . 29
3.5.3 Memory complexity analysis . . . . . . . . . . . . . . . . . . . . . 29

4 Evaluations 30
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Conclusion and Future Work 40


List of Figures

2.1 Self-organizing fuzzy logic prototype (SOFLP) based classifier. . . . . . . 9


2.2 Insertion of xk in a data-cloud . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Data-cloud of samples (Multiple feature dataset, three of the features). . . 15

3.1 System diagram of Offline training stage. . . . . . . . . . . . . . . . . . . 19


3.2 Illustration of a tree generated from 100 data of class 2 from Occupancy
detection Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Illustration of a tree generated from 100 data of class 2 from Letter recog-
nition dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Illustrates merging procedure of data-cloud in offline mode, where fig. (a)
describes the state before cloud merging and fig. (b) describes the state
after merging data-cloud 1 with 2. . . . . . . . . . . . . . . . . . . . . . . 22
3.5 System diagram of Online training stage. . . . . . . . . . . . . . . . . . . 23
3.6 Merging procedure of data-cloud in online mode, where fig. (a) describes
the state before data-cloud merging and fig. (b) describes the state after
merging data-cloud 1 with 2 according to condition 7. . . . . . . . . . . 25

4.1 Data samples from MNIST dataets. . . . . . . . . . . . . . . . . . . . . . 31


4.2 Highlighting data-cloud and data sample during Offline training. . . . . . 32
4.3 Highlighting data-cloud and data sample during Evolving training. . . . . 32
4.4 Highlighting data-cloud and data sample during Offline training. . . . . . 33
4.5 Highlighting data-cloud and data sample during Evolving training. . . . . 33
4.6 Reading of data-cloud number during evolving training phase for class 0. 34
4.7 Per sample training time graph for Mnist data. . . . . . . . . . . . . . . . 34
4.8 Memory use during training of Mnist data. . . . . . . . . . . . . . . . . . 35
4.9 Total training time for various decay parameter value. . . . . . . . . . . . 36

vi
List of Tables

2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1 Describing accuracy for different algorithm with dataset. . . . . . . . . . 38


4.2 Compare of performance (in accuracy, %) with different amounts of MNIST
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 A part of Fuzzy-rule generated from evolving stage. . . . . . . . . . . . . 39

vii
Chapter 1

Introduction

For revolutionary development in technology data is generating through numerous sources


as computer simulations, consumer market, smart phones, business transactions, au-
tonomous systems, satellites, sensors, social media, different applications and so on.
According to the research, more than 2.5 quintillion bytes of data was generating every
day in 2018 [30] and this data will be double in every two years has averred by the Inter-
national Data Corporation (IDC) [42]. All these overwhelming volume and unbounded,
structured or unstructured series of data which are arriving continuously are defined as
data streams [54]. To extract the preserved knowledge form data stream is quite chal-
lenging for data mining or data classification algorithms which have three basic feature:
volume, velocity and variety. In recent decades, data classification is popularly studied
topic in machine learning [27]. This chapter narrates the objectives and the motivations
of this thesis. Then the explanation of challenges and our contribution in this thesis is
given. Finally, this chapter is closed by making a draft of the thesis.

1.1 Objectives

The objective of this thesis is to build a AnYa type fuzzy rule-based classification model
that able to do the task well. Generally, peoples classify stream data by using proto-
type, potential data points, data-cloud density, and defined parameters. Sometimes it is
difficult to identify efficient rule base or data-clouds for those who have less knowledge
and experience. In modern data processing and intelligent technology, computer-based
data/image classification or recognition are effective and widely used in different sectors
where researchers are researching in this field using image processing and recognition
techniques and proposed different approaches to identify and classify image or numeric
data. Angelov et al. [10] introduced prototype and density based classification of stream

1
Chapter 1. Introduction 2

data which used parametric gaussian-bell type membership function. Yager and Angelov
proposed a alternative data analysis approach which is more efficient for both classifi-
cation and clustering using data analysis operators cumulative proximity, standardized
eccentricity and data-cloud density [8].

1.2 Motivations

Still now, with the Fuzzy-rule-Based system we have various alternative data classifica-
tion algorithms as K-Nearest Neighbor(KNN) [46], Baycsian classifiers [21, 31], decision
trees [34, 62] , Random forest [45, 55], support vector machines [56, 58] and neural
network based classifiers [16, 26] which are widely using in multiple area i.e. image
classification [37], remote sensing [59, 60], pattern recognition [19], fault detection [49],
user behaviour modeling [32], handwritten digit and face recognition [18, 22, 35, 38], etc.
Many algorithms or techniques has proven to be a powerful new tool for data mining but
remain incomprehensible as black box. This is a critical impediment to the widespread
deployment of technology, as decades of research have found that users simply will not
trust a model whose solutions cannot be explained. Fuzzy systems, on the other hand,
are by design much more easily understood. As a family of effective and promising
data stream classification, clustering, and regression approaches, evolving fuzzy systems
(EFSs) have received a great deal of attention. EFSs are fuzzy rule based systems with
self-learning structures and parameters [25]. Therefore, evolving fuzzy systems (EFS)
have emerged as prominent methodology during the last decade in order to properly ad-
dress the demands in (fast) online modeling and stream mining processes with changing
characteristics [40]. EFSs have two remarkable characteristics:

1. EFSs are equipped with a real-time evolving structure that can capture the data
dynamics and concept drifts.

2. EFSs can present the knowledge learned online in an accurate, transparent and
interpretable way [25].

According to different architectures and operating mechanisms, the existing techniques


can be divided into two part, offline [23, 39] and online [6, 13, 15, 35, 43, 48, 50].
Offline algorithms can extract knowledge from a static dataset and never develop its
knowledge based learning architectures, as a result now it is less applicable for Big
Data manipulation. On the other hand, online classification techniques ( incremental
and evolving) can be of one pass type [27].They only store extracted interpretation or
information (i.e. rule base [15], key prototype [32], cloud density [8], etc.) in the system
Chapter 1. Introduction 3

from the non stationary data streams meanwhile it abandon previously processed sample
[2, 15, 35]. To cope with the overwhelming amount of data in real time, both can address
the changing data pattern with updating their system structures and meta-parameters
recursively [27]. Those approaches are frequently using in different applications for
having its low memory and computational efficiency. But the ordering difference of data
sample may affect badly on the performance of online classification algorithms. In part
of a real situation, most often some data sample can be available as a static form and
the rest can arrive sequentially as streaming data. In that case, learning ”from scratch”
is inappropriate while the available static data can be train in offline and assumed
knowledge can perform on the streaming data [27]. Additionally, online structure can
also refute the prior knowledge for non-stationary data by developing its rule base and
structure.

1.3 Challenges

There are many satisfactory research works that have been done for a long time for
stream or one pass data classification. However, all of them do not able to solve the
problem completely. More of that, the work for both numeric and image data is very
rare. So there are always some difficulties that have to face during Buffer-based adaptive
fuzzy classification.

1. Catastrophic forgetting: Adaptive and evolving modeling from online data


streams are distinct concepts. Adaptive models are suitable to cope with smooth,
gradual changes of system parameters and statistical properties of data (concept
drift). However, when an adaptive model is changed to learn a new behavior, the
knowledge about some previous behaviors tends to be partially lost.

2. Evolving structure: Evolving fuzzy rule-based models are appropriate for online
detection and classification in non-stationary data stream environments. Abrupt
changes in online or stream data change a variable or parameter (concept shift
and drift) of model. This requires both parametrical and structural adaptation of
models. It indicates such a higher level of flexibility of models.

3. Online pruning: For evolving or one-pass data classification fuzzy system-based


model, redundant rule base always deteriorates its performance. So, finding the
irrelevant fuzzy rule or redundant rule base in online is a bit of challenging task
for different kinds of data.

4. Identifying the most essential data-cloud: In offline fuzzy classification model


or expert system based model rule base using data-cloud can be predefined. But,
Chapter 1. Introduction 4

For a data-cloud based offline-online classification model, it is tough to identify


effective and efficient data-cloud for fuzzy rule base.

5. Time series data: As our world gets increasingly instrumented, sensors and sys-
tems are constantly emitting a relentless stream of time series data. The common
two traits of time series data is drift and shift which indicates gradual or sudden
change in data-cloud based rules. In case of time series data in real environment,
rule base for fuzzy systems is quite challenging.

1.4 Research Questions

Considering the above issues, the purpose of this research is to design a model that can
classify numeric, image and time series data. In this thesis, the following thesis questions
will be answered.

1. How to enhance stream data classification accuracy with challenging dataset?

2. How to design a stream data classifier which is able or more capable to extract
information than other existing process?

3. Which classifier is better for stream data classification using fuzzy-rule-based sys-
tems?

4. How to use rule base effectively and efficiently by minimizing rule base redundancy?

1.5 Contributions of this Research

In this paper, we propose and describe a data-cloud and fuzzy-rule-based (FRB) classifier
grounded at Empirical Fuzzy Sets [12] and recently introduced self-organising fuzzy logic
classifier (SOF) [27]. The algorithm is divided in offline and online training stages. In
offline, 0-order AnYa type fuzzy rule is created by extracted knowledge from available
static dataset [8]. And in online stage the whole identified rule base of offline stage can
be updated continuously by processing streaming data. The online procedure belong
”one pass” system and support drift and/or shit in the data pattern. Shortly, the
contributions of this study are as follows:

1. SOF [27], the state-of-the-art techniques for classification use a predefined user
input variable (granularity level) as threshold and a greedy way for ranking the
multimodal density based list can reduce efficacy of the classifier. Here, Prim’s
Chapter 1. Introduction 5

algorithm based ranking technique and local average distance of a data-cloud for
threshold are using to reduce the dependency on user and increase the efficiency
of classifier.

2. A new recursive formula based on unimodal density [5] and cosine dissimilarity
has also been introduced here for ease of online stage calculation. It also makes
the process free from the ”curse of dimensionality”.

3. From offline or online stage, some data-cloud may seem irrelevant but can be
relevant in future are not considered by existing classifier algorithms. This classifier
uses a temporarily memory storage for storing seemingly irrelevant data-clouds.
Those data-clouds also be retrieved from temporarily memory storage if it found
relevant in future and by functioning some pruning operation it will permanently
remove completely irrelevant data-clouds. This operation discard the complexity
of creating the new data-clouds and also minimize streaming data processing time.

4. It introduces an efficient and elegant approach to merging existing data-clouds to


dominate dynamic and evolving streaming data.

1.6 Thesis Organization

Rest of the chapters of this thesis are designed as follow:

1. Chapter 2: Background Study The overall advancement on stream data, data-


cloud based, and Empirical data analysis operator based classification are concisely
but comprehensively discussed in 2.1. Different techniques and theories which has
used or applied on this study has described in 2.2.

2. Chapter 3: Buffer-based Adaptive Fuzzy Classifier The derivation of pro-


posed cosine dissimilarity based recursive formula are described in 3.1 , whereas the
proposed methodology with algorithm of fuzzy-rule-based and data-cloud based
classifier is described in 3.2 and 3.3 with proper explanation. Further, the com-
plexity of offline, online training stage and memory has been analysed in 3.5.

3. Chapter 4: Evaluation Experimental analysis exhibit distinguishable result


on different performance measure as accuracy, used memory, processing time and
transparent dynamic data-clouds of the proposed method over different well-known
classification methods. The evaluation of proposed approach is described and
analyzed with seven benchmark dataset and eight different classification method
in chapter 4.
Chapter 1. Introduction 6

4. Chapter 5: Conclusion and Future work Further, this thesis is wrapped up


by remarking overall thesis work for stream data classification and giving future
direction 5.
Chapter 2

Background Study

This chapter includes the background review which has been done for this thesis. In the
first section, a brief discussion on stream data classification is given. Then the relevant
component of semi-online stream data classification using fuzzy systems as, Data-cloud,
Fuzzy rule, EDA are discussed with related state-of-the-art of each step of the respective
works. Finally, the chapter concludes by giving a brief summary.

2.1 Literature Review

There have been a variety of successful data classification algorithms in machine learning
and data science are available in the literature, as classification is the most popular
research topic for the last three decades. Because of the limited scope of this study, it is
practically impossible to discuss all of them. The review of relevant studies in this paper
focuses on three categories of the well-known data classification approaches: (1) fuzzy
rule-based and (2) Data-cloud based and (3) Empirical data analysis (EDA) based.

2.1.1 Fuzzy rule based classification approach

Fuzzy rule-based classification approach has its arresting capability. Evolving fuzzy
systems have received increased attention from the research community for the purpose
of data stream modeling in an incremental, single-pass and transparent manner. To
date, a wide variety of EFS approaches have been developed and successfully used in
real-world applications which address structural evolution and parameter adaptation in
single or multiple stage models. Among them a few techniques are described later.

7
Chapter 2. Background Study 8

2.1.1.1 eClass

Eclass [10], a prototype/data point potentiality based online classifier is considered as


a primitive algorithm in case of stream data classification that creates fuzzy rules with
the most potential data point. This process has grounded at two empirical studies,
Evolving fuzzy systems from data streams in real-times [9] and Evolving Rule-Based
Models [11]. In eclass, antecedent and consequent based fuzzy rule has made using
recursive calculation of prototype potentiality and esteemed value of Gaussian-bell type
membership function. The overall architecture is one-pass and can develop its rule base
”from scratch”.

2.1.1.2 eClass0 and eClass1

Some problem from the first member [10] of Eclass family has further developed in
[15]. In this primitive and empirical study parameter estimation and pruning operation
for fuzzy rule base has also introduced with global and local potentiality of a data
point. To make the procedure more deterministic it has used cosine dissimilarity to
make it free from ”curse of dimensionality”, where [10] used euclidean type distance
for potentiality calculation. But it does not solve the problem of membership functions
that generates some esteemed value using aggregation which is significantly differ from
real data distribution [8]. On the other hand, it has shown its arresting classification
accuracy in numerous benchmark problem data with intrusion detection stream data.

2.1.1.3 EVABCD

A well-known application of eclass is EVABCD [32], which determines user behavior from
some sequence of UNIX command. Using tire based data structure relative frequency
has calculated from the command sequence to characterize the user profile. This evolving
classifier has the ability to adopt the behavioral changes in command. It has calculated
potentiality of a prototype by using the cauchy type function with cosine distance to
create a user profile and evolves it considering high/low potential value of an arriving
sample data.
Chapter 2. Background Study 9

2.1.2 Data-cloud and AnYa type fuzzy rule based classifiers

2.1.2.1 Incremental simplified FRB classsifier

All parameterization problems (e.i. centre and spread) of traditional fuzzy rule based
system has solved by introducing data-cloud [7, 8] based antecedent part in the rule. This
data-cloud concept does not need any membership function and predefined parameter
as centre or spread with no shapes or boundaries. In the mean time, it can calculate
density of a data-cloud recursively to define antecedent part which represents real data
distribution system with a fully data driven process from the data stream. Moreover,
this incremental fuzzy rule based classifier is one pass type and does not need to store
data samples in the system.

2.1.2.2 SOFLP classifier

Here, a new fuzzy rule-based hybrid optimization approach for dimension reduction and
multi-classification problems using self-organizing fuzzy logic prototype is introduced
[40]. It classify the levels of pain perception from patients’ fMRI for clinical pain as-
sessment. In this multi-layer classifier approach, the first and second layers depict the
antecedent part of the fuzzy rules based on a 0-order AnYa fuzzy rule and Data-cloud.
In the offline stage of this classifier it creates AnYa type fuzzy rule based on cluster like
data-cloud. It also use level of granularity as predefined user input. Further, in online
stage, based on stream data it updates meta-parameters of the algorithm. Although the
online stage is capable to fetch stream data, it does not allow to evolve the data-cloud by
merging or pruning its rule base obtained from offline stage. Fig. 2.1 describes structure
of SOFLP classifier.

Figure 2.1: Self-organizing fuzzy logic prototype (SOFLP) based classifier.


Chapter 2. Background Study 10

2.1.3 Empirical Data analyst (EDA) based classifiers

2.1.3.1 NaIve EDA classifier

EDA [5] has introduced essential and alternative data analysis operators which is more
efficient for both classification [13, 14, 35] and clustering [28]. Data analysis operators as
cumulative proximity, standardized eccentricity, density and finally global typicality are
non-parametric. Those non-parametric and data driven operators are totally extracted
from empirical observation of arriving data samples which does not need any prior
realistic or restrictive assumptions. Moreover, multimodal typicality based classifier
named NaIve EDA classifier has also introduced here which is effective for representing
extracted information directly and recursively.

2.1.3.2 Naive TEDA classifier

A well-known application for EDA [5] to analyze data in a data distribution system has
introduced in [4].Non-parametric and recursive operators eccentricity and typicality has
applied for anomaly data detection where higher value of eccentricity indicates anomaly
data. Those non-parametric and data driven operators are able to extract information by
ensemble data sample properties with recursive calculation which is perfectly applied for
streaming data analysis. In addition, Local and global typicality based classifier (Naı̈ve
T-EDA Classifier) has proved its domineering performance on numerous alternative
classifier [4].

2.1.3.3 ALMMo-0

Data cloud and EDA operators has also used parallelly for deep rule based classifiers
in Autonomous Learning Multi-Model Systems [13] for stream data classification. In
this autonomous learning system, structure has composed with non-parametric data-
cloud from extracted knowledge of observed data with recursive calculation of meta-
parameters. Anya type fuzzy rule and cauchy function based density also used for rule
base identification. Further, this method has also used in [3] for handwritten digits
recognition problem which has transparency and high level of interpretability.

2.1.3.4 TEDA classifier

Another zero order Anya type fuzzy rule based stream data classifier is TEDAClass [35].
This classifier have used typicality and eccentricity for data analysis where two types of
Chapter 2. Background Study 11

distance (Mahalanobis and Euclidean) have used for recursive calculation. Moreover, it is
one-pass classifier where every data sample can proceeded for once and relearn process is
not needed while new data sample arrive. This self-adopting and non-parametric system
has also introduced merging of AnYa type fuzzy rule for real time classification and to
making it computationally efficient.

2.1.3.5 FRB classifier

To enhance performance of fuzzy rule based classifiers FRB system [12] has introduced
new form of fuzzy set and fuzzy rule based system. The traditional fuzzy system has
parameterization problem with centre, shape and spread for membership function where
AnYa type FRB system discard all those limitation by introducing non-parametric and
shape free data-cloud. Although data-cloud enhance the objectiveness, it leads to less
interpretability and loss of information [12]. As a result, It has combined traditional
fuzzy systems and AnYa type fuzzy rule based system to make more efficient system by
using unimodal density based membership function.

This section described the existing works relevant to this thesis. In the next section, we
will present the description of related theories which are solely component of this thesis.

2.2 Related Theories

As, previously discussed, this section is going to discuss all the related theories of this
thesis. Related theories are Devided into tho parts: (1) Data analysis and EDA opera-
tors, (2) Data-cloud and AnYa type fuzzy rule.

2.2.1 Data analysis and EDA operators

EDA is an evolving method for data analysis, introduced by Angelov [5]. However, the
idea of cumulative proximity, unimodal density and multimodal density has successfully
applied in different classification and regression approach[12, 27]. The concepts of cumu-
lative proximity describes the similarity of a n-dimensional data between its surrounding
data in a data distribution system and density express the level of closeness of a data
point with others. Hence, a data point with low density and high cumulative proximity
can be considered as anomaly or isolated data. All notations used in this study with
their descriptions are tabulated in table 2.1.
Chapter 2. Background Study 12

Table 2.1: Notations

Notation Description Notation Description


C Total number of class in data t storage Buffer
c A specific class in data X Set of static dataset
πK (xi ) Cumulative proximity of xi xi A specific data with all its fea-
ture
Dk (xi ) Unimodal density of xi at k th dcos (∗, ∗) Cosine dissimilarity between
time instance two data
xc Data sample belong to a class s ageci,k Survival age of ith data-cloud
c of class c at k th time instance
in Buffer
xck Data sample of class c arrive Ageci,k Age of ith data-cloud of class
at k th time instance c at k th time instance
Dkc (xck ) Unimodal density of a data ar- γ c (x) Firing strength of a rule for a
rived at k th time instance data x in validation section
Adci Average distance of ith data- Adci,k Average distance of ith data-
cloud of cth class cloud in cth class at k th time
instance
Sic Number of member in ith c
Si,k Sic at k th time instance
data-cloud of ct h class
C centrecnew Newly created data-cloud cen- c
sum of ki,k Sum of k
tre
Adcnew Average distance of newly cre- d(∗, ∗) distance between two data
ated data-cloud point
Agecnew Age of newly created data- C centreci,k Data-cloud centre at k th time
cloud instance
c
Snew Member of new data-cloud C centreC All the data cloud in a specific
class C
c
Dnew Density of new data-cloud C centre∗ All the data-cloud in the sys-
tem
Dkc (C centre∗ )Unimodal density of a data- k Time instance during online
cloud centre training
σc , γ c Two meta-parameter σkc ,γkc Meta-parameter belongs to
cth class
C centreci Centre of ith data-cloud of MM
Dc (ui ) c
Multimodal density of an
class c unique data sample belongs to
cth class
MM
N Total number of data-cloud Dc (nodei ) Multimodal density of an
unique data sample belongs to
cth class
d2 (∗, ∗) Square of distance between Ui C
ith unique data of class c
two data
root Root of prims tree or the data Xic All data sample belong ith
which have maximum density data cloud of class c
uci ith unique data of class c {u}cU c Set of unique data of class c
Uc Total number of unique data {f }cU c Frequency of all unique data
in class c of class c
fic Frequency of ith unique data Xc Total number of data sample
of class c in class c
Chapter 2. Background Study 13

Cumulative proximity can be summarize as the sum of square of distance from a partic-
ular data sample to all other existing data sample in the data distribution system [1, 2],
as expressed:

K
X
πK (xi ) = d2 (xi , xj ), i = 1, 2, 3....K; (2.1)
j=1

Where, xi /xj is the data sample of a real time data set or stream X and X∈Rn . At
Kth time instance data space X k = {x1 , x2 , x3 ....xk } and xi = [ xi,1 , xi,2 , xi,3 ....xi,n ], x
is the n-dimensional data sample and i denotes the time instance when xi arrived. The
U nimodal density (Dk ) of xi , i = [1, k] has expressed as [4, 14]:
Pk Pk Pk 2
j=1 πk (xj ) j=1 l=1 d (xj , xl )
Dk (xi ) = = Pk , i = 1, 2, 3...k; (2.2)
2Kπk (xi ) 2K 2
l=1 d (xi , xl )

Here, it can be summarized as the sums of the distances from all data samples to all
other data samples divided by the sum of the distances of a particular data sample to
all other existing data samples of the data space at k th time instance [5, 12].

In case of static dataset, data distribution system may contain some data sample X,
in which data sample can be repeated more than once, literally, ∃xi = xj , i 6= j.
Now, the set of the unique data samples will be {u}cU c = {uc1 , uc2 , ..., ucU c } , uci =
[uci,1 , uci,2 , uci,3 ....uci,n ], {u}cUc ⊆ Xc , Uc is the number of unique data sample in class c and
PC c
c=1 U is equal to the total unique data samples of the static dataset [12, 27]. Also
the corresponding frequency of unique data set will be as {f }cU c = {f1c , f2c , ..., fUc c } and
sum of all frequency ( U
P c
i=1 fi ) will be sum of data in class c. So, M ultimodal density
(DcM M ) of the unique data sample ui can be defined as [4, 14]:
PX c PX c
fic j=1
2 c c
l=1 d (xj , xl )
DcM M (uci ) = P c 2 c c , i = 1, 2, 3...U c ; (2.3)
2X c Xl=1 d (ui , xl )

In equation 2.1, 2.2 and 2.3, the function d2 (xj , xl ) or d2 (ui , xl ) denotes the square of
distance between two data point. In our study, we are going to use M ultimodal density
for static dataset and U nimodal density for online or real time situation. Different
formula for evolving or online data classification system has developed using Mahalanobis
distance [35] and Euclidean distance [13].

Pn
j=1 akj blj
dcos (ak , bl ) = 1 − qP , a and b are n dimensional data; (2.4)
n 2
Pn 2
j=1 akj j=1 blj
Chapter 2. Background Study 14

In section 3.1, a new recursive formula for online density calculation has introduced
using cosine dissimilarity (2.4).

2.2.2 Data-cloud and AnYa type fuzzy rule

The data-cloud introduced in [8] and further successfully applied in numerous appli-
cations [4, 13, 14, 29] has made a new era for data classification or data clustering.
Data-cloud can identified through some set of data with common properties of cluster
but it does not have any predefined boundaries or a particular shape. Comparing with
traditional membership function (i.e. trapezoidal, triangular and gaussian), data-cloud
has much more objective representation of real data distribution.
Equation 3.5 can calculate density of arriving data sample xk independently in online
training stage to identify in which cloud it belong as shown in figure 2.2. In this figure,
the proposed system can determine the membership of arrived data xk between two
data-cloud c1 and c2 based on calculated density and proposed threshold in equation
(avg-dist of data-cloud as threshold). Holding that condition proposed classifier will
determine the membership of arrived data xk in which data-cloud it will significantly
affect or does it need a new data-cloud.

Figure 2.2: Insertion of xk in a data-cloud

Since, this algorithm is recursive, it will not store past data in online training stage.
Instead, some extracted information from data through variable are needed for every
data-cloud: density of centre (Cdenki ), data-cloud centre (cloud centreki ), number of
samples (Sik ), average distance between centre to each sample (Adki ) and age of a data-
cloud (Ageki ) at k th time instant.
For example, figure 2.3 depicts a scenario (two feature of multiple feature dataset analysis
of class 1) for cloud c1 and c2 after arrival of k th data. The density of each data-cloud
Cdenk1 = .9397 and Cdenk2 = .9116, the number of samples belong to each data-cloud
S1k = 9 and S2k = 8, average distance Adk1 = 2.424 and Adk2 = 2.0391 and the age of each
data-cloud Agek1 = 89.04 and Agek2 = 54.81.
Chapter 2. Background Study 15

Figure 2.3: Data-cloud of samples (Multiple feature dataset, three of the features).

The alternative approach of widely used fuzzy rule based system Takagi-Sugeno [57]
or Mamdani [41] type is AnYa type FRB system which was introduced in [8]. The
traditional Mamdani approach is more linguistic than TS system but they provide same
type of antecedent part, one is fuzzy-set-based and other is function based whereas AnYa
type fuzzy rules are more objective, non-parametric and compact.It does not need any
predefined membership function. AnYa type fuzzy rule is as [8]:

IF (x ∼ Cloud centrec1 ) OR (x ∼ Cloud centrec2 ) OR ...... OR (x ∼ Cloud centrecN )


THEN Classc ← x
(2.5)

where Cloud centreci denotes ith data-cloud of cth class for n-dimensional input data
vector, x =[xi , x2 , x3 ...xn ]; N is the number of data-cloud, the number of data-cloud
extracted from the data samples which can varied class to class and ’∼’ expressed the
similarity between x and ith data-cloud. The class label of the arrived data can esteemed
by winner-takes-all strategy which is perfectly described in section 3.4.
Chapter 3

Buffer-based Adaptive Fuzzy


Classifier

In this section, recursive formula for density calculation, the offline training procedure,
online training procedure and decision-making processes of the proposed classifier BAFC
are depicted in detail. A broad description of the computational complexity of the
proposed BAFC algorithm is also depicted at the end of this section. In addition, the
offline training will be processed with the static dataset and variables will be prepared
per class for using in online training stage. In online training phase all calculation will
be conducted at k th time instance.

3.1 Theories

This study has introducing a cosine distance based formula for ease of efficient online
calculation and to make this system free from ”Curse of dimensionality”. Let us assume
that the distance type is cosine distance (2.4), and then derive some formulae to describe
the process of recursive calculation of U nimodal density (Dk ) [4, 14]:

Pk Pk 2
j=1 l=1 d (xj , xl )
Dk (xi ) = Pk , i = 1, 2, 3...k;
2K 2
l=1 d (xi , xl )
Pk Pk 2
j=1 l=1 d (xj , xl ) (3.1)
=  2
Pn
Pk
1 − q j=1 xij xlj
2K l=1 Pn Pn

2 2
j=1 xij j=1 xlj

16
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 17

Now, by considering the denominator part and simplifying it we found:


 Pn 2  Pn 2
k k
j=1 xij xlj j=1 xij xlj
X X
2K 1 − q  = 2K 1 − q qP 
Pn 2
Pn 2
Pn 2 n 2
l=1 j=1 xij j=1 xlj l=1 j=1 xij j=1 xlj
v
k  n
u n 2
2 uX
X fl X
= 2K 1− ; Here, fl = xij xlj , h = t xij ;
hgl
l=1 j=1 j=1
k  2 !
X 2fl fl
= 2K 1− +
hgl hgl
l=1
k   2 
X 2fl f
= 2K 1− + 2l 2
hgl h gl
l=1
k k k  !
X X 2fl X fl2
= 2K (1) − +
hgl h2 gl2
l=1 l=1 l=1
k k  !!
1 X 2fl 1 X fl2
= 2K k − +
h gl h gl2
l=1 l=1
  
1 1
= 2K k − 2Bk + Dk
h h
(3.2)

fl Pk Pk fl2
Where Bk = and Dk =
l=1 l=1 2 . By observing the variable fl and gl , we
gl gl
can assume that they depend on all the data sample represent by l but variable h only
depends on the arriving xi data attribute. In addition, both Bk and Dk can be calculated
recursively as:

n
x xkj
r r
Pn kj 2 ;
X
Bk = xij bkj ; bkj = b(k−1)j + b1j = Pn 2
(3.3)
j=1 l=1 (xkl ) l=1 (xkl )

and
n n
xkj xkp x1j x11
xkp dpkj ; dpkj = dp(k−1)j + Pn
X X
Dk = xkj l 2
; d11j = Pn l 2
(3.4)
j=1 p=1 l=1 (xk ) l=1 (x1 )
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 18

Finally, we get recursive function of U nimodal density as:


Pk Pk 2
j=1 l=1 d (xj , xl )
Dk (xi ) =  2
Pn
x x
j=1 ij lj
2K kl=1 1 − qP
P

n 2
Pn 2
j=1 ijx x
j=1 lj
Pk Pk 2
l=1 d (xj , xl ) (3.5)
=  j=1  
1 1
2K k − 2Bk + Dk
h h
σk−1 + 2γk
=   
1 1
2K k − 2Bk + Dk
h h

where,
 the distance
 between k th data to all other data in the data space, γk = k −
1 1
2Bk + Dk . The distance between all pair for the streaming data, σk = σk−1 +2γk
h h
and σ1 for online training stage will be the sum of distance of all pair static available
data in offline stage.
In addition, for online process we have to calculate the density of existing data cloud.
We can use the variable and density calculated in (k − 1)th time instance as:
Pk−1 Pk−1
∗ j=1 l=1 d2 (xj , xl )
Dk−1 (Cloud centre ) = Pk−1
2(K − 1) l=1 d2 (Cloud centre∗ , xl )
σk−1
Dk−1 (Cloud centre∗ ) = Pk−1 (3.6)
2(K − 1) l=1 d2 (Cloud centre∗ , xl )
k−1
X σk−1
d2 (Cloud centre∗ , xl ) =
2(K − 1)Dk−1 (Cloud centre∗ )
l=1

Further using formulae 3.6, we will calculate the density of existing data-cloud at k th
time instance as:

Pk Pk 2
∗ j=1 l=1 d (xj , xl )
Dk (Cloud centre ) = Pk
2K 2
l=1 d (Cloud centre∗ , xl )
σk−1 + 2γk
= hP i
k−1 2
2K l=1 d (Cloud centre∗ , xl ) + d2 (Cloud centre∗ , xk )
σk
=  
σk−1 2 ∗
2K + d (Cloud centre , xk )
2(K − 1)Dk−1 (Cloud centre∗ )
(3.7)

This section already described about EDA operators and the newly introduced recursive
formulae for density calculation. Section 2.2.2 will discuss about Data-cloud and AnYa
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 19

type fuzzy rule.

3.2 Offline training

This proposed classifier will construct potential data-cloud from each class uniquely and
form an AnYa type fuzzy rule (as equation 2.5) based on data-cloud centre which is
determined from every class. For a better demonstration, the architecture of the offline
training of proposed classifier is given in Fig. 3.1. Data-cloud identified from each
class will not be affected by each other. As mentioned earlier in section 2.2.1, all EDA
operators will be operated in k th time instance for cth class of data in case of online
training but in offline stage we will consider a static dataset.

Figure 3.1: System diagram of Offline training stage.

Firstly, the proposed classifier will determine superior data sample from the dataset by
using multimodal density [4, 14] formula (2.3) for per class. Now, a ranking system
will be conducted by using prim’s algorithms concept, esteemed density values, and the
distance between every pair of unique values.
c
To preparing the tree, maximum density ( arg maxU M M (uc ))) of a unique data sam-
i=1 (Dc i
c
ple will be the root node and the nearest ( arg minU c
i=1 (d(root, ui )) data sample of root
will be the first ancestor. Then the nearest data sample of tree nodes will be added next
as ancestor node. The newly added node can be nearest of root or the first node. So,
from which node it is nearest will be the parent node of that ancestor and one unique
data sample will not be selected for tree more than one time. Repeating this process
and by using all unique data the tree will be generated. An illustration has made using
100 data of class 2 from Occupancy detection and Letter recognition dataset in figure
3.2 and 3.3. The node number denotes serial no of an unique data in the static dataset
and node-61 (fig. 3.2), node-65 (fig. 3.3) are roots. The sign ”←” indicates a child node
of a parent node and the parent node containing more than one child (out-degree) will
be considered as junction parent node in this study.
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 20

Figure 3.2: Illustration of a tree generated from 100 data of class 2 from Occupancy
detection Dataset.

Figure 3.3: Illustration of a tree generated from 100 data of class 2 from Letter
recognition dataset.

Condition 1:

IF (DcM M (nodei ) > DcM M (nodei+1 )) AND (DcM M (nodei ) > DcM M (nodei−1 ))
(3.8)
THEN nodei ∈ Cloud centrec

From the tree, three types of path segment will be used to identify the superior data
points or prototypes: 1) root to a node before junction parent node, 2) junction parent
node to a node before another junction parent node and 3) junction parent node to leaf
node. If one junction parent node have more than one valid path, then the first path
will be chosen by considering highest density. Applying condition 1 [27] and three
path segment of tree (similar to figure 3.2 and 3.3), the identified unique data points or
prototypes will be assigned in Cloud centrec for C th class as a data-cloud centre.
Before cloud merging in offline mode, firstly we have to make data-clouds using all the
sample data available in offline mode and by using widely used Voronoi tessellation [17]
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 21

as:

m
winning Cloud centre = arg min(d(xj , Cloud centreci )); xj ∈ X c (3.9)
i=1

The cloud centre will always grab its nearby data samples to make an effective data-cloud
and Average distance between cloud centre to its sample, Adci = d(Cloud centreic , xj )
P

/ no of sample in ith data-cloud. If any cloud exits into the potential area (distance
between two cloud centre is less than equal ones Adci ) of another cloud then according
to condition 2 (3.10) those clouds will be merged (multiple data-cloud to one data-
cloud) into one data-cloud, the ith one. The offline merging procedure has delineates
in figure 3.4 where seemingly elliptical structure for data-cloud has used for simplicity
(data-cloud does not have any shapes or boundaries). In figure 3.4a, the centre of
data-cloud2 is located on\inside data-cloud1 and data-cloud3 is located outside of the
potential zone of data-cloud1 . Hence, the merging result of data-cloud 1 and 2 has
depicted in figure 3.4b. In contrast, no merge operation takes place between data-cloud
1 and 3.
Condition 2:

IF ∃ (d(Cloud centreic , Cloud centrej∗ c


c ) AND i 6= j) ≤ Adi
(3.10)
THEN ∀(Cloud centrej∗ i
c ) ∈ Cloud centrec

If Condition 2 is satisfied then parameters (no of sample in a data-cloud, data-cloud


centre and average distance) for ith data-cloud will be updated as follows:

Sic = Sic + Sj∗


c
P c
c Xi
Cloud centrei = (3.11)
Sic
d(Cloud centreci , Xic )
P
c
Adi =
Sic
Where, Sic denotes number of data samples in cloud i of class c and Sj∗
c denotes total

number of sample of all j th cloud in class c which satisfied condition 2. Updated centre
and average distance of ith cloud is centreci and Adci .

Before the ending of offline training stage it will calculate the parameter σ c , γ c and
unimodal density of identified data-cloud (D(Cloud centreci ) ) per class for recursive
calculation of online training stage. After updating meta-parameters of all class, the
offline training stage will prepare AnYa type fuzzy rule base as 2.5.

The Algorithm 1 has illustrated overall procedure of offline training architecture.


Chapter 3. Buffer-based Adaptive Fuzzy Classifier 22

Figure 3.4: Illustrates merging procedure of data-cloud in offline mode, where fig. (a)
describes the state before cloud merging and fig. (b) describes the state after merging
data-cloud 1 with 2.

Algorithm 1: Offline Training


Input: Static dataset.
Output: Data-cloud, Average distance of each data-cloud (Adci ),
data-cloud centre (centreci ), support of each data-coud (Sic ), σ c and γ c
Find out unique data samples.
Calculate multimodal density (DcM M (uci )) for each unique data sample .
Prepare tree to rank the unique data sample using prim’s algorithm.
Apply Condition 1 (3.8) in the tree to find out superior unique data
sample.
Prepare data-cloud using 3.9.
for all the existing data-cloud apply Condition 2 (3.10) do
if Condition 2 is satisfied then
Update ith data-cloud parameters (Adci , Sic and centreci ) using
3.11.
Delete all j th data-cloud.
end
end
Create AnYa type data-cloud based fuzzy rule using 2.5.
Calculate per class parameter σ c and γ c .

3.3 Online training

In this subsection, the architecture of evolving training and removing irrelevant data-
cloud is detailed as follows. For a better demonstration, the architecture of the online
training of proposed classifier is given in Fig. 3.5. As this online approach is ”one-pass”,
it will not store previous data samples rather it will update its data-cloud and per class
variable or parameters obtained from offline training stage. Irrelevant data-cloud will be
stored in temporary memory storage and within some conditions it will be retrieved or
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 23

removed permanently. Data-cloud will also be merged when conditions will be satisfied
to make this approach more computation and memory efficient. Before the beginning
of evolving train process age of data-cloud(Agec0 ) will be initialized as zero for each
data-cloud of each class. Further, considering this age of a particular data-cloud we will
identify a data-cloud as relevant or irrelevant. Moreover, in the following process we
have to assume that the online training process is executing on a sample data belongs
cth class.

Figure 3.5: System diagram of Online training stage.

For each arriving data sample the system calculate its unimodal density using recursive
formula 3.5 with variables σkc and γkc . At the same time we have to update existing
data-cloud density recursively using density of (k −1)th time instance as early mentioned
formula 3.7.

Condition 3:

IF (Dkc (xck )) > max(Dkc (Cloud centre∗ )) OR (Dkc (xck )) < min(Dkc (Cloud centre∗ ))
OR (d(xck , Cloud centre∗ ) > max(Adck )) THEN xck ∈ Cloud centrec
(3.12)
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 24

Condition 3 [27] will be satisfied when the newly arrived k th data will contain maximum
or minimum density than any existing data cloud and the distance between k th data xck
with any existing data-cloud is greater than maximum average distance of all data-cloud,
new data-cloud will be created with early mentioned variables or parameters as follows:

Cloud centrecnew = xck ; Adcnew = 0; Agecnew = 0; c


Snew = 1; c
Dnew = Dkc (xck )
(3.13)

The condition will not be satisfied when it is nearest of a data-cloud and identified
nearest data-cloud, i = arg min (d(xck , Cloud centrec∗,k )). The parameters or variables
of identified data-cloud will be updated as follows:

(Cloud centreci,k−1 ∗ Si,k−1


c ) + xck
Cloud centreci,k = ;
Sic + 1
(Adci,k−1 ∗ Si,k−1
c ) + d(xck , Cloud centreci,k−1 ) (3.14)
Adci,k = c ;
Si,k−1 +1
c c c c
sum of ki,k = sum of ki,k−1 + k; Si,k = Si,k−1 + 1;

where sum of k denotes the sum of time index when k th data has arrived in a specific
data-cloud and it is needed for recursively age calculation of a data-cloud.

Now, we will calculate age of a data-cloud recursively to find out irrelevant data-cloud
as follows: c
sum of ki,k
Ageci,k = k − c (3.15)
Si,k

If data sample is assigned to a existing data-cloud, its age will be smaller and if no data
sample is assigned to existing data-cloud, its age will be added by one [15]. In addition,
age of a data-cloud will always follow the range [0,k]. The following simple condition
4 [15] will determine which data-cloud will be stored in temporary memory storage as
a seemingly irrelevant data-cloud.
Condition 4:

IF (Ageci,k ) > Ageck + std(Ageck ) THEN


(3.16)
Cloud centreci,k ∈ temporary storage AND survival ageci,k = 0.5

According to the condition a data-cloud will be considered irrelevant when its age is
more than the ”one-sigma” value. As mentioned earlier this system will retrieve data-
cloud from the buffer and remove its survival age when the following condition will be
satisfied.
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 25

Condition 5:

IF (Ageci,k ) < Ageck + std(Ageck ) THEN


(3.17)
Retrieve ← Cloud centreci,k AND Remove ← survival ageci,k

Data-cloud will be retrieved by condition 5 from the buffer when it will be found as
relevant to the current data stream.
For every iteration, the survival age of a data-cloud stored in temporary memory stor-
1
age will be reduced by decay [33]. The process can identify dying data-cloud which are
completely irrelevant to the recent data flow or stream for a long time. A died data-
cloud will be identified when its survival age(survival ageci,k ) will be less then or equal
to zero and will be removed permanently from the system. The process of identifying
and removing data-cloud is as follows:
Condition 6:

IF survival ageci,k ≤ 0 THEN Remove ← Cloud centreci,k (3.18)

Furthermore, from condition 6, the irrelevant data-cloud with non-positive survival age
are identified and are completely removed from the memory storage. The data-cloud
with zero or negative survival age can be identified as anomaly data. However, this
study is out of that concept.

Figure 3.6: Merging procedure of data-cloud in online mode, where fig. (a) describes
the state before data-cloud merging and fig. (b) describes the state after merging
data-cloud 1 with 2 according to condition 7.

In order to make number of data-clouds limited, at the same time, to extract the dynamic
or evolving characteristics of data, our proposed system is able to merge two overlapping
data-clouds. A data-cloud merging operation will be executed when distance between
two nearest data-cloud is significantly low. The condition for this fully autonomous
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 26

process is as follows:
Condition 7:

Adci,k Adcj,k
IF d(Cloud centreci,k , Cloud centrecj,k ) < + THEN
2 2
Cloud centrecj,k ∈ Cloud centreci,k ; (3.19)
Nc
j = arg min d(Cloud centreci,k , Cloud centrecj∗,k )
j=1

In summary, the condition depicts that two clouds will be merged only when distance
between ith data-cloud and j th data-cloud is less than the sum of their half of local
average distance and the j th data-cloud will be selected on basis of minimum distance
from ith data-cloud. Figure 3.6 describes the procedure of online data-cloud merging.
In fig. 3.6a, both data-cloud satisfied condition 7 and the resulting merged data-cloud
has shown in fig. 3.6b.

This merging process will control uncontrolled growth of data-clouds to make this system
more dynamic. As, two data-cloud has merged, we have to update its parameters or
variables as follows:
c ) + (Cloud centrec ∗ S c )
(Cloud centreci,k ∗ Si,k j,k j,k
Cloud centreci,k = c + Sc ;
Si,k j,k
Adci,k + Adcj,k + d(Cloud centreci,k , Cloud centrecj,k )
Adci,k = ;
2
c ∗ S c ) + (sum of k c ∗ S c )
(sum of ki,k (3.20)
i,k j,k j,k
Ageci,k = k − c c ;
Si,k + Sj,k
c c c
sum of ki,k = sum of ki,k + sum of kj,k ;
c c c
Si,k = Si,k + Sj,k ;

After online merging operation of data-cloud it will update its AnYa type fuzzy rule
base according to 2.5 and will read next data sample or be shifted at validation stage.
The Algorithm 2 has illustrated overall procedure of online training architecture.
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 27

Algorithm 2: Evolving Training


Input: Data sample xck as stream data and parameters ( Adccloud ,
centreccloud , Scloud
c
, σ c and γ c ) from offline training stage.
Output: AnYa type data-cloud based fuzzy rule.
while Data stream not ends do
Initialize Agecik , sum of kik
c
, Bkc and Dkc .
Calculate unimodal density of arrived data Dkc (xck ) applying
formulae 3.5.
Calculate density of existing data-clouds Dkc (Cloud centre∗ ) using
recursive formulae 3.7.
if Condition 3 (3.12) is satisfied then
Create new data-cloud with parameter of (3.13).
else
Enter in a existing data-cloud and update parameter of (3.14).
end
for Every existing data-cloud do
Update age (Agecik ) recursively using 3.15.
end
for Every existing data-cloud do
if Condition 4 (3.16) is satisfied then
Move data-cloud into temporary storage (Buffer).
Assign survival age of the data-cloud, survival ageci,k = 0.5.
end
end
for Every data-cloud in temporary storage (Buffer) do
Reduce decay from survival age as survival ageci,k =
1
survival ageci,k - decay .
if Condition 5 (3.17) is satisfied then
Retrieve data-cloud from temporary storage (Buffer).
Remove survival age of the data-cloud.
end
end
for Every data-cloud in temporary storage (Buffer) do
if Condition 6 (3.18) is satisfied then
Remove data-cloud permanently.
end
end
for Every existing data-cloud do
if Condition 7 (3.19) is satisfied then
Merge data-cloud ith and j th one into ith data cloud.
Update properties of ith data-cloud with 3.20.
end
end
Create AnYa type data-cloud based fuzzy rule using (2.5).
end
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 28

3.4 Decision Making Procedure

This subsection will describe the procedure for validating a data samples label by using
pre-selected data-clouds. In both offline and online training architecture, the construct-
ing procedure of AnYa type data-cloud based fuzzy rule has described. For the arriving
data sample (x), each active fuzzy rule will provide a firing strength by computing the
similarity between x and pre-selected data-cloud denoted by γ c (x) (c = 1, 2, 3...C),
which will be determined by the following formulae [27]:
 2 (x,Cloud

γ c (x) = max e−d centre)
(3.21)
Cloud centre∈{Cloud centreC }

Further, depending on the firing strength of per class (one rule from a class) rules, the
class label of x will be set by a straightforward way with general decision-maker using
the ”winner takes all” formulae as well [15]:

class label = arg max (γ c (x)) (3.22)


c=1,2....C

Next section 3.5 will analysis the computational complexity of both offline and online
algorithm of BAFC.

3.5 Algorithm Complexity Analysis

Computational complexity of offline training procedure, online training procedure and


memory complexity of the BAFC will be analyzed in this section.

3.5.1 Computational complexity analysis of offline training process

According to the described algorithm 1, offline procedure will perform on a static dataset,
where computational complexity of the algorithm highly depends on uniquely identified
data sample from the dataset. The computational complexity of the entire offline train-
ing procedure is O(c × (n + (n × (n-1)) + m) ) or O(c × n) (where, always n¿=m),
Where c belongs to the number of class, n is the number of uniquely identified data
sample from the static dataset and m is the number of data-cloud after merging of data-
cloud in offline. In this offline training procedure, the ranking process takes at most
O(n × (n-1)) and offline data-cloud merging takes O(m) time complexity. By analyzing
the offline training algorithm, we simply can tell that the data-cloud creation procedure
starts with the ranking system of uniquely identified data sample and next ensure its
Chapter 3. Buffer-based Adaptive Fuzzy Classifier 29

structure with its neighboring or supporting data sample using 3.9. In addition, the
entire offline training procedure at last provides some output or meta-parameters as
data-cloud center (Cloud centre), density of data-cloud (D(Cloud centre)), support of
a data-cloud(s), average distance of a data-cloud (Ad) for each class and for each existing
data-cloud.

3.5.2 Computational complexity analysis of online training process

To compute the complexity of online training procedure, we have to assume that the
arrived data sample xk belongs to class c. Though meta-parameter updating for each
data-cloud increases the computational complexity but the traits (‘one pass’ type and
merging with deletion of data-cloud) of the proposed algorithm minimize it as well. The
computational complexity of online training procedure for arrived data xk of class c is
O(n + n + m + m + n) or O(n), where n belongs number of data-cloud at k th instance
and m is the number of data-cloud stored in temporary memory (buffer). So, the overall
complexity of online training procedure will be O(n × m), where n is total number of
arrived data sample as stream data and m belongs to number of active data-cloud at
k th time instance.

3.5.3 Memory complexity analysis

The requirements for memory space in our proposed BAFC includes the storing of
data-cloud, decay parameter, current sample data. If n data-cloud with d dimensional
Cloud centre and m number of meta-parameters exits, then BAFC requires a memory
of O(n × (m + d) + d). As, the memory requirements for meta-parameters and parameter
decay are constant, the memory space complexity will be O(n × d).

Experiments and Result analysis in chapter 4 will describe numerical examples and
analysis on benchmark datasets for performance evaluation. All tabulated outcomes are
average performance obtained after 10 runs on randomize dataset and validation stage
is used for classify the test dataset.
Chapter 4

Evaluations

This section is going to delineate the experimental result which have conducted on
numerous real-world and synthetic dataset with a well-known image dataset to prove
the effectiveness of the proposed BAFC algorithm. The large set of data has helped
to find out the capability on time series prediction using the proposed method. The
performance of this proposed algorithm has compared with a number of ”state-of-the-
art” classifiers performed on MATLAB 2014a.

4.1 Dataset

To describe the efficiency of this newly proposed data-cloud based classifier, well known
benchmark dataset has used and further it has compared with well-established classifiers
namely Evolving classifiers (eclass)[15]), Support vector machine (SVM) classifier[23], K-
nearest neighbour (KNN) classifier[24], Decision tree (DT) classifier [51], Self-organising
map (SOM) classifier [47], DENFIS classifier [36], SOF[27] classifier and TEDAClass
classifier[35]. The experiment has conducted on the following well-established bench-
mark and numerical datasets:

• Occupancy detection: A real-world multi-variate time series dataset on room


occupancy (binary) as per the environmental condition of the room[20]. This
dataset contains 20560 instances and 7 attributes for two class- occupied(1) and
non-occupied(0).

• Optical recognition: It contains 5620 instance of 10 class. The 64 feature of


Optical recognition dataset is extracted from handwritten digits into bitmaps.

30
Chapter 4. Evaluations 31

Figure 4.1: Data samples from MNIST dataets.

• Multiple feature: This dataset consists of features of handwritten numerals (0-


9) extracted from a collection of Dutch utility maps. 200 patterns per class (for a
total of 2,000 patterns) have been digitized in binary images. For every instance
it has 649 feature.

• Letter Recognition: This english capital letter recognition dataset contains


20000 instance with 16 features and 26 class labels.

• SEA: A synthetic dataset having the traits of concept drift consists of 50000
instances with three features of which only two are relevant[53]. The two class
decision boundary is given by condition, f1 + f2 ≤ θ where f1 , f2 are the two
relevant first and second feature and θ is a predefined threshold. To simulate
unexpected drift four different concepts has applied by altering the value of θ after
every 12500 samples including within 10% noise.

• Electricity pricing: Electricity is another widely used dataset collected from the
Australian New South Wales Electricity Market[Street and Kim]. In this market,
prices are not fixed and are affected by demand and supply of the market. They
are set every five minutes. This dataset contains 45,312 instances. The class label
identifies the change of the price relative to a moving average of the last 24 hours.

• Mnist data: This dataset is a handwritten numeral database related with the
symbol image recognition which is known as MNIST[61] dataset. It contains 60,000
gray-level training images and 10,000 gray-level images for validation of 10 hand-
written digits (0 –9). Each image is 28 pixels in height and 28 pixels in width, for
a total of 784 pixels in total. In Fig.4.1 example images of this image set are given
for better demonstration.
Chapter 4. Evaluations 32

4.2 Result

To evaluate the performance of Proposed BAFC, it is compared with numerious well-


known classifiers. Firstly, the performance of the proposed approach is experimented on
six aforementioned benchmark dataset and compared with eight evolving, autonomous
and offline method. The results of the comparison of proposed method with the well-
established methods are tabulated in table 4.1 and it shows the high accuracy of our
proposed method.

Figure 4.2: Highlighting data-cloud and data sample during Offline training.

Figure 4.3: Highlighting data-cloud and data sample during Evolving training.

As the occupancy detection dataset and optical recognition of handwritten digits dataset
contains both training and testing set, here 15% of training data has used for offline
training and rest has used for online training stage. In addition, for multiple features
dataset, SEA dataset, letter recognition dataset and electricity prizing dataset 15% of
Chapter 4. Evaluations 33

Figure 4.4: Highlighting data-cloud and data sample during Offline training.

Figure 4.5: Highlighting data-cloud and data sample during Evolving training.

overall data has used for offline training stage with 50% of data in online training stage
and in validation stage it has used the rest 35% of data.

From Fig 4.2, 4.3, 4.4, and 4.5 one can notice the after and before situation of online
training in the data space. Fig 4.2 indicates data-cloud centres marked by blue asterisks
(∗), where red (·) express data sample in the data space after offline training for random-
ize 1000 data sample and fig 4.3 indicates newly arrived data samples and data-cloud
centres after online training of randomize 2823 data sample for optical recognition data.
In addition, fig 4.4 delineates the merged data-cloud after offline training of randomize
2000 data sample by green dots(·) and in fig 4.5 black dots(·) indicates merged data-cloud
centers during online training for randomize 3000 data sample of SEA dataset.

Furthermore, the minimization and deletion of huge amount of data-cloud which is the
focus point of our proposed algorithm has experimented on occupancy detection dataset.
Fig. 4.6 delineates the number of active data-clouds, data-clouds in buffer( temporary
Chapter 4. Evaluations 34

Occupancy Detection.
200

180

160

140

120

No of cloud
100
Active Dc
80
Dc in Buffer
Dead Dc
60

40

20

0
0 1000 2000 3000 4000 5000 6000
No of Data sample

Figure 4.6: Reading of data-cloud number during evolving training phase for class 0.

Mnist Dataset.
0.8
BAFC
0.7 AutoClass1
eClass1
TEDAClass
0.6
Training time per sample (s)

SOF

0.5

0.4

0.3

0.2

0.1

0
0 20 40 60 80 100
Amount of Data in percent (%)

Figure 4.7: Per sample training time graph for Mnist data.

storage) and deleted or dead data-clouds during online training stage of occupancy de-
tection dataset for class 1. One can notice that from offline training stage it produce
more or less 150 data-cloud in online training stage and after reading 2000 data sam-
ple in online its number of data-cloud increased and with the time it reads newer data
sample as stream and merged the data-cloud or delete it from buffer. In addition, after
online training process it provides the total number of data-cloud more or less 130 to
the validation stage.

Figure 4.7 shows the training time comparison in average time from MNIST dataset. It
compares the training time with two prototype based and two data-cloud based evolving
algorithms, namely, AutoClass1, eClass1, TEDAClass and SOF. As shown in fig-4.7, the
training time of TEDAClass, eClass1 and Autoclass1 are considerably high than BAFC
and SOF. The computational complexity and number of prototype or data point in those
Chapter 4. Evaluations 35

classifier are fact behind this result. On the other hand limited number of data-cloud
brings efficient training time for BAFC.

The memory efficiency using MNIST data is depicted in figure 4.8. As we have seen
in 3.5.3 that the required memory is directly related with the number of data-cloud
for a data-cloud based classification algorithm, the performance of BAFC is superior
in this case. The number of created data-cloud during training stage is compared with
data-cloud based classifier SOF and TEDAClass where SOF shows comparatively poor
performance because it does not have any online data-cloud pruning or merging tech-
nique.

Mnist Dataset.
6000
BAFC
SOF
5000 TEDAClass
Number of Data−clouds (per class)

4000

3000

2000

1000

0
0 20 40 60 80 100
Amount of Data in percent (%)

Figure 4.8: Memory use during training of Mnist data.

The total training time for different values of decay has shown in figure 4.9. The relation
between decay value and training time indicates that with large value of decay data-
cloud stored in the buffer waits for a long time. As a result, according to subsection 3.5
extra processing time will be formed for stored data-cloud in buffer.

While comparing with state-of-the-art classifier method, cosine dissimilarity with the
level of granularity, G = 12 has used for SOF; a linear kernel has used for the SVM
classifier; for the k-NN classifier, k = 10 and “winner-takes-all” principle with a net size
of 9×9 has Applied for SOM classifier.
In table 4.1, Acc define obtained accuracy with a method against a dataset and T ime(s)
define the required of training for a data sample in seconds. It has showed the compari-
son of performance, where It can be seen from the evaluated results that proosed BAFC
method performed superior than other methods, obtaining 94.96% and 77.08% accura-
cies respectively for Letter recognition and Electricity prizing dataset. For occupancy
detection dataset k-NN has showed remarkable performance with accuracy 96.64% and
Chapter 4. Evaluations 36

Mnist Dataset.
6000

5000

Training time per sample (s)


4000

3000

2000
Decay=500
Decay=750
1000 Decay=1000
Decay=1250
Decay=1500
0
0 20 40 60 80 100
Amount of Data in percent (%)

Figure 4.9: Total training time for various decay parameter value.

for Optical recognition dataset SOF performed better with accuracy 98.39%. In addi-
tion, offline algorithm SVM has showed better performance for both multiple feature and
SEA dataset than other methods obtaining 96.71% and 82.65% accuracies respectively,
where 65% data has trained and for testing rest 35% data has used. In this exper-
iment,the revolutionary method for stream data classification, Eclass has gave lower
accuracy than all other method and the DENFIS classifier has failed for the high dimen-
sionality of optical recognition and multiple feature datasets[27]. As, average accuracy
over all classifiers has shown, it signifies that BAFC can compete with acknowledged
classifier with average accuracy 90.82%.

Table 4.2 show the classification rate used to evaluate the accuracy of the six classification
method for the MNIST[61] dataset. As one can notice, in online and offline method
various sizes (5% to 100%) of training sets have been used. To conduct online training
result, 10% data has trained in offline and produced comparable accuracy 97.26%. Point
to be noted, using image data proposed algorithm BAFC establish significantly better
results by acquiring performance even for 15% data in online training stage. BAFC
algorithm, as well as offline classifier SVM, give more stable result, comparing with SOF,
eClass1, Autoclass1 and TEDAclass. In addition, we can enhance the performance on
image dataset by using different types of image feature descriptor as GIST[44].

Furthermore, one may acquire extremely transparent, human-understandable AnYa type


fuzzy rules, which is another advantage of the BAFC classifier, because of the data-cloud-
based structure of the BAFC classifier. Table 4.3 shows illustrative examples of the fuzzy
rules created from the three benchmark problems. For better visual clarity, the images
extracted from the AnYa type fuzzy rule in the table are scaled. The proposed BAFC
classifier in this study is a powerful alternative to existing techniques with excellent
Chapter 4. Evaluations 37

accuracy, transparency, and fast evolving learning (online training stage), as shown by
the numerical examples above.
Chapter 4. Evaluations
Table 4.1: Describing accuracy for different algorithm with dataset.

Algorithm Letter Recognition Occupancy detection Optical recognition Multiple features Electricity Pricing SEA Average
Acc 94.96 96.03 97.90 96.61 77.08 82.38 90.82
Proposed
Time(s) 0.37 3.32 2.91 1.066 1.30 1.05
Acc 92.98 95.88 98.39 93.66 71.39 74.56 87.81
SOF [27]
Time(s) .40 4.70 1.11 1.05 1.89 5.02
Acc 51.25 88.63 86.81 82.64 67.27 72.34 74.82
Eclass [15]
Time(s) 4.74 8.72 9.69 6.59 12.10 13.97
Acc 91.80 96.64 97.66 91.51 72.20 73.03 87.14
KNN [24]
Time(s) 1.05 1.11 1.08 1.02 1.05 1.09
Acc 51.54 96.34 91.20 86.37 72.63 76.22 79.05
TEDAClass [35]
Time(s) 2998.32 512.11 1817.19 18001.44 3342.11 1981.45
Acc 85.33 95.77 96.27 96.71 71.11 82.65 87.97
SVM [23]
Time(s) 32.34 199.23 6.94 22.71 35.14 179.23
Acc 82.43 93.14 85.25 92.44 68.18 69.91 81.89
DT [51]
Time(s) 1.19 2.18 1.21 1.60 2.09 2.10
Acc 32.56 89.09 No valid result No valid result 51.08 63.40 –
DENFIS [36]
Time(s) 160.60 26.18 139.13 116.81
Acc 54.49 94.71 95.03 74.91 64.05 80.56 77.29
SOM [47]
Time(s) 23.81 19.34 22.90 44.29 1.94 5.54

38
Chapter 4. Evaluations
Table 4.2: Compare of performance (in accuracy, %) with different amounts of MNIST dataset

Amount of training sample Proposed Online Proposed Offline SOF [27] SVM [23] Autoclass1 [6] Eclass1 [15] TEDAclass [35]
5% 97.11 96.66 97.03 96.51 96.37 96.63
10% 97.19 96.87 97.21 96.81 96.55 97.01
15% 97.26 97.21 96.99 97.29 97.02 96.83 97.12
25% 97.42 97.29 97.12 97.41 97.13 96.91 97.26
35% 97.66 97.54 97.18 97.59 97.31 97.04 97.31
45% 97.73 97.66 97.23 97.63 97.39 97.31 97.33
55% 97.91 97.78 97.21 97.88 97.35 97.39 97.51
65% 97.96 97.89 97.39 97.95 97.38 97.41 97.63
75% 97.95 97.81 97.54 98.01 97.39 97.42 97.67
100% 98.13 98.01 97.70 98.15 97.44 97.41 97.61

Table 4.3: A part of Fuzzy-rule generated from evolving stage.

Dataset IF part of AnYa type Fuzzy rule THEN part


SEA x ∼ (8.231264, 6.158971, 0.652521) OR x ∼ (3.240307, 9.291315, 0.819171) OR .... OR x ∼ (8.131284, 5.072553, 3.631046) Class 1
x ∼ (0.263657, 1.5065, 7.252238) OR x ∼ (0.356543, 0.585792, 8.048625) OR .... OR x ∼ (1.659741, 5.539075, 0.534608) Class 2
Electricity Pricing x ∼ (0.148936, 0.0455798, 0.149194, 0.0034670.422915, 0.414912) OR x ∼ (0.127660, 0.045717, 0.151032, 0.003467, 0.422915, 0.414912) OR .. Class 1
.. OR x ∼ (0.042553, 0.053233, 0.242189, 0.003467, 0.422915, 0.414912) OR x ∼ (0.170213, 0.046265, 0.155432, 0.003467, 0.422915, 0.414912)
x ∼ (0.423292, 0.090428, 0.532360, 0.003467, 0.422915, 0.414912) OR x ∼ (0.189249, 0.072599, 0.209023, 0.003467, 0.422915, 0.414912) OR .. Class 2
.. OR x ∼ (0.382979, 0.084680, 0.539956, 0.003467, 0.422915, 0.414912) OR x ∼ (0.240300, 0.087543, 0.366233, 0.003467, 0.422915, 0.414912)

MNIST x∼ OR x ∼ OR x ∼ OR x ∼ OR .... OR x ∼ OR x ∼ OR x ∼ OR x ∼ Class 1

x∼ OR x ∼ OR x ∼ OR x ∼ OR .... OR x ∼ OR x ∼ OR x ∼ OR x ∼ Class 4

x∼ OR x ∼ OR x ∼ OR x ∼ OR .... OR x ∼ OR x ∼ OR x ∼ OR x ∼ Class 7

x∼ OR x ∼ OR x ∼ OR x ∼ OR .... OR x ∼ OR x ∼ OR x ∼ OR x ∼ Class 10

39
Chapter 5

Conclusion and Future Work

In this study, a new type of Buffer based self-organising fuzzy classifier (BAFC) is pro-
posed on the basis of the Data-cloud and AnYa type FRB system with Empirical Data
Analytics computational framework. The proposed SOF classifier is free from prede-
fined parameters or prior assumptions about the data generation model and it is highly
transparent structure composed of meaningful data-clouds. The proposed classifier can
recognize potential data-clouds from the data sample of offline training in an effective
and efficient way and stay on learning process from the data as a streaming way by uti-
lizing the newly introduced recursive formulae of density calculation. The used memory
of the proposed algorithm is relatively more than the offline and online prototype based
SOF because of storing the temporarily irrelevant data-clouds in the buffer (temporary
storage) but it shows considerably better result than early mentioned methods. Fur-
thermore, The proposed method has the capability of distinguish existing changes of the
arriving stream data by utilizing online data-cloud merging concept. Its evolving online
training structure can update the properties of the data-cloud, create new data-clouds
to adapt the changes in data stream. Experimental analysis of the benchmark dataset
exhibit the high performance of the BAFC approach and show its ability in managing
different sorts of problems.

In future, we will update our classification model into an fully online and evolving
method considering all the shortcoming of state-of-the-art research. Besides, we also
try to utilize BAFC in different image classification fields for further researches using
different arresting image feature descriptor methods.

40
Bibliography

[1] Angelov, P. (2014). Anomaly detection based on eccentricity analysis. In 2014 IEEE
symposium on evolving and autonomous learning systems (EALS), pages 1–8. IEEE.

[2] Angelov, P. (2015). Typicality distribution function—a new density-based data an-
alytics tool. In 2015 International Joint Conference on Neural Networks (IJCNN),
pages 1–8. IEEE.

[3] Angelov, P. and Gu, X. (2017). Mice: Multi-layer multi-model images classifier
ensemble. In 2017 3rd IEEE International Conference on Cybernetics (CYBCONF),
pages 1–8. IEEE.

[4] Angelov, P., Gu, X., and Kangin, D. (2017a). Empirical data analytics. International
Journal of Intelligent Systems, 32(12):1261–1284.

[5] Angelov, P., Gu, X., Kangin, D., and Principe, J. (2016). Empirical data analysis:
a new tool for data analytics. In 2016 IEEE International Conference on Systems,
Man, and Cybernetics (SMC), pages 000052–000059. IEEE.

[6] Angelov, P., Kangin, D., Zhou, X., and Kolev, D. (2014). Symbol recognition with a
new autonomously evolving classifier autoclass. In 2014 IEEE Conference on Evolving
and Adaptive Intelligent Systems (EAIS), pages 1–7. IEEE.

[7] Angelov, P. and Yager, R. (2011). Simplified fuzzy rule-based systems using non-
parametric antecedents and relative data density. In 2011 IEEE Workshop on Evolving
and Adaptive Intelligent Systems (EAIS), pages 62–69. IEEE.

[8] Angelov, P. and Yager, R. (2012). A new type of simplified fuzzy rule-based system.
International Journal of General Systems, 41(2):163–185.

[9] Angelov, P. and Zhou, X. (2006). Evolving fuzzy systems from data streams in
real-time. In 2006 International symposium on evolving fuzzy systems, pages 29–35.
IEEE.

41
Bibliography 42

[10] Angelov, P., Zhou, X., and Klawonn, F. (2007). Evolving fuzzy rule-based classi-
fiers. In 2007 IEEE Symposium on Computational Intelligence in Image and Signal
Processing, pages 220–225. IEEE.

[11] Angelov, P. P. (2013). Evolving rule-based models: a tool for design of flexible
adaptive systems, volume 92. Physica.

[12] Angelov, P. P. and Gu, X. (2018). Empirical fuzzy sets. International Journal of
Intelligent Systems, 33(2):362–395.

[13] Angelov, P. P., Gu, X., and Prı́ncipe, J. C. (2017b). Autonomous learning multi-
model systems from data streams. IEEE Transactions on Fuzzy Systems, 26(4):2213–
2224.

[14] Angelov, P. P., Gu, X., and Prı́ncipe, J. C. (2017c). A generalized methodology for
data analysis. IEEE transactions on cybernetics, 48(10):2981–2993.

[15] Angelov, P. P. and Zhou, X. (2008). Evolving fuzzy-rule-based classifiers from data
streams. Ieee transactions on fuzzy systems, 16(6):1462–1475.

[16] Bishop, C. M. et al. (1995). Neural networks for pattern recognition. Oxford uni-
versity press.

[17] Boots, B., Okabe, A., and Sugihara, K. (1999). Spatial tessellations. Geographical
information systems, 1:503–526.

[18] Borgi, M. A., Labate, D., El’Arbi, M., and Amar, C. B. (2014). Regularized shear-
let network for face recognition using single sample per person. In 2014 IEEE In-
ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages
514–518. IEEE.

[19] Byun, H. and Lee, S.-W. (2002). Applications of support vector machines for pattern
recognition: A survey. In International Workshop on Support Vector Machines, pages
213–236. Springer.

[20] Candanedo, L. M. and Feldheim, V. (2016). Accurate occupancy detection of an


office room from light, temperature, humidity and co2 measurements using statistical
learning models. Energy and Buildings, 112:28–39.

[21] Cheeseman, P. C., Self, M., Kelly, J., Taylor, W., Freeman, D., and Stutz, J. C.
(1988). Bayesian classification. In AAAI, volume 88, pages 607–611.

[22] Ciregan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column deep neural
networks for image classification. In 2012 IEEE conference on computer vision and
pattern recognition, pages 3642–3649. IEEE.
Bibliography 43

[23] Cristianini, N., Shawe-Taylor, J., et al. (2000). An introduction to support vector
machines and other kernel-based learning methods. Cambridge university press.

[24] Cunningham, P. and Delany, S. J. (2020). k-nearest neighbour classifiers–. arXiv


preprint arXiv:2004.04523.

[25] Ge, D. and Zeng, X.-J. (2020). Learning data streams online—an evolving fuzzy sys-
tem approach with self-learning/adaptive thresholds. Information Sciences, 507:172–
184.

[26] Gibert, D., Mateu, C., Planes, J., and Vicens, R. (2019). Using convolutional neural
networks for classification of malware represented as images. Journal of Computer
Virology and Hacking Techniques, 15(1):15–28.

[27] Gu, X. and Angelov, P. P. (2018). Self-organising fuzzy logic classifier. Information
Sciences, 447:36–51.

[28] Gu, X., Angelov, P. P., Kangin, D., and Principe, J. C. (2017). A new type of
distance metric and its use for clustering. Evolving Systems, 8(3):167–177.

[29] Gu, X., Angelov, P. P., and Prı́ncipe, J. C. (2018). A method for autonomous data
partitioning. Information Sciences, 460:65–82.

[30] Hariri, R. H., Fredericks, E. M., and Bowers, K. M. (2019). Uncertainty in big data
analytics: survey, opportunities, and challenges. Journal of Big Data, 6(1):44.

[31] Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical
learning: data mining, inference, and prediction. Springer Science & Business Media.

[32] Iglesias, J. A., Angelov, P., Ledezma, A., and Sanchis, A. (2011). Creating evolving
user behavior profiles automatically. IEEE Transactions on Knowledge and Data
Engineering, 24(5):854–867.

[33] Islam, M. K., Ahmed, M. M., and Zamli, K. Z. (2019). A buffer-based online
clustering for evolving data stream. Information Sciences, 489:113–135.

[34] Jin, R. and Agrawal, G. (2003). Efficient decision tree construction on streaming
data. In Proceedings of the ninth ACM SIGKDD international conference on Knowl-
edge discovery and data mining, pages 571–576.

[35] Kangin, D., Angelov, P., and Iglesias, J. A. (2016). Autonomously evolving classifier
tedaclass. Information Sciences, 366:1–11.

[36] Kasabov, N. K. and Song, Q. (2002). Denfis: dynamic evolving neural-fuzzy in-
ference system and its application for time-series prediction. IEEE transactions on
Fuzzy Systems, 10(2):144–154.
Bibliography 44

[37] Kim1 , J., Kim, B., and Savarese, S. (2012). Comparing image classification methods:
K-nearest-neighbor and support-vector-machines. In Proceedings of the 6th WSEAS
international conference on Computer Engineering and Applications, and Proceedings
of the 2012 American conference on Applied Mathematics, volume 1001, pages 48109–
2122.

[38] Larrain, T., Bernhard, J. S., Mery, D., and Bowyer, K. W. (2017). Face recognition
using sparse fingerprint classification algorithm. IEEE Transactions on Information
Forensics and Security, 12(7):1646–1657.

[39] Lu, L., Di, L., and Ye, Y. (2014). A decision-tree classifier for extracting transparent
plastic-mulched landcover from landsat-5 tm images. IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing, 7(11):4548–4558.

[40] Lughofer, E. and Pratama, M. (2020). Online sequential ensembling of fuzzy sys-
tems. In 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS),
pages 1–10. IEEE.

[41] Mamdani, E. H. and Assilian, S. (1975). An experiment in linguistic synthesis with


a fuzzy logic controller. International journal of man-machine studies, 7(1):1–13.

[42] McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D., and Barton, D. (2012).
Big data: the management revolution. Harvard business review, 90(10):60–68.

[43] Noorbehbahani, F., Fanian, A., Mousavi, R., and Hasannejad, H. (2017). An incre-
mental intrusion detection system using a new semi-supervised stream classification
method. International Journal of Communication Systems, 30(4):e3002.

[44] Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holis-
tic representation of the spatial envelope. International journal of computer vision,
42(3):145–175.

[45] Pal, M. (2005). Random forest classifier for remote sensing classification. Interna-
tional journal of remote sensing, 26(1):217–222.

[46] Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2):1883.

[47] Ploński, P. and Zaremba, K. (2012). Self-organising maps for classification with
metropolis-hastings algorithm for supervision. In International Conference on Neural
Information Processing, pages 149–156. Springer.

[48] Pratama, M., Anavatti, S. G., Angelov, P. P., and Lughofer, E. (2013). Panfis:
A novel incremental learning machine. IEEE Transactions on Neural Networks and
Learning Systems, 25(1):55–68.
Bibliography 45

[49] Rahman Fahim, S., Sarker, S. K., Muyeen, S., Sheikh, M., Islam, R., and Das,
S. K. (2020). Microgrid fault detection and classification: Machine learning based
approach, comparison, and reviews. Energies, 13(13):3460.

[50] Rong, H.-J., Sundararajan, N., Huang, G.-B., and Saratchandran, P. (2006). Se-
quential adaptive fuzzy inference system (safis) for nonlinear system identification and
prediction. Fuzzy sets and systems, 157(9):1260–1275.

[51] Safavian, S. R. and Landgrebe, D. (1991). A survey of decision tree classifier


methodology. IEEE transactions on systems, man, and cybernetics, 21(3):660–674.

[Street and Kim] Street, W. N. and Kim, Y. https://moa.cms.waikato.ac.nz/datasets/.

[53] Street, W. N. and Kim, Y. (2001). A streaming ensemble algorithm (sea) for
large-scale classification. In Proceedings of the seventh ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 377–382.

[54] Su, N., Liu, J., Yan, C., Liu, T., and An, X. (2017). An arbitrary shape clustering
algorithm over variable density data streams. Journal of Algorithms & Computational
Technology, 11(1):93–99.

[55] Subudhi, A., Dash, M., and Sabut, S. (2020). Automated segmentation and classi-
fication of brain stroke using expectation-maximization and random forest classifier.
Biocybernetics and Biomedical Engineering, 40(1):277–289.

[56] Suykens, J. A. and Vandewalle, J. (1999). Least squares support vector machine
classifiers. Neural processing letters, 9(3):293–300.

[57] Takagi, T. and Sugeno, M. (1985). Fuzzy identification of systems and its applica-
tions to modeling and control. IEEE transactions on systems, man, and cybernetics,
(1):116–132.

[58] Wang, M. and Chen, H. (2020). Chaotic multi-swarm whale optimizer boosted
support vector machine for medical diagnosis. Applied Soft Computing, 88:105946.

[59] Weng, Q., Mao, Z., Lin, J., and Guo, W. (2017). Land-use classification via extreme
learning classifier based on deep convolutional features. IEEE Geoscience and Remote
Sensing Letters, 14(5):704–708.

[60] Xia, G.-S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., and Lu, X. (2017).
Aid: A benchmark data set for performance evaluation of aerial scene classification.
IEEE Transactions on Geoscience and Remote Sensing, 55(7):3965–3981.

[61] Y. LeCun, C. C. (2010). Mnist handwritten digit database. available at:


http://yann.lecun.com/exdb/mnist/,.
Bibliography 46

[62] Zhang, W. and Ntoutsi, E. (2019). Faht: an adaptive fairness-aware decision tree
classifier. arXiv preprint arXiv:1907.07237.

You might also like